Issue metadata
Sign in to add a comment
|
Access tokens don’t refresh |
||||||||||||||||||||||||
Issue descriptionChrome Version: 70.0.3534.0 (Official Build) canary (64-bit) OS: Mac, 10.13.6 (17G65) I upgraded Canary on Mon Aug 27 11:27:27. Since then, several of my access tokens on chrome://signin-internals have not refreshed, breaking Sync (see attached screenshot). However, some of my tokens *have* refreshed and continue to do so. Some details from signin-internals in text form: Basic Information Chrome Version 70.0.3534.0 canary Account Consistency DICE Signin Status Signed In Last Signin Details Gaia Authentication Result RefreshToken Received Successful (106082914502521534264) 8/28/18, 10:31:45 AM SigninManager Started SigninManager Completed 4/10/18, 11:26:57 AM Successful This sounds possibly similar to crbug.com/869376 . Separately, when Sync attempts to use the token, it reports NETWORK_CONNECTION_UNAVAILABLE instead of AUTH_ERROR, which distinguishes this from crbug.com/875315.
,
Aug 29
,
Aug 29
Also, I tried running Canary with a different user-dir and didn’t run into this problem, so it’s tied to Chrome’s state, not the account. I will try to remember not to restart or upgrade until someone’s told me it’s OK.
,
Aug 30
Current state attached. I have more tokens. Some of the expired tokens have been successfully fetched since yesterday, but then expired.
,
Aug 30
Adding more signin folks. This still looks quite similar to bug 875315. There, it was the case that token requests got wrongly canceled under some circumstances, and *some* consumers had custom retry logic that hid this problem. And the set of services with a valid token looks quite similar (gcm_network_channel, suggestions_service, extensions_identity_api), so I'm suspecting this is a similar problem.
,
Sep 3
Back from vacation now :). 1. sfiera@, could you provide a screenshot of sync-internals as well? 2. Wrt this bug and crbug.com/875315, there is something that I'm not understanding: Sync has custom retry logic. I detailed my investigation of the impact of crbug.com/869376 here: https://docs.google.com/document/d/1oyzXBl4Z-PLh36br9l9T2yhsHiPnZVmY3XF-B7wI2DI/edit#heading=h.al2bfduymcp. My expectation (as I wrote up there) was that sync would *not* be affected. So either (a) there's something that I'm not yet understanding about the impact of crbug.com/869376 (which could also imply that my fix was not complete), or (b) there's something else going on (of course, it's very suspicious that Nicolas' problem has seemed to go away once he updated to the version with the cherrypick). I'd like to try to understand why sync seems to not be retrying and obtaining a valid access token.
,
Sep 3
Hmm. Sync is working now after the weekend, though there’s still something weird about my signin state.
,
Sep 3
Oh, and sorry I didn’t capture sync-internals in the former, bad state. My recollection is that all datatypes were listed as “Backed off”, and in the Sync Protocol Log, the client would issue a request and receive NETWORK_CONNECTION_UNAVAILABLE every ten minutes. In particular, I don’t recall what was in the “Identity” and “Credentials” sections, but whatever it was, it wasn’t anything obviously bad [to us] as it took quite a bit more investigation before we realized the signin state was bad.
,
Sep 5
We’re back out of business!
Screenshots attached, plus the specific excerpt from “Dump status” that I assume you were looking for:
{
"data": [ {
"is_valid": true,
"stat_name": "Requested Token",
"stat_value": "Monday, September 3, 2018 at 9:41:57 AM"
}, {
"is_valid": true,
"stat_name": "Received Token",
"stat_value": "Monday, September 3, 2018 at 9:41:57 AM"
}, {
"is_valid": true,
"stat_name": "Token Request Status",
"stat_value": "OK"
}, {
"is_valid": true,
"stat_name": "Next Token Request",
"stat_value": "not scheduled"
}],
"is_sensitive": false,
"title": "Credentials"
},
,
Sep 5
Hm, looking over the code some more, I think the NETWORK_CONNECTION_UNAVAILABLE error might be the key: It looks like on this error, we don't even get to the point of trying to acquire an access token. Which makes sense in itself, but the question is why we're getting a network error in the first place. Also weird is that gaia_account_tracker and profile_downloader seem to have the same problem.
,
Sep 5
Still, the Sync client does issue a request to the server. The attached request from chrome://net-internals corresponds to: 9/5/2018, 11:29:23 AM GetUpdates Response Received error: NETWORK_CONNECTION_UNAVAILABLE So the Sync client may not get to the point of requesting an access token, but it definitely gets *past* that point (and a 401, obviously).
,
Sep 5
,
Sep 5
The behavior of the services other than sync appears to be similar on 68, which is before crbug.com/869376 (see attachment). So I think that we can concentrate on sync for now.
,
Sep 5
Wrt comment #10, it's possible that the error here isn't related to access tokens at all. If sync isn't getting to the point of making a request with the access token and having it fail due to invalid credentials, then sync won't try to request a new access token. Hence it could end up in a state where it has an expired access token but no next access token fetch scheduled. Agreed with Marc that the pressing question is why we're getting the NETWORK_CONNECTION_UNAVAILABLE error.
,
Sep 5
(Re: comment #14, the behavior that I described is my understanding of sync's logic, at least).
,
Sep 5
It looks like NETWORK_CONNECTION_AVAILABLE is set only when the server_status is set to HttpResponse::CONNECTION_UNAVAILABLE (cf. https://cs.chromium.org/search/?q=%22NETWORK_CONNECTION_UNAVAILABLE%22&sq=package:chromium&type=cs). In turn, HttpResponse::CONNECTION_UNAVAILABLE is set in the following places: https://cs.chromium.org/search/?q=%22%3D+HttpResponse::CONNECTION_UNAVAILABLE%22&sq=package:chromium&type=cs Unsurprisingly, comments on server_connection_manager.h indicate that SYNC_AUTH_ERROR should be set in response to receiving a 401 error (https://cs.chromium.org/chromium/src/components/sync/engine_impl/net/server_connection_manager.h?type=cs&q=SYNC_AUTH_ERROR&sq=package:chromium&g=0&l=49). It's not at all obvious to me how we're getting from reception of the 401 error to setting HttpResponse::CONNECTION_UNAVAILABLE.
,
Sep 5
Marc, is it possible that https://chromium-review.googlesource.com/c/chromium/src/+/1051242 has affected the flow here in a way that tickles some weirdness? This method: https://cs.chromium.org/chromium/src/components/sync/engine_impl/sync_manager_impl.cc?sq=package:chromium&g=0&l=476 returns early if its call to SetAuthToken() returns false. SetAuthToken(), in turn, returns false if the passed-in auth token is empty: https://cs.chromium.org/chromium/src/components/sync/engine_impl/net/server_connection_manager.cc?q=SetAuthToken&sq=package:chromium&g=0&l=173. Thus, it's possible that not setting the auth token to "credentials_lost" in ProfileSyncService has some behavioral impact.
,
Sep 5
Thanks for the investigation so far! That CL does look plausibly related, though I still don't see how the empty token would result in a CONNECTION_UNAVAILABLE error. It looks like before that CL, we would have done a "scheduler_->OnCredentialsUpdated()" call that we're not doing anymore now, maybe that somehow got us out of the bad state again? But the connection to the connection error eludes me...
,
Sep 5
Cross-ref: bug 880633 is almost certainly a dupe, and the investigation there has found a likely culprit.
,
Sep 5
Whoohoo! Going to dupe this bug into that one, where the relevant discussion is happening.
,
Sep 11
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/05b9b976f63eb1f9fcca0e01c15a6eeeb719dd69 commit 05b9b976f63eb1f9fcca0e01c15a6eeeb719dd69 Author: Antonio Gomes <tonikitoo@igalia.com> Date: Tue Sep 11 17:10:45 2018 Fix sync logic failing with repeated instance of "network connection unavailable" This is a follow on of [1], where the sync::HttpBridge logic switched away from URLFetcher to SimpleURLLoader, as part of the network servicification effort. [1] https://crrev.com/c/1174655/ It turns out that the way sync::HttpBridge interpreted the response of an upload request changed after the migration: By default, SimpleURLLoader does not pass a valid response body in case of either HTTP or network errors. What is more, the new logic in HttpBridge::OnURLLoadComplete relied precisely on null-checking the responsebody to verify whether the upload was successful or not. This CL changes this particular check logic so that it matches the previous behavior: only actual network errors account for sync failures (which map to "network connection unavailable" errors) whereas HTTP errors (>= 400 && < 600) still account as a successful uploads. 401 specifically is handled to request expired user credentials in [2]. [2] https://cs.chromium.org/chromium/src/components/sync/engine_impl/net/sync_server_connection_manager.cc?l=84 BUG= 844968 , 878718 , 880633 Change-Id: I9d0b93b527f103149a7c78ba405dde27cae779f2 Reviewed-on: https://chromium-review.googlesource.com/1217122 Commit-Queue: Antonio Gomes <tonikitoo@igalia.com> Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Marc Treib <treib@chromium.org> Cr-Commit-Position: refs/heads/master@{#590363} [modify] https://crrev.com/05b9b976f63eb1f9fcca0e01c15a6eeeb719dd69/components/sync/engine/net/http_bridge.cc [modify] https://crrev.com/05b9b976f63eb1f9fcca0e01c15a6eeeb719dd69/components/sync/engine/net/http_bridge_unittest.cc
,
Sep 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bb324165da2232905e108c07ef42e288db468fac commit bb324165da2232905e108c07ef42e288db468fac Author: Antonio Gomes <tonikitoo@igalia.com> Date: Fri Sep 14 06:31:30 2018 Fix sync logic failing with repeated instance of "network connection unavailable" This is a follow on of [1], where the sync::HttpBridge logic switched away from URLFetcher to SimpleURLLoader, as part of the network servicification effort. [1] https://crrev.com/c/1174655/ It turns out that the way sync::HttpBridge interpreted the response of an upload request changed after the migration: By default, SimpleURLLoader does not pass a valid response body in case of either HTTP or network errors. What is more, the new logic in HttpBridge::OnURLLoadComplete relied precisely on null-checking the responsebody to verify whether the upload was successful or not. This CL changes this particular check logic so that it matches the previous behavior: only actual network errors account for sync failures (which map to "network connection unavailable" errors) whereas HTTP errors (>= 400 && < 600) still account as a successful uploads. 401 specifically is handled to request expired user credentials in [2]. [2] https://cs.chromium.org/chromium/src/components/sync/engine_impl/net/sync_server_connection_manager.cc?l=84 BUG= 844968 , 878718 , 880633 Change-Id: I9d0b93b527f103149a7c78ba405dde27cae779f2 Reviewed-on: https://chromium-review.googlesource.com/1217122 Commit-Queue: Antonio Gomes <tonikitoo@igalia.com> Reviewed-by: Matt Menke <mmenke@chromium.org> Reviewed-by: Marc Treib <treib@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#590363}(cherry picked from commit 05b9b976f63eb1f9fcca0e01c15a6eeeb719dd69) Reviewed-on: https://chromium-review.googlesource.com/1225490 Reviewed-by: Antonio Gomes <tonikitoo@igalia.com> Cr-Commit-Position: refs/branch-heads/3538@{#400} Cr-Branched-From: 79f7c91a2b2a2932cd447fa6f865cb6662fa8fa6-refs/heads/master@{#587811} [modify] https://crrev.com/bb324165da2232905e108c07ef42e288db468fac/components/sync/engine/net/http_bridge.cc [modify] https://crrev.com/bb324165da2232905e108c07ef42e288db468fac/components/sync/engine/net/http_bridge_unittest.cc |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by sfiera@chromium.org
, Aug 29