Variations updates can get wedged on inaccurate "network down" state |
||||
Issue descriptionChrome Version: 65.0.3284.0 OS Version: OS X 10.12.6 What steps will reproduce the problem? 1. Run Chrome for a while – say, days. Leave your computer connected to the network during this time. 2. Look at chrome://histograms/Variations.ResourceRequestsAllowed What is the expected result? Almost all entries should be in bucket 0. What happens instead of that? Many entries are in bucket 4, which corresponds to "no network connection". This appears to be a regression, introduced in August: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=50e708cc98036226289576b104819223 The corresponding changelog is: https://chromium.googlesource.com/chromium/src/+log/62.0.3193.0..62.0.3194.0?pretty=fuller&n=10000 The change that most jumps out is: https://chromium.googlesource.com/chromium/src/+/705f27524a76b857f61d9240f7e401f85fc8e9b1 – "Change ResourceRequestAllowedNotifier to use NetworkChangeObserver" It seems like the ResourceRequestAllowedNotifier is getting wedged in a state where it believes there is no active network connection. Somewhat oddly, this seems to be a Mac-specific issue. Helen and Paul, you two seem like people who are quite familiar with the ResourceRequestAllowedNotifier and NetworkChangeObserver logic. Does any of this sound familiar / would you be able to help me dive into this further?
,
Dec 8 2017
In r496283, the OnNetworkChanged() signal is sent with type CONNECTION_NONE immediately prior to online signals, this will make waiting_for_network_ flip to true momentarily which could block resource requests, and it will also trigger an OnResourceRequestsAllowed() when transitioning from online to online, which the comment at the beginning of ResourceRequestAllowedNotifier::OnNetworkChanged() says isn't wanted. I think we may want to add this to the beginning of ResourceRequestAllowedNotifier::OnNetworkChanged(): if (type == CONNECTION_NONE && !NetworkChangeNotifier::IsOffline()) return; It may also be a good idea to change most uses of waiting_for_network_ outside of ResourceRequestAllowedNotifier::OnNetworkChanged() to be calls to NetworkChangeNotifier::IsOffline()
,
Dec 9 2017
Thank you both for chiming in! Re #2: Yes, I think those are both reasonable-sounding changes. Would you like me to send you a CL with those?
,
Dec 9 2017
Sure.
,
Dec 9 2017
I've uploaded a fix in https://chromium-review.googlesource.com/#/c/chromium/src/+/818527 – I realized after thinking about the code more deeply that Helen is probably correct, and we don't need the |waiting_for_network_| boolean at all. Thanks!
,
Dec 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dcaadbb611d296ae6165897b8f5e760f4ff557c0 commit dcaadbb611d296ae6165897b8f5e760f4ff557c0 Author: Ilya Sherman <isherman@chromium.org> Date: Mon Dec 11 19:09:24 2017 Don't attempt to cache network state in the ResourceRequestAllowedNotifier. Caching the network state is dangerous, because a single missed network connection update can indefinitely block all subsequent requests. Moreover, there is no benefit to caching the state – it's just extra overhead. R=pauljensen@chromium.org Bug: 793103 Change-Id: I21d6318b3b85fb7125cc53b8e3d54842c2ce7564 Reviewed-on: https://chromium-review.googlesource.com/818527 Reviewed-by: Paul Jensen <pauljensen@chromium.org> Commit-Queue: Ilya Sherman <isherman@chromium.org> Cr-Commit-Position: refs/heads/master@{#523151} [modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier.cc [modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier.h [modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier_unittest.cc
,
Dec 11 2017
Thanks for the fix!
,
Dec 11 2017
thanks, and +1 data point that it seems ok on windows (left a windows canary running for a few days)
,
Dec 11 2017
The code should automatically recover from failures now. I'll keep an eye on the timeline graphs to make sure that the observed failure rates do go down after this change.
,
Dec 15 2017
The timeline data shows that this fix did indeed seem to address the regression: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=48ae82c05f5867763abcf7195872d444 |
||||
►
Sign in to add a comment |
||||
Comment 1 by xunji...@chromium.org
, Dec 8 2017