New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 793103 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

Variations updates can get wedged on inaccurate "network down" state

Project Member Reported by isherman@chromium.org, Dec 7 2017

Issue description

Chrome Version: 65.0.3284.0
OS Version: OS X 10.12.6

What steps will reproduce the problem?
1. Run Chrome for a while – say, days.  Leave your computer connected to the network during this time.
2. Look at chrome://histograms/Variations.ResourceRequestsAllowed

What is the expected result?
Almost all entries should be in bucket 0.

What happens instead of that?
Many entries are in bucket 4, which corresponds to "no network connection".

This appears to be a regression, introduced in August: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=50e708cc98036226289576b104819223

The corresponding changelog is: https://chromium.googlesource.com/chromium/src/+log/62.0.3193.0..62.0.3194.0?pretty=fuller&n=10000

The change that most jumps out is: https://chromium.googlesource.com/chromium/src/+/705f27524a76b857f61d9240f7e401f85fc8e9b1 – "Change ResourceRequestAllowedNotifier to use NetworkChangeObserver"

It seems like the ResourceRequestAllowedNotifier is getting wedged in a state where it believes there is no active network connection.  Somewhat oddly, this seems to be a Mac-specific issue.  Helen and Paul, you two seem like people who are quite familiar with the ResourceRequestAllowedNotifier and NetworkChangeObserver logic.  Does any of this sound familiar / would you be able to help me dive into this further?
 
ResourceRequestAllowedNotifier::OnNetworkChanged seems correct to me.
It's possible that somehow there is an online signal but |waiting_for_network_| is false.

I don't quite get why we need the |waiting_for_network_| boolean. Connection change signals can get lost on some platforms. 
In r496283, the OnNetworkChanged() signal is sent with type CONNECTION_NONE immediately prior to online signals, this will make waiting_for_network_ flip to true momentarily which could block resource requests, and it will also trigger an OnResourceRequestsAllowed() when transitioning from online to online, which the comment at the beginning of ResourceRequestAllowedNotifier::OnNetworkChanged() says isn't wanted.  I think we may want to add this to the beginning of ResourceRequestAllowedNotifier::OnNetworkChanged():

if (type == CONNECTION_NONE && !NetworkChangeNotifier::IsOffline())
  return;

It may also be a good idea to change most uses of waiting_for_network_ outside of ResourceRequestAllowedNotifier::OnNetworkChanged() to be calls to NetworkChangeNotifier::IsOffline()

Thank you both for chiming in! Re #2: Yes, I think those are both reasonable-sounding changes. Would you like me to send you a CL with those?
Sure.
Status: Started (was: Assigned)
I've uploaded a fix in https://chromium-review.googlesource.com/#/c/chromium/src/+/818527 – I realized after thinking about the code more deeply that Helen is probably correct, and we don't need the |waiting_for_network_| boolean at all. Thanks!
Project Member

Comment 6 by bugdroid1@chromium.org, Dec 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dcaadbb611d296ae6165897b8f5e760f4ff557c0

commit dcaadbb611d296ae6165897b8f5e760f4ff557c0
Author: Ilya Sherman <isherman@chromium.org>
Date: Mon Dec 11 19:09:24 2017

Don't attempt to cache network state in the ResourceRequestAllowedNotifier.

Caching the network state is dangerous, because a single missed network
connection update can indefinitely block all subsequent requests. Moreover,
there is no benefit to caching the state – it's just extra overhead.

R=pauljensen@chromium.org

Bug:  793103 
Change-Id: I21d6318b3b85fb7125cc53b8e3d54842c2ce7564
Reviewed-on: https://chromium-review.googlesource.com/818527
Reviewed-by: Paul Jensen <pauljensen@chromium.org>
Commit-Queue: Ilya Sherman <isherman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#523151}
[modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier.cc
[modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier.h
[modify] https://crrev.com/dcaadbb611d296ae6165897b8f5e760f4ff557c0/components/web_resource/resource_request_allowed_notifier_unittest.cc

Thanks for the fix!

Comment 8 by skare@chromium.org, Dec 11 2017

thanks, and +1 data point that it seems ok on windows (left a windows canary running for a few days)
Status: Fixed (was: Started)
The code should automatically recover from failures now.  I'll keep an eye on the timeline graphs to make sure that the observed failure rates do go down after this change.
Status: Verified (was: Fixed)
The timeline data shows that this fix did indeed seem to address the regression: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=48ae82c05f5867763abcf7195872d444

Sign in to add a comment