New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 610436 link

Starred by 3 users

Issue metadata

Status: Archived
Owner: ----
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug



Sign in to add a comment

Spike in EMPTY_RESPONSE on Android all channels. (maybe cros beta too).

Project Member Reported by mattm@chromium.org, May 9 2016

Issue description

Net.ErrorCodesForMainFrame3 EMPTY_RESPONSE spikes on android starting around may 4th. Seems to occur on dev, beta, and stable, so maybe related to a finch trial? Cros beta also shows a spike, but the graph looks a bit noiser, dunno if related.

stable: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%224%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D
beta: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%223%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D
dev: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%222%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D
(cros beta https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%223%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22C%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2247%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D)

 
Labels: -OS-Chrome
The spike in CrOS seems to have gone away. Android's is still there though.
There does not appear to be a correlation with versions. Net.ErrorCodesForHTTPSGoogleMainFrame2 shows a corresponding but much less pronounced rise.
Interestingly the corresponding rise on Net.ErrorCodesForHTTPSGoogleMainFrame2 is not present on beta. It also isn't present on Net.ErrorCodesForSubresources2 or Net.ErrorCodesForImages in stable.

I'm not sure what to make of this. That it occurs at the same time on all channels means it's not a code change. That it's not on subresources says to me it's easily reproducible when it occurs, or it is actually correlated with main frame resources somehow. (If we assume most subresources are first-party, we would expect reliable errors against a host to not show up since we won't load the main frame. Whereas something which occurs intermittently against a host would fire roughly equally on top-levels and subresources since (1-p) * p and p are close for small p.)

If some particular server made a change, I would expect either Net.ErrorCodesForHTTPSGoogleMainFrame2 to show it much more visibly (if its our servers) or not at all (if it's someone else's servers). Instead, I see it being some combination of not visible to visible but not clearly.

If I show data over the past year, the Google-only graph is much noisier. I'm tempted to guess it does not significantly affect our servers... yet it's too aligned to be a coincidence.

One possibility is that it somehow only occurs on non-QUIC and the muted Google-only graph is because most of our connections use QUIC. However, there was a recent dip in Net.QuicSession.QuicVersion's total count with no corresponding change in the other histogram, which I think I'd have expected if that were the case.

Perhaps the stable-only change in ErrorCodesForHTTPSGoogleMainFrame2 is just a coincidence (that graph is quite noisy) and it's really just some random large server which started acting up.

Comment 4 by mmenke@chromium.org, Jun 23 2016

And the spike on Android seems to be going away...But we still have a spike on Linux.

Note that one unique thing about main frame loads is that they're probably most likely to occur on connections that have been idle for a while, though we *do* retry on ERR_EMPTY_RESPONSE on reused sockets.

I wonder if it would be worth breaking down these histograms by protocol, though a lot of errors happen before we even know if we're talking HTTP2 or HTTP/1.x.  And with QUIC races, if both QUIC and HTTP fail, not sure if we'd consistently pick the same one to blame.

Comment 5 by mmenke@chromium.org, Jun 27 2016

Hrm...I take that back about the Android spike going away - must have been looking at the wrong graph.
Cc: pauljensen@chromium.org
Things are taking another bump ~June 20. I thought maybe this could be from captive portals but none of the captive portal metrics show abnormalities. ccing pauljenson@.
June 20th bump seems to be subsiding.
Still haven't returned to the previous level, and it looks like the Google-specific EMPTY_RESPONSE is on the rise, though doesn't seem to be correlated with the ErrorCodesForMainFrame graph.
There is a sudden spikes up/down in M53 - goes up to 0.03%.

https://uma.googleplex.com/timeline_v2?sid=3841713ead52bd3b5366e9c5600372c4
Cc: cbentzel@chromium.org
Labels: -Pri-2 Pri-3
Doing the numbers, about 1/100th of the errors that users get are EMPTY_RESPONSE.  My inclination is that a spike on that response at this level isn't worth putting developer time into.  Please argue if you disagree.

Status: Available (was: Untriaged)
This may be related to  issue 638712  (similar spike on ChromeOS).

Also transition to Available to get off triager's radar.

Project Member

Comment 12 by sheriffbot@chromium.org, Aug 25 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Archived (was: Untriaged)
This thread has been inactive for a while. Given histogram data is time sensitive, archive this one. 

Sign in to add a comment