Spike in EMPTY_RESPONSE on Android all channels. (maybe cros beta too). |
||||||
Issue descriptionNet.ErrorCodesForMainFrame3 EMPTY_RESPONSE spikes on android starting around may 4th. Seems to occur on dev, beta, and stable, so maybe related to a finch trial? Cros beta also shows a spike, but the graph looks a bit noiser, dunno if related. stable: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%224%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D beta: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%223%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D dev: https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%222%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22A%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2248%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D (cros beta https://uma.googleplex.com/timeline_v2?q=%7B%22day_count%22%3A%2247%22%2C%22end_date%22%3A%222016%2F05%2F05%22%2C%22entries%22%3A%5B%7B%22bucket%22%3A%22EMPTY_RESPONSE%22%2C%22logScale%22%3Afalse%2C%22measure%22%3A%22bucketProp%22%2C%22percentile%22%3A%2250%22%2C%22showLowVolumeData%22%3Atrue%2C%22zeroBased%22%3Afalse%7D%5D%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%223%22%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%22C%22%7D%2C%7B%22fieldId%22%3A%22milestone%22%2C%22operator%22%3A%22GE%22%2C%22selected%22%3A%5B%5D%2C%22value%22%3A%2247%22%7D%5D%2C%22histograms%22%3A%5B%22Net.ErrorCodesForMainFrame3%22%5D%2C%22window_size%22%3A3%7D)
,
Jun 22 2016
There does not appear to be a correlation with versions. Net.ErrorCodesForHTTPSGoogleMainFrame2 shows a corresponding but much less pronounced rise.
,
Jun 22 2016
Interestingly the corresponding rise on Net.ErrorCodesForHTTPSGoogleMainFrame2 is not present on beta. It also isn't present on Net.ErrorCodesForSubresources2 or Net.ErrorCodesForImages in stable. I'm not sure what to make of this. That it occurs at the same time on all channels means it's not a code change. That it's not on subresources says to me it's easily reproducible when it occurs, or it is actually correlated with main frame resources somehow. (If we assume most subresources are first-party, we would expect reliable errors against a host to not show up since we won't load the main frame. Whereas something which occurs intermittently against a host would fire roughly equally on top-levels and subresources since (1-p) * p and p are close for small p.) If some particular server made a change, I would expect either Net.ErrorCodesForHTTPSGoogleMainFrame2 to show it much more visibly (if its our servers) or not at all (if it's someone else's servers). Instead, I see it being some combination of not visible to visible but not clearly. If I show data over the past year, the Google-only graph is much noisier. I'm tempted to guess it does not significantly affect our servers... yet it's too aligned to be a coincidence. One possibility is that it somehow only occurs on non-QUIC and the muted Google-only graph is because most of our connections use QUIC. However, there was a recent dip in Net.QuicSession.QuicVersion's total count with no corresponding change in the other histogram, which I think I'd have expected if that were the case. Perhaps the stable-only change in ErrorCodesForHTTPSGoogleMainFrame2 is just a coincidence (that graph is quite noisy) and it's really just some random large server which started acting up.
,
Jun 23 2016
And the spike on Android seems to be going away...But we still have a spike on Linux. Note that one unique thing about main frame loads is that they're probably most likely to occur on connections that have been idle for a while, though we *do* retry on ERR_EMPTY_RESPONSE on reused sockets. I wonder if it would be worth breaking down these histograms by protocol, though a lot of errors happen before we even know if we're talking HTTP2 or HTTP/1.x. And with QUIC races, if both QUIC and HTTP fail, not sure if we'd consistently pick the same one to blame.
,
Jun 27 2016
Hrm...I take that back about the Android spike going away - must have been looking at the wrong graph.
,
Jun 29 2016
Things are taking another bump ~June 20. I thought maybe this could be from captive portals but none of the captive portal metrics show abnormalities. ccing pauljenson@.
,
Jun 29 2016
June 20th bump seems to be subsiding.
,
Aug 11 2016
Still haven't returned to the previous level, and it looks like the Google-specific EMPTY_RESPONSE is on the rise, though doesn't seem to be correlated with the ErrorCodesForMainFrame graph.
,
Aug 19 2016
There is a sudden spikes up/down in M53 - goes up to 0.03%. https://uma.googleplex.com/timeline_v2?sid=3841713ead52bd3b5366e9c5600372c4
,
Aug 24 2016
Doing the numbers, about 1/100th of the errors that users get are EMPTY_RESPONSE. My inclination is that a spike on that response at this level isn't worth putting developer time into. Please argue if you disagree.
,
Aug 24 2016
This may be related to issue 638712 (similar spike on ChromeOS). Also transition to Available to get off triager's radar.
,
Aug 25 2017
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 29 2017
This thread has been inactive for a while. Given histogram data is time sensitive, archive this one. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by davidben@chromium.org
, Jun 22 2016