New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 760656 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Regression in Net.HttpTimeToFirstByte on Android Canary

Project Member Reported by mmenke@chromium.org, Aug 30 2017

Issue description

Cc: alexilin@chromium.org
+alexilin it looks like https://chromium-review.googlesource.com/c/chromium/src/+/612380 is in the regression range, would you please take a look?

Note that PageLoad timing metrics are similarly being (slightly) affected:
https://uma.googleplex.com/timeline_v2?sid=fb52c44fb02ed80444c96321f71973a2 in what appears to be the same regression range.
Hm, actually looking at the split, I think the regression is between 3194 and 3196
https://uma.googleplex.com/timeline_v2?sid=54daa7a249e61d75cb5e68bb46dc0f7a

There is the related CL https://chromium-review.googlesource.com/c/chromium/src/+/628522 which could be a cause but I haven't looked into it.


Cc: xunji...@chromium.org
Here's the regression range I think the culprit is in:
https://chromium.googlesource.com/chromium/src/+log/62.0.3194.0..62.0.3196.0?pretty=fuller&n=10000
Given that this CL also triggers some DCHECKs ( Issue 757458 ), we probably should do a speculative revert and see if the metrics drop.
OK I'll go ahead
It appears to be dropping back to normal on its own:
https://uma.googleplex.com/timeline_v2?sid=6797a8ea8a5c1b0ef8f5e3899139334f
Cc: zhongyi@chromium.org
It looks like 3198 is showing OK metrics.
https://uma.googleplex.com/timeline_v2?sid=0ec6690ef4f3cc0c48d33d861aedddcb

It's looking like returning to the previous median but I don't see anything in the range between 3196 and 3198 to explain it like a revert of something that landed in 3196.

It's possible there was an optimization in 3198 that skewed these metrics. Maybe r497288? +zhongyi


BTW in 3196 it looks like all the weight came from the 0ms bucket. It has ~5x fewer counts than in other versions.
Cc: ckrasic@chromium.org
The change I landed in r497288 was to delay TCP if we are on the startup and QUIC requires confirmation, which would lead to more usage of QUIC. I could this change affects Android more as Android restarts more often. If QUIC improves this metric significantly, it's possible that this change might lead to the drop. However, I didn't see a huge difference between QUIC vs non-QUIC in finch experiments: https://uma.googleplex.com/p/chrome/variations/?sid=ca3bf47717059683ee8445e08bddfbfe. 

I remembered Buck mentioned hanging gets affect metrics like HttpJob.Totaltime*, could this also be affected by that? +ckrasic
The first landed CL https://chromium-review.googlesource.com/c/chromium/src/+/612380 could affect the metrics but it was landed in 3194 which is OK so it's hardly a reason.

The CL https://chromium-review.googlesource.com/c/chromium/src/+/628522 is unlikely to be the cause of the regression because the metrics dropped back in 3198 before the revert happened in 3201 + the change isn't really relevant to the problem.
I'm planning to reland this CL with the DCHECKs issue fix.

#11, SGTM. I'm going to keep an eye on the metrics to make sure the regression really is going down in 3198.

Still, very mysterious :)
This spike is also within historical norms: https://uma.googleplex.com/timeline_v2?sid=72477afab914a05ef29d15d69eaadd8d

I think it'd be worth having links to 365 day (all versions) charts in chirp reports, as I typically find that most chirp reports don't fall outside of historical norms.
It doesn't really look within historical norms to me, especially if you look at 1 day aggregation.

The previous highest median was ~m58 at 130ms. The peak of this spike is 158ms. That's over a 20% increase.
Ah, I had somehow had canary+dev in my link. I agree, on canary it's abnormal: https://uma.googleplex.com/timeline_v2?sid=2d6e13a994dd15daea31388168ec1466


DNS.AttemptSuccessDuration seems to have gone back down:
https://uma.googleplex.com/timeline_v2?sid=f2bcafe4d096a5fb6fcbec3f058f6eb1

Net.HttpTimeToFirstByte has certainly recovered, but it's not clear if it's quite returned to historical norms yet? I suspect it has, but maybe we should wait for a bit more data to be sure.
https://uma.googleplex.com/timeline_v2?sid=2b9e65e29d344fbe7720a3d1ae22cd23

Comment 17 by rch@chromium.org, Sep 25 2017

Status: WontFix (was: Untriaged)
Both have now recovered, thankfully

Sign in to add a comment