Regression in Event.Latency.TouchToFirstScrollUpdateSwapBegin from M48 to M49 |
|||||
Issue descriptionIn the median: - GT-I9500, GT-I9505, Nexus 5X, Nexus 6P all regressed by ~1.3ms - Nexus 5 and Nexus 6 improved by ~0.75ms In the 99th percentile: - GT-I9505 regressed by ~50ms - GT-I9500, Nexus 5X, Nexus 6P, Nexus 5 and Nexus 6 improved by ~50ms Telemetry shows a regression in first_gesture_scroll_update_latency which has now recovered, which overlaps with when M49 was branched. It's possible we should just wait until 50 hits beta to see if this is resolved. Any thoughts on what this could be?
,
Mar 15 2016
Hit send too soon. I meant to say it's weird there was opposite changes in the 99th percentile for different devices. Are we sure those numbers are statistically meaningful at the 99th percentile? (I sort of hope they are but I've been lead astray before by randomness).
,
Mar 15 2016
Right, I suspect the first_gesture_scroll_update_latency was also caused by us turning off expensive task blocking. Incidentally I've been trying to find out if some metrics show movement from the finch experiment that enables that blocking again. So far it's been very inconclusive, but maybe we can find the numbers corresponding to the devices listed here. Here's GT-I9505 for example: go/piwfx -- Tim does that look similar to the regression you saw?
,
Mar 15 2016
These numbers are statistically significant at the 99th percentile (the timeline view shows relatively stable numbers). GT-I505 in UMA in general regressed by 80ms, compared to a ~15ms regression from the Finch Trial. It doesn't look like this accounts for the whole regression.
,
Mar 16 2016
Something that might explain the difference is that the new policy is less aggressive than the one we disabled in M49. For example it doesn't block anything during main thread driven gestures anymore. We might be able to bring it back with more heuristics, but I'm not sure if it's worth the complexity.
,
Mar 16 2016
Is my understanding of the timeline here correct? M46: No task blocking M47: Aggressive task blocking (merged back from M48) M48: Aggressive task blocking M49: Less aggressive task blocking behind a finch trial M50: Less aggressive task blocking behind a finch trial
,
Mar 16 2016
Almost, M48 had the task blocking turned off.
,
Mar 21 2016
In that case, from M48 to M49, we went from no task blocking to non-aggressive task blocking behind a finch trial. How would that result in a regression in M49? The relevant UMA graph is here (internal only): https://uma.googleplex.com/p/chrome/timeline_v2/?q=%7B%22day_count%22%3A%22All%22%2C%22end_date%22%3A%22latest%22%2C%22window_size%22%3A%221%22%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%224%22%5D%7D%2C%7B%22fieldId%22%3A%22version_tags%22%2C%22operator%22%3A%22CONTAINS%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22D%22%5D%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22A%22%5D%7D%2C%7B%22fieldId%22%3A%22short_hw_class%22%2C%22operator%22%3A%22COMPARE%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22Nexus%205%22%2C%22Nexus%205X%22%2C%22Nexus%206%22%2C%22Nexus%206P%22%2C%22GT-I9500%22%2C%22GT-I9505%22%2C%22SM_G900H%22%5D%7D%5D%2C%22histograms%22%3A%5B%22Event.Latency.TouchToFirstScrollUpdateSwapBegin%22%2C%22Event.Latency.TouchToScrollUpdateSwapBegin%22%5D%2C%22default_entry_values%22%3A%7B%22measureModel%22%3A%7B%22measure%22%3A%22%22%2C%22buckets%22%3A%5B%5D%2C%22percentiles%22%3A%5B%2250%22%5D%2C%22selectedFormulas%22%3A%5B%5D%2C%22allFormulas%22%3A%5B%5D%7D%2C%22zeroBased%22%3Atrue%2C%22logScale%22%3Afalse%2C%22showLowVolumeData%22%3Afalse%2C%22showVersionAnnotations%22%3Atrue%7D%2C%22entries%22%3A%5B%7B%22measureModel%22%3A%7B%22measure%22%3A%22percentile%22%2C%22percentiles%22%3A%5B%2299%22%5D%7D%2C%22zeroBased%22%3Afalse%7D%2C%7B%22measureModel%22%3A%7B%22measure%22%3A%22percentile%22%7D%7D%5D%7D
,
Mar 23 2016
Sami, I'm not clear on how this regression could be related to the scheduler change. If M48 didn't have task blocking, how could the M48-M49 regression be related to task blocking?
,
Mar 23 2016
Right, I think I need to double check the dates here. M48 had blocking, but it was reverted once we found the problems. If we're comparing M48 before the revert to M49 then there could be a difference. Can we figure out revision ranges for the regressions? (I'll soon be OoO the rest of the week but I'll return to this next week.)
,
Mar 23 2016
On some devices there is some movement during M48, but performance appears to be improving over time, not regressing. Performance in M49 is worse than the worst performance in M48 though. We see M48 become dominant on Feb 4, and then its performance gradually improves until Feb 24, and then it's about flat until March 12, when M49 becomes dominant.
,
Apr 7 2016
Any status updates here? Not sure if you guys are investigating off-thread.
,
Apr 7 2016
Sorry, this has been on my backlog for a while. I'll try to get to it.
,
Apr 14 2016
These are the ranges for the two regressions: --- First (smaller) regression from 47.0.2526.83...48.0.2564.95 I'm guessing this is the difference of M47 having task blocking and it being disabled in M48. --- Second regression from 48.0.2564.95...49.0.2623.91 No clear culprit for this one :( There are a number of changes in the display scheduler, cc scheduler and the Blink scheduler, but none of them seem obviously problematic. I took some traces of scrolling at ToT and the only problem I saw had to do with jank caused by the long press touch highlight rect that sometimes activated when I saw trying to scroll (trace attached). However M48 has the same problem. I also compared M48 and ToT with a high speed camera on a couple of sites but couldn't see any quantifiable difference. Deep reports might be the next thing to look at here.
,
May 10 2016
This recovers in M50. https://uma.googleplex.com/p/chrome/timeline_v2/?q=%7B%22day_count%22%3A%22All%22%2C%22end_date%22%3A%22latest%22%2C%22window_size%22%3A%221%22%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%223%22%5D%7D%2C%7B%22fieldId%22%3A%22version_tags%22%2C%22operator%22%3A%22CONTAINS%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22D%22%5D%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22A%22%5D%7D%5D%2C%22histograms%22%3A%5B%5B%22Event.Latency.TouchToFirstScrollUpdateSwapBegin%22%5D%5D%2C%22default_entry_values%22%3A%7B%22measureModel%22%3A%7B%22measure%22%3A%22%22%2C%22buckets%22%3A%5B%5D%2C%22percentiles%22%3A%5B%2250%22%5D%2C%22selectedFormulas%22%3A%5B%5D%2C%22allFormulas%22%3A%5B%5D%7D%2C%22zeroBased%22%3Atrue%2C%22logScale%22%3Afalse%2C%22showLowVolumeData%22%3Afalse%2C%22showVersionAnnotations%22%3Atrue%7D%2C%22entries%22%3A%5B%7B%22measureModel%22%3A%7B%22measure%22%3A%22percentile%22%7D%2C%22zeroBased%22%3Afalse%7D%5D%7D https://uma.googleplex.com/p/chrome/timeline_v2/?q=%7B%22day_count%22%3A%22All%22%2C%22end_date%22%3A%22latest%22%2C%22window_size%22%3A%221%22%2C%22filters%22%3A%5B%7B%22fieldId%22%3A%22channel%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%224%22%5D%7D%2C%7B%22fieldId%22%3A%22version_tags%22%2C%22operator%22%3A%22CONTAINS%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22D%22%5D%7D%2C%7B%22fieldId%22%3A%22platform%22%2C%22operator%22%3A%22EQ%22%2C%22study%22%3A%22%22%2C%22selected%22%3A%5B%22A%22%5D%7D%5D%2C%22histograms%22%3A%5B%5B%22Event.Latency.TouchToFirstScrollUpdateSwapBegin%22%5D%5D%2C%22default_entry_values%22%3A%7B%22measureModel%22%3A%7B%22measure%22%3A%22%22%2C%22buckets%22%3A%5B%5D%2C%22percentiles%22%3A%5B%2250%22%5D%2C%22selectedFormulas%22%3A%5B%5D%2C%22allFormulas%22%3A%5B%5D%7D%2C%22zeroBased%22%3Atrue%2C%22logScale%22%3Afalse%2C%22showLowVolumeData%22%3Afalse%2C%22showVersionAnnotations%22%3Atrue%7D%2C%22entries%22%3A%5B%7B%22measureModel%22%3A%7B%22measure%22%3A%22percentile%22%7D%2C%22zeroBased%22%3Afalse%7D%5D%7D I did a bit more digging - this only occurs for Samsung device, so we don't have much telemetry coverage, but we do have some: https://chromeperf.appspot.com/report?sid=6779fd0c536d8e89c211a6a93213a351d3267d39655b0a46061da3a4bf189f87&start_rev=311770&end_rev=392588
,
May 17 2016
,
May 31 2016
Event.Latency.Browser.TouchAcked didn't regress, but Event.Latency.ScrollUpdate.TouchToHandled_Main did regress. This implies that it's the actual main thread scrolling machinery (or scheduling) that regressed.
,
Jul 29 2016
Marking WontFix, as this is old, and has recovered.
,
Aug 22 2016
,
Aug 25 2016
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by alexclarke@chromium.org
, Mar 15 2016