New issue
Advanced search Search tips

Issue 878077 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android , Windows
Pri: 1
Type: Bug



Sign in to add a comment

Event.Latency.ScrollBegin.Touch.TimeToScrollUpdateSwapBegin2 99th percentile regressed by almost 75% in M70

Project Member Reported by sahel@chromium.org, Aug 27

Issue description

The regression is very visible in Canary
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=8967b4696dc94c2c89e1d71ccaf467f9

The change range for Canary is(70.0.3521.0 - 70.0.3522.0):
https://chromium.googlesource.com/chromium/src/+log/70.0.3521.0..70.0.3522.0?pretty=fuller&n=10000


The regression is touch only, and only on scrollbegin (I did not see any regressions on update metrics in this range), the regression is visible both on Windows and Android.

Event.Latency.ScrollBegin.Touch.TimeToHandled2_Impl is the regressed submetric.

 
Issue 877478 has been merged into this issue.
Cc: xidac...@chromium.org
I wonder if this is caused by Xida's crash dump reporting.

If you look at a modified histogram of the % of crash reports throttled you see an uptick around the same time as well.

https://uma.googleplex.com/p/chrome/timeline_v2/?sid=0e2bb8579445eebba20de2518bfd64e4

Unfortunately these numbers don't report for Android but I wonder if the act of causing the crash report is the cause for the latency regressions.
It is possible.
After this CL was landed, there were more than 1000 DumpWithoutCrashing per day.
https://chromium-review.googlesource.com/c/chromium/src/+/1174696

Let's keep an eye on this. I have landed a CL to remove the DumpWithoutCrashing. It should go into the canary build tomorrow.
The regression range is 70.0.3521.0 - 70.0.3522.0 and https://chromium-review.googlesource.com/c/chromium/src/+/1174696 is first landed in 70.0.3524.0.
Owner: nzolghadr@chromium.org
Handing off to nzolghadr@ since I will be ooo.

Thanks Navid!
Components: Speed>Release
This seemed to have recovered completely in Canary channel during multiple releases until Aug 31st release (70.0.3535.2). But it is hard to pin point one range for it. At the same time the dev build only party recovered from 70.0.3529.0 to 70.0.3535.2.

https://chromium.googlesource.com/chromium/src/+log/70.0.3529.0..70.0.3535.2?pretty=fuller

This was puzzling from the beginning that dev regressed quite a bit more than the Canary build.
Just to add another data point: UKM data doesn't show a clear trend either here.
Cc: bokan@chromium.org
Owner: sahel@chromium.org
Adding Sahel in this bug again.
Issue 878610 has been merged into this issue.
As mentioned in comment #7: The regression on Canary has completely recovered. However, the dev regression is not fully recovered, checking UKM data does not show any regressions in either of the channels.

I also checked critique to find finch experiments that are on dev only and could not find any.
Since the regression has happened on M70, we have a few weeks to resolve the issue before stable release.

tdresser@ PTAL, do you have any ideas?
Cc: altimin@chromium.org
Here is the better UMA link here showing the regression in Canary and Dev and full recovery in Canary and partial recovery in Dev and also regression hitting the beta:

https://uma.googleplex.com/p/chrome/timeline_v2/?sid=f56f0bcc04f7facd86d2d64990a2ff07

altimin@ are you aware of any changes on the scheduling side or may finch trials that would look like this and affecting m70?
Plz note that D4 and C20 from the link in comment #14 both have version 70.0.3535.2 but different performances.
Cc: weiliangc@chromium.org sadrul@chromium.org
Only on ScrollBegin, but has nothing to do with main thread business...

It looks like it's pretty much only the overflow bucket that changed.

Sadrul or Wei, any ideas? Maybe something about compositor side hit testing?
New data shows that Dev has recovered as well:
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=fff471a0a40d06ea114948755069edee

I checked all the finch experiment since July 1st with min version 70 to see if they are expired or disabled later on and could not find anything.

I also compared the regressed and improved change lists both on dev and canary and could not find any obvious reverts.

UKM did not show any regression on Dev or Canary (probably large noises) but it shows the regression for Youtube on Beta:
https://uma.googleplex.com/ukm/timeline?sid=042a7622665cafd4af9ead4079e88aeb

I will try to locally reproduce the regression now that we have a URL.
sahel@: I just checked my CLs and I don't think my changes to touch action would impact this.

The only possible changes that could cause the regression is this CL that adds DumpWithoutCrashing:
https://chromium-review.googlesource.com/c/chromium/src/+/1174696
The above CL is in 70.0.3524.0

Then I remove all the DumpWithoutCrashing in this CL:
https://chromium-review.googlesource.com/c/chromium/src/+/1193745
The above CL is in 70.0.3536.0

The current beta channel is 70.0.3538.35, which includes my CL that removes the DumpWithoutCrashing. So my change won't cause this regression.
Awesome, glad we could find the regression in UKM on Beta!
I just had a chat with Navid, and realized this clarification on comment #17 is necessary:

The regression is not YouTube only, the desktop version of Facebook is another example that is affected. To a lesser degree it is also visible on Windows:
https://uma.googleplex.com/ukm/timeline?sid=ad207689a6ecfde2e65bd75e104d461f
Issue 884885 has been merged into this issue.
Status: Fixed (was: Assigned)
Good news here. Everything is recovered and whatever change it was it is now also merged into beta.
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=68fdb1f1305f86365bdeb61b02a703e8
I'm closing this for now but we can discuss it later in one of our meetings to add the lessons learned here somewhere.
Issue 886502 has been merged into this issue.

Sign in to add a comment