New issue
Advanced search Search tips

Issue 883153 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Nov 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

100% regression in Event.Latency.ScrollUpdate.Touch.TimeToScrollUpdateSwapBegin4

Project Member Reported by nzolghadr@chromium.org, Sep 12

Issue description

Owner: sadrul@chromium.org
Status: Assigned (was: Available)
Sadrul, regressions seems to be also happening for Event.Latency.ScrollUpdate.Touch.BrowserNotifiedToBeforeGpuSwap2.

Does anything catch your eyes in the regression range?
I went over the regression range mentioned in OP. It doesn't look like there were many cc or viz related changes in there. So it's kind of difficult to tell. Was there some finch trials happening around that time?

Looking at this range: https://chromium.googlesource.com/chromium/src/+log/70.0.3537.0..70.0.3538.2?pretty=fuller There are some interesting changes in there, including:
 . https://chromium-review.googlesource.com/1197334
 . https://chromium-review.googlesource.com/1157874

I don't think the InProcessCommandBuffer is used at the moment on andorid, so the second CL shouldn't be related. The first CL seems to be a reland ... but it's in close-by areas, and perhaps something changed between revert and reland?

Is it possible to bisect?
Cc: bokan@chromium.org
https://chromium-review.googlesource.com/1197334 is a clean up patch for removing dead renderer related fling code.This cannot be the root cause since latency info for GSUs generated from fling are excluded from ScrollUpdate metrics and are measured in ScrollInertial metrics instead.
Labels: -Pri-3 Pri-2
Sadrul, since this happened on 70 I was wondering if you can take a closer look as BrowserNotifiedToBeforeGpuSwap2 has regressed here.
Labels: -Pri-2 Pri-1
Just to add a new UMA url to show the problem better:
https://uma.googleplex.com/p/chrome/timeline_v2?sid=b032034cd7a8355682ee57662023a2ba

It seems that although we see the regression in Canary and Dev on 70 it is not shown in Beta. Sadrul is there like a Finch trial or something that you might be aware of?
Issue 889244 has been merged into this issue.
This hasn't recovered yet and we are about to hit the stable this week. Sadrul how do you think we should proceed here.
Cc: khushals...@chromium.org
I don't know of any finch trials that would affect this.

Looking at this graph: https://uma.googleplex.com/p/chrome/timeline_v2?sid=1f759a351b665dbaf809e4864f0efaae

For the beta-channel, I looked at the diff between B10 (70.0.3538.17) and B11 (70.0.3538.27): https://chromium.googlesource.com/chromium/src/+log/70.0.3538.17..70.0.3538.27?pretty=fuller I don't see any obvious candidates.

+khushal in case 8f470e057e10a730e9e81bafb7266367071b10a8 is relevant.

+nzolghadr@ any chance 05a520420b86ce02f12addfa8f35c50832b903c5 is relevant?

Is this something that repros locally on those builds? If it's possible to repro the regression locally (like sahel@ had done for the previous bug in  issue 867581 ), it would be easier to bisect, investigate etc.
Hmmm, I'd be surprised if https://chromium.googlesource.com/chromium/src/+/8f470e057e10a730e9e81bafb7266367071b10a8 was related. It adds a lock to protect access to an object, but other than Webview it should always be used from the GPU main thread. So the lock should be practically free.

Another thing is that the code only affects OOP raster, which is being rolled out using finch. So checking whether the regression only affects the population in the experiment would help narrow down whether this is related.
khushalsagar@ can you provide the name of the finch to take a look or the link if you have it handy.
It doesn't work with the version tag dominant feature. If I look at the timeline without it though, looks like both experiment groups have identical changes.

https://uma.googleplex.com/p/chrome/timeline_v2/?sid=733f3ab4d73a7c9e25f8b53e3471bac4
sahel@: any luck repro-ing locally on device?
Cc: sadrul@chromium.org
Owner: sahel@chromium.org
--> sahel@ do you think you would be able to attempt a repro locally? That would be very useful.
Owner: nzolghadr@chromium.org
Comment #15: I am sorry I missed this earlier. I am already working on a bug that should be fixed for 70 and is missing tomorrow's 10% :  crbug.com/884728 
I don't think I will have the time for this repo.

Talked to nzolghadr@ and he is gonna investigate.
Just to answer to some of the previous comments:

khushalsagar@ I don't know about the current state of OOP raster and what finch trial number is the correct group number for the latest experiment. But there seems to be some significant jump in the metrics in the dev channel for group 2:
https://uma.googleplex.com/p/chrome/variations/?sid=7a31a0e4ec04ad40fb48e7003f096b5a

Although the effect it much less in beta. It is worth to keep an eye on it. Feel free to cc me on anything that you find related.

sadrul@ I think the beta range you looked at is too wide and is giving you some unrelated CLs. Looking at this:
https://uma.googleplex.com/p/chrome/timeline_v2?sid=5f4241e0dfcfb82151c09d53fd622677

the widest possible imaginable regression range in Canary is somewhere between c29 and c31 which is
https://chromium.googlesource.com/chromium/src/+log/70.0.3537.0..71.0.3545.0?pretty=fuller

which doesn't have either of my change or Khushal's changes.


Still looking to find the problem here...



Owner: sadrul@chromium.org
From the ukm data it seems that the regression is very visible in m.facebook website.  I used our galaxy note 5 and was able to get to these and here are the results (99% of the Event.Latency.ScrollUpdate.Touch.BrowserNotifiedToBeforeGpuSwap2 for the two versions I tested locally. The latency is in micro second.

****  70.0.3538.17:

with 498 samples in experiment 1:
42650  {99.0%}

with 1542 samples in experiment 2:
38193  {99.2%}


****  70.0.3538.27:

with 795 samples in experiment 1:
53185 {98.9%}

with 1651 samples in experiment 2:
143622 {99.0%}


There are obviously quite noisy but I believe it is more or less obvious that 70.0.3538.27 has some sort of regression.



Status: Started (was: Assigned)

Comment 20 Deleted

Sadrul, the regressions seems to be recovered:

https://uma.googleplex.com/p/chrome/timeline_v2?sid=86cb071e0f99e9e05d496b3a23efb160

Are you aware of any changes happening in that area? For some reason even though it lasted for the whole beta M70 and only got fixed in beta M71, stable m70 does not show any similar regression. So this might be some what of a version guarded finch as well.

Is that alright to close this now?
Status: WontFix (was: Started)
I think we can close now, yes. I will WontFix as does-not-repro.

I am not sure what triggered the initial regression, or what fixed it, unfortunately. I think having corresponding metrics in telemetry (issue 878420) would make testing/debugging this kind of regression much easier.

Sign in to add a comment