100% regression in Event.Latency.ScrollUpdate.Touch.TimeToScrollUpdateSwapBegin4 |
||||||||||
Issue descriptionThere seems to be a regression coming back in from 70.0.3537.0 to 70.0.3538.2 and more in the next version (71.0.3545.0). https://uma.googleplex.com/p/chrome/timeline_v2?sid=9362326e8120a91fbb887296134aab40 Here is the Canary breakdown splitted by the versions: https://uma.googleplex.com/p/chrome/timeline_v2?sid=35be6ef7850e96d62819fd35def97fc2#Event.Latency.ScrollUpdate.Touch.BrowserNotifiedToBeforeGpuSwap2 Here is the change range: https://chromium.googlesource.com/chromium/src/+log/70.0.3537.0..71.0.3545.0?pretty=fuller
,
Sep 12
I went over the regression range mentioned in OP. It doesn't look like there were many cc or viz related changes in there. So it's kind of difficult to tell. Was there some finch trials happening around that time? Looking at this range: https://chromium.googlesource.com/chromium/src/+log/70.0.3537.0..70.0.3538.2?pretty=fuller There are some interesting changes in there, including: . https://chromium-review.googlesource.com/1197334 . https://chromium-review.googlesource.com/1157874 I don't think the InProcessCommandBuffer is used at the moment on andorid, so the second CL shouldn't be related. The first CL seems to be a reland ... but it's in close-by areas, and perhaps something changed between revert and reland? Is it possible to bisect?
,
Sep 12
,
Sep 12
https://chromium-review.googlesource.com/1197334 is a clean up patch for removing dead renderer related fling code.This cannot be the root cause since latency info for GSUs generated from fling are excluded from ScrollUpdate metrics and are measured in ScrollInertial metrics instead.
,
Sep 26
Sadrul, since this happened on 70 I was wondering if you can take a closer look as BrowserNotifiedToBeforeGpuSwap2 has regressed here.
,
Oct 1
Just to add a new UMA url to show the problem better: https://uma.googleplex.com/p/chrome/timeline_v2?sid=b032034cd7a8355682ee57662023a2ba It seems that although we see the regression in Canary and Dev on 70 it is not shown in Beta. Sadrul is there like a Finch trial or something that you might be aware of?
,
Oct 4
Issue 889244 has been merged into this issue.
,
Oct 9
This hasn't recovered yet and we are about to hit the stable this week. Sadrul how do you think we should proceed here.
,
Oct 10
I don't know of any finch trials that would affect this. Looking at this graph: https://uma.googleplex.com/p/chrome/timeline_v2?sid=1f759a351b665dbaf809e4864f0efaae For the beta-channel, I looked at the diff between B10 (70.0.3538.17) and B11 (70.0.3538.27): https://chromium.googlesource.com/chromium/src/+log/70.0.3538.17..70.0.3538.27?pretty=fuller I don't see any obvious candidates. +khushal in case 8f470e057e10a730e9e81bafb7266367071b10a8 is relevant. +nzolghadr@ any chance 05a520420b86ce02f12addfa8f35c50832b903c5 is relevant? Is this something that repros locally on those builds? If it's possible to repro the regression locally (like sahel@ had done for the previous bug in issue 867581 ), it would be easier to bisect, investigate etc.
,
Oct 10
Hmmm, I'd be surprised if https://chromium.googlesource.com/chromium/src/+/8f470e057e10a730e9e81bafb7266367071b10a8 was related. It adds a lock to protect access to an object, but other than Webview it should always be used from the GPU main thread. So the lock should be practically free. Another thing is that the code only affects OOP raster, which is being rolled out using finch. So checking whether the regression only affects the population in the experiment would help narrow down whether this is related.
,
Oct 10
khushalsagar@ can you provide the name of the finch to take a look or the link if you have it handy.
,
Oct 10
Sure, here is the finch json file: https://cs.corp.google.com/piper///depot/google3/googledata/googleclient/chrome/finch/studies/OutOfProcessRasterization_Android.json I tried the finch study filter in the uma link on the bug but that didn't work: https://uma.googleplex.com/p/chrome/timeline_v2?sid=acd3e4ada8d1b3a3bb7fa36a6eb25140
,
Oct 10
It doesn't work with the version tag dominant feature. If I look at the timeline without it though, looks like both experiment groups have identical changes. https://uma.googleplex.com/p/chrome/timeline_v2/?sid=733f3ab4d73a7c9e25f8b53e3471bac4
,
Oct 12
sahel@: any luck repro-ing locally on device?
,
Oct 15
--> sahel@ do you think you would be able to attempt a repro locally? That would be very useful.
,
Oct 15
Comment #15: I am sorry I missed this earlier. I am already working on a bug that should be fixed for 70 and is missing tomorrow's 10% : crbug.com/884728 I don't think I will have the time for this repo. Talked to nzolghadr@ and he is gonna investigate.
,
Oct 15
Just to answer to some of the previous comments: khushalsagar@ I don't know about the current state of OOP raster and what finch trial number is the correct group number for the latest experiment. But there seems to be some significant jump in the metrics in the dev channel for group 2: https://uma.googleplex.com/p/chrome/variations/?sid=7a31a0e4ec04ad40fb48e7003f096b5a Although the effect it much less in beta. It is worth to keep an eye on it. Feel free to cc me on anything that you find related. sadrul@ I think the beta range you looked at is too wide and is giving you some unrelated CLs. Looking at this: https://uma.googleplex.com/p/chrome/timeline_v2?sid=5f4241e0dfcfb82151c09d53fd622677 the widest possible imaginable regression range in Canary is somewhere between c29 and c31 which is https://chromium.googlesource.com/chromium/src/+log/70.0.3537.0..71.0.3545.0?pretty=fuller which doesn't have either of my change or Khushal's changes. Still looking to find the problem here...
,
Oct 17
From the ukm data it seems that the regression is very visible in m.facebook website. I used our galaxy note 5 and was able to get to these and here are the results (99% of the Event.Latency.ScrollUpdate.Touch.BrowserNotifiedToBeforeGpuSwap2 for the two versions I tested locally. The latency is in micro second.
**** 70.0.3538.17:
with 498 samples in experiment 1:
42650 {99.0%}
with 1542 samples in experiment 2:
38193 {99.2%}
**** 70.0.3538.27:
with 795 samples in experiment 1:
53185 {98.9%}
with 1651 samples in experiment 2:
143622 {99.0%}
There are obviously quite noisy but I believe it is more or less obvious that 70.0.3538.27 has some sort of regression.
,
Oct 29
,
Nov 7
Sadrul, the regressions seems to be recovered: https://uma.googleplex.com/p/chrome/timeline_v2?sid=86cb071e0f99e9e05d496b3a23efb160 Are you aware of any changes happening in that area? For some reason even though it lasted for the whole beta M70 and only got fixed in beta M71, stable m70 does not show any similar regression. So this might be some what of a version guarded finch as well. Is that alright to close this now?
,
Nov 12
I think we can close now, yes. I will WontFix as does-not-repro. I am not sure what triggered the initial regression, or what fixed it, unfortunately. I think having corresponding metrics in telemetry (issue 878420) would make testing/debugging this kind of regression much easier. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by nzolghadr@chromium.org
, Sep 12Status: Assigned (was: Available)