New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 867581 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 6
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-08-30
OS: Android
Pri: 1
Type: Bug



Sign in to add a comment

Event.Latency.ScrollUpdate.Touch.TimeToScrollUpdateSwapBegin2 99th percentile regressed by almost 100% in M69

Project Member Reported by nzolghadr@chromium.org, Jul 25

Issue description

Owner: sadrul@chromium.org
Status: Assigned (was: Available)
Here is the sub-metric (Event.Latency.ScrollUpdate.Touch.BrowserNotifiedToBeforeGpuSwap2) that shows the same regression:
https://uma.googleplex.com/p/chrome/timeline_v2?sid=c5e19bdd729d16790073c58c7b25d479

Sadrul do you know what might have caused this?
Labels: -Pri-3 Pri-1
sardul@ this is a pretty significant regression. Could you take a look?
Cc: piman@chromium.org jdarpinian@chromium.org sunn...@chromium.org
+jdarpinian@, sunnyps@, piman@

Is it possible that https://chromium-review.googlesource.com/c/chromium/src/+/1077435 may have caused this regression?

Some context: the regression here is that the display-compositor attempts a gpu-swap much later than it used to (after receiving a compositor-frame). The timestamp for the display-compositor receiving a compositor-frame is set here https://cs.chromium.org/chromium/src/components/viz/service/frame_sinks/compositor_frame_sink_support.cc?l=368. The timestamp for the gpu swap-begin is set here: https://cs.chromium.org/chromium/src/components/viz/service/display/output_surface.cc?type=cs&sq=package:chromium&targetos=chromium-android&g=0&l=52 (I believe the timestamp comes from https://cs.chromium.org/chromium/src/gpu/ipc/service/pass_through_image_transport_surface.cc?type=cs&sq=package:chromium&targetos=chromium-android&g=0&l=205)
SwapBuffers explicitly flushes (https://cs.chromium.org/chromium/src/gpu/command_buffer/client/gles2_implementation.cc?q=GLES2Implementation::SwapBuffers&sq=package:chromium&g=0&l=1369), so a priori changes in implicit flushes shouldn't have an effect.
Cc: sadrul@chromium.org
Owner: nzolghadr@chromium.org
We discussed this offline. We will attempt to repro this locally, because that would make it much easier to investigate/bisect. Navid is going to try a local-repro, and we will then investigate after that.
Owner: sahel@chromium.org
We couldn't find a Pixel device and Pixel 2 wasn't showing the regression. Sahel found a Galaxy device and she is working on reproducing the issue.
Cc: bokan@chromium.org
Owner: sadrul@chromium.org
The issue is easily reproducible on a Samsung Galaxy Note5 device:
1- Download the two canary builds (for arm) with versions 69.0.3445.2 (good), 69.0.3447.3 (bad).
2- Install the older version. Open canary and navigate to https://m.facebook.com/login.php?next=https%3A%2F%2Fm.facebook.com%2Fhome.php&refsrc=https%3A%2F%2Fm.facebook.com%2Fhome.php&_rdr
3- Do a few scrolls up and down without logging in.
4- Go to chrome://histograms and look for BrowserNotifiedToBeforeGpuSwap2, the max value for the good version should be below 50000.
5- Do steps 2-4 with the new version and check that the max value for BrowserNotifiedToBeforeGpuSwap2 hits 200000.

Cc: fsam...@chromium.org
Status: Started (was: Assigned)
The regression seems to be at least partly attributable to crrev.com/562737. Following is the BrowserNotifiedToBeforeGpuSwap2 on the device on the various revisions in the rage (range being 562690-563438):

            r563438   r562737   r562736   r562690
50%-ile          9K       13K       11K       11K
90%-ile         47K       59K       30K       30K
99%-ile        128K      179K       53K       53K

There seems to be a large regression with 562737, some of which seems to have been eventually recovered by 563438, but not all.
I'm confused how this caused a regression. What is this metric measuring? Surface sync is still off by default...
Fix is up for review at https://chromium-review.googlesource.com/c/chromium/src/+/1187277 I think it'd make sense to land this and try out on canary before attempting to merge to stable, as we try to understand the regression better.
I'm reluctant to land a CL without understanding why it improves things...
I agree. But in this case, ideally, it would be a revert of the original CL, since it also caused a regression ( issue 855433 ). But because of the changes in the pagesets at that time, it took until Jul 4th to find the culprit CL (which, unfortunately, did not get triaged in due time).
Project Member

Comment 15 by bugdroid1@chromium.org, Aug 25

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5d1f8253fad3b518ca58c5fbf535dd0cc186826e

commit 5d1f8253fad3b518ca58c5fbf535dd0cc186826e
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Sat Aug 25 20:45:07 2018

viz: Recreate the surface layer without surface-sync.

crrev.com/562737 made a change to update the surface-layer with the
appropriate primary/fallback surfaces (and bounds) during surface
activation. This was to improve performance with surface-sync. However,
this ended up triggering a regression in the input pipeline (see linked
bugs). So, update the code so that when surface-sync is not used, the
surface-layer is destroyed during eviction, and recreated as needed.

BUG= 867581 

Change-Id: I0c07f5782c06de171bec131014b66eab9a09b879
Reviewed-on: https://chromium-review.googlesource.com/1187277
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Reviewed-by: Fady Samuel <fsamuel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#586143}
[modify] https://crrev.com/5d1f8253fad3b518ca58c5fbf535dd0cc186826e/ui/android/delegated_frame_host_android.cc

NextAction: 2018-08-30
It looks like the UMA is beginning to recover after the fix: https://uma.googleplex.com/p/chrome/timeline_v2?sid=1168838c39ed27aad049e55b2b543aed I will track for a few more days before requesting a merge.
The fix has been live for a few days, and the UMA is recovering pretty nicely. So requesting a merge for crrev.com/586143 to M69.
Labels: Merge-Request-69
Labels: OS-Android
Project Member

Comment 20 by sheriffbot@chromium.org, Aug 30

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: We are only 4 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
The NextAction date has arrived: 2018-08-30
Labels: -Hotlist-Merge-Review -Merge-Review-69 Merge-Approved-69
Approved for merge into 69, branch 3497.
Sadrul, beside this original bug and the commit range we have  issue 878059  and  issue 876753  which are regressions after this bug. From the graph It seems that the latest change recovered them all.
https://uma.googleplex.com/p/chrome/timeline_v2?sid=e3617cba547f2f40e569261658c374d5

Can you confirm that your revert indeed fixed them all and it was intentional?
Do you mind also comment in the hints on the graph of the other regressions about the result of the investigations.
Project Member

Comment 24 by bugdroid1@chromium.org, Aug 31

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f2f429ddd35b34d9eafeeef83a23b1a95f8d6507

commit f2f429ddd35b34d9eafeeef83a23b1a95f8d6507
Author: Sadrul Habib Chowdhury <sadrul@chromium.org>
Date: Fri Aug 31 06:36:36 2018

[Merge to M69] viz: Recreate the surface layer without surface-sync.

crrev.com/562737 made a change to update the surface-layer with the
appropriate primary/fallback surfaces (and bounds) during surface
activation. This was to improve performance with surface-sync. However,
this ended up triggering a regression in the input pipeline (see linked
bugs). So, update the code so that when surface-sync is not used, the
surface-layer is destroyed during eviction, and recreated as needed.

BUG= 867581 

(cherry picked from commit 5d1f8253fad3b518ca58c5fbf535dd0cc186826e)

Change-Id: I0c07f5782c06de171bec131014b66eab9a09b879
Reviewed-on: https://chromium-review.googlesource.com/1187277
Commit-Queue: Sadrul Chowdhury <sadrul@chromium.org>
Reviewed-by: Fady Samuel <fsamuel@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#586143}
Reviewed-on: https://chromium-review.googlesource.com/1198745
Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#859}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/f2f429ddd35b34d9eafeeef83a23b1a95f8d6507/ui/android/delegated_frame_host_android.cc

Status: Fixed (was: Started)
From looking at the graph https://uma.googleplex.com/p/chrome/timeline_v2?sid=21b083f14143e73dfa17d5240a017e75 it does look like the CL fixes all the regressions. So I am closing all three bugs.

However, it does look like there has been a new regression since then :-/

Sign in to add a comment