New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 591725 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug-Regression



Sign in to add a comment

132.9% regression in smoothness.top_25_smooth at 367701:367719

Project Member Reported by tdres...@chromium.org, Mar 3 2016

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=591725

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDAq8XfuQoM


Bot(s) for this bug's original alert(s):

android-nexus5
Cc: tdres...@chromium.org
Owner: siev...@chromium.org
This regression has occurred twice - the only patch common to both cases is https://chromium.googlesource.com/chromium/src/+/7f045e4f2e9f7142c79e2eeb30b70716a1ea8f0b.

This looks extremely likely to be the issue.
sievers@, can you take a look?

Comment 3 by kouhei@chromium.org, Mar 14 2016

friendly ping
Cc: siev...@chromium.org
Status: Available (was: Assigned)
Sorry it'll still be a few days until I'll be able to get to this. So sorry.
sievers: ping?
How does one find out what's being measured where in the code for a metric?
(code search for 'mean_input_event_latency' is not very conclusive)
48.0.2564.95 does not have my change.
49.0.2623.91 does have my change.

Now I've looked at all the interesting histograms computed in render_widget_host_latency_tracker.cc:ComputeInputLatencyHistograms() which seem to cover anything in between touch event and swap happening on the GPU thread in various combinations. But I don't see any interesting changes there in the mean values between the two releases.
Oh and I also missed that this is specific to youtube.com, where instead of around 12ms we now have a more unstable measure between roughly 9 an 28ms.

That looks like an extra frame's time (on top of the 12ms baseline) in a lot of the measurements. But this really needs understanding what we're measuring here.

Cc: vmi...@chromium.org
tdresser, vmiura, is there documentation on mean_input_event_latency anywhere? The smoothness benchmark documentation links to http://www.chromium.org/developers/design-documents/rendering-benchmarks but it does not include anything about input.
Cc: rbyers@chromium.org
rbyers@ could you help with the question above on mean_input_event_latency?
I've added:
"mean_input_event_latency: time between receiving a touchmove event and performing the frame swap for the corresponding scroll event."

To https://sites.google.com/a/chromium.org/dev/developers/design-documents/rendering-benchmarks.

Is that enough information?
So basically the mean of this?

void ComputeScrollLatencyHistograms(
...
 for (size_t i = 0; i < original_component.event_count; i++) {
    UMA_HISTOGRAM_TOUCH_TO_SCROLL_LATENCY(
        "Event.Latency.TouchToScrollUpdateSwapBegin", original_component,
        gpu_swap_begin_component);
 }
Yup, it's the same values.

mean_input_event_latency doesn't give you any results because the mean is prepended separately. See
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/web_perf/metrics/smoothness.py&l=69
for a reasonable place to start, or 
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/catapult/telemetry/telemetry/web_perf/metrics/rendering_stats.py&l=66
where the latency is actually determined.
Cool, thanks, that'll get me started - I'll need to look at some traces.
Cc: briander...@chromium.org enne@chromium.org sunn...@chromium.org
Ok here's what I've found out so far:

The slow frames that are distorting the mean have a browser composite (besides just being a frame from the renderer) which is caused by the top-controls logic (omnibox hiding... we disable the actual movement for tests... I think to reduce noise, but still go through the code.. so looks like we did that at the wrong level).

The reason that this shows up on youtube only is I think because the scroll is really short on this page. I get around 15 frames of which the ones around the ends (which trigger the top-controls path and browser composite) are slow which is good enough to distort the mean.

I do see that there is some slight change in scheduling now that the browser compositor uses cc scheduler. It overall does more work, and it splits work up into more tasks so things drift a little more. I'm not sure this explains it alone yet, I have to try to restore the old behavior and compare side-by-side.
Again note that this only affects the frames when the top controls are shifting.

One other thing that stands out and could be responsible related to that is this sequence I see now on bad frames:
BC: browser compositor
DC: display scheduler

1) VSYNC
2) BC_beginframe
3) BC_NotifyReadyToCommit
4) OnSwapCompositorFrame (from renderer)
5) BC_OnBeginImplFrameDeadline
6) DS_OnBeginImplFrameDeadline

*IIUC* and from following the scroll update flow event, this actually adds a frame of latency because the browser commit 3) happens before 4) so the display scheduler is probably latching the previous frame (remember it has to for it to match the top controls scroll offset - we force that by deleting the renderer's Surface in 4)).

This would explain things since before my patch we'd have more likely forced the sequence of: OnSwapCompositorFrame->Browser_Composite.

And finally +enne@ for just something flaky I noticed on the way:

The CommitVSyncParameters() logic and how it interacts with the display scheduler is probably a bit prone to drift. When we call CommitVSyncParameters() is (totally unintentionally) dependent on whether there is a browser composite (and that requests a vsync). This will totally get better with the UBF stuff enne@ is doing. I don't think it's the main problem here though.




The display scheduler latches the previous frame because 4 happens before 5, right? You said it happens because 3 happens before 4 and I don't see why that's a problem.
I actually haven't dug into how this all works in detail wrt display scheduler and surface aggregation. In RWHVAndroid we destroy the current surface and recreate the browser compositor's SurfaceLayer if the top control offset changes in 4). And FWIU that has the intended side effect of needing a new browser frame for this new renderer frame.

So I was guessing that a browser commit after 4) would produce a proper Surface state for all of this to go on the screen together.
So in case it wasn't clear: the intention of the code is to basically bypass the direct Surface submission from the renderer when the top control offset changes because there is a layer in the *browser* Surface with a texture for the omnibox which needs to be positioned according to the scroll offset of the renderer Surface's current frame (send through CompositorFrame metadata).

But my comments might be inaccurate in terms of how and if that actually works in detail.
Cc: nyerramilli@chromium.org
 Issue 571188  has been merged into this issue.
 Issue 600375  has been merged into this issue.
Ok looks like I can't defer the browser compositor with a deadline to wait for the renderer frame as long as there is an update to the main tree involved, because BeginMainFrame will still happen immediately and if UpdateLayerTreeHost() causes any 'needs commit' then it will clobber my deadline to immediately.

On the other hand, deferring BF itself also seems like a hack.

Sunny suggests that it'd be easier if the animations on the browser compositor tree were impl-side. So that'd have to at least cover omnibox hiding, but maybe  also overscroll glow.



I can restore the previous ordering which yields lower latency by simply posting an immediate BeginFrame when NeedsBeginFrames() turns on (when the renderer frame arrived and the change to the SurfaceLayer causes SetNeedsCommit).

Then on the next vsync it will defer the BeginFrame because nothing changed (*) but immediately post it when the top controls offset changes again on the next OnSwapCompositorFrame. And so forth.

(*)

I did have to remove this requestRender() call here (which causes SetNeedsAnimate() for no reason):

CompositorViewHolder.java:
 public void onVisibleContentOffsetChanged(float offset) {
        mLastVisibleContentOffset = offset;
        propagateViewportToLayouts(getWidth(), getHeight());
        requestRender(); <---
    }

Ok two more things I noticed while debugging this:

a) retro begin frame args cause badness because we don't clean them up when we go idle and turn off the BFS. Once we come back from idle it causes a skipped deferred first frame (effectively pushing us into high latency mode), I think because the timestamp is too old.

b) we are not particularly good with SetNeedsBeginMainFrame() so you can also end up in high latency mode because we start another BeginMainFrame though there were no changes. This is because WillBeginMainFrame() can go on and off repeatedly before there is an actual BeginFrame started. For example if you request SetNeedsCommit() and SetNeedsAnimate() from the same callstack, it causes two BeginMainFrames with SingleThreadProxy.

Were you able to land the changes you described?
Not yet, still discussing what really needs to happen for this to work best, while at the same time the BeginFrame plumbing is changing (also see  crbug.com/604007  which I think is similar).
Project Member

Comment 28 by bugdroid1@chromium.org, Apr 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6414a4526b77dd087581d8a9c1cec50affc66333

commit 6414a4526b77dd087581d8a9c1cec50affc66333
Author: sievers <sievers@chromium.org>
Date: Fri Apr 22 21:33:20 2016

Android: Browser-side scheduling latency tweaks

    - Emit an immediate 'missed' BeginFrame when an
      BFObserver is added instead of waiting for next vsync
    - Remove unneeded requestRender() (=SetNeedsAnimate()) call

    The former helps both the browser compositor and display
    scheduler (now that it uses the same BeginFrameSource).

    Regarding the browser compositor:
    When the omnibox is hiding (or selection handles are showing)
    we need to browser composite with the correct scroll offset
    for a given renderer frame.

    Without this patch we run:
    ScrollBegin VSync SwapCompositorFrame VSync BrowserComposite

    Now we do:
    ScrollBegin VSync SwapCF MissedBF BrowserComposite VSync

    In the next interval, this works because if nothing changed yet,
    the scheduler will defer the BeginMainFrame until there is
    another change in the browser compositor tree (MODE_LATE).

    Calling SetNeedsAnimate()) from CompositorViewHolder when the
    top control offset changes is unnecessary, since this is a
    renderer-driven animation (causes an invalidation in the browser
    compositor tree), so will already cause the necessary
    UpdateLayerTreeHost(). Requesting SetNeedsAnimate()
    causes an extra BeginMainFrame() (SingleThreadProxy+scheduler
    are not particularly good at tracking these between needs_commit
    and needs_animate) and puts us into high latency mode otherwise.
    Fileed bug  crbug.com/602489  to make this more robust in
    SingleThreadProxy and scheduler.

TBR=dtrainor@chromium.org
BUG= 591725 , 602489 , 602485 , 604007 

Review URL: https://codereview.chromium.org/1879833002

Cr-Commit-Position: refs/heads/master@{#389248}

[modify] https://crrev.com/6414a4526b77dd087581d8a9c1cec50affc66333/chrome/android/java/src/org/chromium/chrome/browser/compositor/CompositorViewHolder.java
[modify] https://crrev.com/6414a4526b77dd087581d8a9c1cec50affc66333/content/browser/renderer_host/compositor_impl_android.cc

The CL mentioned in #27 (Remove retro frames from scheduler.) Has not yet landed, and the metric has not recovered.

@sunnyps, could you comment?
The CL in #28 should have addressed this.
The graph may have gotten a little better, but it's not fully resolved. sievers, can you investigate again?
Friendly Perf Sheriff ping!
sievers@chromium.org, Did you get chance to investigate #31
Friendly perf-sheriff ping. Any update re #31?
Labels: -performance-sheriff Performance-Sheriff
Sievers, ping!

This is still pretty bad.
sievers, this still needs some investigation.

Though it is only a single bot/metric, so maybe we're okay...
Emailed sievers.
Downgrading since this only affects latency while the omnibox is animating and therefore also wasn't noticeable in the overall UMA histograms from the field for these latency measurements. It was only noticeable in one test IIRC that did a very short scroll of a few frames duration during which the scroll offset changed the whole time - ironically the omnibox hiding was supposed to be disabled for noise like this in tests, but it turns out it's disabled 'visually' (the layer doesn't move) but still hits this codepath with the side effects.

The reason this happens is that we use cc surfaces and the cc (as well as display) scheduler now. Before we had some special cased logic that dealt with this specific use case a little better.

Unfortunately, now it's up to the cc scheduling logic to handle this use case in an ideal fashion. It's probably an uncommon and Android-specific use case where the renderer frame causes the UI to invalidate (because of the scroll offset and omnibox needing to move in the UI layer tree in the browser). In fact, with cc Surfaces the whole design was centered around having less dependencies (and less of a hierarchy) between the different compositors.

However, maybe it's not totally unreasonable to support this optimization where the browser compositor handles the MISSED BeginFrame from SetNeedsBeginFrame() going high when the renderer sends a frame where the scroll offset changed.
I thought I had it pretty much working with the patch I landed. I did see the retroactive begin frame logic in the scheduler causing issues occasionally, so it's possible that finishing and landing 
https://codereview.chromium.org/1887243002/ would help here.

It's also possible though that at the end of the day it remains a bit fragile, esp. if there are independent invalidations for the UI also (the browser compositor scheduler would have to know that it might want to wait for a renderer frame). But then that was sort of the case before also. In other words, the previous logic wasn't necessarily better overall (it was simpler, and less generic, so hard to say).

Labels: -Pri-2 Pri-3
I should say though that in terms of weighing complexity (in terms of modifying the scheduler) vs. the severity of the problem, the task of removing retro frames from the scheduler will actually remove complexity :)

So definitely worth seeing how this helps.
In terms of the tests we do here I think it would be better to:
- totally disable omnibox hiding for all the regular smoothness tests
- maybe have a specific test that measures the latency for omnibox hiding specifically

Not sure how to find someone to do that though.
Perf sheriff ping: reminder to follow up on possible performance issues
Owner: sunn...@chromium.org
Over to Sunny. It's possible that things are better with the retroactive frame logic gone.
Ping!
Owner: vmi...@chromium.org
Status: Assigned (was: Available)
Anyway we can get some of Sunny's time to investigate?
Cc: skyos...@chromium.org
+ more scheduling folks.  I'm afraid Sunny may not have time to look at this till Jan.
This hasn't moved in 8 months. The graph eventually came back down. Should we WontFix?
Status: WontFix (was: Assigned)
#46, agree this looks like it fully recovered at one point.

Sign in to add a comment