New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 737379 link

Starred by 8 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug-Regression



Sign in to add a comment

50.1%-162.4% regression in smoothness.top_25_smooth at 482390:482500

Project Member Reported by sullivan@chromium.org, Jun 28 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jun 28 2017


=== BISECT JOB RESULTS ===
Bisect failed for unknown reasons

Please contact the team (see below) and report the error.


Bisect Details
  Configuration: android_nexus6_perf_bisect
  Benchmark    : smoothness.top_25_smooth
  Metric       : mean_input_event_latency/LinkedIn


To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=LinkedIn smoothness.top_25_smooth

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8975581277313720416

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5527677768826880


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 5 by 42576172...@developer.gserviceaccount.com, Jun 28 2017

Cc: chongz@chromium.org
Owner: chongz@chromium.org

=== Auto-CCing suspected CL author chongz@chromium.org ===

Hi chongz@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Chong Zhang
  Commit : 379f637084f5ccfe96bba687afaad8a75f53c3d5
  Date   : Mon Jun 26 21:23:59 2017
  Subject: Re-enable VsyncAlignedInput to Field Trial

Bisect Details
  Configuration: android_nexus6_perf_bisect
  Benchmark    : smoothness.top_25_smooth
  Metric       : mean_input_event_latency/LinkedIn
  Change       : 172.67% | 5.097 -> 13.8978333333

Revision             Result                  N
chromium@482389      5.097 +- 0.433862       6      good
chromium@482403      6.174 +- 3.31565        6      good
chromium@482404      5.187 +- 0.506958       6      good
chromium@482405      13.1253 +- 3.59629      6      bad       <--
chromium@482407      14.2202 +- 2.93612      6      bad
chromium@482410      13.7988 +- 1.60684      6      bad
chromium@482417      13.9547 +- 3.11471      6      bad
chromium@482445      14.899 +- 2.73114       6      bad
chromium@482500      13.8978 +- 2.3433       6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=LinkedIn smoothness.top_25_smooth

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8975577441039686128

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5527677768826880


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 6 by 42576172...@developer.gserviceaccount.com, Jun 28 2017

Cc: tdres...@chromium.org
 Issue 737547  has been merged into this issue.
Project Member

Comment 7 by 42576172...@developer.gserviceaccount.com, Jun 28 2017

 Issue 737383  has been merged into this issue.
Status: Assigned (was: Untriaged)
Cc: rmcilroy@chromium.org
 Issue 739299  has been merged into this issue.
Project Member

Comment 10 by bugdroid1@chromium.org, Jul 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/990aaa397ddbd256f8184a4168cf6873ef2455a2

commit 990aaa397ddbd256f8184a4168cf6873ef2455a2
Author: Chong Zhang <chongz@chromium.org>
Date: Thu Jul 20 22:31:29 2017

[Android] Use oldest historical event time as hardware timestamp

On other platforms we recored hardware timestamp to measure end-to-end
latency, however on Android the 'hardware timestamp' we got is actually
a resample (or latest) timestamp due to it's event batching mechanism.

This patch switched to use the oldest historical event time to match
other platforms and for better measuring of end-to-end performance.

Expecting to see regressions in UMA similar to
'Event.Latency.ScrollUpdate.Touch.TimeToScrollUpdateSwapBegin2',
e.g. Probably a 10ms ~ 20ms increase.

Note: This patch has no effect on telemetry tests.

Bug:  737379 
Change-Id: I3bb4349ec1a47dc497817d70c2c5a64ed40f910a
Reviewed-on: https://chromium-review.googlesource.com/577962
Reviewed-by: Dave Tapuska <dtapuska@chromium.org>
Reviewed-by: Alexandre Elias <aelias@chromium.org>
Reviewed-by: Theresa <twellington@chromium.org>
Commit-Queue: Chong Zhang <chongz@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488447}
[modify] https://crrev.com/990aaa397ddbd256f8184a4168cf6873ef2455a2/ui/android/java/src/org/chromium/ui/base/EventForwarder.java

Labels: Hotlist-Input-Dev OS-All
Status: Fixed (was: Assigned)
There are 3 issues involved in this bug:
1. All Platforms: 1ms (10%) delay in |mean_input_event_latency|
  * We will accept the latency in order to have events aligned by VSync
2. macOS: Extra 1 frame delay in |first_gesture_scroll_update_latency|
  * Fixed in  issue 737547 #c4
3. Android: Extra 4ms (30%) delay in |mean_input_event_latency|
  * The telemetry tests didn't simulate Android's native event batching behavior and the measurement didn't take this latency into account. In fact the latency should be improved according to high speed camera tests.
  * The patch in #c10 fixed the UMA, and we should see improvements in it later.

Closing as fixed.

Cc: briander...@chromium.org sunn...@chromium.org
chongz@, there was a regression in smoothness (jank_count / percentage_smooth) with this change.

Related graphs: https://chromeperf.appspot.com/report?sid=fee25f97188f0d6dfe1b6136a92090a3dd37f4cc7d4c501200045c2b39a66ef2&start_rev=461341&end_rev=494400

Do we have an understanding why that is, and if smoothness really regressed?  For example the Blogger page jank_count increased from 0~1 to 5~6 range.
Status: Started (was: Fixed)
Re #12: Hmmm I don't see those graphs in the bug but it looks like to be related. Will dig into it, thanks!
 Issue 737977  has been merged into this issue.
Cc: vmi...@chromium.org
Re #12: After inspection I think the regression on telemetry tests is expected.

In short, regressions on Android jank_count / percentage_smooth are caused by longer frame_times. 
e.g. frame_times for Blogger increased from 16.8+-1.4ms to 17.2+-2.8ms, and we are using 17ms as smooth_threshold.

However as indicated in #11-3. this doesn't reflect real world performance.
The real world performance should be worse than old telemetry tests result, and my change moves the telemetry tests result closer to real world performance.

Does that sound reasonable?
chongz@, I think the The change in latency info shouldn't have affected the frame_times.  The frame_times come from swap timing SurfaceFlinger, and we shouldn't be seeing a regression in the length or regularity of frames.  That indicates the experience got worse than before.

I've attached before & after traces from the bots which do seem to show the swap latencyInfo events on CrGpuMain thread became more irregular.

I will kick off a bisect to confirm if the change in jank_count is related to your change or something else.
before-trace-file-id_7-2017-06-26_21-48-38-50966.html
3.6 MB View Download
after-trace-file-id_7-2017-06-27_02-18-24-53928.html
3.6 MB View Download
Project Member

Comment 18 by 42576172...@developer.gserviceaccount.com, Aug 30 2017

Status: Assigned (was: Started)

=== Auto-CCing suspected CL author chongz@chromium.org ===

Hi chongz@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Chong Zhang
  Commit : 379f637084f5ccfe96bba687afaad8a75f53c3d5
  Date   : Mon Jun 26 21:23:59 2017
  Subject: Re-enable VsyncAlignedInput to Field Trial

Bisect Details
  Configuration: android_nexus5_perf_bisect
  Benchmark    : smoothness.gpu_rasterization.top_25_smooth
  Metric       : jank_count/Blogger
  Change       : 485.71% | 1.16666666667 -> 6.83333333333

Revision             Result                  N
chromium@482365      1.16667 +- 4.33974      6      good
chromium@482396      2.33333 +- 4.16333      6      good
chromium@482404      1.5 +- 3.39116          6      good
chromium@482405      7.0 +- 4.47214          6      bad       <--
chromium@482406      5.83333 +- 2.61406      6      bad
chromium@482408      6.16667 +- 3.85141      6      bad
chromium@482411      5.77778 +- 3.09121      9      bad
chromium@482426      7.16667 +- 2.61406      6      bad
chromium@482487      6.83333 +- 2.97209      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Blogger smoothness.gpu_rasterization.top_25_smooth

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8969800955303446912


For feedback, file a bug with component Speed>Bisection
Cc: pmeenan@chromium.org mustaq@chromium.org
 Issue 740120  has been merged into this issue.
vmiura@, thank you for the trace! I'm not exactly sure how SurfaceFlinger works, but please let me explain how I interpret the trace & |frame_time| regression:

1. I assume we don't produce a frame if cc didn't receive an input.
2. My feature queues input in |InputHandlerProxy|, and forward them to cc on |WillBeginImplFrame()|.
3. According to the attach picture, 'Event A' arrives too late to be picked up by current begin frame, thus will be delayed to next begin frame.
4. CrGpuMain won't do |DrawAndSwap()| since cc didn't receive an input this frame. However |frame_time| will be reported as 32ms as if we lost a frame.

I will keep learning how SurfaceFlinger works, but please let me know if the explanation above makes sense to you (or assumption 1. doesn't hold).

Thanks!
after-trace-file-id_7-2017-06-27_02-18-24-53928-slice.png
181 KB View Download
It does look like the regression isn't real and is just an alignment issue with our synthetic input events.

We'll need to fix the synthetic input events so they are more realistic though.

If we made the synthetic input event rate double the display refresh rate, that would more reflect how some phones/tablets work if we request raw input events. It would also make sure we get at least 1 input event per frame.

Is that something we can do? The hardest part would probably be determining what is signal vs. noise in our benchmarks during the transition.

Alternatively/additionally, if we had input extrapolation, we could avoid the input vs. BeginFrame race. The race would still be a problem even if the input events come in at double the refresh rate since some frames might see 1 or 3 events, which wouldn't look good even if we are hitting 60fps.
chongz@ I think that explanation makes sense.

What we're looking from these tests is to show that our systems can sustain a solid 60fps in scrolls.  If our synthetic gestures aren't generating 60fps of scrolling it's a problem as it conflates with what the test is trying to measure.
> If we made the synthetic input event rate double the display refresh rate, that would more reflect how some phones/tablets work if we request raw input events. It would also make sure we get at least 1 input event per frame.

This should be the next step of  issue 722921  as described in the doc:
https://docs.google.com/document/d/1_Apee7VRgTFKWt8Mq_1cEk_WMRYxwyhR_G0MgM-8H9c/edit


> Is that something we can do? The hardest part would probably be determining what is signal vs. noise in our benchmarks during the transition.

I'd imagine we could shorten the timer in |SyntheticGestureController::StartTimer|, which currently is 16666us.
But yes the hard part is to analysis perf.


> Alternatively/additionally, if we had input extrapolation, we could avoid the input vs. BeginFrame race. The race would still be a problem even if the input events come in at double the refresh rate since some frames might see 1 or 3 events, which wouldn't look good even if we are hitting 60fps.

By 'input extrapolation' I assume you are referring to event resampling? eirage@ has plan on this feature so |InputHandlerProxy| could present smoother events to cc. (No tracking bug yet though)

> What we're looking from these tests is to show that our systems can sustain a solid 60fps in scrolls.  If our synthetic gestures aren't generating 60fps of scrolling it's a problem as it conflates with what the test is trying to measure.

That sounds reasonable, thanks!

Anyway I will consider this regression about jank_count / percentage_smooth as fixed unless I found something new about SurfaceFlinger, and this bug shouldn't be blocked by synthetic events issues.
Components: Blink>Input
Status: Fixed (was: Assigned)
Mark as fixed as per #23.
Issue 777732 has been merged into this issue.

Sign in to add a comment