New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 749357 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression



Sign in to add a comment

103.5% regression in smoothness.sync_scroll.key_mobile_sites_smooth at 487936:488051

Project Member Reported by m...@chromium.org, Jul 27 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 27 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=749357

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=f4c6399fb0fa0033429299de2d378031f3c9c6306bb955e183770ddbd9c9e945


Bot(s) for this bug's original alert(s):

android-nexus6
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 27 2017

Cc: bokan@chromium.org
Owner: bokan@chromium.org

=== Auto-CCing suspected CL author bokan@chromium.org ===

Hi bokan@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : David Bokan
  Commit : 6933fe8a1b1d14e0d36a69cb27ef026217595d95
  Date   : Wed Jul 19 22:39:54 2017
  Subject: Remove inert-visual-viewport flag (enabled by default)

Bisect Details
  Configuration: android_nexus6_perf_bisect
  Benchmark    : smoothness.sync_scroll.key_mobile_sites_smooth
  Metric       : first_gesture_scroll_update_latency/Wikipedia (1 tab) - delayed scroll start
  Change       : 85.87% | 10.2928333333 -> 19.1313333333

Revision             Result                  N
chromium@487935      10.2928 +- 5.43773      6      good
chromium@487993      10.0962 +- 4.49807      6      good
chromium@488008      11.0813 +- 7.99643      6      good
chromium@488015      9.27817 +- 3.79522      6      good
chromium@488017      10.4223 +- 9.39564      9      good
chromium@488018      17.4914 +- 8.23915      9      bad       <--
chromium@488019      17.6368 +- 8.91969      9      bad
chromium@488022      18.4819 +- 8.36105      9      bad
chromium@488051      19.1313 +- 2.59009      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=Wikipedia..1.tab....delayed.scroll.start smoothness.sync_scroll.key_mobile_sites_smooth

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972959419037463728


For feedback, file a bug with component Speed>Bisection

Comment 4 by bokan@chromium.org, Jul 27 2017

Labels: OS-Android
Status: Started (was: Untriaged)
Repro'd locally, taking a look today.

Comment 5 by bokan@chromium.org, Jul 27 2017

Cc: -bokan@chromium.org tdres...@chromium.org
Tim: I'm having a hard time understanding where the numbers are coming from here, I'm wondering if you could help shed some light.

I've attached screenshots of the trace (from bots) for 488051 (bad - 18.865ms) and 487935 (good - 9.791ms). To me it seems that the first GSU latency is equal between the two, and it's the following GSUs that are taking a longer time in the bad trace. Am I looking at the wrong thing?

Also, without getting more trace data it'll be hard to figure out what the difference is. I've tried adding --profiler=trace to run_benchmark but that throws an exception (presumably because tracing is already being run to collect this data). I've noticed I can add categories using --extra-chrome-categories but I can't find where/if the tracing data is being saved anywhere so I can't get at it. Do you know how this is supposed to work? Any debugging tips here would be much appreciated. Thanks.


bad.png
238 KB View Download
good.png
221 KB View Download
Yeah, getting a trace and the metrics in the same run will be pretty necessary here.

It's annoying, but here's what I do:

tools/perf/run_benchmark --browser=android-chrome-dev smoothness.top_25_smooth --output-format=json --output-dir=~/foo

# in timeline_based_measurement.py (third_party/catapult)
class Options::__init__
overhead_level = DEBUG_OVERHEAD_LEVEL
Once you've got the traces, can you attach them on the bug?

Comment 8 by m...@chromium.org, Jul 27 2017

Cc: -m...@chromium.org

Comment 9 by bokan@chromium.org, Jul 27 2017

Awesome, thanks Tim - traces attached. Locally it seems that the difference on the first GSU latency is more stark so maybe I just got unlucky with traces from the dashboard. Still not sure why the numbers in the metric are detached from the length of the first GSU latency though...

Can't see yet what difference my change made but I'll dig in a little deeper on Monday (I'm off tomorrow).
good.html
4.7 MB View Download
bad.html
5.6 MB View Download
The way the trace event is drawn doesn't quite match the way we compute the duration for the metric, if I recall correctly.

Code is here for computing this metric:
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/web_perf/metrics/rendering_stats.py?type=cs&q=gesture_scr+file:%5C.py&sq=package:chromium&l=30

The end timestamp is INPUT_EVENT_GPU_SWAP_BUFFER_COMPONENT.
The start timestamp is INPUT_EVENT_LATENCY_ORIGINAL_COMPONENT.

Good:164322450046 - 164322437000 = 13046
Bad: 164057305358 - 164057284000 = 21358

That's about the same as the difference in slice duration...

It would be worth checking which delta between components is causing the change.
Ok, I've figured out what's going on. This is indeed caused because innerWidth/innerHeight report the layout viewport size rather than the visual.

On the Wikipedia case, the page initially loads into a larger viewport. You can verify this as the page is initially able to zoom out. As it loads, the minimum-scale eventually becomes 1, and so the layout viewport becomes device-width (this can happen either because content is removed so it no longer overflows the device-width or by explicitly setting minimum-scale in the viewport meta, I suspect the latter here).

What appears to be happening here is that the page is checking the innerWidth/Height during loading and using that to determine if the user is on a tablet and showing a different UI (I think just the headings are all loaded expanded). Since we now report layout viewport for innerWidth/Height, the page sees the intermittent large layout viewport and thinks its on a tablet. This caused more work and the buffer swaps on the GPU to get pushed back (the traces show that in the "tablet mode" the swaps were happening half way into the next frame, rather than right at the beginning) and so the GSU doesn't get shown until the next frame, hence the large delay.

I suspect this has been fixed since I don't see the same problem when I visit the live page today. I think the fix here is to rerecord the WPR? Tim, do you know what the process is for updating these? 

There's a bit of compat concern here as this could come up on other pages but I expect this not to be too bad since we've already shipped this for a milestone and haven't seen this in the wild. Hopefully this kind of viewport trickery is rare since it relies on timing and there's much better ways to do this (e.g. media queries).
Docs on updating wpr recordings are here: https://www.chromium.org/developers/telemetry/record_a_page_set.

Your analysis SGTM, rerecording is the right approach.
Ok, so I created a new recording and, though there's no difference based on inert anymore, it seems that the page now has some heavier JS on it...the first GSU latency went way up (90-130ms) as we're inside a long running JS function (trace attached) when the first GSU comes in. This has made the metric much noisier. Additionally, there's a hiccup every ~700ms as a JS timer fires which will probably reduce reliability on the other metrics. Given that, it seems like a bad trade to me so I'm inclined to just leave the current recording and use the new number as the  baseline. WDYT?
Wikipedia__1_tab____delayed_scroll_start02017-08-03_12-20-15.html
6.7 MB View Download
Cc: nednguyen@chromium.org
+Ned

I'd be inclined to perform the update.
If the web gets worse, we still want our benchmarks to reflect the current state of the web.

We have lots of noise vs representativeness tradeoffs, and much of the time I side with getting rid of noise, but in this case, I think representativeness should win out.

Ned, what do you think?
+1 to perform the update. Though please do it with "--use-wpr-go" flag (we are migrating to use wprgo - bit.ly/wpr-go-project).

I agree that representativeness should win out as this effect how we optimize Chrome code. We would be better addressing noise later by improving the way we compute metrics, tuning CPU cores, repeat the test more time... 
yeah, I thought about that but wasn't sure where we draw the line so figured you guys might have more insight. I'm off tomorrow but I'll update the recording after the (long) weekend. Thanks for the help!

Comment 17 by bokan@chromium.org, Aug 15 2017

Hey Ned, I tried using --use-wpr-go but it says that's already the default (perhaps it's been flipped in the meantime). However, I'm running into  issue 755654 . It looks like it doesn't like the fact that the existing archives are .wpr but it's trying to save a .wprgo. How should I proceed?
I answered in  issue 755654 
Project Member

Comment 19 by bugdroid1@chromium.org, Aug 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2331c0f6e3a7a6df0883486ac70a04f2bb539e8c

commit 2331c0f6e3a7a6df0883486ac70a04f2bb539e8c
Author: David Bokan <bokan@chromium.org>
Date: Wed Aug 23 13:45:31 2017

Update Wikipedia story in key_mobile_sites

The recording of this was old and causing the site to "misload" when the
inert-visual-viewport feature was turned on. The recording has been
uploaded to cloud storage. This patch simply updates the hash to match
the uploaded file.

Bug:  749357 
Change-Id: I2a877d5f45eab5e7597f3389dda833dc27c2b72d
Reviewed-on: https://chromium-review.googlesource.com/621295
Commit-Queue: David Bokan <bokan@chromium.org>
Reviewed-by: Ned Nguyen (ooo til 08/30) <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#496671}
[modify] https://crrev.com/2331c0f6e3a7a6df0883486ac70a04f2bb539e8c/tools/perf/page_sets/data/key_mobile_sites_009.wprgo.sha1

Comment 20 by bokan@chromium.org, Aug 25 2017

Status: Fixed (was: Started)
Ok, I re-recorded the benchmark, looks like the first_gesture_scroll_update latency came back to normal but so far seems a tad higher than before my patch. That's expected since, as I mentioned above, the re-recorded page seems to be doing more work than before:

https://chromeperf.appspot.com/report?sid=430e14d43cffd9e26a403cf9c3ffae7d243ad454b1d99303b148a71b6378c6ed&start_rev=358213&end_rev=497340

I've checked some of the other tests, a little movement for both better and worse due to rerecording but overall seems fine to me. I'm going to close this bug.

Sign in to add a comment