Issue metadata
Sign in to add a comment
|
1%-22.5% regression in smoothness.gpu_rasterization.tough_path_rendering_cases at 483316:483448 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 5 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8974917928249046496
,
Jul 5 2017
=== Auto-CCing suspected CL author krb@chromium.org === Hi krb@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Kevin Bailey Commit : 3dc51db95f327e4f86962c94d6d5455ee75cf433 Date : Thu Jun 29 14:43:43 2017 Subject: [omnibox] Select all when we revert all Bisect Details Configuration: winx64_high_dpi_perf_bisect Benchmark : smoothness.tough_webgl_cases Metric : frame_times/http___kenrussell.github.io_webgl-animometer_Animometer_tests_3d_webgl.html Change : 24.83% | 101.17461779 -> 126.297406433 Revision Result N chromium@483319 101.175 +- 4.96642 6 good chromium@483344 101.757 +- 2.34893 6 good chromium@483356 100.505 +- 3.12129 6 good chromium@483359 101.183 +- 2.42339 6 good chromium@483361 106.244 +- 15.8564 6 good chromium@483362 129.72 +- 5.338 6 bad <-- chromium@483368 123.032 +- 4.75483 6 bad chromium@483416 126.297 +- 4.52276 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=http...kenrussell.github.io.webgl.animometer.Animometer.tests.3d.webgl.html smoothness.tough_webgl_cases More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8974917928249046496 For feedback, file a bug with component Speed>Bisection
,
Jul 6 2017
Talked with krb@ offline -- need to do two runs locally, one with top of tree, and one with 3dc51db95f327e4f86962c94d6d5455ee75cf433 reverted. Let's look at mean_frame_times for the WebGL Animometer case for both. If it looks like there is a real performance regression then maybe the Omnibox is now rapidly triggering browser repaints, or something similar.
,
Jul 6 2017
You said, "Let's" so I assume you want to be cc'd. Sorry if that's presumptuous.
I suspect this is what you're looking for, from the 2 runs I did this
morning:
"charts": {
"mean_frame_time": {
"summary": {
"std": 0.0,
"name": "mean_frame_time",
"improvement_direction": "down",
"important": true,
"values": [
- 30.563
+ 35.171
],
"units": "ms",
"type": "list_of_scalar_values",
"description": "Arithmetic mean of frame times."
},
This is from Linux. I'm still working on the Windows version.
I can't imagine how this change would trigger a lot more redraws, and I added some logging to verify that the paint routines for the elements in question weren't getting explicitly called more. In case I'm looking at the wrong routines, is there something specific Omnibox-related that anyone would like to see numbers on ?
,
Jul 7 2017
Fine to CC: me. Please confirm which number is from which run -- i.e., from top-of-tree, or with 3dc51db95f327e4f86962c94d6d5455ee75cf433 reverted. Please gather 5 runs of each so we can get an idea locally of how noisy this benchmark is on your machine.
,
Jul 7 2017
Both are ToT. 30.563ms is with my change removed via a patch. (Think "before and after".) I will get the additional samples tomorrow morning.
,
Jul 7 2017
If the per-frame time is reliably 35 ms with your change in the code base and 30 ms without it, then there's clearly been a huge performance regression and you should revert it ASAP while the cause is being investigated. I suspect that the Omnibox is triggering a lot of UI repaints where it wasn't before. These flags might help: /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --user-data-dir=/tmp/t1 --show-paint-rects --ui-show-paint-rects I'd also suggest gathering traces with and without your patch via about:tracing.
,
Jul 7 2017
Here are the numbers. I got 2 sets of 5 for each build:
before {32.286, 34.674, 34.475, 31.52, 31.227} = 32.8544 +- 1.48493
{32.685, 32.375, 34.221, 36.37, 33.727} = 33.8756 +- 1.41653
after {36.275 30.792 37.2 31.966 34.517} = 34.15 +- 2.4494
{32.901 36.658 30.639 33.516 30.018} = 32.7464 +- 2.35805
I don't like the SD on the change, but it doesn't look to me like the average has changed. Again, this is Linux; let me get the Windows numbers.
,
Jul 7 2017
The Windows numbers are much worse: ~105 vs ~130ms. I tried rearranging a few things but it doesn't get any better. I've no idea why the time is so different when we're just, essentially, moving a call from one place to another. I also tried getting a trace, but I don't see any Omnibox events, despite checking the box. I'll just revert it.
,
Jul 7 2017
OK. Thanks for investigating this. Again, I suspect that your change is causing the browser's UI to repaint a lot more often for some reason. If you can dig into why then we can re-land your patch.
,
Jul 7 2017
Sure, and thanks for the suggestions. Unfortunately, I'm no closer to understanding why this is happening. Is "mean_frame_time" for a single navigation or per-frame (several times per second) ? I verified that it is the SelectAll() in OmniboxViewViews::Update() that triggers it, and that it is only called once, for the navigation itself (in case Peter has any theories.)
,
Jul 7 2017
mean_frame_time is per-frame, averaged across the entire run of the benchmark. The increase seen on Windows from 105 ms to 130 ms is huge, and should be easily visible if you gather a trace from about:tracing with type "Javascript and Rendering". I'd suggest you try viewing the benchmark page in your browser build outside the benchmark harness, gathering traces before and after your patch, and see if anything obvious comes up. You ought to be able to easily see the browser, renderer and GPU processes' work on a per-frame basis (start with a clean browser session with no other tabs). In fact if you'd like to attach before-and-after traces here, I and others may be able to help diagnose them.
,
Jul 10 2017
To make sure I'm doing this correctly, I went to chrome://tracing, clicked on 'Record', then 'Javascript and rendering', then editted the selections by adding 'omnibox' and 'views'. I then hit 'Start', and flipped to a tab where I had the WebGL test cued up and hit enter. I let it run for about 10 seconds before stopping the recording. When I zoom in to about 200ms, I don't see any blocks that are 30ms long (except for a couple things that are in the "good" trace too.) I'm not familiar with this data enough to see any differences, or out of place blocks either. I have 3 traces, but it will only let me upload one at a time. I've attached the latest "bad" trace. If there's a better place to put the others, I'd be happy to upload them there.
,
Jul 10 2017
The 10 MB limit is per comment. Could you please upload a "good" trace to a new comment?
,
Jul 10 2017
Here is the "good" trace.
,
Jul 12 2017
,
Jul 13 2017
sunnyps@ helped me look through not only these traces but also the ones from the original alert, specifically on: ChromiumPerf/win-high-dpi/smoothness.tough_webgl_cases / mean_frame_time / http___kenrussell.github.io_webgl-animometer_Animometer_tests_3d_webgl.html Trace before the alert: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_8-2017-06-28_23-27-58-20273.html Trace after the alert: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_8-2017-06-29_15-31-05-14635.html Sunny pointed out that the traces gathered by hand might not reproduce the original problem. They were so big that the browser had a difficult time loading both simultaneously, anyway. After much analysis Sunny concluded that the frame_times and mean_frame_time metrics are flawed on Windows. These metrics seem to be looking at both BenchmarkInstrumentation::ImplThreadRenderingStats and BenchmarkInstrumentation::DisplayRenderingStats trace events on this platform. In the "good" trace, the browser UI is occasionally redrawing (presumably caused by the Omnibox), and drawing *more* frames than in the "bad" trace. Other than that, the times to draw WebGL appear unchanged. So krb, your CL is *improving* things by reducing the number of frames drawn! The additional frames aren't useful, but are skewing the frame_times and mean_frame_times metrics. These metrics are apparently computed differently on different platforms -- on Android, they are tied into the SurfaceFlinger -- which is why the frame_times and mean_frame_time metrics regressed only on Windows. It also explains the dramatic improvement in these metrics on Windows for the IE Chalkboard benchmark, and the regression seen when krb's CL was reverted in 5604cb88fc88d2c3b7886702e8253b21ceea600e . krb@, let's re-land your original CL and close this regression bug as WontFix. brianderson@ mentioned there was ongoing work to improve these metrics; we should reference that bug here.
,
Jul 13 2017
A small correction to #18: The reason we don't see this regression on Android is probably because this code doesn't run there. However, benchmarks like this can be affected by spurious frames injected by the UI code on Android too. (I don't know if this happens today for sure.)
,
Jul 13 2017
Thanks Sunny for clarifying. Then it's also somewhat of a mystery why this didn't show up on Linux (where Views is used), though since Mac has different Omnibox code, that may explain why the regression wasn't seen there.
,
Jul 13 2017
+Tim: fyi
,
Jul 13 2017
Ken and Sunny, thank you for tracking this down. It's good to get to the bottom of a mystery. btw, I can't reland it quite yet, as there's a small change we need to make first. https://bugs.chromium.org/p/chromium/issues/detail?id=738843
,
Aug 7 2017
,
Aug 10 2017
Why is this wontfix? krb?
,
Aug 10 2017
@24: See comment 18.
,
Aug 25 2017
,
Sep 22 2017
,
Oct 10 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by rmcilroy@chromium.org
, Jul 5 2017