Issue metadata
Sign in to add a comment
|
7.5% regression in thread_times.simple_mobile_sites at 377357:377427 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Mar 7 2016
=== Auto-CCing suspected CL author ericrk@chromium.org === Hi ericrk@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Refactor signaling in RWP Author : ericrk Commit description: A previous patch had moved RasterWorkerPool to make heavy use of broadcast in order to simplify logic a bit. It turns out that broadcast is not ideal in performance critical situations, as it increases lock contention (see https://code.google.com/p/chromium/codesearch#chromium/src/base/synchronization/condition_variable.h&l=34). This change removes all uses of broadcast other than in Shutdown, where performance is less of a concern and we actually need all threads to wake up. To achieve this, we need to re-factor RWP to use: - N foreground threads - 1 background thread rather than N threads which may be able to run foreground/background tasks. In order to ensure that we don't overload the system, this change introduces a limiting system where the background priority thread can only run tasks if no other threads are doing so. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel Review URL: https://codereview.chromium.org/1666283002 Cr-Commit-Position: refs/heads/master@{#377379} Commit : f064193fe71b8b055d060adb10fbc4e3cca8c604 Date : Wed Feb 24 21:18:12 2016 ===== TESTED REVISIONS ===== Revision Mean Value Std. Dev. Num Values Good? chromium@377356 2.984972 0.077132 8 good chromium@377375 3.020454 0.071806 8 good chromium@377377 2.990694 0.043098 5 good chromium@377378 3.01307 0.044526 5 good chromium@377379 3.129479 0.060257 8 bad chromium@377383 3.16841 0.063349 8 bad chromium@377393 3.135948 0.06011 8 bad chromium@377427 3.168396 0.037785 5 bad Bisect job ran on: android_nexus6_perf_bisect Bug ID: 592292 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --also-run-disabled-tests thread_times.simple_mobile_sites Test Metric: thread_renderer_main_cpu_time_per_frame/thread_renderer_main_cpu_time_per_frame Relative Change: 6.20% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus6_perf_bisect/builds/2001 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9018879277035874576 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with label Cr-Tests-AutoBisect. Thank you!
,
Mar 14 2016
ericrk@: Would you please take a look?
,
Mar 14 2016
Investigating now.
,
Mar 14 2016
===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Refactor signaling in RWP Author : ericrk Commit description: A previous patch had moved RasterWorkerPool to make heavy use of broadcast in order to simplify logic a bit. It turns out that broadcast is not ideal in performance critical situations, as it increases lock contention (see https://code.google.com/p/chromium/codesearch#chromium/src/base/synchronization/condition_variable.h&l=34). This change removes all uses of broadcast other than in Shutdown, where performance is less of a concern and we actually need all threads to wake up. To achieve this, we need to re-factor RWP to use: - N foreground threads - 1 background thread rather than N threads which may be able to run foreground/background tasks. In order to ensure that we don't overload the system, this change introduces a limiting system where the background priority thread can only run tasks if no other threads are doing so. BUG= CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel Review URL: https://codereview.chromium.org/1666283002 Cr-Commit-Position: refs/heads/master@{#377379} Commit : f064193fe71b8b055d060adb10fbc4e3cca8c604 Date : Wed Feb 24 21:18:12 2016 ===== TESTED REVISIONS ===== Revision Mean Value Std. Dev. Num Values Good? chromium@377356 2.965524 0.050674 12 good chromium@377375 2.993519 0.082888 12 good chromium@377377 3.016277 0.055761 8 good chromium@377378 2.966529 0.064105 5 good chromium@377379 3.119201 0.044459 8 bad chromium@377383 3.08487 0.029491 8 bad chromium@377393 3.106822 0.088513 8 bad chromium@377427 3.103297 0.058215 8 bad Bisect job ran on: android_nexus6_perf_bisect Bug ID: 592292 Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --also-run-disabled-tests thread_times.simple_mobile_sites Test Metric: thread_renderer_main_cpu_time_per_frame/thread_renderer_main_cpu_time_per_frame Relative Change: 4.90% Score: 99.8 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_nexus6_perf_bisect/builds/2017 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9018190294894129152 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with label Cr-Tests-AutoBisect. Thank you!
,
Mar 15 2016
Did some investigation into this: It looks like a regression in some thread times is expected - we now have a background and foreground compositor thread on Android. While the background thread is designed to remain idle while foreground work is in progress, long running background tasks can cause some overlap which would bump up thread times. Why renderer_main (as opposed to renderer_compositor or raster) thread time is affected is a bit of a mystery - locally this CL seemed to cause a slight bump in most frame times, so it's possible that increased thread contention causes our readings to increase a bit globally? This bump in thread times should also result in work happening more efficiently, especially when scrolling (where the background work ends up being used). While it's a bit tricky to pinpoint in the graphs, it does appear that this CL may have given us an improvement in main_thread_scroll_latency (https://chromeperf.appspot.com/report?sid=9aaccb8b0901f44a4c0fd1d538a11234ccd5b8b707aa69a77af326e00d3ef349&start_rev=376332&end_rev=379495). In terms of avoiding this cost, we can modify the task runner to block foreground work until non-cancel-able background work completes. I've confirmed locally and this does appear to fix the issue - see crrev.com/1796383002 for an idea of the change. However, in my local setup fixing the thread_times comes at the cost of first_gesture_Scroll_update_latency (see attached trace for the site that showed the largest regression, ebay.co.uk). All things considered, I'm in favor of leaving this in. This change seems to have gotten us some perf wins, and the current behavior works better with with the Android little core work that's being done. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by ericwilligers@chromium.org
, Mar 7 2016