Issue metadata
Sign in to add a comment
|
31.6% regression in performance_browser_tests at 408143:408178 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 28 2016
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/9005905528418313952
,
Jul 28 2016
=== Auto-CCing suspected CL author enne@chromium.org === Hi enne@chromium.org, the bisect results pointed to your CL below as possibly causing a regression. Please have a look at this info and see whether your CL be related. ===== BISECT JOB RESULTS ===== Status: completed ===== SUSPECTED CL(s) ===== Subject : Turn on enable begin frame scheduling by default Author : enne Commit description: This turns on --enable-begin-frame-scheduling for all platforms but mus and Blimp. This change means browser->renderer begin frame ticks instead of sending vsync information and having a synthetic source on the renderer side. This feature was already on for Android so should only be a real change for desktop / ChromeOS platforms. Lots of cleanup can follow from this like removing all commit vsync / authoritative vsync / CompositorVSyncManager things, but this is a smaller patch to suss out any performance regressions. This previously landed and regressed a number of metrics. Some of these have been investigated and fixed and the rest are WontFix. CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel Review-Url: https://codereview.chromium.org/1939253002 Cr-Commit-Position: refs/heads/master@{#408172} Commit : d020d554c9ffc0476d43f5364a47c522b64e3d9a Date : Wed Jul 27 17:45:34 2016 ===== TESTED REVISIONS ===== Revision Mean Std Dev N Good? chromium@408142 18.9603 0.346778 5 good chromium@408159 18.9014 0.251633 5 good chromium@408169 19.186 0.11121 5 good chromium@408171 19.0275 0.0844004 5 good chromium@408172 25.1898 0.305958 5 bad <-- chromium@408174 25.1296 0.185025 5 bad chromium@408178 25.1645 0.337451 5 bad Bisect job ran on: win_x64_perf_bisect Bug ID: 632292 Test Command: .\src\out\Release_x64\performance_browser_tests.exe --test-launcher-print-test-stdio=always --enable-gpu Test Metric: TabCapturePerformance/Capture Relative Change: 32.72% Score: 99.9 Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/win_x64_perf_bisect/builds/1353 Job details: https://chromeperf.appspot.com/buildbucket_job_status/9005905528418313952 Not what you expected? We'll investigate and get back to you! https://chromeperf.appspot.com/bad_bisect?try_job_id=5273785255591936 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Tests>AutoBisect. Thank you!
,
Sep 23 2016
enne, looks like your patch caused a regression. Could you investigate?
,
Oct 6 2016
perf fixit: unassigning owner for sheriff-owned bugs to clarify that these are triaged by a rotation.
,
Oct 6 2016
Ugh...Looks like this bug never got assigned to enne@, so this regression made it out to M54. On many flavors of the testing, it's actually a full 50% regression in performance. enne: PTAL as soon as possible, and please merge fixes for M55.
,
Oct 7 2016
I landed and reverted this patch in the past. Ultimately when the browser is under load, it ends up being a latency/throughput tradeoff and frame times increase. This is sad from a perf graph perspective, but it's an architectural win for future improvements. Knowing when frames start allows input to be batched and sent and not delivered midframe. This will help both with overall load (not needing to defensively raster content ahead) and perf in the compositor (being able to start frames immediately). We can't easily throttle the framerate of oopifs without the browser controlling when frames start with this patch too. So, this is mostly setting up architecture to help for the future. See this doc for details: https://docs.google.com/document/d/1aTaqW0yJj5C-9uTdkiQ7U6bCtBg6hdskzZcNOYrhTr8/view# So, many of these regressions are expected and wontfix. linux-release/smoothness.gpu_rasterization_and_decoding.image_decoding_cases/frame_times known, wontfix image_decoding_cases/chromium-rel-mac-retina/scheduler.tough_scheduling_case/squeueing_durations recovered chromium-rel-mac11/power.top_25/idle_wakeups_total known, wontfix chromium-rel-mac11/jetstream/Score recovered chromium-rel-mac11/smoothness.tough_path_rendering_cases/frame_times known, wontfix chromium-rel-mac11/media.tough_video_cases/idle_wakeups_gpu known, wontfix chromium-rel-mac10/power.gpu_rasterization.top_25/idle_wakeups_total recovered chromium-rel-mac10/media.tough_video_cases/idle_wakeups_browser known, wontfix android-nexus7v2/smoothness.top_25_smooth/frame_times known, wontfix linux-release/smoothness.tough_filters_cases/frame_times/animometer known, wontfix chromium-rel-win7-dual/smoothness.tough_animation_cases/frame_times known, wontfix linux-release/smoothness.gpu_rasterization.tough_filters_cases/frame_times/animometer focus known, wontfix There are a few that I want to look into further. In particular, the tab capture test results are the surprising result here. I'll take a closer look here and see what's going on. chromium-rel-win7-dual/performance_browser_tests/TabCapturePerformance_webrtc/CaptureSucceeded chromium-rel-mac10/smoothness.top_25_smooth/percentage_smooth/Blogger chromium-rel-win7-dual/smoothness.top_25_smooth/percentage_smooth/percentage_smooth/Docs
,
Oct 10 2016
Managed to hack up tracing to spit out traces for tab capture locally on Linux synced to this CL that I manually fixed up until they loaded properly.
,
Oct 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ff408189176e09e6c8160cc3ac6d44bc9846ebef commit ff408189176e09e6c8160cc3ac6d44bc9846ebef Author: enne <enne@chromium.org> Date: Fri Oct 14 20:15:20 2016 Fix scheduler bug in skipping main frames Previously the scheduler would think that it could skip a main frame if the begin frame's start time + estimated time < deadline. However, if this begin frame gets delivered very late, then this is incorrect, and it's even possible that start time + estimated time < Now() and is in the past, which doesn't make sense. Instead, check if Now() + estimated time < deadline. This is the right logic to see if it's possible to finish activation from the current moment before the deadline. This was causing problems where the following events would occur: - browser has a 40ms draw - browser delivers a begin frame - renderer mistakenly thinks the activation fits below the deadline even though estimated time > one frame - renderer skips this begin main frame - browser has another 40ms draw - renderer now does a begin main frame, >40ms late This was causing tab capture performance tests where the browser was pegged with long draw operations to cause the renderer frame times to go from 60 to 100ms. R=sunnyps@chromium.org BUG= 632292 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Review-Url: https://codereview.chromium.org/2418593002 Cr-Commit-Position: refs/heads/master@{#425449} [modify] https://crrev.com/ff408189176e09e6c8160cc3ac6d44bc9846ebef/cc/scheduler/scheduler.cc [modify] https://crrev.com/ff408189176e09e6c8160cc3ac6d44bc9846ebef/cc/scheduler/scheduler.h [modify] https://crrev.com/ff408189176e09e6c8160cc3ac6d44bc9846ebef/cc/scheduler/scheduler_unittest.cc
,
Oct 21 2016
Looking at tab capture, I fixed one bug, but this looks like the same known case of a test that's overloading the browser. As my change helps latency and decreases throughput in this case, it makes tab capture slower. I think a more realistic tab capture case wouldn't have 50ms draw operations. That's the only other thing I really wanted to look at, and so am going to WontFix this whole batch as being an expected regression.
,
Oct 21 2016
enne: is the known case of the test overloading the browser logged anywhere? I want to make sure things that are sending spurious alerts are tracked.
,
Oct 22 2016
The original author of the performance_browser_tests focused heavily on stress-testing as a method of performance measure. Meaning, some of these tests attempt to push execution to an overloaded level, and then we measure regressions as the machine becomes more overloaded before/after a code change. The purpose of this is to know when low-end user machines will go from "decent" to some lesser performance level, even though high-end machines would see no difference. An example to clarify: Let's say a low-end machine can capture a frame in 30 ms, while a high-end one can capture a frame in 10 ms. Both can maintain 30 FPS capture (33 ms per frame). Then, a regressing change is committed where tab capture takes 20% longer per frame. Now, the low-end machine cannot keep up a 30 FPS capture rate; while the high-end machine (such as a perf bot) is still just fine. I wouldn't mind some guidance here. Is this the right approach to perf testing? Otherwise, how do we detect "slowly-creeping regressions" over time? Also worth mentioning, bug 567848 is open where I'd really like to re-visit the design and monitoring of the performance_browser_tests. As always, the problem is eng resources. ;-)
,
Oct 24 2016
sullivan: If you'd really like me to add a bug, I can. However, I don't expect further regressions from here. Also the GPU team is hoping to move the display compositor doing these long draws from the browser process to the gpu process, which will involve a lot of thinking about this particular problem (and scheduling this work against gpu work there). I don't think having an open bug against this will keep it from being forgotten; if anything, it'll just be an open nag that will eventually just get closed. miu: I think what these tests are doing is one way to do perf testing. I think in this case, there's just a bad edge case (really long browser draws causing increased latency). I think if you overload everything, then there's some argument about whether you're still testing a realistic scenario or just testing an edge case. I think these 50ms draws are not usual for most pages, even on low end machines. In this case, I think it's a question of content more than machine perf? I think in general I prefer testing that records thread times and isn't limited by vsync. If you're limited to vsync, then unless you're overloaded, you can't see changes because everything runs at 60hz or whatever until things get really bad. Thread times (or power?) can give a little bit more fine grained sense of how much work is being done. It won't catch things like poor pipelining or weird scheduling, but it will help with gradual slowness. I'm not sure what the right answer is here either. :)
,
Oct 24 2016
Hmm...The answer might be both: Some perf tests centered on the user experience, which would generate bugs as "almost P0 regressions" if they trigger an alert; while others that directly measure CPU execution time of expensive code paths (via tracing, perhaps?) to detect "regression creep." I'll have to sit down and think about this a bit more when I get started on bug 567848. Thanks for your input. :)
,
Mar 5 2017
I just happened to see the TabCapturePerformance/CaptureSucceeded jump linked to this bug. The doc linked from comment 7 doesn't mention that metric, and comments 12-14 suggest this is a real regression. Why is this jump WontFix?
,
Mar 6 2017
We don't believe the test is representative of real behaviour, and the outcome is explained. See #10 to #14. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by kouhei@chromium.org
, Jul 28 2016