Issue metadata
Sign in to add a comment
|
5% regression in power.desktop at 576324:576365 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 23
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/16d3588ba40000
,
Jul 23
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/16d3588ba40000 Remove content border for opaque frame in Refresh by pbos@chromium.org https://chromium.googlesource.com/chromium/src/+/5e35d8a8cb7c2547cd2d13e1b5ab283a112cf900 0.1469 → 0.152 (+0.00507) Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Jul 25
charliea@ I believe you're the owner of this suite. The best hypothesis we have is that this change made the content area 2 columns and one row (1px thick) larger. This might've added additional tiles in the compositor so it might not be entirely proportional to client-area size. Is there any way for us to test this hypothesis, e.g. rerun the same tests with a slightly smaller window?
,
Jul 30
Ned, do you have any idea if this is possible in Telemetry?
,
Jul 30
Yes. You can hack this locally & run the benchmark: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/desktop_browser_finder.py?rcl=dd9e5b9c8c10464e1c4ec227ae7edbc626138acc&l=191
,
Jul 30
Awesome. pbos@, to clarify: 1) Edit the resolution that Ned linked to to a resolution that you think will make the problem go away (it sounds like 1px row shorter and 2px wide thinner) 2) Upload a patchset with that change 3) Run a perf try job on the affected platform (linux-perf) using the instructions at https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_trybots.md. When asked for the patch URL, link to the URL of the patch uploaded in step 2. When asked for the bug, list this one. 4) Wait until the perf try job results come back 5) Make sure that you see the drop in cpu_time_percentage_avg that you expected.
,
Aug 1
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/128d91dba40000
,
Aug 1
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/128d91dba40000 Buildbucket says the build completed successfully, but Pinpoint can't find the isolate hash.
,
Aug 1
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000
,
Aug 1
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/11aff167a40000
,
Aug 1
📍 Job complete. See results below. https://pinpoint-dot-chromeperf.appspot.com/job/11aff167a40000
,
Aug 1
📍 Job complete. See results below. https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000
,
Aug 1
,
Aug 1
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000
,
Aug 1
📍 Job complete. See results below. https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000
,
Aug 2
https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000 tests our hypothesis by removing 1 row and 2 columns https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000 reverts the original patch Neither of these show any significant improvement (cpu_time_percentage remains within a percent) as far as I can tell. Can you help us interpret these results otherwise, find out what we're setting up wrong to reproduce this or otherwise retriage? We'd be happy to help but if we can't even revert in a tryjob to get the original cpu usage I'm not sure what to do here.
,
Aug 2
Assign to Bruce, power benchmark owner
,
Aug 17
Sorry for the two-week delay - I was tracking a gnarly memory leak. I think the reason that the pinpoint job results don't show an improvement is that only a single benchmark regressed (TrivialGifPageSharedPageState), but multiple tests are run in the pinpoint jobs. Once that is allowed for the revert does show a distinct improvement. I took a look at the https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000 results (reverting the original patch), used Export to get CSV data, pasted that into Sheets, told it to split text to columns, set text wrapping to overflow, deleted unwanted columns, and then filtered down to just the TrivialGifPageSharedPageState results with the cpu_time_percentage metric. These are the results over the ten runs of this benchmark with the two patches: TrivialGifPageSharedPageState 8768493 8768493 + 1f669c2 0.194 0.184 0.117 0.116 0.192 0.183 0.201 0.182 0.193 0.185 0.191 0.186 0.116 0.116 0.191 0.185 0.193 0.183 0.192 0.186 0.178 0.171 In short, reverting the patch improved the results from 0.178 to 0.171 which corresponds to the TrivialGifPageSharedPageState regression shown here: https://chromeperf.appspot.com/group_report?sid=a8213cf6bd08dfc3d0ecc1ccadc8ae234ca8d97a36ef09cce528354e29e92ca3 The bimodal nature of the data (either ~116 ms or ~185 ms) is worrisome, but both runs had an equal number of fast runs (always the second and seventh?) so that shouldn't matter. I did the same thing for the other pinpoint results at https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000 (removing 1 row and 2 columns) and got these results: TrivialGifPageSharedPageState ab656f5 ab656f5 + 638910f 0.193 0.184 0.194 0.190 0.117 0.116 0.196 0.186 0.192 0.191 0.117 0.115 0.118 0.118 0.116 0.117 0.116 0.116 0.117 0.118 0.148 0.145 The results are again bimodal, although in a different way, but again it doesn't seem to matter. In this case the difference is *probably* within the noise, but I'm not sure. My take on the results is that the regression is real. I have some questions for Ned and Peter: 1) Is crbug.com/866090 related? It shows up on the regression graphs and it regresses both TrivialGifPageSharedPageState and TrivialGifPageSharedPageState_ref. I don't understand the difference between the two metrics and the first (non-ref) and second (both) regressions. 2) Is there an easier way to see the performance of individual tests on these pinpoint runs? There must be. 3) Any other ideas about what from that patch could be causing the regression? 4) The change from reverting the patch is small and noisy - can we get another pinpoint job do double-check that? 5) Is there some reasonable way of comparing the trace files to see where the differences are coming from? I've tried spot checking various parts but I can't tell what is real and it doesn't scale well. Given that it looks like the change *is* causing a (small, Linux only, but real) regression I'm assigning back to pbos@ A (Google only) spreadsheet with the analysis shown above is here: https://docs.google.com/spreadsheets/d/1tHCupAM7tfgtRP_wtreA5tX3YfBfdtoEMHP9FgyhtHk/edit?usp=sharing
,
Aug 17
dtu@ to answer Bruce's question in #9
,
Aug 17
-> tbergquist@ who's agreed to take a look as I'm out for 4 weeks and hopefully we can get to the bottom of it. There might be a partial revert that helps narrow down where the regression actually is.
,
Aug 27
Thanks, Bruce, for the analysis. The bimodal distribution is really weird. It looks like the regression only appears in one of the modes, namely the ~185ms one. If that's the case the second job has only four eligible data points, not nearly enough to go on. I'll kick off another pair of jobs. As an aside, this kind of distribution leads me to suspect environmental differences in the benchmark hosts (hardware config, software config, or resource contention if they're on shared hardware). The control and test runs do appear to be paired (i.e. on the same host), which supports that hypothesis.
,
Aug 27
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1628e805640000
,
Aug 27
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1414bb6d640000
,
Aug 27
📍 Job complete. See results below. https://pinpoint-dot-chromeperf.appspot.com/job/1414bb6d640000
,
Aug 27
📍 Job complete. See results below. https://pinpoint-dot-chromeperf.appspot.com/job/1628e805640000
,
Aug 27
https://pinpoint-dot-chromeperf.appspot.com/job/1628e805640000 tests the hypothesis. https://pinpoint-dot-chromeperf.appspot.com/job/1414bb6d640000 tests the revert. Looks like the revert fails to merge now, which I guess is fine since I didn't need that data anyways. Here's the data: df15dab df15dab+638910f regression % 0.1899694621 0.1801701545 0.9484164059 0.1085795548 0.1083581776 0.9979611524 0.1905268334 0.1816287077 0.9532972573 0.1875011751 0.1871311161 0.9980263646 0.1114206924 0.1102842407 0.9898003531 0.1907618192 0.1799659337 0.9434064659 0.191645106 0.1808107497 0.9434665643 0.1103246912 0.1129628108 1.023912322 0.1908240446 0.1818222499 0.9528267276 0.1097089277 0.1102283813 1.004734834 It's bimodal again, and it recovers from the regression in the 185 mode to the same degree as the revert did above. I think that we can conclude that the regression is indeed caused by the enlarged client area.
,
Aug 27
Sorry for the slow response, this fell off my radar for a bit. https://dev-dtu-63920e4c-dot-pinpoint-dot-chromeperf.appspot.com/job/16d3588ba40000 Here's an experimental view of the original bisect job, showing means overlaid histograms. It shows a strong bimodality, and the proportion of low vs high runs is consistent across all revisions. I agree with Taylor's assessment: the regression only shows up in the top mode, not the bottom mode. And because Pinpoint has device pinning (every revision runs on the same set of devices, in the same order), the paired results means that the mode depends on the device. The hardware is not shared, so the differences must be in the hardware or software environment.
,
Aug 27
Wow, that's an amazing view. Is that going live soon? In terms of this regression, given the above ^, it's clear that the observed regression is entirely due to an enlarged client area in the test. There is no regression in real performance; we're just measuring performance slightly differently now. I'll wontfix this. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 23