New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 866574 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Aug 27
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

5% regression in power.desktop at 576324:576365

Project Member Reported by sullivan@chromium.org, Jul 23

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=866574

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=a8213cf6bd08dfc3d0ecc1ccadc8ae234ca8d97a36ef09cce528354e29e92ca3


Bot(s) for this bug's original alert(s):

linux-perf
Cc: pbos@chromium.org
Owner: pbos@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/16d3588ba40000

Remove content border for opaque frame in Refresh by pbos@chromium.org
https://chromium.googlesource.com/chromium/src/+/5e35d8a8cb7c2547cd2d13e1b5ab283a112cf900
0.1469 → 0.152 (+0.00507)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions
Cc: charliea@chromium.org
charliea@ I believe you're the owner of this suite. The best hypothesis we have is that this change made the content area 2 columns and one row (1px thick) larger. This might've added additional tiles in the compositor so it might not be entirely proportional to client-area size.

Is there any way for us to test this hypothesis, e.g. rerun the same tests with a slightly smaller window?
Cc: nednguyen@chromium.org
Ned, do you have any idea if this is possible in Telemetry?
Awesome. pbos@, to clarify:

1) Edit the resolution that Ned linked to to a resolution that you think will make the problem go away (it sounds like 1px row shorter and 2px wide thinner)
2) Upload a patchset with that change
3) Run a perf try job on the affected platform (linux-perf) using the instructions at https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_trybots.md. When asked for the patch URL, link to the URL of the patch uploaded in step 2. When asked for the bug, list this one.
4) Wait until the perf try job results come back
5) Make sure that you see the drop in cpu_time_percentage_avg that you expected.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/128d91dba40000

Buildbucket says the build completed successfully, but Pinpoint can't find the isolate hash.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/11aff167a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000
Owner: cyan@chromium.org
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000
Cc: cyan@chromium.org
Owner: nednguyen@chromium.org
https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000 tests our hypothesis by removing 1 row and 2 columns

https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000 reverts the original patch

Neither of these show any significant improvement (cpu_time_percentage remains within a percent) as far as I can tell. Can you help us interpret these results otherwise, find out what we're setting up wrong to reproduce this or otherwise retriage? We'd be happy to help but if we can't even revert in a tryjob to get the original cpu usage I'm not sure what to do here.
Owner: brucedaw...@chromium.org
Assign to Bruce, power benchmark owner
Owner: pbos@chromium.org
Sorry for the two-week delay - I was tracking a gnarly memory leak.

I think the reason that the pinpoint job results don't show an improvement is that only a single benchmark regressed (TrivialGifPageSharedPageState), but multiple tests are run in the pinpoint jobs. Once that is allowed for the revert does show a distinct improvement.

I took a look at the https://pinpoint-dot-chromeperf.appspot.com/job/149adf40640000 results (reverting the original patch), used Export to get CSV data, pasted that into Sheets, told it to split text to columns, set text wrapping to overflow, deleted unwanted columns, and then filtered down to just the TrivialGifPageSharedPageState results with the cpu_time_percentage metric. These are the results over the ten runs of this benchmark with the two patches:

TrivialGifPageSharedPageState
8768493 8768493 + 1f669c2
0.194	0.184
0.117	0.116
0.192	0.183
0.201	0.182
0.193	0.185
0.191	0.186
0.116	0.116
0.191	0.185
0.193	0.183
0.192	0.186
	
0.178	0.171

In short, reverting the patch improved the results from 0.178 to 0.171 which corresponds to the TrivialGifPageSharedPageState regression shown here:
 https://chromeperf.appspot.com/group_report?sid=a8213cf6bd08dfc3d0ecc1ccadc8ae234ca8d97a36ef09cce528354e29e92ca3

The bimodal nature of the data (either ~116 ms or ~185 ms) is worrisome, but both runs had an equal number of fast runs (always the second and seventh?) so that shouldn't matter.

I did the same thing for the other pinpoint results at https://pinpoint-dot-chromeperf.appspot.com/job/11aaf933a40000 (removing 1 row and 2 columns) and got these results:

TrivialGifPageSharedPageState	
ab656f5	ab656f5 + 638910f
0.193	0.184
0.194	0.190
0.117	0.116
0.196	0.186
0.192	0.191
0.117	0.115
0.118	0.118
0.116	0.117
0.116	0.116
0.117	0.118
	
0.148	0.145

The results are again bimodal, although in a different way, but again it doesn't seem to matter. In this case the difference is *probably* within the noise, but I'm not sure.

My take on the results is that the regression is real. I have some questions for Ned and Peter:
1) Is crbug.com/866090 related? It shows up on the regression graphs and it regresses both TrivialGifPageSharedPageState and TrivialGifPageSharedPageState_ref. I don't understand the difference between the two metrics and the first (non-ref) and second (both) regressions.
2) Is there an easier way to see the performance of individual tests on these pinpoint runs? There must be.
3) Any other ideas about what from that patch could be causing the regression?
4) The change from reverting the patch is small and noisy - can we get another pinpoint job do double-check that?
5) Is there some reasonable way of comparing the trace files to see where the differences are coming from? I've tried spot checking various parts but I can't tell what is real and it doesn't scale well.

Given that it looks like the change *is* causing a (small, Linux only, but real) regression I'm assigning back to pbos@

A (Google only) spreadsheet with the analysis shown above is here:
https://docs.google.com/spreadsheets/d/1tHCupAM7tfgtRP_wtreA5tX3YfBfdtoEMHP9FgyhtHk/edit?usp=sharing
Cc: dtu@chromium.org
dtu@ to answer Bruce's question in #9
Owner: tbergquist@chromium.org
-> tbergquist@ who's agreed to take a look as I'm out for 4 weeks and hopefully we can get to the bottom of it. There might be a partial revert that helps narrow down where the regression actually is.
Status: Started (was: Assigned)
Thanks, Bruce, for the analysis.
The bimodal distribution is really weird.  It looks like the regression only appears in one of the modes, namely the ~185ms one.  If that's the case the second job has only four eligible data points, not nearly enough to go on.  I'll kick off another pair of jobs.

As an aside, this kind of distribution leads me to suspect environmental differences in the benchmark hosts (hardware config, software config, or resource contention if they're on shared hardware).  The control and test runs do appear to be paired (i.e. on the same host), which supports that hypothesis.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/1414bb6d640000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/1628e805640000
https://pinpoint-dot-chromeperf.appspot.com/job/1628e805640000 tests the hypothesis.
https://pinpoint-dot-chromeperf.appspot.com/job/1414bb6d640000 tests the revert.

Looks like the revert fails to merge now, which I guess is fine since I didn't need that data anyways.

Here's the data:

df15dab		df15dab+638910f	regression %
0.1899694621	0.1801701545	0.9484164059
0.1085795548	0.1083581776	0.9979611524
0.1905268334	0.1816287077	0.9532972573
0.1875011751	0.1871311161	0.9980263646
0.1114206924	0.1102842407	0.9898003531
0.1907618192	0.1799659337	0.9434064659
0.191645106	0.1808107497	0.9434665643
0.1103246912	0.1129628108	1.023912322
0.1908240446	0.1818222499	0.9528267276
0.1097089277	0.1102283813	1.004734834

It's bimodal again, and it recovers from the regression in the 185 mode to the same degree as the revert did above.  I think that we can conclude that the regression is indeed caused by the enlarged client area.
Sorry for the slow response, this fell off my radar for a bit.

https://dev-dtu-63920e4c-dot-pinpoint-dot-chromeperf.appspot.com/job/16d3588ba40000
Here's an experimental view of the original bisect job, showing means overlaid histograms. It shows a strong bimodality, and the proportion of low vs high runs is consistent across all revisions.

I agree with Taylor's assessment: the regression only shows up in the top mode, not the bottom mode. And because Pinpoint has device pinning (every revision runs on the same set of devices, in the same order), the paired results means that the mode depends on the device. The hardware is not shared, so the differences must be in the hardware or software environment.
Status: WontFix (was: Started)
Wow, that's an amazing view.  Is that going live soon?

In terms of this regression, given the above ^, it's clear that the observed regression is entirely due to an enlarged client area in the test.  There is no regression in real performance; we're just measuring performance slightly differently now.  I'll wontfix this.

Sign in to add a comment