New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 762300 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Nov 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

1.8% regression in system_health.memory_desktop at 498509:498605

Project Member Reported by briander...@chromium.org, Sep 5 2017

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=762300

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=01d5acaa67beb823760cbb25ec3ce702b0972d358c1758a0c7aeaf74b92db791


Bot(s) for this bug's original alert(s):

chromium-rel-win10
Cc: b...@chromium.org
Owner: b...@chromium.org
Status: Assigned (was: Untriaged)

=== Auto-CCing suspected CL author bnc@chromium.org ===

Hi bnc@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Bence Béky
  Commit : 230ac614ca52360e6d51f1820f6c25857e3d8a27
  Date   : Wed Aug 30 19:17:08 2017
  Subject: Ignore AltSvc response headers on connections with certificate errors.

Bisect Details
  Configuration: winx64_10_perf_bisect
  Benchmark    : system_health.memory_desktop
  Metric       : memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/browse_media/browse_media_youtube
  Change       : 1.43% | 21957975.6667 -> 22271903.1667

Revision             Result                  N
chromium@498508      21957976 +- 147556      6      good
chromium@498557      21997566 +- 184867      6      good
chromium@498560      22014967 +- 257705      6      good
chromium@498562      22046693 +- 410401      9      good
chromium@498563      22331099 +- 167789      6      bad       <--
chromium@498569      22273735 +- 161875      6      bad
chromium@498581      22333277 +- 245940      6      bad
chromium@498605      22271903 +- 118691      6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.media.youtube system_health.memory_desktop

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8969252336227568816


For feedback, file a bug with component Speed>Bisection
bnc: ping? bisect reproduced a 284kib regression at your CL. Can you take a look?

Comment 5 by b...@chromium.org, Oct 19 2017

This is a runtime memory usage regression, not a binary size regression, is that correct?  This affects the production build, not any particular test target, is that correct?

I find it difficult to believe that this is caused by the CL mentioned above, so I'd like to confirm the regression locally.  What --browser argument shall I pass to run_benchmark?  If I pass --browser=release_x64, it says "Cannot find browser of type release_x64."  Is run_benchmark running gn gen and ninja, or do I have to do that?  If I do it, how do I tell run_benchmark where the gn output directory is?  --browser=list tells me beta, dev, stable, system, but none of these are local builds.

I read the documentation at https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md but I was not able to find the answers to any of these questions.

Thank you for your help.

Comment 6 by b...@chromium.org, Oct 19 2017

Invoking `tools/perf/run_benchmark try winx64-10 --story-filter=browse.media.youtube system_health.memory_desktop` on https://crrev.com/c/728700 instead of trying to reproduce locally.
Cc: nedngu...@google.com
+Ned can you help with #5?
releases_x64 should be supported, it's probably because you use a non traditional outdir.

What happen if you run the benchmark with "--chromium-output-directory=<path to your out dir>"?

If that doesn't work, you can also try:

--browser=exact --browser-executable==<path to your browser binary>

Comment 9 by b...@chromium.org, Oct 19 2017

Thank you, --browser-executable does the trick.  Awesome!

Comment 10 by b...@chromium.org, Oct 20 2017

Status: Started (was: Assigned)
Try job winx64_10_perf_bisect finished on https://crrev.com/c/728700, see https://build.chromium.org/p/tryserver.chromium.perf/builders/winx64_10_perf_bisect/builds/1551.  HTML results, Select a reference column => TOT, show all: every single line is marked as insignificant.

I'm now running the same try job on a full revert, reverting both production and test changes: https://crrev.com/c/731404.

Also running the same benchmark locally on the same full revert.
Cc: benjhayden@chromium.org
+Ben in case you need help with interpreting HTML results 

Comment 12 by b...@chromium.org, Oct 21 2017

The try job finished for https://crrev.com/c/731404.  I need help interpreting the results.  The numbers seem to be all over the place.  I do not actually find the memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/browse_media/browse_media_youtube metric.  Many metrics seem to be better on ToT than with the patch (that revert the culprit CL above), a few seem to be worse, is any of this statistically significant?  Thanks!

Again, does this memory regression relate to production only, or do tests also affect the metrics?  Knowing this might help understand where the regression is coming from.
Here's a link to the html results, filtered to only show the top-level memory measurements, grouping results by story name:

https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-10-20_16-48-14#q=reported&r=TOT&s=%25%CE%94avg&g=name.stories&c=1&d=&n=0

It might take a few seconds to load because there are a lot of stories in that benchmark.
Make sure the black arrow at the top of the Patch column points down. (There might be a bug with deep-linking the sort direction.)

The stories are like test cases. The dashboard computes regression alerts on a per-story bases. You want to make sure that each and every one of your test cases (stories) doesn't regress. On the left side of the Name column, click the black arrows to expand sub rows to look at per-story results.

Effective_size (averaged over all stories) increased by 2.7%.
load:tools:drive increased by 5.7%, but apparently that's not statistically significant, possibly because there are only 3 samples because pageset-repeat was only 3. If you want to be sure that the 5.7% increase is not statistically significant, you might want to set pageset-repeat to 10 or 20. (Of course, with more repeats, the results.html will take proportionally longer to load.)

effective_size in long_running:tools:gmail-foreground increased by 5.595%, and that is statistically significant, possibly partly because there are 306 samples. I'm not sure why there are so many more samples even though the pageset-repeat was only 3, maybe something about how memoryMetric works or something about how the benchmark is configured. Anyway, click on the memory:chrome:all_processes:reported_by_chrome:effective_size > long_running:tools:gmail-foreground cell in the Patch column to see more about it. The breakdowns (stacked bar chart + table) in the histogram diagnostics show that the renderer_processes' v8:heap:old_space measurement is the biggest contributor. Click the links in the breakdown to narrow in on that measurement. Fortunately, it still has the red frown emoji, the regression is here.

In the grouping-picker (checkboxes and left/right arrows for groupings like name, stories, storysetRepeats, etc.), enable storysetRepeats to see which repetition exhibits the most significant change. In this case, it's 1.
Click the histogram icon (3 blue bars) in the Name column to open the cells in both columns to compare the distributions and breakdowns.

Click on the metadata tab to see traceUrls. If you group by stories and storysetRepeats, there'll only be 1 link:
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/long_running_tools_gmail_foreground_2017-10-20_13-36-22_58293.html
You can click on the memory dump dots at the top of the trace and explore them, but I don't know how exactly that's going to help you because I don't know how your change works. If you have questions about how to interpret the memory dumps, you can ask memory-infra (hjd, perezju).

Once you figure out why v8:heap:old_space regressed for that page, and either fix it or agree with memory-infra to ignore it, then go back to load:tools:drive and maybe some of the other stories to see why their effective_size increased so much. Once you are happy with effective_size, go back to some of the other measurements and dig down into them, as well.

Does that help?
Please feel free to schedule a meeting with me if you have more questions about interpreting results.html.
I'm in the process of writing a doc about how to interpret this pivot table containing all this information and more. Can I ask for your feedback when it's ready?

Comment 14 by b...@chromium.org, Oct 30 2017

Re comment #13: Thank you for the detailed walkthrough, it is really helpful.

When looking at, for example, the three storysetRepeats of long_running:tools:gmail-foreground, the numbers are all over the place: +28.860%, +9.706%, and -16.286%.  Surely the mean is +5.595%, which is scary, but I have difficulty believing that this is significant.

Most importantly so because this is the regression of a patch using ToT as a baseline.  Now the patch in question is the revert of the suspected culprit above.  A regression of a revert means the original change was an improvement.  When I look through every story in that file, I see no improvement of the revert, that is, no regression of the suspected culprit.

I ran the benchmark with --story-filter=browse.media.youtube locally, both on https://crrev.com/c/728700 and https://crrev.com/c/731404.  It shows no regressions.

I think this bug can be closed as WontFix, but I invite you to prove me wrong.  If you can find a regression, I'd be glad to look at the traces to see if there is anything relevant to the change.  Thank you.
It looks to me like it's probably ok to mark this WontFix. I don't know if pageset-repeat=3 is enough data to convince one way or another, especially if the noise between repetitions is so high, but it was only 1.8% to begin with, and the chart in comment #1 has improved more than that since the regression.

I'll keep thinking and talking about how to make it easier to find regressions or lack thereof in results.html.

Comment 16 by b...@chromium.org, Nov 2 2017

Status: WontFix (was: Started)
Thank you.  I look forward to further improvements to the memory regression tool, I think it is really amazing already (despite potential false positives)!

Sign in to add a comment