Issue metadata
Sign in to add a comment
|
1.8% regression in system_health.memory_desktop at 498509:498605 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Sep 5 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8969252336227568816
,
Sep 6 2017
=== Auto-CCing suspected CL author bnc@chromium.org === Hi bnc@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Bence Béky Commit : 230ac614ca52360e6d51f1820f6c25857e3d8a27 Date : Wed Aug 30 19:17:08 2017 Subject: Ignore AltSvc response headers on connections with certificate errors. Bisect Details Configuration: winx64_10_perf_bisect Benchmark : system_health.memory_desktop Metric : memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/browse_media/browse_media_youtube Change : 1.43% | 21957975.6667 -> 22271903.1667 Revision Result N chromium@498508 21957976 +- 147556 6 good chromium@498557 21997566 +- 184867 6 good chromium@498560 22014967 +- 257705 6 good chromium@498562 22046693 +- 410401 9 good chromium@498563 22331099 +- 167789 6 bad <-- chromium@498569 22273735 +- 161875 6 bad chromium@498581 22333277 +- 245940 6 bad chromium@498605 22271903 +- 118691 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.media.youtube system_health.memory_desktop More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8969252336227568816 For feedback, file a bug with component Speed>Bisection
,
Oct 18 2017
bnc: ping? bisect reproduced a 284kib regression at your CL. Can you take a look?
,
Oct 19 2017
This is a runtime memory usage regression, not a binary size regression, is that correct? This affects the production build, not any particular test target, is that correct? I find it difficult to believe that this is caused by the CL mentioned above, so I'd like to confirm the regression locally. What --browser argument shall I pass to run_benchmark? If I pass --browser=release_x64, it says "Cannot find browser of type release_x64." Is run_benchmark running gn gen and ninja, or do I have to do that? If I do it, how do I tell run_benchmark where the gn output directory is? --browser=list tells me beta, dev, stable, system, but none of these are local builds. I read the documentation at https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md but I was not able to find the answers to any of these questions. Thank you for your help.
,
Oct 19 2017
Invoking `tools/perf/run_benchmark try winx64-10 --story-filter=browse.media.youtube system_health.memory_desktop` on https://crrev.com/c/728700 instead of trying to reproduce locally.
,
Oct 19 2017
+Ned can you help with #5?
,
Oct 19 2017
releases_x64 should be supported, it's probably because you use a non traditional outdir. What happen if you run the benchmark with "--chromium-output-directory=<path to your out dir>"? If that doesn't work, you can also try: --browser=exact --browser-executable==<path to your browser binary>
,
Oct 19 2017
Thank you, --browser-executable does the trick. Awesome!
,
Oct 20 2017
Try job winx64_10_perf_bisect finished on https://crrev.com/c/728700, see https://build.chromium.org/p/tryserver.chromium.perf/builders/winx64_10_perf_bisect/builds/1551. HTML results, Select a reference column => TOT, show all: every single line is marked as insignificant. I'm now running the same try job on a full revert, reverting both production and test changes: https://crrev.com/c/731404. Also running the same benchmark locally on the same full revert.
,
Oct 20 2017
+Ben in case you need help with interpreting HTML results
,
Oct 21 2017
The try job finished for https://crrev.com/c/731404. I need help interpreting the results. The numbers seem to be all over the place. I do not actually find the memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/browse_media/browse_media_youtube metric. Many metrics seem to be better on ToT than with the patch (that revert the culprit CL above), a few seem to be worse, is any of this statistically significant? Thanks! Again, does this memory regression relate to production only, or do tests also affect the metrics? Knowing this might help understand where the regression is coming from.
,
Oct 24 2017
Here's a link to the html results, filtered to only show the top-level memory measurements, grouping results by story name: https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-10-20_16-48-14#q=reported&r=TOT&s=%25%CE%94avg&g=name.stories&c=1&d=&n=0 It might take a few seconds to load because there are a lot of stories in that benchmark. Make sure the black arrow at the top of the Patch column points down. (There might be a bug with deep-linking the sort direction.) The stories are like test cases. The dashboard computes regression alerts on a per-story bases. You want to make sure that each and every one of your test cases (stories) doesn't regress. On the left side of the Name column, click the black arrows to expand sub rows to look at per-story results. Effective_size (averaged over all stories) increased by 2.7%. load:tools:drive increased by 5.7%, but apparently that's not statistically significant, possibly because there are only 3 samples because pageset-repeat was only 3. If you want to be sure that the 5.7% increase is not statistically significant, you might want to set pageset-repeat to 10 or 20. (Of course, with more repeats, the results.html will take proportionally longer to load.) effective_size in long_running:tools:gmail-foreground increased by 5.595%, and that is statistically significant, possibly partly because there are 306 samples. I'm not sure why there are so many more samples even though the pageset-repeat was only 3, maybe something about how memoryMetric works or something about how the benchmark is configured. Anyway, click on the memory:chrome:all_processes:reported_by_chrome:effective_size > long_running:tools:gmail-foreground cell in the Patch column to see more about it. The breakdowns (stacked bar chart + table) in the histogram diagnostics show that the renderer_processes' v8:heap:old_space measurement is the biggest contributor. Click the links in the breakdown to narrow in on that measurement. Fortunately, it still has the red frown emoji, the regression is here. In the grouping-picker (checkboxes and left/right arrows for groupings like name, stories, storysetRepeats, etc.), enable storysetRepeats to see which repetition exhibits the most significant change. In this case, it's 1. Click the histogram icon (3 blue bars) in the Name column to open the cells in both columns to compare the distributions and breakdowns. Click on the metadata tab to see traceUrls. If you group by stories and storysetRepeats, there'll only be 1 link: https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/long_running_tools_gmail_foreground_2017-10-20_13-36-22_58293.html You can click on the memory dump dots at the top of the trace and explore them, but I don't know how exactly that's going to help you because I don't know how your change works. If you have questions about how to interpret the memory dumps, you can ask memory-infra (hjd, perezju). Once you figure out why v8:heap:old_space regressed for that page, and either fix it or agree with memory-infra to ignore it, then go back to load:tools:drive and maybe some of the other stories to see why their effective_size increased so much. Once you are happy with effective_size, go back to some of the other measurements and dig down into them, as well. Does that help? Please feel free to schedule a meeting with me if you have more questions about interpreting results.html. I'm in the process of writing a doc about how to interpret this pivot table containing all this information and more. Can I ask for your feedback when it's ready?
,
Oct 30 2017
Re comment #13: Thank you for the detailed walkthrough, it is really helpful. When looking at, for example, the three storysetRepeats of long_running:tools:gmail-foreground, the numbers are all over the place: +28.860%, +9.706%, and -16.286%. Surely the mean is +5.595%, which is scary, but I have difficulty believing that this is significant. Most importantly so because this is the regression of a patch using ToT as a baseline. Now the patch in question is the revert of the suspected culprit above. A regression of a revert means the original change was an improvement. When I look through every story in that file, I see no improvement of the revert, that is, no regression of the suspected culprit. I ran the benchmark with --story-filter=browse.media.youtube locally, both on https://crrev.com/c/728700 and https://crrev.com/c/731404. It shows no regressions. I think this bug can be closed as WontFix, but I invite you to prove me wrong. If you can find a regression, I'd be glad to look at the traces to see if there is anything relevant to the change. Thank you.
,
Nov 1 2017
It looks to me like it's probably ok to mark this WontFix. I don't know if pageset-repeat=3 is enough data to convince one way or another, especially if the noise between repetitions is so high, but it was only 1.8% to begin with, and the chart in comment #1 has improved more than that since the regression. I'll keep thinking and talking about how to make it easier to find regressions or lack thereof in results.html.
,
Nov 2 2017
Thank you. I look forward to further improvements to the memory regression tool, I think it is really amazing already (despite potential false positives)! |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Sep 5 2017