Issue metadata
Sign in to add a comment
|
29.1%-255.8% regression in system_health.memory_mobile at 484350:484459 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jul 7 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8974716540724065008
,
Jul 7 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8974716502302334464
,
Jul 7 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8974716489189056976
,
Jul 7 2017
=== Auto-CCing suspected CL author fsamuel@chromium.org === Hi fsamuel@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Fady Samuel Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f Date : Thu Jul 06 00:10:04 2017 Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport. Bisect Details Configuration: android_nexus7_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_news/browse_news_cricbuzz Change : 268.75% | 27850780.0 -> 102698502.667 Revision Result N chromium@484349 27850780 +- 2013918 6 good chromium@484373 27850780 +- 2072846 6 good chromium@484376 28595112 +- 2007183 6 good chromium@484377 93601061 +- 31331934 6 bad <-- chromium@484378 102335324 +- 8153341 6 bad chromium@484379 99391022 +- 17458239 6 bad chromium@484385 103040414 +- 4940776 6 bad chromium@484397 94697079 +- 23675780 6 bad chromium@484445 102698503 +- 7806230 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.cricbuzz system_health.memory_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8974716540724065008 For feedback, file a bug with component Speed>Bisection
,
Jul 7 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Fady Samuel Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f Date : Thu Jul 06 00:10:04 2017 Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport. Bisect Details Configuration: android_nexus7_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_shopping/browse_shopping_amazon Change : 189.17% | 41314856.0 -> 119469266.667 Revision Result N chromium@484349 41314856 +- 1095.45 6 good chromium@484373 41498162 +- 641592 6 good chromium@484376 41314856 +- 1095.45 6 good chromium@484377 120864437 +- 8666336 6 bad <-- chromium@484378 126981531 +- 26279092 6 bad chromium@484379 118076022 +- 13603738 6 bad chromium@484385 125291248 +- 20978677 6 bad chromium@484397 129295093 +- 15512006 6 bad chromium@484445 119469267 +- 9730861 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.shopping.amazon system_health.memory_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8974716489189056976 For feedback, file a bug with component Speed>Bisection
,
Jul 7 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Fady Samuel Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f Date : Thu Jul 06 00:10:04 2017 Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport. Bisect Details Configuration: android_nexus7_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_news/browse_news_cnn Change : 232.16% | 31612114.0 -> 105003692.0 Revision Result N chromium@484349 31612114 +- 86726.4 6 good chromium@484373 31627948 +- 0.0 6 good chromium@484376 31627948 +- 0.0 6 good chromium@484377 107581441 +- 11483733 6 bad <-- chromium@484378 107860519 +- 13784607 6 bad chromium@484379 110435538 +- 14193622 6 bad chromium@484385 105055575 +- 21926734 6 bad chromium@484397 109569367 +- 12776588 6 bad chromium@484445 105003692 +- 21344369 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.cnn system_health.memory_mobile More information on addressing performance regressions: http://g.co/ChromePerformanceRegressions Debug information about this bisect: https://chromeperf.appspot.com/buildbucket_job_status/8974716502302334464 For feedback, file a bug with component Speed>Bisection
,
Jul 7 2017
Issue 740127 has been merged into this issue.
,
Jul 7 2017
Issue 740127 has been merged into this issue.
,
Jul 8 2017
Issue 740123 has been merged into this issue.
,
Jul 8 2017
Issue 740123 has been merged into this issue.
,
Jul 8 2017
Issue 740119 has been merged into this issue.
,
Jul 8 2017
Issue 740123 has been merged into this issue.
,
Jul 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1 commit c085745ac17b783a62ddbd9d843f7cabe0fd4ea1 Author: Fady Samuel <fsamuel@chromium.org> Date: Tue Jul 11 14:37:15 2017 cc: Make sure resources are returned for old surfaces If a CompositorFrame is submitted to a CompositorFrameSink with a new LocalSurfaceId, then the old surface becomes a candidate for garbage collection. After the new surface is used in a display frame, the resources of the old frame should be returned back to the client for reuse. However, with this CL: https://chromium-review.googlesource.com/c/558472/ that behavior was broken because resources on a surface were unref'ed as long as that surface existed. A surface may not survive until the next display frame to be unref'ed. This problem is most apparent during resize. Bug: 740121 Change-Id: Idf681a387a535767fa38c966ee2191675502d9a3 Reviewed-on: https://chromium-review.googlesource.com/566332 Reviewed-by: John Bauman <jbauman@chromium.org> Commit-Queue: Fady Samuel <fsamuel@chromium.org> Cr-Commit-Position: refs/heads/master@{#485627} [modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface.h [modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator.cc [modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator.h [modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator_unittest.cc
,
Jul 11 2017
This should be fixed now.
,
Jul 13 2017
Issue 741819 has been merged into this issue.
,
Jul 13 2017
Re-opening. fsamuel: How do we know that this is fixed? The original CL was never reverted, so that means that when the "fix" landed, telemetry would not have a clean slate against which to compare the fix. The original CL had a 80MB memory regression. How do we know that original CL + fix has 0MB regression, rather than just a 2MB regression? The easiest solution would be to revert both the fix and the original CL, and then land both together, and see if alerts are fired. A harder solution requires you to look at the graphs/UMA graphs [See Isssue 741819] and convince me that the problem is fixed.
,
Jul 13 2017
The original CL was intended to be a refactor but a small simplification was proposed that turned out to be a silly mistake on my part. The fix is to restore the original behavior, putting the net change at just a mechanical refactor without any functional change.
,
Jul 13 2017
""" The fix is to restore the original behavior, putting the net change at just a mechanical refactor without any functional change. """ Why not just revert the original and reland a fixed refactor? Given the magnitude of the problem [multi-GB leak on Canary], I'd really like a more thorough approach to make sure that the problem is fully gone [and not just *mostly* gone].
,
Jul 13 2017
Do we have another way to verify the fix? A number of large refactors have landed on top of these CLs and so there's no real trivial revert...
,
Jul 13 2017
Can you post snapshots [or better yet, full memory dumps] of the telemetry tests before CL1, after CL1, and after CL2 [the fix], to confirm that all the numbers have recovered to expected levels?
,
Jul 13 2017
Sure, I'll look into this tomorrow! Thanks!
,
Aug 21 2017
Perf sheriff checking in. fsamuel: any update on this?
,
Sep 21 2017
fsamuel: ping on the memory snapshots/dumps for #21?
,
Jan 5 2018
fsamuel: ping on #21?
,
Jan 5 2018
Thanks for the ping! This got lost in the bugs assigned to me. I'll follow up soon on my backlog of bugs for viz finch trial.
,
Jan 25 2018
fsamuel: Perf sheriff ping again on #20
,
Jan 25 2018
+sadrul@: gpu team metrics lead Marking as fixed for this particular issue. Now that we have a full time gpu metrics team I think this work will be done in a far more systematic than I could possibly hope to do on my own. I'm confident that we'll have the mechanisms in place to catch regressions without my further immediate action.
,
Jan 25 2018
,
Jan 26 2018
Looking at this graph: https://chromeperf.appspot.com/report?sid=6e2130103e91a194bdae5e586b86b136b7f9c47cfc9ca2606f39dcb3aef8fae0&start_rev=483912&end_rev=489905 The initial regression in all three-graphs (by ~7MB, ~5MB, and ~10MB respectively) includes crrev.com/484377, which was detected as the CL causing the regressions. The last dip in all three graphs above (by ~6MB, ~6MB, ~10MB respectively) includes the subsequent fix (crrev.com/485627). However, it looks like there were some additional regressions in all three cases between the initial regression, and the subsequent fix. That makes it tricky to claim with confidence that the last dip recovers all of the regression in the first spike (because maybe it recovered some of the subsequent regressions instead?). I think erikchen@'s advice to completely revert first, and reland with the fix, is what we should do in such instances, so that it's easier to make no-regression claims confidently. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jul 7 2017