New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 740121 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

29.1%-255.8% regression in system_health.memory_mobile at 484350:484459

Project Member Reported by mustaq@chromium.org, Jul 7 2017

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=740121

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=b40edb5cf906b3c8199e5893dc32042704a1a43e01f11a95d4c3a5cbfed275ec


Bot(s) for this bug's original alert(s):

android-nexus6
android-nexus7v2
android-webview-nexus5X
Cc: fsam...@chromium.org
Owner: fsam...@chromium.org

=== Auto-CCing suspected CL author fsamuel@chromium.org ===

Hi fsamuel@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Fady Samuel
  Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f
  Date   : Thu Jul 06 00:10:04 2017
  Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport.

Bisect Details
  Configuration: android_nexus7_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_news/browse_news_cricbuzz
  Change       : 268.75% | 27850780.0 -> 102698502.667

Revision             Result                    N
chromium@484349      27850780 +- 2013918       6      good
chromium@484373      27850780 +- 2072846       6      good
chromium@484376      28595112 +- 2007183       6      good
chromium@484377      93601061 +- 31331934      6      bad       <--
chromium@484378      102335324 +- 8153341      6      bad
chromium@484379      99391022 +- 17458239      6      bad
chromium@484385      103040414 +- 4940776      6      bad
chromium@484397      94697079 +- 23675780      6      bad
chromium@484445      102698503 +- 7806230      6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.cricbuzz system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8974716540724065008


For feedback, file a bug with component Speed>Bisection

=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Fady Samuel
  Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f
  Date   : Thu Jul 06 00:10:04 2017
  Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport.

Bisect Details
  Configuration: android_nexus7_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_shopping/browse_shopping_amazon
  Change       : 189.17% | 41314856.0 -> 119469266.667

Revision             Result                     N
chromium@484349      41314856 +- 1095.45        6      good
chromium@484373      41498162 +- 641592         6      good
chromium@484376      41314856 +- 1095.45        6      good
chromium@484377      120864437 +- 8666336       6      bad       <--
chromium@484378      126981531 +- 26279092      6      bad
chromium@484379      118076022 +- 13603738      6      bad
chromium@484385      125291248 +- 20978677      6      bad
chromium@484397      129295093 +- 15512006      6      bad
chromium@484445      119469267 +- 9730861       6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.shopping.amazon system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8974716489189056976


For feedback, file a bug with component Speed>Bisection

=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Fady Samuel
  Commit : 28c5a8185d81adc5252e7280d5d0b652d998018f
  Date   : Thu Jul 06 00:10:04 2017
  Subject: Decouple cc::Surface from cc::CompositorFrameSinkSupport.

Bisect Details
  Configuration: android_nexus7_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:cc:effective_size_avg/browse_news/browse_news_cnn
  Change       : 232.16% | 31612114.0 -> 105003692.0

Revision             Result                     N
chromium@484349      31612114 +- 86726.4        6      good
chromium@484373      31627948 +- 0.0            6      good
chromium@484376      31627948 +- 0.0            6      good
chromium@484377      107581441 +- 11483733      6      bad       <--
chromium@484378      107860519 +- 13784607      6      bad
chromium@484379      110435538 +- 14193622      6      bad
chromium@484385      105055575 +- 21926734      6      bad
chromium@484397      109569367 +- 12776588      6      bad
chromium@484445      105003692 +- 21344369      6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.news.cnn system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8974716502302334464


For feedback, file a bug with component Speed>Bisection
 Issue 740127  has been merged into this issue.
 Issue 740127  has been merged into this issue.
Issue 740123 has been merged into this issue.
Issue 740123 has been merged into this issue.
 Issue 740119  has been merged into this issue.
Issue 740123 has been merged into this issue.
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1

commit c085745ac17b783a62ddbd9d843f7cabe0fd4ea1
Author: Fady Samuel <fsamuel@chromium.org>
Date: Tue Jul 11 14:37:15 2017

cc: Make sure resources are returned for old surfaces

If a CompositorFrame is submitted to a CompositorFrameSink with a new
LocalSurfaceId, then the old surface becomes a candidate for garbage
collection. After the new surface is used in a display frame, the resources
of the old frame should be returned back to the client for reuse.

However, with this CL: https://chromium-review.googlesource.com/c/558472/ that
behavior was broken because resources on a surface were unref'ed as long as that
surface existed. A surface may not survive until the next display frame to be
unref'ed. This problem is most apparent during resize.

Bug:  740121 
Change-Id: Idf681a387a535767fa38c966ee2191675502d9a3
Reviewed-on: https://chromium-review.googlesource.com/566332
Reviewed-by: John Bauman <jbauman@chromium.org>
Commit-Queue: Fady Samuel <fsamuel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485627}
[modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface.h
[modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator.cc
[modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator.h
[modify] https://crrev.com/c085745ac17b783a62ddbd9d843f7cabe0fd4ea1/cc/surfaces/surface_aggregator_unittest.cc

Status: Fixed (was: Untriaged)
This should be fixed now.

Comment 16 by piman@chromium.org, Jul 13 2017

Cc: benhenry@chromium.org piman@chromium.org primiano@chromium.org vmi...@chromium.org ssid@chromium.org mariakho...@chromium.org dskiba@chromium.org ericrk@chromium.org
 Issue 741819  has been merged into this issue.
Cc: erikc...@chromium.org
Status: Assigned (was: Fixed)
Re-opening.

fsamuel: How do we know that this is fixed?

The original CL was never reverted, so that means that when the "fix" landed, telemetry would not have a clean slate against which to compare the fix. 

The original CL had a 80MB memory regression. How do we know that original CL + fix has 0MB regression, rather than just a 2MB regression?

The easiest solution would be to revert both the fix and the original CL, and then land both together, and see if alerts are fired. A harder solution requires you to look at the graphs/UMA graphs [See Isssue 741819] and convince me that the problem is fixed.

The original CL was intended to be a refactor but a small simplification was proposed that turned out to be a silly mistake on my part. The fix is to restore the original behavior, putting the net change at just a mechanical refactor without any functional change.
"""
The fix is to restore the original behavior, putting the net change at just a mechanical refactor without any functional change.
"""

Why not just revert the original and reland a fixed refactor? 

Given the magnitude of the problem [multi-GB leak on Canary], I'd really like a more thorough approach to make sure that the problem is fully gone [and not just *mostly* gone].
Do we have another way to verify the fix? A number of large refactors have landed on top of these CLs and so there's no real trivial revert...
Can you post snapshots [or better yet, full memory dumps] of the telemetry tests before CL1, after CL1, and after CL2 [the fix], to confirm that all the numbers have recovered to expected levels?
Sure, I'll look into this tomorrow! Thanks!
Perf sheriff checking in. fsamuel: any update on this?
fsamuel: ping on the memory snapshots/dumps for #21?
fsamuel: ping on #21?
Thanks for the ping! This got lost in the bugs assigned to me. I'll follow up soon on my backlog of bugs for viz finch trial.
fsamuel: Perf sheriff ping again on #20

Comment 28 by fsamuel@google.com, Jan 25 2018

Status: Fixed (was: Assigned)
+sadrul@: gpu team metrics lead

Marking as fixed for this particular issue.

Now that we have a full time gpu metrics team I think this work will be done in a far more systematic than I could possibly hope to do on my own. I'm confident that we'll have the mechanisms in place to catch regressions without my further immediate action.

Comment 29 by fsamuel@google.com, Jan 25 2018

Cc: sadrul@chromium.org
Looking at this graph:

https://chromeperf.appspot.com/report?sid=6e2130103e91a194bdae5e586b86b136b7f9c47cfc9ca2606f39dcb3aef8fae0&start_rev=483912&end_rev=489905

The initial regression in all three-graphs (by ~7MB, ~5MB, and ~10MB respectively) includes crrev.com/484377, which was detected as the CL causing the regressions.

The last dip in all three graphs above (by ~6MB, ~6MB, ~10MB respectively) includes the subsequent fix (crrev.com/485627).

However, it looks like there were some additional regressions in all three cases between the initial regression, and the subsequent fix. That makes it tricky to claim with confidence that the last dip recovers all of the regression in the first spike (because maybe it recovered some of the subsequent regressions instead?).

I think erikchen@'s advice to completely revert first, and reland with the fix, is what we should do in such instances, so that it's easier to make no-regression claims confidently.

Sign in to add a comment