New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 732294 link

Starred by 9 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

1%-6.5% regression in system_health.memory_mobile at 478081:478217

Project Member Reported by alexclarke@chromium.org, Jun 12 2017

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=732294

Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?keys=agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1oLi7gsM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglrSl0ggM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglsWN-woM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDgltOcuwoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDgltPIqgsM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg5oXOlwgM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglpePqQsM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1tDJvgoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1u79qAoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1ryZuQoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1urC7QgM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglr_x8AgM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglta3zAoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1pCVsgkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1rqUswkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglpe0twsM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglpe0twkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglunpuQkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1uzbiAoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglu7CowoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglpbb3gkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglvXqoQoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDg1rjJqwkM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDgltb5hAoM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglta3jAgM,agxzfmNocm9tZXBlcmZyFAsSB0Fub21hbHkYgIDglrGtuQoM


Bot(s) for this bug's original alert(s):

android-nexus6
android-one
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jun 12 2017

Cc: sa...@chromium.org
Owner: sa...@chromium.org

=== Auto-CCing suspected CL author sammc@chromium.org ===

Hi sammc@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Sam McNally
  Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae
  Date   : Fri Jun 09 01:31:36 2017
  Subject: Enable PrefService by default.

Bisect Details
  Configuration: android_nexus6_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_os:system_memory:java_heap:proportional_resident_size_avg/browse_tools/browse_tools_maps
  Change       : 6.40% | 22340949.3333 -> 23771136.0

Revision             Result                  N
chromium@478080      22340949 +- 255708      6      good
chromium@478149      22274048 +- 266838      6      good
chromium@478154      22297941 +- 303386      6      good
chromium@478156      22379520 +- 286061      6      good
chromium@478157      23713621 +- 212146      6      bad       <--
chromium@478158      23764992 +- 349911      6      bad
chromium@478166      23744512 +- 290729      6      bad
chromium@478183      23828821 +- 280545      6      bad
chromium@478217      23771136 +- 219337      6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.tools.maps system_health.memory_mobile

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8977002017683656944

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5282316034768896


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Project Member

Comment 4 by 42576172...@developer.gserviceaccount.com, Jun 12 2017

Cc: chcunningham@chromium.org
 Issue 732483  has been merged into this issue.
Project Member

Comment 5 by 42576172...@developer.gserviceaccount.com, Jun 12 2017

 Issue 732483  has been merged into this issue.
Project Member

Comment 6 by 42576172...@developer.gserviceaccount.com, Jun 12 2017

 Issue 732483  has been merged into this issue.
Project Member

Comment 7 by 42576172...@developer.gserviceaccount.com, Jun 13 2017

 Issue 732484  has been merged into this issue.

Comment 8 by sa...@chromium.org, Jun 16 2017

Status: WontFix (was: Untriaged)
The native heap/malloc memory increases are expected. The Java heap changes don't make much sense since the change was C++ only.
Project Member

Comment 9 by 42576172...@developer.gserviceaccount.com, Jun 26 2017

Cc: benhenry@chromium.org mlippautz@chromium.org sullivan@chromium.org dtseng@chromium.org erikc...@chromium.org etienneb@chromium.org
 Issue 732849  has been merged into this issue.
sammc, looks like this bug has been closed on Jun 16 with "he native heap/malloc memory increases are expected". More alerts have been associated to this bug in the meantime.
Did you have any discussion about this with our release managers (+benhenry)? was there a performance improvement that justifies the extra memory cost?

Labels: -Pri-2 ReleaseBlock-Stable Pri-1
Status: Assigned (was: WontFix)
re-opening b/c of  crbug.com/732849  and b/c of comment #10.
Cc: roc...@chromium.org jam@chromium.org
Please add appropriate OSs.

Comment 14 by jam@chromium.org, Jun 26 2017

Looks like the extra memory increase is 1.3MB. Is that really expected?

Comment 15 by sa...@chromium.org, Jun 26 2017

No, but the large increases are java_heap and pref service is C++-only. Unless I'm missing some hidden meaning in java_heap I don't see how we could have caused that.
Project Member

Comment 16 by sheriffbot@chromium.org, Jun 27 2017

This issue is marked as a release blocker with no OS labels associated. Please add an appropriate OS label.

All release blocking issues should have OS labels associated to it, so that the issue can tracked and promptly verified, once it gets fixed.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
I've kicked off another bisect this time for desktop memory.  I'm reasonably confident the bots will find the culprit based on the shape of the graph.  Lets see if they find the same patch or not.
Project Member

Comment 20 by 42576172...@developer.gserviceaccount.com, Jun 27 2017


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Sam McNally
  Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae
  Date   : Fri Jun 09 01:31:36 2017
  Subject: Enable PrefService by default.

Bisect Details
  Configuration: winx64ati_perf_bisect
  Benchmark    : memory.desktop
  Metric       : memory:chrome:all_processes:reported_by_chrome:shim_allocated_objects_size_avg/TrivialWebGLPageSharedPageState
  Change       : 3.12% | 9509251.0 -> 9806084.33333

Revision             Result                  N
chromium@478108      9509251 +- 87818.4      6      good
chromium@478154      9510022 +- 51289.3      6      good
chromium@478156      9510600 +- 65855.5      6      good
chromium@478157      9775120 +- 86178.8      6      bad       <--
chromium@478160      9805899 +- 80062.6      6      bad
chromium@478166      9805868 +- 75937.2      6      bad
chromium@478177      9802059 +- 51677.4      6      bad
chromium@478200      9806084 +- 64232.5      6      bad

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=TrivialWebGLPageSharedPageState memory.desktop

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8975621582938381200

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5247606977986560


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
OK the bots think it is that patch. The regression is quite large, should we revert?
Labels: OS-Android OS-Linux OS-Mac OS-Windows
Project Member

Comment 23 by 42576172...@developer.gserviceaccount.com, Jun 27 2017


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Sam McNally
  Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae
  Date   : Fri Jun 09 01:31:36 2017
  Subject: Enable PrefService by default.

Bisect Details
  Configuration: winx64nvidia_perf_bisect
  Benchmark    : system_health.memory_desktop
  Metric       : memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/load_search/load_search_google
  Change       : 3.76% | 11428035.5556 -> 11858022.7778

Revision             Result                   N
chromium@478081      11428036 +- 1203195      9      good
chromium@478123      11311448 +- 61739.1      6      good
chromium@478144      11295788 +- 72381.5      6      good
chromium@478155      11441713 +- 1214693      9      good
chromium@478156      11444093 +- 1191470      9      good
chromium@478157      11707865 +- 1260066      9      bad       <--
chromium@478158      11715948 +- 1241436      9      bad
chromium@478160      11723741 +- 1209891      9      bad
chromium@478165      11858023 +- 1619732      9      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.google system_health.memory_desktop

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8975621716917651616

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5641042893733888


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!
Sam - can you please investigate? Primiano - can you think of ways C++ would change java_heap or help us understand what we mean by java_heap in this context?
Given that we have clear bisects on multiple OSes, please revert the CL in question.

Comment 26 by sa...@chromium.org, Jun 28 2017

It's still a few hundred KiB (which is quite challenging to see thanks to the lack of digit separators), which is still expected.

Further, I find the choice of earliest value and latest value rather than the values immediately before and after the change pretty questionable. Using the values from immediately before and after results in a change of 2.3% rather than 3.76%.
I did split the java_heap regression in a separate bug ( Issue 737498 ). It is possible that some other nearby CL in the same range caused this.
As a metric owner, I am personally fine ignoring the java heap regression here if we don't find a plausible explanation: the Java Runtime can have some odd time-related behaviors, we had some occasional cases like this in the past.
The problem is that there is, instead, a definite regression in native code, affecting Windows, Mac and Android, up to ~500 KB, as reported by
https://chromeperf.appspot.com/group_report?bug_id=732294 (linked in #0)
and as confirmed by the bisects above.

> No, but the large increases are java_heap and pref service is C++-only. Unless I'm missing some hidden meaning in java_heap I don't see how we could have caused that.

The original bug reports regressions both in native code and java_heap. I get the point about java being puzzling, but from what I understand your line of reasoning here is:
"I find the java heap regression extremely unlikely to be caused by this CL. Hence I am going to completely ignore this alert and WontFix the bug without even letting anybody know about this".
Other peers in the same situation at least dropped an email or reassigned the bug saying "Hey, this is weird, I'm doing a C++ change but there is also a java heap regression here. Can somebody take a look?"

> It's still a few hundred KiB (which is quite challenging to see thanks to the lack of digit separators), which is still expected.
Can you define "expected"? In #10 I asked
> Did you have any discussion about this with our release managers (+benhenry)? was there a performance improvement that justifies the extra memory cost?"
and haven't seen a reply to that yet.

Please, try to understand the situation. Our project sees ~250 CLs per day (without counting subprojects like v8, skia etc). 
If even 0.4% of those (1 CL out of 250) "expects" a ~500 KB regressions, our footprint would increase by 182 MB per year, which is far from being desirable.
We had a number of multi-quarter projects to save some MB of memory (e.g., the v8 people completely rewrote their compiler pipeline w/ ignition, see [1] for a list of some recent work).
What is missing here is a discussion of the form: "here's why I think that it is fine to cause a ~500KB regression, equivalent to some weeks of work of 3 eng in the v8 team"

At this point given that:
- This bug has been assigned on Jun 12, four different perf sheriffs (erikchen, alexclarke, chcunningham and myself) found alerts that were duped on this bug on this bug.
- Our perf sheriffing rules [1] clearly say "If you are certain that a specific CL caused a performance regression, and the author does not have an immediate plan to address the problem, please revert the CL."
- From the discussion there doesn't seem to be any plan to address this.
- The CL itself is just a flag, so revering won't cause rebase pain to the author.

I am going to revert this CL https://chromium-review.googlesource.com/c/552142/ .
Please don't take it personally and understand the situation and the scaling problem.
See [3] for docs on how to reproduce the benchmarks locally and feel free to reach out if you need assistance to debug the perf regression.


[1] https://chromium.googlesource.com/chromium/src.git/+/master/docs/memory/achievements.md
[2] https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_regression_sheriffing.md
[3] https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md
Cc: parisa@chromium.org
Owner: benhenry@chromium.org
> I am going to revert this CL https://chromium-review.googlesource.com/c/552142/ .
Well looks like sammc@ NOT-LGTM-d and unticked the revert https://chromium-review.googlesource.com/c/552142/ .
moving this to benhenry@

+parisa@ FYI
Project Member

Comment 29 by 42576172...@developer.gserviceaccount.com, Jun 28 2017

Cc: primiano@chromium.org
 Issue 737498  has been merged into this issue.

Comment 30 by sa...@chromium.org, Jun 28 2017

Well, it did get approved for launch in M60 (issue 707602). It's an extra copy of prefs in-memory to enable servicification of prefs users. There are no plans to change the way it works; the fun alternatives are inserting blocking IPC for every pref access, shared memory and interprocess locking or not servicifying anything that depends on prefs.

Again, I would like an explanation for why the measurements furthest away from the chosen CL are reported. In every bisect result listed in this bug that change is greater than the difference at the blamed CL. Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers.
When the LR review was conducted did they know this would be regressing memory?
> Again, I would like an explanation for why the measurements furthest away from the chosen CL are reported.
> In every bisect result listed in this bug that change is greater than the difference at the blamed CL.

the bisect in #20 says 200 KB
the bisect in #23 says 263 KB

which is aligned with the results of the waterfall in https://chromeperf.appspot.com/group_report?bug_id=732294

> Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers.
I would love to have an answer straight away, unfortunately I don't. I have zero context about your code and what it does. 

>Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers.
agrees. that's why I moved the java heap metric out of this bug.
But we are still left with the native_heap/malloc regressions up to 500KB and the discussion on this bug is all about this.

> When the LR review was conducted did they know this would be regressing memory?
+1 on the question (specifically, did they know how much?).

I managed to find the launch bug (Issue 707602) and realized that there is a finch trial for this.
Awesome, so I looked into the Finch memory histograms for that:

Android:
https://uma.googleplex.com/p/chrome/variations/?sid=c44ae7b26a77bf0072bd033013309487
median: +200 KB
95th: +500 KB

Windows:
https://uma.googleplex.com/p/chrome/variations/?sid=afb6d353a32768815776ef6e596db95a
median: +300 KB 
95th percentile: +3.1 MB

Linux:
https://uma.googleplex.com/p/chrome/variations/?sid=79e472f5327aeb449865c826c6741245
median: +7.3 MB
95th percentile: +157 MB

Now this is interesting because those metrics are heartbeat metrics, so there should be Chirp alerts associated with them.

...digging more...

There have been actual chirp alerts since April

https://groups.google.com/a/google.com/forum/#!searchin/finch-chirp/prefservice$20memory%7Csort:relevance
looks like both sammc@ and tibell@ were cc-ed to those alerts.

So it seems that 2 months before our labs got to this, the authors had the finch data telling them about multi-MB impact of this.

benhenry@: At this point the only question left is: how does launch review work? It seems that all the pieces pipeline worked well here.
Actually by looking more at the new UMA metrics [1] which count also swap, the impact on Linux is +1.9 GB on 95th percentile (233 MB on the browser process, 10 MB per renderer)
https://uma.googleplex.com/p/chrome/variations/?sid=d453bc33c91e0ad1cc7356634b40009a

and 490 MB for Mac OSX @ 95th percentile:
https://uma.googleplex.com/p/chrome/variations/?sid=4ae34f1744a6bea57907cd263f7eb473

[1] context: https://groups.google.com/a/google.com/forum/#!topic/chrome-memory/WorRmlzoN7o

Comment 35 by jam@chromium.org, Jun 28 2017

Cc: tibell@chromium.org
@sammc @tibell: it would be helpful to have a document explaining the regression and alternatives to lowering the memory usage, something like what the mojoloading team did: https://docs.google.com/document/d/1wissv31WUjeHaSaf78uAH0FiFQfSU6sqv14Lcuo1toA
Project Member

Comment 36 by bugdroid1@chromium.org, Jun 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5e829660f4e678f3fd1edc36b6a6785de4714bb4

commit 5e829660f4e678f3fd1edc36b6a6785de4714bb4
Author: Primiano Tucci <primiano@chromium.org>
Date: Wed Jun 28 17:07:53 2017

Revert "Enable PrefService by default."

This reverts commit cd0c21dfb32b6235c5c4bed740dac090b4e25bae.

Reason for revert:
The CL caused perf regressions ( crbug.com/732294 ).
No immediate plan to address the problem showed up.
Reverting in accordance with perf sheriffing rules in
https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_regression_sheriffing.md

Original change's description:
> Enable PrefService by default.
>
> This also changes a DCHECK in RegistryHashStoreContentsWin to be more
> permissive to workaround  crbug.com/727282 , which is triggered more
> reliably with PrefService enabled.
>
> Bug:  654988 , 707602,  727282 
> Change-Id: I39e94dd6d2c7360d8763608a1ac9897f98141902
> Reviewed-on: https://chromium-review.googlesource.com/527959
> Reviewed-by: Johan Tibell <tibell@chromium.org>
> Commit-Queue: Sam McNally <sammc@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#478157}

TBR=sammc@chromium.org,jam@chromium.org

Bug:  732294 ,  654988 , 707602,  727282 
Change-Id: I2636bf309da1f32b76f5c6eff3e10145e769f1a2
Reviewed-on: https://chromium-review.googlesource.com/552142
Commit-Queue: Erik Chen <erikchen@chromium.org>
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Cr-Commit-Position: refs/heads/master@{#483036}
[modify] https://crrev.com/5e829660f4e678f3fd1edc36b6a6785de4714bb4/chrome/common/chrome_features.cc

Thanks for doing a thorough investigation, primiano. I'm confirming that the UMA metrics in c#34 report a 1.9GB increase in memory usage on Linux at the 95th percentile. 

I've reverted the CL that enables this by default:
https://chromium-review.googlesource.com/c/552142/

I think we need either an informal or formal post-mortem on this, given that we almost launched a massive performance regression, and had primiano@ not been super diligent [looking at WontFix-ed performance regressions], this would have launched without anyone else the wiser.
benhenry@, can you review what happened [both in L-R, and this thread, chirp alerts, etc.] and put together a small analysis with next steps for how to improve this process?
I started a doc, now I have a bunch of meetings.

go/sr-pm-8
A friendly reminder that M61 branch is coming soon on 07/20! Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix ASAP to trunk. This way we branch M61 from a high quality trunk. Thank you.
Labels: -ReleaseBlock-Stable
I think we can remove the RB since the patch has been reverted.
Status: Fixed (was: Assigned)

Sign in to add a comment