1%-6.5% regression in system_health.memory_mobile at 478081:478217 |
|||||||||||
Issue descriptionSee the link to graphs below.
,
Jun 12 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8977002017683656944
,
Jun 12 2017
=== Auto-CCing suspected CL author sammc@chromium.org === Hi sammc@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Sam McNally Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae Date : Fri Jun 09 01:31:36 2017 Subject: Enable PrefService by default. Bisect Details Configuration: android_nexus6_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:chrome:all_processes:reported_by_os:system_memory:java_heap:proportional_resident_size_avg/browse_tools/browse_tools_maps Change : 6.40% | 22340949.3333 -> 23771136.0 Revision Result N chromium@478080 22340949 +- 255708 6 good chromium@478149 22274048 +- 266838 6 good chromium@478154 22297941 +- 303386 6 good chromium@478156 22379520 +- 286061 6 good chromium@478157 23713621 +- 212146 6 bad <-- chromium@478158 23764992 +- 349911 6 bad chromium@478166 23744512 +- 290729 6 bad chromium@478183 23828821 +- 280545 6 bad chromium@478217 23771136 +- 219337 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=browse.tools.maps system_health.memory_mobile Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8977002017683656944 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5282316034768896 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 12 2017
,
Jun 12 2017
Issue 732483 has been merged into this issue.
,
Jun 12 2017
Issue 732483 has been merged into this issue.
,
Jun 13 2017
Issue 732484 has been merged into this issue.
,
Jun 16 2017
The native heap/malloc memory increases are expected. The Java heap changes don't make much sense since the change was C++ only.
,
Jun 26 2017
Issue 732849 has been merged into this issue.
,
Jun 26 2017
sammc, looks like this bug has been closed on Jun 16 with "he native heap/malloc memory increases are expected". More alerts have been associated to this bug in the meantime. Did you have any discussion about this with our release managers (+benhenry)? was there a performance improvement that justifies the extra memory cost?
,
Jun 26 2017
re-opening b/c of crbug.com/732849 and b/c of comment #10.
,
Jun 26 2017
,
Jun 26 2017
Please add appropriate OSs.
,
Jun 26 2017
Looks like the extra memory increase is 1.3MB. Is that really expected?
,
Jun 26 2017
No, but the large increases are java_heap and pref service is C++-only. Unless I'm missing some hidden meaning in java_heap I don't see how we could have caused that.
,
Jun 27 2017
This issue is marked as a release blocker with no OS labels associated. Please add an appropriate OS label. All release blocking issues should have OS labels associated to it, so that the issue can tracked and promptly verified, once it gets fixed. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 27 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8975621716917651616
,
Jun 27 2017
I've kicked off another bisect this time for desktop memory. I'm reasonably confident the bots will find the culprit based on the shape of the graph. Lets see if they find the same patch or not.
,
Jun 27 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8975621582938381200
,
Jun 27 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Sam McNally Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae Date : Fri Jun 09 01:31:36 2017 Subject: Enable PrefService by default. Bisect Details Configuration: winx64ati_perf_bisect Benchmark : memory.desktop Metric : memory:chrome:all_processes:reported_by_chrome:shim_allocated_objects_size_avg/TrivialWebGLPageSharedPageState Change : 3.12% | 9509251.0 -> 9806084.33333 Revision Result N chromium@478108 9509251 +- 87818.4 6 good chromium@478154 9510022 +- 51289.3 6 good chromium@478156 9510600 +- 65855.5 6 good chromium@478157 9775120 +- 86178.8 6 bad <-- chromium@478160 9805899 +- 80062.6 6 bad chromium@478166 9805868 +- 75937.2 6 bad chromium@478177 9802059 +- 51677.4 6 bad chromium@478200 9806084 +- 64232.5 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=TrivialWebGLPageSharedPageState memory.desktop Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8975621582938381200 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5247606977986560 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 27 2017
OK the bots think it is that patch. The regression is quite large, should we revert?
,
Jun 27 2017
,
Jun 27 2017
=== BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : Sam McNally Commit : cd0c21dfb32b6235c5c4bed740dac090b4e25bae Date : Fri Jun 09 01:31:36 2017 Subject: Enable PrefService by default. Bisect Details Configuration: winx64nvidia_perf_bisect Benchmark : system_health.memory_desktop Metric : memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/load_search/load_search_google Change : 3.76% | 11428035.5556 -> 11858022.7778 Revision Result N chromium@478081 11428036 +- 1203195 9 good chromium@478123 11311448 +- 61739.1 6 good chromium@478144 11295788 +- 72381.5 6 good chromium@478155 11441713 +- 1214693 9 good chromium@478156 11444093 +- 1191470 9 good chromium@478157 11707865 +- 1260066 9 bad <-- chromium@478158 11715948 +- 1241436 9 bad chromium@478160 11723741 +- 1209891 9 bad chromium@478165 11858023 +- 1619732 9 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=release_x64 --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.google system_health.memory_desktop Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8975621716917651616 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5641042893733888 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 27 2017
Sam - can you please investigate? Primiano - can you think of ways C++ would change java_heap or help us understand what we mean by java_heap in this context?
,
Jun 27 2017
Given that we have clear bisects on multiple OSes, please revert the CL in question.
,
Jun 28 2017
It's still a few hundred KiB (which is quite challenging to see thanks to the lack of digit separators), which is still expected. Further, I find the choice of earliest value and latest value rather than the values immediately before and after the change pretty questionable. Using the values from immediately before and after results in a change of 2.3% rather than 3.76%.
,
Jun 28 2017
I did split the java_heap regression in a separate bug ( Issue 737498 ). It is possible that some other nearby CL in the same range caused this. As a metric owner, I am personally fine ignoring the java heap regression here if we don't find a plausible explanation: the Java Runtime can have some odd time-related behaviors, we had some occasional cases like this in the past. The problem is that there is, instead, a definite regression in native code, affecting Windows, Mac and Android, up to ~500 KB, as reported by https://chromeperf.appspot.com/group_report?bug_id=732294 (linked in #0) and as confirmed by the bisects above. > No, but the large increases are java_heap and pref service is C++-only. Unless I'm missing some hidden meaning in java_heap I don't see how we could have caused that. The original bug reports regressions both in native code and java_heap. I get the point about java being puzzling, but from what I understand your line of reasoning here is: "I find the java heap regression extremely unlikely to be caused by this CL. Hence I am going to completely ignore this alert and WontFix the bug without even letting anybody know about this". Other peers in the same situation at least dropped an email or reassigned the bug saying "Hey, this is weird, I'm doing a C++ change but there is also a java heap regression here. Can somebody take a look?" > It's still a few hundred KiB (which is quite challenging to see thanks to the lack of digit separators), which is still expected. Can you define "expected"? In #10 I asked > Did you have any discussion about this with our release managers (+benhenry)? was there a performance improvement that justifies the extra memory cost?" and haven't seen a reply to that yet. Please, try to understand the situation. Our project sees ~250 CLs per day (without counting subprojects like v8, skia etc). If even 0.4% of those (1 CL out of 250) "expects" a ~500 KB regressions, our footprint would increase by 182 MB per year, which is far from being desirable. We had a number of multi-quarter projects to save some MB of memory (e.g., the v8 people completely rewrote their compiler pipeline w/ ignition, see [1] for a list of some recent work). What is missing here is a discussion of the form: "here's why I think that it is fine to cause a ~500KB regression, equivalent to some weeks of work of 3 eng in the v8 team" At this point given that: - This bug has been assigned on Jun 12, four different perf sheriffs (erikchen, alexclarke, chcunningham and myself) found alerts that were duped on this bug on this bug. - Our perf sheriffing rules [1] clearly say "If you are certain that a specific CL caused a performance regression, and the author does not have an immediate plan to address the problem, please revert the CL." - From the discussion there doesn't seem to be any plan to address this. - The CL itself is just a flag, so revering won't cause rebase pain to the author. I am going to revert this CL https://chromium-review.googlesource.com/c/552142/ . Please don't take it personally and understand the situation and the scaling problem. See [3] for docs on how to reproduce the benchmarks locally and feel free to reach out if you need assistance to debug the perf regression. [1] https://chromium.googlesource.com/chromium/src.git/+/master/docs/memory/achievements.md [2] https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_regression_sheriffing.md [3] https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md
,
Jun 28 2017
> I am going to revert this CL https://chromium-review.googlesource.com/c/552142/ . Well looks like sammc@ NOT-LGTM-d and unticked the revert https://chromium-review.googlesource.com/c/552142/ . moving this to benhenry@ +parisa@ FYI
,
Jun 28 2017
,
Jun 28 2017
Well, it did get approved for launch in M60 (issue 707602). It's an extra copy of prefs in-memory to enable servicification of prefs users. There are no plans to change the way it works; the fun alternatives are inserting blocking IPC for every pref access, shared memory and interprocess locking or not servicifying anything that depends on prefs. Again, I would like an explanation for why the measurements furthest away from the chosen CL are reported. In every bisect result listed in this bug that change is greater than the difference at the blamed CL. Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers.
,
Jun 28 2017
When the LR review was conducted did they know this would be regressing memory?
,
Jun 28 2017
> Again, I would like an explanation for why the measurements furthest away from the chosen CL are reported. > In every bisect result listed in this bug that change is greater than the difference at the blamed CL. the bisect in #20 says 200 KB the bisect in #23 says 263 KB which is aligned with the results of the waterfall in https://chromeperf.appspot.com/group_report?bug_id=732294 > Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers. I would love to have an answer straight away, unfortunately I don't. I have zero context about your code and what it does. >Also, if no one can explain why C++-only changes are affecting java heap numbers that seems pretty bad for the reliability of those numbers. agrees. that's why I moved the java heap metric out of this bug. But we are still left with the native_heap/malloc regressions up to 500KB and the discussion on this bug is all about this. > When the LR review was conducted did they know this would be regressing memory? +1 on the question (specifically, did they know how much?).
,
Jun 28 2017
I managed to find the launch bug (Issue 707602) and realized that there is a finch trial for this. Awesome, so I looked into the Finch memory histograms for that: Android: https://uma.googleplex.com/p/chrome/variations/?sid=c44ae7b26a77bf0072bd033013309487 median: +200 KB 95th: +500 KB Windows: https://uma.googleplex.com/p/chrome/variations/?sid=afb6d353a32768815776ef6e596db95a median: +300 KB 95th percentile: +3.1 MB Linux: https://uma.googleplex.com/p/chrome/variations/?sid=79e472f5327aeb449865c826c6741245 median: +7.3 MB 95th percentile: +157 MB Now this is interesting because those metrics are heartbeat metrics, so there should be Chirp alerts associated with them. ...digging more... There have been actual chirp alerts since April https://groups.google.com/a/google.com/forum/#!searchin/finch-chirp/prefservice$20memory%7Csort:relevance looks like both sammc@ and tibell@ were cc-ed to those alerts. So it seems that 2 months before our labs got to this, the authors had the finch data telling them about multi-MB impact of this. benhenry@: At this point the only question left is: how does launch review work? It seems that all the pieces pipeline worked well here.
,
Jun 28 2017
Actually by looking more at the new UMA metrics [1] which count also swap, the impact on Linux is +1.9 GB on 95th percentile (233 MB on the browser process, 10 MB per renderer) https://uma.googleplex.com/p/chrome/variations/?sid=d453bc33c91e0ad1cc7356634b40009a and 490 MB for Mac OSX @ 95th percentile: https://uma.googleplex.com/p/chrome/variations/?sid=4ae34f1744a6bea57907cd263f7eb473 [1] context: https://groups.google.com/a/google.com/forum/#!topic/chrome-memory/WorRmlzoN7o
,
Jun 28 2017
@sammc @tibell: it would be helpful to have a document explaining the regression and alternatives to lowering the memory usage, something like what the mojoloading team did: https://docs.google.com/document/d/1wissv31WUjeHaSaf78uAH0FiFQfSU6sqv14Lcuo1toA
,
Jun 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5e829660f4e678f3fd1edc36b6a6785de4714bb4 commit 5e829660f4e678f3fd1edc36b6a6785de4714bb4 Author: Primiano Tucci <primiano@chromium.org> Date: Wed Jun 28 17:07:53 2017 Revert "Enable PrefService by default." This reverts commit cd0c21dfb32b6235c5c4bed740dac090b4e25bae. Reason for revert: The CL caused perf regressions ( crbug.com/732294 ). No immediate plan to address the problem showed up. Reverting in accordance with perf sheriffing rules in https://chromium.googlesource.com/chromium/src/+/master/docs/speed/perf_regression_sheriffing.md Original change's description: > Enable PrefService by default. > > This also changes a DCHECK in RegistryHashStoreContentsWin to be more > permissive to workaround crbug.com/727282 , which is triggered more > reliably with PrefService enabled. > > Bug: 654988 , 707602, 727282 > Change-Id: I39e94dd6d2c7360d8763608a1ac9897f98141902 > Reviewed-on: https://chromium-review.googlesource.com/527959 > Reviewed-by: Johan Tibell <tibell@chromium.org> > Commit-Queue: Sam McNally <sammc@chromium.org> > Cr-Commit-Position: refs/heads/master@{#478157} TBR=sammc@chromium.org,jam@chromium.org Bug: 732294 , 654988 , 707602, 727282 Change-Id: I2636bf309da1f32b76f5c6eff3e10145e769f1a2 Reviewed-on: https://chromium-review.googlesource.com/552142 Commit-Queue: Erik Chen <erikchen@chromium.org> Reviewed-by: Primiano Tucci <primiano@chromium.org> Cr-Commit-Position: refs/heads/master@{#483036} [modify] https://crrev.com/5e829660f4e678f3fd1edc36b6a6785de4714bb4/chrome/common/chrome_features.cc
,
Jun 28 2017
Thanks for doing a thorough investigation, primiano. I'm confirming that the UMA metrics in c#34 report a 1.9GB increase in memory usage on Linux at the 95th percentile. I've reverted the CL that enables this by default: https://chromium-review.googlesource.com/c/552142/ I think we need either an informal or formal post-mortem on this, given that we almost launched a massive performance regression, and had primiano@ not been super diligent [looking at WontFix-ed performance regressions], this would have launched without anyone else the wiser.
,
Jun 28 2017
benhenry@, can you review what happened [both in L-R, and this thread, chirp alerts, etc.] and put together a small analysis with next steps for how to improve this process?
,
Jun 28 2017
I started a doc, now I have a bunch of meetings. go/sr-pm-8
,
Jul 11 2017
A friendly reminder that M61 branch is coming soon on 07/20! Your bug is labelled as Stable ReleaseBlock, pls make sure to land the fix ASAP to trunk. This way we branch M61 from a high quality trunk. Thank you.
,
Jul 12 2017
I think we can remove the RB since the patch has been reverted.
,
Aug 3 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by alexclarke@chromium.org
, Jun 12 2017