Issue metadata
Sign in to add a comment
|
2.1%-16.9% regression in system_health.memory_mobile at 451826:452905 |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Mar 2 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8986217139123905408
,
Mar 2 2017
=== Auto-CCing suspected CL author yhirano@chromium.org === Hi yhirano@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : yhirano Commit : 865407349e3755b33d6db5642501f88c8fb7784d Date : Fri Feb 24 14:43:30 2017 Subject: Do not reuse a loading resource associated with another fetcher Bisect Details Configuration: android_webview_nexus6_aosp_perf_bisect Benchmark : system_health.memory_mobile Metric : memory:webview:all_processes:reported_by_chrome:v8:effective_size_avg/load_social/load_social_tumblr Change : 16.62% | 62133163.1111 -> 72477434.6667 Revision Result N chromium@452819 62133163 +- 927194 9 good chromium@452822 62238464 +- 1096079 6 good chromium@452823 72572268 +- 3863796 6 bad <-- chromium@452824 72618091 +- 2360980 6 bad chromium@452828 73214161 +- 1970314 6 bad chromium@452838 71380713 +- 13016875 9 bad chromium@452857 72477435 +- 2093261 6 bad Please refer to the following doc on diagnosing memory regressions: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md To Run This Test src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.social.tumblr system_health.memory_mobile Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8986217139123905408 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5322701910048768 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Mar 3 2017
Who is the metrics owner? This is a security fix and hence I don't want to revert it.
,
Mar 3 2017
Adding perezju, primiano as test/metric owners. This is a massive performance regression, some of the regressions are over 10MiB.
,
Mar 3 2017
,
Mar 3 2017
yhirano, can you run a try-job with a revert of your CL? It should help assess the impact it has across all system health stories. Instructions can be found in the relevant link on #3
,
Mar 6 2017
,
Mar 8 2017
Taking this to investigate the results in #9 and try to make a call on the overall effects of this CL. (Also slightly blocked due to go/catabug/3344)
,
Mar 8 2017
To provide more context on this: We were sharing blink::Resources where we shouldn't have, which lead to a security bug. The CL fixes the issue by properly forcing Blink to not share blink::Resources in those cases. So now we are in a dilemma. The new code is now secure. However, we now use more memory, as we are not sharing that blink::Resource anymore.
,
Mar 8 2017
,
Mar 11 2017
,
Mar 14 2017
Quick update here, I had a look at this with Primiano today. - I've split off to issue 701376 some gpu regressions that were grouped here, at this point it's unclear if they have the same cause or not. - On the remaining v8 regressions, the regressions appear to recover after r455720 (switching on ignition). However, on other pages that did _not_ regress on this alert, ignition brings even more memory savings: https://chromeperf.appspot.com/report?sid=41b49f4c1f4d8753a76f29a1f1d9b374c284b0e7297288a73878bb9dcc35ebda So even if things look "OK" now, it seems that r452823 is eating up on some pages the savings that ignition would otherwise yield. yhirano, could we ask you to re-run your tryjob to get a new set of results now after ignition? Also, do we know why keeping these blink::Resources's could be causing a regression in v8?
,
Mar 14 2017
I took a look with perezju here. Here's a set of things that we should check: - First of all, as perezju@ said in #14, there are two different jumps associated. Let's check that also the GPU memory regression is due to this CL (the bisect he kicked should answer this soon). - The v8 regression instead seem real. If you look at the timeline the alert seems recovered, but that is just due to the fact that we shipped Ignition+Turbofan and the memory saving there is comparable. So essentially what I am saying is that, on the page affected by this bug, this CL is completely obliterating the effects of a team-wide year-long engineering effort (V8 Ignition+TF). As such, my question is: do we understand why retaining a resource in blink is causing a regression in v8? Doesn't seem directly obvious to me and at very least, we should came up with an explanation of why this change is causing this effect in v8. I am definitely up for having the "let's trade off security vs memory" discussion, but before that I want to make sure that this CL is really intending to retain this much memory in v8, and make sure that this is not an accidental side effect we are not thinking about.
,
Mar 15 2017
>#14: Done (https://codereview.chromium.org/2736623002/#ps20001), but the command doesn't accept android-nexus5. I specified 'all' instead, but android-s5 seems not working.
,
Mar 15 2017
More updates: The bisect in issue 701376 came back and was duped to this issue. Those are some 4-6 MiB regressions in gpu that have not recovered ever since. +simonhatch: are run_benchmark tryjobs working now?
,
Mar 15 2017
I filed issue 701697 to fix the regression.
,
Mar 15 2017
(Note that I don't have an idea how to fix the regression at this moment)
,
Mar 15 2017
re: #c17 Yep they should be working again.
,
Mar 17 2017
,
Mar 17 2017
I succeeded to request android_nexus5_perf_bisect but they fail :(
,
Mar 17 2017
,
Mar 17 2017
Sorry about that, I've filed issue 702555 to get that fixed.
,
Mar 17 2017
Re #22: android_nexus5_perf_bisect should hopefully be back now.
,
Mar 21 2017
,
Mar 21 2017
From the results: https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-03-20_23-54-27 We still see that the revert would recover some 3.5 MiB in load:social:tumblr. The largest regression appears here, so I would suggest to focus on load:social:tumblr for the follow up investigations. Here are a couple of traces from that run: (ToT) https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/trace-file-id_42-2017-03-20_23-51-12-48554.html (with revert) https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/trace-file-id_42-2017-03-20_20-58-47-64309.html I would suggest comparing those two to drill down into the memory measurements and try to figure out the source of the regression.
,
Mar 21 2017
,
Mar 21 2017
,
Mar 22 2017
Thanks! > #15 Based on local observations, Tumblr has some script resources affected by this change. Such resources were shared but are not shared any more. According to hajimehoshi@, ScriptResource's data should be counted in web_cache, not in v8. So, I don't have an answer for your question. cc-ing hajimehoshi@.
,
Sep 14 2017
Any update on this?
,
Oct 3 2017
,
Oct 3 2017
Oops, sorry, I mis-merged the issue.
,
Jan 5 2018
yhirano, are you still working on this issue?
,
Jan 11 2018
Sorry for not updating. At this moment all I can say is that this change is critical for security: Content-Security-Policy can be bypassed without this fix, which leads to script execution in a way that the site owner is not expecting.
,
Jan 11 2018
From #30 it sounds like there is a good rationale for the increased memory usage, a trade off for security in this case. So I don't think anybody is arguing for reverting this change. It would be great if there was a way to keep the security aspects intact, while still being able to share a few more resources and use less memory. If that's doable maybe file a new bug for that. Otherwise I think it's fine to close this since the trade off has been justified.
,
Jan 15 2018
Thanks, let me close this issue. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by sullivan@chromium.org
, Mar 2 2017