New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 697910 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jan 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression

Blocked on:
issue 702555



Sign in to add a comment

2.1%-16.9% regression in system_health.memory_mobile at 451826:452905

Project Member Reported by sullivan@chromium.org, Mar 2 2017

Issue description

See the link to graphs below.
 
Cc: yhirano@chromium.org
Owner: yhirano@chromium.org

=== Auto-CCing suspected CL author yhirano@chromium.org ===

Hi yhirano@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : yhirano
  Commit : 865407349e3755b33d6db5642501f88c8fb7784d
  Date   : Fri Feb 24 14:43:30 2017
  Subject: Do not reuse a loading resource associated with another fetcher

Bisect Details
  Configuration: android_webview_nexus6_aosp_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:webview:all_processes:reported_by_chrome:v8:effective_size_avg/load_social/load_social_tumblr
  Change       : 16.62% | 62133163.1111 -> 72477434.6667

Revision             Result                    N
chromium@452819      62133163 +- 927194        9      good
chromium@452822      62238464 +- 1096079       6      good
chromium@452823      72572268 +- 3863796       6      bad       <--
chromium@452824      72618091 +- 2360980       6      bad
chromium@452828      73214161 +- 1970314       6      bad
chromium@452838      71380713 +- 13016875      9      bad
chromium@452857      72477435 +- 2093261       6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.social.tumblr system_health.memory_mobile

Debug Info
  https://chromeperf.appspot.com/buildbucket_job_status/8986217139123905408

Is this bisect wrong?
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5322701910048768


| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Speed>Bisection.  Thank you!

Comment 4 Deleted

Who is the metrics owner? This is a security fix and hence I don't want to revert it.

Cc: primiano@chromium.org perezju@chromium.org benhenry@chromium.org hjd@chromium.org
Labels: -Pri-2 Pri-1
Adding perezju, primiano as test/metric owners. This is a massive performance regression, some of the regressions are over 10MiB.
Status: Assigned (was: Untriaged)
yhirano, can you run a try-job with a revert of your CL? It should help assess the impact it has across all system health stories.

Instructions can be found in the relevant link on #3
Owner: perezju@chromium.org
Taking this to investigate the results in #9 and try to make a call on the overall effects of this CL.

(Also slightly blocked due to go/catabug/3344)
To provide more context on this:

We were sharing blink::Resources where we shouldn't have, which lead to a security bug.
The CL fixes the issue by properly forcing Blink to not share blink::Resources in those cases.

So now we are in a dilemma. The new code is now secure. However, we now use more memory, as we are not sharing that blink::Resource anymore.
Cc: kouhei@chromium.org
Labels: Performance-Memory
Quick update here, I had a look at this with Primiano today.

- I've split off to  issue 701376  some gpu regressions that were grouped here, at this point it's unclear if they have the same cause or not.

- On the remaining v8 regressions, the regressions appear to recover after r455720 (switching on ignition). However, on other pages that did _not_ regress on this alert, ignition brings even more memory savings:

https://chromeperf.appspot.com/report?sid=41b49f4c1f4d8753a76f29a1f1d9b374c284b0e7297288a73878bb9dcc35ebda

So even if things look "OK" now, it seems that r452823 is eating up on some pages the savings that ignition would otherwise yield.

yhirano, could we ask you to re-run your tryjob to get a new set of results now after ignition? Also, do we know why keeping these blink::Resources's could be causing a regression in v8?
I took a look with perezju here. Here's a set of things that we should check:
- First of all, as perezju@ said in #14, there are two different jumps associated. Let's check that also the GPU memory regression is due to this CL (the bisect he kicked should answer this soon).
- The v8 regression instead seem real. If you look at the timeline the alert seems recovered, but that is just due to the fact that we shipped Ignition+Turbofan and the memory saving there is comparable. So essentially what I am saying is that, on the page affected by this bug, this CL is completely obliterating the effects of a team-wide year-long engineering effort (V8 Ignition+TF).

As such, my question is: do we understand why retaining a resource in blink is causing a regression in v8? Doesn't seem directly obvious to me and at very least, we should came up with an explanation of why this change is causing this effect in v8.
I am definitely up for having the "let's trade off security vs memory" discussion, but before that I want to make sure that this CL is really intending to retain this much memory in v8, and make sure that this is not an accidental side effect we are not thinking about.
>#14:

Done (https://codereview.chromium.org/2736623002/#ps20001), but the command doesn't accept android-nexus5. I specified 'all' instead, but android-s5 seems not working.
Cc: simonhatch@chromium.org
More updates: The bisect in  issue 701376  came back and was duped to this issue. Those are some 4-6 MiB regressions in gpu that have not recovered ever since.

+simonhatch: are run_benchmark tryjobs working now?
Cc: kenjibaheux@chromium.org
I filed issue 701697 to fix the regression.
(Note that I don't have an idea how to fix the regression at this moment)
re: #c17

Yep they should be working again.
I succeeded to request android_nexus5_perf_bisect but they fail :(
Blockedon: 702555
Sorry about that, I've filed issue 702555 to get that fixed.
Re #22: android_nexus5_perf_bisect should hopefully be back now.
From the results:
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/html-results/results-2017-03-20_23-54-27

We still see that the revert would recover some 3.5 MiB in load:social:tumblr.

The largest regression appears here, so I would suggest to focus on load:social:tumblr for the follow up investigations.

Here are a couple of traces from that run:

(ToT)
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/trace-file-id_42-2017-03-20_23-51-12-48554.html

(with revert)
https://console.developers.google.com/m/cloudstorage/b/chromium-telemetry/o/trace-file-id_42-2017-03-20_20-58-47-64309.html

I would suggest comparing those two to drill down into the memory measurements and try to figure out the source of the regression.
v8_regression.png
76.4 KB View Download
Owner: yhirano@chromium.org
Cc: hajimehoshi@chromium.org
Thanks!

> #15

Based on local observations, Tumblr has some script resources affected by this change. Such resources were shared but are not shared any more. According to hajimehoshi@, ScriptResource's data should be counted in web_cache, not in v8. So, I don't have an answer for your question.

cc-ing hajimehoshi@.
Any update on this?
Mergedinto: 721238
Status: Duplicate (was: Assigned)
Status: Assigned (was: Duplicate)
Oops, sorry, I mis-merged the issue.
yhirano, are you still working on this issue?
Sorry for not updating.

At this moment all I can say is that this change is critical for security: Content-Security-Policy can be bypassed without this fix, which leads to script execution in a way that the site owner is not expecting.
From #30 it sounds like there is a good rationale for the increased memory usage, a trade off for security in this case.

So I don't think anybody is arguing for reverting this change.

It would be great if there was a way to keep the security aspects intact, while still being able to share a few more resources and use less memory. If that's doable maybe file a new bug for that.

Otherwise I think it's fine to close this since the trade off has been justified.
Status: WontFix (was: Assigned)
Thanks, let me close this issue.

Sign in to add a comment