New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 778620 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jan 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

168.4% regression in system_health.memory_mobile at 510791:510889

Project Member Reported by kraynov@chromium.org, Oct 26 2017

Issue description

See the link to graphs below.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Oct 26 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=778620

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=9d2c827e59bc1befdd4ba2c8390dd6e5cf07f11d68dc50de7084e6d0a8afc10e


Bot(s) for this bug's original alert(s):

android-nexus5X
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Oct 26 2017

Cc: shivanisha@chromium.org
Owner: shivanisha@chromium.org
Status: Assigned (was: Untriaged)

=== Auto-CCing suspected CL author shivanisha@chromium.org ===

Hi shivanisha@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Shivani Sharma
  Commit : c18f976cdcb74086ce6bab2d2574718166c53517
  Date   : Mon Oct 23 16:43:23 2017
  Subject: Integrate HttpCache::Writers with HttpCache and HttpCache::Transaction layers.

Bisect Details
  Configuration: android_nexus5X_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg/long_running_tools/long_running_tools_gmail-foreground
  Change       : 168.00% | 1248312.0 -> 3345464.0

Revision             Result                  N
chromium@510790      1248312 +- 0.0          6      good
chromium@510815      1248312 +- 0.0          9      good
chromium@510817      1481329 +- 1977214      9      good
chromium@510818      1248312 +- 0.0          6      good
chromium@510819      3112447 +- 1977214      9      bad       <--
chromium@510822      3345464 +- 0.0          6      bad
chromium@510828      3345464 +- 0.0          6      bad
chromium@510840      3345464 +- 0.0          6      bad
chromium@510889      3345464 +- 0.0          6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=long.running.tools.gmail.foreground system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8964676780520069536


For feedback, file a bug with component Speed>Bisection
shivanisha, any update here? Bisect is reproducing a 1MiB memory regression on android.
Looking into this.
Cc: perezju@chromium.org
It seems unlikely for this CL to be the cause of the regression. This CL is not a functionality change since it is for the reduced case of at most 1 transaction writing to the cache which was already happening. It does add a new object HttpCache::Writers for every URLRequest but it's a lightweight object so shouldn't be the root cause.

=== Auto-CCing suspected CL author shivanisha@chromium.org ===

Hi shivanisha@chromium.org, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Shivani Sharma
  Commit : c18f976cdcb74086ce6bab2d2574718166c53517
  Date   : Mon Oct 23 16:43:23 2017
  Subject: Integrate HttpCache::Writers with HttpCache and HttpCache::Transaction layers.

Bisect Details
  Configuration: android_nexus5X_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:gpu:effective_size_avg/long_running_tools/long_running_tools_gmail-foreground
  Change       : 168.00% | 1248312.0 -> 3345464.0

Revision             Result                  N
chromium@510790      1248312 +- 0.0          6      good
chromium@510815      1248312 +- 0.0          6      good
chromium@510817      1248312 +- 0.0          6      good
chromium@510818      1248312 +- 0.0          6      good
chromium@510819      3345464 +- 0.0          6      bad       <--
chromium@510822      3345464 +- 0.0          6      bad
chromium@510828      2995939 +- 1914429      6      bad
chromium@510840      2995939 +- 1914429      6      bad
chromium@510889      3345464 +- 0.0          6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=long.running.tools.gmail.foreground system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8958199865643720528


For feedback, file a bug with component Speed>Bisection
Those are two rock-solid bisects pointing to the same CL.

I would recommend to look at and compare a couple of traces

before:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2017-10-23_10-47-57_36721.html

after:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/long_running_tools_gmail_foreground_2017-10-23_21-07-24_69612.html

In the renderer process, looking at the last memory dump, the "atfer" gpu memory shows an extra 2MiB.

Maybe those traces can figure out the source of the regression?
Cc: primiano@chromium.org erikc...@chromium.org
Compared the "before" and "after" runs and looking at the renderer process's gpu memory. For "after" it shows 2.1 MiB starting from the very first memory dump till the last so its not a gradual increase (suggesting a leak) or an abrupt increase.
Moreover, the CL pointed to by the bisect is not a functionality change and all changes completely reside in the browser process so not sure how this could be related.

Adding erikchen and primiano as per perezju@'s suggestion.
erikchen@, primiano@, Could you PTAL to help debug further , Thanks!

Cc: ericrk@chromium.org
attachign before and after screenshots.

ericrk, can you help with interpretation? does "free_size" mean that there "real" increase is 0?
Screen Shot 2018-01-08 at 5.08.52 PM.png
81.2 KB View Download
Screen Shot 2018-01-08 at 5.08.49 PM.png
86.4 KB View Download
"Free size" in this case just indicates that the buffer is available for immediate use, not that it's not using up space, so I think this is a real regression.

Looking at this, it appears that something has caused us to create a media or raster command buffer, where previously we didn't have one. Things that might cause this are: longer load times resulting in a spinner / progress bar being displayed, etc... so maybe something not directly related to the GPU could have tripped this.

On the other hand, this could be a timing issue. Do you know at what point these memory dumps are taken? Is it after a low memory signal etc... or just at periodic intervals? I might suspect a timing change that causes us to grab a dump at a different point from before... however, the consistency with which this extra buffer appears makes me doubt this a bit.


> Do you know at what point these memory dumps are taken?

At periodic intervals. On the "after" trace all memory dumps (from the very first to up to the last one) show the regressed value of ~ 2.1MiB
Status: WontFix (was: Assigned)
Marking as wontfix as the memory regression cannot be reproduced locally on an android device. (as per comment #15)
Status: Assigned (was: WontFix)
I don't think being unable to reproduce locally is a reason for WontFix-ing. Both dashboards and bisects *are* reproducing a pretty clear and large regression.

Which kind of device did you use? Note the regression was detected on a N5X.

Also have a look at go/telemetry-device-setup, to set up your device as similar as possible to what the bots on the waterfall do.

Alternatively, if you don't have access to the right kind of device, you can run a try-job.
Status: WontFix (was: Assigned)
Hi perezju,
I spoke with shivanisha during this process. Her change is unrelated to the GPU-stack - it's a network stack change. If you see the traces she uploaded locally, both of them have ~2.1MB of GPU buffer in the renderer process, which matches the GPU buffer size in the renderer process on the Nexus 5X post change. This suggests that this is a timing issue related to how network requests get loaded by WPR on the Nexus 5X device. Looking at the Nexus 5X trace pre-change, there's only 0.1MB reported by the GPU MDP in the renderer process. This is quite suspicious, as we'd expect the command buffer + some textures to occupy much more than that, again, that points at loading/timing issues.

While it would be possible to further debug this issue using try-jobs, the marginal utility of doing so what appears to be an unrelated change appears low. In general, we don't have good mechanisms of noticing/debugging subtle loading/timing changes in WPR pages.
That sounds good to me. Thanks for the detailed explanation.

Sign in to add a comment