New issue
Advanced search Search tips

Issue 905172 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
Closed: Jan 11
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression

Blocked on:
issue 906547



Sign in to add a comment

56.3% regression in system_health.memory_desktop at 607067:607096

Project Member Reported by chiniforooshan@chromium.org, Nov 14

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=905172

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=6901d3188f2bb22cd70f0dcb2d7de9bb3818bb50d695f9f90f578a19fc39023f


Bot(s) for this bug's original alert(s):

mac-10_13_laptop_high_end-perf

system_health.memory_desktop - Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
Cc: khushals...@chromium.org
Owner: khushals...@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/112647cfe40000

cc/gpu: Use same budget for transfer cache and discardable textures. by khushalsagar@chromium.org
https://chromium.googlesource.com/chromium/src/+/3477c6da77a41684e23f9533c6058dc1c64102ea
memory:chrome:all_processes:reported_by_chrome:skia:effective_size: 1.074e+07 → 1.691e+07 (+6.166e+06)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
The desktop regressions are just from the cache limit increasing to match GPU raster. The previous limit was 128M and the new limit is 192M, I see that the usage has gone from 98M to 133M.

The confusing one to me is the android-go regression. Since its a low end android device, this change should have reduced the budget for that from 768K to 512K. I still see an additional entry of 2.7M in the transfer cache resulting in the regression. Maybe its locked? But why would this change affect that. ericrk@, any ideas?


Cc: ericrk@chromium.org
+Eric for realz.
I'm going to do another bisect for the android regressions. The desktop ones make sense, the cache limit has been increased to match what GPU raster was using and all memory increases are within the new limit. But I would expect this change to be a no-op for Android. The cache budget should be the same for normal Android, and a little lower for low end.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/12f5e100140000

HTTP status code 401: Login Required

😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1224a3cfe40000

HTTP status code 401: Login Required

😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/10366517e40000

All of the runs failed. The most common error (10/20 runs) was:
HTTPException: HTTP status code 400: {"error": {"message": "CIPD package path is required. Use \".\" to install to run dir."}}
Cc: sullivan@chromium.org nednguyen@chromium.org
Components: Speed>Benchmarks>Waterfall
+sullivan, ned
All my bisects seem to be failing. Is there an ongoing issue with the perfbots?
Cc: dtu@chromium.org st...@chromium.org
Components: Speed>Bisection
Dave, Shoutao for #13
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1065ee50140000

All of the runs failed. The most common error (10/20 runs) was:
HTTPException: HTTP status code 400: {"error": {"message": "CIPD package path is required. Use \".\" to install to run dir."}}
Blockedon: 906547
I couldn't repro this locally on a 5x with my change reverted. And verified that the cache budget, which is the only thing that could be affected by this change, is the same for Android. I'm going to wait for a bisect now before looking into this further.
Cc: jbudorick@chromium.org
📍 Found significant differences after each of 2 commits.
https://pinpoint-dot-chromeperf.appspot.com/job/13672eb7e40000

Build internal android targets in telemetry_chrome_test. by jbudorick@chromium.org
https://chromium.googlesource.com/chromium/src/+/e0c7bc0e869c70995d8b53d9a22e68294f07c7f2
memory:chrome:all_processes:reported_by_chrome:gpu:effective_size: 6.015e+06 → No values

Revert "Build internal android targets in telemetry_chrome_test." by jbudorick@chromium.org
https://chromium.googlesource.com/chromium/src/+/c7539ef6291836eff22077b38ae55bb62f536b99
memory:chrome:all_processes:reported_by_chrome:gpu:effective_size: No values → 1.314e+07

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
Sigh, since the perf bots broke while r606139 was there, the bisect couldn't get any numbers for the changes before it was reverted. I have started another job for the range preceding the revert but with the revert patch applied for the complete job (didn't know this existed, pretty cool). I suppose that should work?
Cc: -nednguyen@chromium.org
I don't know what's going on with pinpoint. The job started 6 days ago and still shows running...
Cc: majidvp@chromium.org
 Issue 908091  has been merged into this issue.
 Issue 908156  has been merged into this issue.
Owner: ----
Status: Untriaged (was: Assigned)
Ah, I think the Android regressions are explained by: https://chromium-review.googlesource.com/c/1321190. The Android Nexus5x Webview Perf bot did not break with the change mentioned in #22 so it had the correct range for when the regression happened.

The PaintCache makes it so we stop using TransferCache for high frequency small items during OOP raster. These items contribute less in size but since are a lot in count, they force eviction of large image entries from the TransferCache because we hit the count limit of 2000. And that eviction is now gone since images and these entries don't share the same cache anymore. You can see this by looking at the traces, the before one has many entries marked "cpu" in the memory dump but the after one does not. I'm going to start a bisect on the Webview to confirm this theory, in which case this is a WontFix.
Owner: khushals...@chromium.org
Status: Assigned (was: Untriaged)
📍 Found significant differences after each of 2 commits.
https://pinpoint-dot-chromeperf.appspot.com/job/12595997e40000

Build internal android targets in telemetry_chrome_test. by jbudorick@chromium.org
https://chromium.googlesource.com/chromium/src/+/e0c7bc0e869c70995d8b53d9a22e68294f07c7f2
memory:webview:all_processes:reported_by_chrome:gpu:effective_size: 6.415e+06 → No values

Revert "Build internal android targets in telemetry_chrome_test." by jbudorick@chromium.org
https://chromium.googlesource.com/chromium/src/+/c7539ef6291836eff22077b38ae55bb62f536b99
memory:webview:all_processes:reported_by_chrome:gpu:effective_size: No values → 6.752e+06

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  https://bit.ly/system-health-benchmarks
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/112e63c8140000

All of the runs failed. The most common error (20/20 runs) was:
ReadValueError: Could not find values matching: {'story': u'browse:tech:discourse_infinite_scroll:2018', 'tir_label': u'browse_tech', 'histogram': u'memory:chrome:all_processes:reported_by_chrome:gpu:effective_size'}
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/16094870140000

All of the runs failed. The most common error (20/20 runs) was:
ReadValueError: Could not find values matching: {'story': u'browse:tech:discourse_infinite_scroll:2018', 'tir_label': u'browse_tech', 'histogram': u'memory:chrome:all_processes:reported_by_chrome:gpu:effective_size'}
Dave, Shoutao, any idea what's going wrong here? I tried scheduling a bisect with a narrower range to see the impact of the change mentioned in #27.
Owner: ----
Status: Available (was: Assigned)
I went through all the mobile regressions and I'm seeing the pattern mentioned in #27 in the gpu memory dumps in all traces except the "background" test cases which report 0 memory. My intuition is that the root cause is the same, which increases the avg private_dirty_size, but because we release all memory on being backgrounded its not obvious from the traces.

It would still be nice to get some validation from the bots with a bisect but I'm having trouble avoiding it always bisecting to jbudorick's revert. Looping the perf sheriff back to help setup a bisect for that. If it does bisect to r606400, then this is a WontFix.
Owner: perezju@chromium.org
Juan, do you think we should get more validation from bisect bots, or close this?
Retying the job in #38. Regardless of the memory regression, I think we should make sure that pinpoint can provide answers to cases like this.

Not sure, however, who should be the right person to make the call whether the regression is acceptable or sufficiently explained.
📍 Couldn't reproduce a difference.
https://pinpoint-dot-chromeperf.appspot.com/job/14afa3a3940000
Cc: piman@chromium.org enne@chromium.org vmi...@chromium.org
Re #39, thanks for redoing the tryjob to make sure we have an accurate bisect. About understanding whether the regression is acceptable, my rule of thumb is to cc other people who are an expert in that code area for an opinion and weigh in with other improvements associated with the change.

To reiterate, there are 2 changes that have been bundled into this regression:

1) cc/gpu: Use same budget for transfer cache and discardable textures, (r607080)
The motivation behind this change is to ensure parity between the current GPU raster path and the new feature (OOP raster) in terms of image cache budgeting. Since we were initially focused on Android, the limits were set keeping that in mind but the change now configures it based on the platform. Since we are now working on enabling this for desktop platforms.

2) cc/gpu: Add client driven paint caching for OOP raster. (r606400)
The motivation behind this change was to improve caching of paint data structures in OOP raster. The fact that they were sharing a single cache with images which was causing a large number of small sized entries to evict a few large sized image entries was an unintentional pattern from this caching strategy.
Also, I expect a lot of the thread times improvements here (https://chromeperf.appspot.com/group_report?rev=606400) to be from this change.

Adding a few more chrome-gpu folks to check my assumptions here. Let me know if there is a different process suggested for making a call in cases like these. :)
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1293ea85940000

All of the runs failed. The most common error (20/20 runs) was:
SwarmingTaskError: The swarming task failed with state "BOT_DIED".
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/17a2da25940000

All of the runs failed. The most common error (20/20 runs) was:
SwarmingTaskError: The swarming task failed with state "BOT_DIED".
For #1, did we confirm that we achieve parity between GPU raster and OOP raster? If so, the regression is acceptable.
For #2, it seems like the right tradeoff as well, though we may want to follow up with additional tuning of the paint cache budget on low-end platforms.
Re #46, Good point, I didn't confirm with a run that we were at parity between OOP and GPU. Just assumed so since this was setting an equivalent budget between the corresponding caches in the 2 modes, and the change in cache size lined up with the budget change.

I triggered a run with OOP disabled on the bots to compare the current state for the 2. I'll try a local run as well.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/12297835940000

guid
Pinpoint job failed. I tried a local run on my machine for browse:media:imgur. One thing to note is that GPU memory used by images have moved in terms of where they are reported. Earlier it was the GpuImageDecodeCache in the renderer and now its the ServiceTransferCache in the gpu process.

This is what I got over 3 runs:
OOP: 149M, 143M, 147M
GPU: 145M, 175M, 146M

The OOP and GPU numbers are pretty equivalent.

This is a WontFix from my side. perezju@, feel free to close this if you don't think there is anything needed to make sure we get pinpoint bisects working for this.
Status: WontFix (was: Available)
I think the most recent pinpoint failures above are now being tracked in issue 916877 and  issue 920750 .

Fine to close this.
Project Member

Comment 52 by bugdroid1@chromium.org, Jan 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6e802e79e19b5a7ca3fe3de3c34719bf399d668a

commit 6e802e79e19b5a7ca3fe3de3c34719bf399d668a
Author: Khushal Sagar <khushalsagar@chromium.org>
Date: Mon Jan 14 11:02:17 2019

gpu: Tune PaintCache budget for low-end devices.

Lower the PaintCache budget if running on low-end devices from 4M to
256K.

R=ericrk@chromium.org

Bug:  905172 
Change-Id: I9e243372b2f1266e13288d726d962da68a0868d6
Reviewed-on: https://chromium-review.googlesource.com/c/1405959
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#622416}
[modify] https://crrev.com/6e802e79e19b5a7ca3fe3de3c34719bf399d668a/gpu/command_buffer/client/raster_implementation.cc

Sign in to add a comment