Issue metadata
Sign in to add a comment
|
56.3% regression in system_health.memory_desktop at 607067:607096 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Nov 14
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/112647cfe40000
,
Nov 14
📍 Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/112647cfe40000 cc/gpu: Use same budget for transfer cache and discardable textures. by khushalsagar@chromium.org https://chromium.googlesource.com/chromium/src/+/3477c6da77a41684e23f9533c6058dc1c64102ea memory:chrome:all_processes:reported_by_chrome:skia:effective_size: 1.074e+07 → 1.691e+07 (+6.166e+06) Understanding performance regressions: http://g.co/ChromePerformanceRegressions Benchmark documentation link: https://bit.ly/system-health-benchmarks
,
Nov 14
The desktop regressions are just from the cache limit increasing to match GPU raster. The previous limit was 128M and the new limit is 192M, I see that the usage has gone from 98M to 133M. The confusing one to me is the android-go regression. Since its a low end android device, this change should have reduced the budget for that from 768K to 512K. I still see an additional entry of 2.7M in the transfer cache resulting in the regression. Maybe its locked? But why would this change affect that. ericrk@, any ideas?
,
Nov 14
+Eric for realz.
,
Nov 16
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12f5e100140000
,
Nov 16
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1224a3cfe40000
,
Nov 16
I'm going to do another bisect for the android regressions. The desktop ones make sense, the cache limit has been increased to match what GPU raster was using and all memory increases are within the new limit. But I would expect this change to be a no-op for Android. The cache budget should be the same for normal Android, and a little lower for low end.
,
Nov 17
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/12f5e100140000 HTTP status code 401: Login Required
,
Nov 17
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/1224a3cfe40000 HTTP status code 401: Login Required
,
Nov 17
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/10366517e40000
,
Nov 17
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/10366517e40000 All of the runs failed. The most common error (10/20 runs) was: HTTPException: HTTP status code 400: {"error": {"message": "CIPD package path is required. Use \".\" to install to run dir."}}
,
Nov 17
+sullivan, ned All my bisects seem to be failing. Is there an ongoing issue with the perfbots?
,
Nov 17
Dave, Shoutao for #13
,
Nov 18
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1065ee50140000
,
Nov 18
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/1065ee50140000 All of the runs failed. The most common error (10/20 runs) was: HTTPException: HTTP status code 400: {"error": {"message": "CIPD package path is required. Use \".\" to install to run dir."}}
,
Nov 19
,
Nov 19
I couldn't repro this locally on a 5x with my change reverted. And verified that the cache budget, which is the only thing that could be affected by this change, is the same for Android. I'm going to wait for a bisect now before looking into this further.
,
Nov 20
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/13672eb7e40000
,
Nov 20
📍 Found significant differences after each of 2 commits. https://pinpoint-dot-chromeperf.appspot.com/job/13672eb7e40000 Build internal android targets in telemetry_chrome_test. by jbudorick@chromium.org https://chromium.googlesource.com/chromium/src/+/e0c7bc0e869c70995d8b53d9a22e68294f07c7f2 memory:chrome:all_processes:reported_by_chrome:gpu:effective_size: 6.015e+06 → No values Revert "Build internal android targets in telemetry_chrome_test." by jbudorick@chromium.org https://chromium.googlesource.com/chromium/src/+/c7539ef6291836eff22077b38ae55bb62f536b99 memory:chrome:all_processes:reported_by_chrome:gpu:effective_size: No values → 1.314e+07 Understanding performance regressions: http://g.co/ChromePerformanceRegressions Benchmark documentation link: https://bit.ly/system-health-benchmarks
,
Nov 20
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/143de524140000
,
Nov 20
Sigh, since the perf bots broke while r606139 was there, the bisect couldn't get any numbers for the changes before it was reverted. I have started another job for the range preceding the revert but with the revert patch applied for the complete job (didn't know this existed, pretty cool). I suppose that should work?
,
Nov 20
,
Nov 26
I don't know what's going on with pinpoint. The job started 6 days ago and still shows running...
,
Nov 26
,
Nov 26
Issue 908156 has been merged into this issue.
,
Nov 27
Ah, I think the Android regressions are explained by: https://chromium-review.googlesource.com/c/1321190. The Android Nexus5x Webview Perf bot did not break with the change mentioned in #22 so it had the correct range for when the regression happened. The PaintCache makes it so we stop using TransferCache for high frequency small items during OOP raster. These items contribute less in size but since are a lot in count, they force eviction of large image entries from the TransferCache because we hit the count limit of 2000. And that eviction is now gone since images and these entries don't share the same cache anymore. You can see this by looking at the traces, the before one has many entries marked "cpu" in the memory dump but the after one does not. I'm going to start a bisect on the Webview to confirm this theory, in which case this is a WontFix.
,
Nov 27
,
Nov 27
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12595997e40000
,
Nov 27
📍 Found significant differences after each of 2 commits. https://pinpoint-dot-chromeperf.appspot.com/job/12595997e40000 Build internal android targets in telemetry_chrome_test. by jbudorick@chromium.org https://chromium.googlesource.com/chromium/src/+/e0c7bc0e869c70995d8b53d9a22e68294f07c7f2 memory:webview:all_processes:reported_by_chrome:gpu:effective_size: 6.415e+06 → No values Revert "Build internal android targets in telemetry_chrome_test." by jbudorick@chromium.org https://chromium.googlesource.com/chromium/src/+/c7539ef6291836eff22077b38ae55bb62f536b99 memory:webview:all_processes:reported_by_chrome:gpu:effective_size: No values → 6.752e+06 Understanding performance regressions: http://g.co/ChromePerformanceRegressions Benchmark documentation link: https://bit.ly/system-health-benchmarks
,
Nov 29
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/112e63c8140000
,
Nov 29
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/112e63c8140000 All of the runs failed. The most common error (20/20 runs) was: ReadValueError: Could not find values matching: {'story': u'browse:tech:discourse_infinite_scroll:2018', 'tir_label': u'browse_tech', 'histogram': u'memory:chrome:all_processes:reported_by_chrome:gpu:effective_size'}
,
Nov 30
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/16094870140000
,
Nov 30
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/16094870140000 All of the runs failed. The most common error (20/20 runs) was: ReadValueError: Could not find values matching: {'story': u'browse:tech:discourse_infinite_scroll:2018', 'tir_label': u'browse_tech', 'histogram': u'memory:chrome:all_processes:reported_by_chrome:gpu:effective_size'}
,
Nov 30
Dave, Shoutao, any idea what's going wrong here? I tried scheduling a bisect with a narrower range to see the impact of the change mentioned in #27.
,
Jan 9
I went through all the mobile regressions and I'm seeing the pattern mentioned in #27 in the gpu memory dumps in all traces except the "background" test cases which report 0 memory. My intuition is that the root cause is the same, which increases the avg private_dirty_size, but because we release all memory on being backgrounded its not obvious from the traces. It would still be nice to get some validation from the bots with a bisect but I'm having trouble avoiding it always bisecting to jbudorick's revert. Looping the perf sheriff back to help setup a bisect for that. If it does bisect to r606400, then this is a WontFix.
,
Jan 9
Juan, do you think we should get more validation from bisect bots, or close this?
,
Jan 10
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14afa3a3940000
,
Jan 10
Retying the job in #38. Regardless of the memory regression, I think we should make sure that pinpoint can provide answers to cases like this. Not sure, however, who should be the right person to make the call whether the regression is acceptable or sufficiently explained.
,
Jan 10
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/14afa3a3940000
,
Jan 10
Re #39, thanks for redoing the tryjob to make sure we have an accurate bisect. About understanding whether the regression is acceptable, my rule of thumb is to cc other people who are an expert in that code area for an opinion and weigh in with other improvements associated with the change. To reiterate, there are 2 changes that have been bundled into this regression: 1) cc/gpu: Use same budget for transfer cache and discardable textures, (r607080) The motivation behind this change is to ensure parity between the current GPU raster path and the new feature (OOP raster) in terms of image cache budgeting. Since we were initially focused on Android, the limits were set keeping that in mind but the change now configures it based on the platform. Since we are now working on enabling this for desktop platforms. 2) cc/gpu: Add client driven paint caching for OOP raster. (r606400) The motivation behind this change was to improve caching of paint data structures in OOP raster. The fact that they were sharing a single cache with images which was causing a large number of small sized entries to evict a few large sized image entries was an unintentional pattern from this caching strategy. Also, I expect a lot of the thread times improvements here (https://chromeperf.appspot.com/group_report?rev=606400) to be from this change. Adding a few more chrome-gpu folks to check my assumptions here. Let me know if there is a different process suggested for making a call in cases like these. :)
,
Jan 10
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/1293ea85940000
,
Jan 10
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/1293ea85940000 All of the runs failed. The most common error (20/20 runs) was: SwarmingTaskError: The swarming task failed with state "BOT_DIED".
,
Jan 10
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/17a2da25940000
,
Jan 10
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/17a2da25940000 All of the runs failed. The most common error (20/20 runs) was: SwarmingTaskError: The swarming task failed with state "BOT_DIED".
,
Jan 10
For #1, did we confirm that we achieve parity between GPU raster and OOP raster? If so, the regression is acceptable. For #2, it seems like the right tradeoff as well, though we may want to follow up with additional tuning of the paint cache budget on low-end platforms.
,
Jan 10
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12297835940000
,
Jan 10
Re #46, Good point, I didn't confirm with a run that we were at parity between OOP and GPU. Just assumed so since this was setting an equivalent budget between the corresponding caches in the 2 modes, and the change in cache size lined up with the budget change. I triggered a run with OOP disabled on the bots to compare the current state for the 2. I'll try a local run as well.
,
Jan 10
😿 Pinpoint job stopped with an error. https://pinpoint-dot-chromeperf.appspot.com/job/12297835940000 guid
,
Jan 11
Pinpoint job failed. I tried a local run on my machine for browse:media:imgur. One thing to note is that GPU memory used by images have moved in terms of where they are reported. Earlier it was the GpuImageDecodeCache in the renderer and now its the ServiceTransferCache in the gpu process. This is what I got over 3 runs: OOP: 149M, 143M, 147M GPU: 145M, 175M, 146M The OOP and GPU numbers are pretty equivalent. This is a WontFix from my side. perezju@, feel free to close this if you don't think there is anything needed to make sure we get pinpoint bisects working for this.
,
Jan 11
I think the most recent pinpoint failures above are now being tracked in issue 916877 and issue 920750 . Fine to close this.
,
Jan 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6e802e79e19b5a7ca3fe3de3c34719bf399d668a commit 6e802e79e19b5a7ca3fe3de3c34719bf399d668a Author: Khushal Sagar <khushalsagar@chromium.org> Date: Mon Jan 14 11:02:17 2019 gpu: Tune PaintCache budget for low-end devices. Lower the PaintCache budget if running on low-end devices from 4M to 256K. R=ericrk@chromium.org Bug: 905172 Change-Id: I9e243372b2f1266e13288d726d962da68a0868d6 Reviewed-on: https://chromium-review.googlesource.com/c/1405959 Reviewed-by: Eric Karl <ericrk@chromium.org> Commit-Queue: Khushal <khushalsagar@chromium.org> Cr-Commit-Position: refs/heads/master@{#622416} [modify] https://crrev.com/6e802e79e19b5a7ca3fe3de3c34719bf399d668a/gpu/command_buffer/client/raster_implementation.cc |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Nov 14