New issue
Advanced search Search tips

Issue 859635 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression

Blocked on:
issue 863995



Sign in to add a comment

Turning on OOP Raster regresses memory

Project Member Reported by bsheedy@google.com, Jul 2

Issue description

Bisect points to https://chromium-review.googlesource.com/1117275. There are also a number of other regressions in the same revision range that I suspect are caused by this, but I have yet to confirm that.

There are a few improvements in the revision range, but there are far more regressions than improvements.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=859635

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=2160a3cfaa99c0534568ff895ca364db9ba971639083ffc0fcc274307dfab786


Bot(s) for this bug's original alert(s):

pixel_2_xl
pixel_xl
Components: UI>Browser>VR
Labels: VR-Perf
Cc: khushals...@chromium.org
Components: Internals>Compositing>OOP-Raster
This is most likely due to  issue 859419 . I'm on it.
Cc: enne@chromium.org hjd@google.com qnnguyen@chromium.org
 Issue 859947  has been merged into this issue.
 Issue 859920  has been merged into this issue.
Cc: piman@chromium.org vmi...@chromium.org ericrk@chromium.org
The regressions in skia size in the GPU process are expected, this is simply all of skia caching moving from renderer to GPU process since with OOP raster skia runs in the GPU process. Also the changes reported currently are minor since this is only the glyph cache. I have a change up to report the gpu memory consumed by skia (https://chromium-review.googlesource.com/c/chromium/src/+/1123894).

The regressions in overall PSS are likely due to not purging skia's GPU cache as described in  issue 859419  (patch up https://chromium-review.googlesource.com/c/chromium/src/+/1123775). This fixed a couple of cases I tried locally so going to wait for this to land before investigating further.

There is one case where the regression is real and unavoidable (imgur). The reason is that this is hitting a case of greater than max texture size images in our image cache stack. Since these images can't be uploaded directly, we hand cache the bitmap which is handed over to skia at draw time. In the Gpu cache case this bitmap was cached in discardable but now it is also cached in the GPU process in memory managed by the TransferCache. Its not possible to switch to discardable memory in the GPU since the renderer has to be able to lock the entry with just the discardable handle. Also the case is rare and the transfer cache respects memory pressure signals to release everything if necessary. +ericrk for his thoughts too.
Components: -UI>Browser>VR
Labels: OS-Android
 Issue 860193  has been merged into this issue.
Summary: Turning on OOP Raster regresses memory (was: A zero-to-nonzero to 1.1% regression in xr.browsing.static at 571576:571697)
 Issue 860269  has been merged into this issue.
 Issue 860294  has been merged into this issue.
 Issue 860188  has been merged into this issue.
 Issue 860267  has been merged into this issue.
 Issue 860178  has been merged into this issue.
 Issue 860284  has been merged into this issue.
 Issue 860298  has been merged into this issue.
 Issue 860282  has been merged into this issue.
 Issue 860266  has been merged into this issue.
 Issue 860280  has been merged into this issue.
 Issue 860261  has been merged into this issue.
Owner: khushals...@chromium.org
Cc: tdres...@chromium.org
 Issue 861897  has been merged into this issue.
I think we also need to make the ServiceTransferCache shared across all raster decoders, to ensure we have a unified cache for uploaded images similar to discardable GPU with GPU raster. This is especially important because we are not clearing the cache for a decoder once the corresponding renderer is backgrounded.
Cc: perezju@chromium.org
 Issue 861905  has been merged into this issue.
 Issue 861904  has been merged into this issue.
Project Member

Comment 28 by bugdroid1@chromium.org, Jul 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4

commit 7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4
Author: Khushal <khushalsagar@chromium.org>
Date: Tue Jul 10 20:01:45 2018

gpu: Drop GrContext cache after an idle period.

The renderer drops the GrContext cache after a second of idle period,
where idle period is identified as a continous period where the
ContextProvider backing a GrContext is not used. Implement the same
optimization for OOP raster in the GPU process.

R=ericrk@chromium.org

Bug:  859419 , 859635 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Ic2ae533212b94320cd8c1b8b1149cf5ba4ce686e
Reviewed-on: https://chromium-review.googlesource.com/1123775
Reviewed-by: Eric Karl <ericrk@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#573871}
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/BUILD.gn
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/BUILD.gn
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/command_buffer_direct.h
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/decoder_client.h
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/gles2_cmd_decoder_unittest_base.h
[add] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/gr_cache_controller.cc
[add] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/gr_cache_controller.h
[add] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/gr_cache_controller_unittest.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/memory_program_cache_unittest.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/program_manager_unittest.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/raster_decoder.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/raster_decoder_unittest.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/command_buffer/service/raster_decoder_unittest_base.h
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/in_process_command_buffer.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/in_process_command_buffer.h
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/service/command_buffer_stub.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/service/command_buffer_stub.h
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/service/gpu_channel_manager.cc
[modify] https://crrev.com/7324ec4b07a7c91aec1fb4e65d2c34964c4a8ab4/gpu/ipc/service/gpu_channel_manager.h

 Issue 862149  has been merged into this issue.
 Issue 862272  has been merged into this issue.
 Issue 862252  has been merged into this issue.
 Issue 862258  has been merged into this issue.
 Issue 862263  has been merged into this issue.
 Issue 862248  has been merged into this issue.
 Issue 862266  has been merged into this issue.
Project Member

Comment 36 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/441bb2d8704de8d4ae008d2fc212674d8b6d5670

commit 441bb2d8704de8d4ae008d2fc212674d8b6d5670
Author: Khushal <khushalsagar@chromium.org>
Date: Wed Jul 11 17:42:39 2018

gpu: Follow up on GrContext idle cleanup patch.

Follow up for remaining comments on:
https://chromium-review.googlesource.com/c/chromium/src/+/1123775

R=ericrk@chromium.org

Bug:  859419 , 859635 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I78577f183d4050d6380024dc15aad2573c24714e
Reviewed-on: https://chromium-review.googlesource.com/1132350
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574228}
[modify] https://crrev.com/441bb2d8704de8d4ae008d2fc212674d8b6d5670/gpu/command_buffer/service/gr_cache_controller.cc
[modify] https://crrev.com/441bb2d8704de8d4ae008d2fc212674d8b6d5670/gpu/ipc/service/gpu_channel_manager.cc

I'll summarize all the regressions that have been duped here so far, and the expectation is that most of the actionable ones should be fixed by the change in #28.

1) xr.browsing.wpr.static
The reported_by_chrome regressions in the gpu process is skia's glyph cache moving from renderer to GPU process. Look at the skia tab in before and after traces, you should see the same deduction in the renderer.

The regression in PSS/malloc for the GPU process/all processes are legit and most are already fixed by #28 on the bots that caught up with the change. I'll wait for the rest of the bots to get there.

Still a few that might warrant more investigation, since it didn't recover completely.
a) pixel_2_xl gl:pss_avg on index GPU PSS usage.
b) pixel_xl renderer_processes:reported_by_chrome:malloc:effective_size_avg on index and https_m_facebook_com_rihanna.

There is a regression in memory:chrome:renderer_processes:reported_by_chrome:shared_memory:effective_size_avg, all of them very minor (~2-6k), which I think is somewhat expected since OOP uses the transfer buffer for sending paint commands instead of command buffer. On that note, I guess in this mode we could reduce the size of the command buffer to 64k similar to mailbox contexts since we're not actually sending GL commands here anymore?

2) system_health.memory_mobile/memory.top_10_mobile
There are a bunch of all_processes:reported_by_chrome:skia:effective_size_avg from the glyph cache. The increase is well within the cache budget. I'm going to guess some event in the renderer caused us to purge this cache, and the same is not replicated in the GPU. Need to verify this.

reported_by_os:system_memory regressions should again be fixed by #28, the bots haven't caught up with that change yet.
> On that note, I guess in this mode we could reduce the size of the command buffer to 64k similar to mailbox contexts since we're not actually sending GL commands here anymore

Yes, we should make the default command buffer size for Raster contexts much smaller. 64k as a start sg, I suspect it's even larger than we need.
Project Member

Comment 39 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2fc1e3d1e3915159b3c7053d2c37b116fd18bb31

commit 2fc1e3d1e3915159b3c7053d2c37b116fd18bb31
Author: Khushal <khushalsagar@chromium.org>
Date: Thu Jul 12 21:36:03 2018

content: Use lower command buffer memory for OOP raster contexts.

Since most commands with OOP raster are paint commands sent via the
transfer buffer and not GL commands sent via the command buffer, we
can use a much smaller command buffer.

R=piman@chromium.org

Bug:  859635 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I8a52395b40f240d774ea2f54cc7f4961ab0ac2ce
Reviewed-on: https://chromium-review.googlesource.com/1134471
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574732}
[modify] https://crrev.com/2fc1e3d1e3915159b3c7053d2c37b116fd18bb31/content/renderer/render_thread_impl.cc
[modify] https://crrev.com/2fc1e3d1e3915159b3c7053d2c37b116fd18bb31/gpu/command_buffer/client/shared_memory_limits.h

Project Member

Comment 40 by bugdroid1@chromium.org, Jul 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/996e9917b9e14b73beddb285f28086f3cfdb3d81

commit 996e9917b9e14b73beddb285f28086f3cfdb3d81
Author: Khushal <khushalsagar@chromium.org>
Date: Fri Jul 13 08:31:00 2018

gpu: Use a shared ServiceTransferCache between RasterDecoders.

Currently each decoder has its own cache which means each decoder could
potentially keep the budget's maximum capacity allocated at any time.
This is particularly bad because the GPU process has no notion of which
decoder belongs to a background renderer, so the memory is not released
even if the renderer is invisible.

R=ericrk@chromium.org

Bug:  859635 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel;master.tryserver.blink:linux_trusty_blink_rel
Change-Id: I9aa709b860270f94ca83bb096deb74e5f6de0ff6
Reviewed-on: https://chromium-review.googlesource.com/1132330
Commit-Queue: Khushal <khushalsagar@chromium.org>
Reviewed-by: Eric Karl <ericrk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574862}
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/cc/paint/transfer_cache_unittest.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/raster_decoder.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/raster_decoder.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/raster_decoder_context_state.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/raster_decoder_context_state.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/raster_decoder_mock.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/service_transfer_cache.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/service_transfer_cache.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/command_buffer/service/service_transfer_cache_unittest.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/ipc/in_process_command_buffer.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/ipc/in_process_command_buffer.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/ipc/raster_in_process_context.cc
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/ipc/raster_in_process_context.h
[modify] https://crrev.com/996e9917b9e14b73beddb285f28086f3cfdb3d81/gpu/ipc/service/gpu_channel_manager.cc

So the regression with skia's glyph cache is the idle handler in the renderer purging all skia' caches. The logic in the renderer runs this handler at intervals of 30s, and keeps pushing it by 2 handler runs each time an input event is observed (btw it seems like that notification for pushing it to IdleUserDetector is only made if the input is handled on the main thread: https://cs.chromium.org/chromium/src/content/renderer/render_view_impl.cc?type=cs&targetos=chromium&g=0&l=2405, should it not consider input on the compositor thread?).

I could do something similar in the GPU, accounting for idle periods on the Gpu_main thread to purge these caches.
 Issue 861910  has been merged into this issue.
Cc: cbiesin...@chromium.org
 Issue 861908  has been merged into this issue.
 Issue 861907  has been merged into this issue.
Cc: simonhatch@chromium.org schenney@chromium.org
 Issue 862247  has been merged into this issue.
Blockedon: 863995
Bisects on the new bot names were kicked off using legacy recipe bisect instead of Pinpoint: blocking on  issue 863995 
Project Member

Comment 48 by bugdroid1@chromium.org, Jul 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f935630e5c264482a6e07b82a899bfed37b6f6d5

commit f935630e5c264482a6e07b82a899bfed37b6f6d5
Author: Khushal <khushalsagar@chromium.org>
Date: Tue Jul 17 01:03:42 2018

gpu: Release all skia's cached memory when app backgrounded.

R=ericrk@chromium.org

Bug:  859635 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I0afc80a877b5fdae5a4459bb5fa6f767ad842a7d
Reviewed-on: https://chromium-review.googlesource.com/1138750
Reviewed-by: Eric Karl <ericrk@chromium.org>
Commit-Queue: Khushal <khushalsagar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575504}
[modify] https://crrev.com/f935630e5c264482a6e07b82a899bfed37b6f6d5/gpu/ipc/service/gpu_channel_manager.cc

I investigated some of pixel_2_xl remaining regressions, and for a few cases where the gpu memory didn't recover completely, I think its because of the skia caches moving from renderer -> gpu which is also reflective in the reported_by_os numbers.

Here are the graphs for reported_by_os pss across all processes and seperately for gpu/renderer processes: https://chromeperf.appspot.com/report?sid=8288abdfa73a5afe06606fd469b362338696ade29788d040dc9e595b83bcfa6f and https://chromeperf.appspot.com/report?sid=50c5cc9053bcdc7d399f54c04fee1d1d64a6c653ec891bb9a337c126f87daf79

All regressions in overall PSS have recovered, and the slight increase in GPU process numbers is marked by a decrease in renderer numbers.

The memory:chrome:all_processes:reported_by_os:system_memory:private_dirty_size_avg for nexus 5 and 5x have also recovered: https://chromeperf.appspot.com/report?sid=05eab882d664b38096e4a851300e45b401be43d32d1020d19a3ef2a5e9f249c9, and the memory:chrome:all_processes:reported_by_os:system_memory:private_footprint_size_avg too: https://chromeperf.appspot.com/report?sid=105ab4363c146a7f2a8680c0a176d2f9b67e1cf848bdd7c12b5c86aafe5ecb3b.

Finally I don't know what's happening with the webview bots. I thought the names of the bots changed from android-webview-nexus6 -> Android Nexus6 WebView Perf. Looking at the timeline for those 2, the last run on android-webview-nexus6 was r572249 and the first run on Android Nexus6 WebView Perf was r576490 . Is there nothing for the range in between? (https://chromeperf.appspot.com/report?sid=5dfd4c3f57b3925f19e1238307e3450e6cc4a5ff30867b8c1e2a01bea364c2c7)

Another thing, the optimization in #41 is left but I don't know whether that's worth it. May be it helps with OOPIFs where each foreground renderer has an independent cache budget, but with unified skia caching in the GPU that's not the case anymore. Also not a lot is cached by skia on the cpu since we're doing GPU raster, other than the glyph cache. #48 purges everything when in background but I'm not sure if we need to do this when in foreground.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/11b48c59a40000

Buildbucket says the build completed successfully, but Pinpoint can't find the isolate hash.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/17ca6969a40000

All of the runs failed. The most common error (20/20 runs) was:
TimeoutException: Timed out while waiting 60s for IsJavaScriptExpressionTrue.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1044de71a40000

All of the runs failed. The most common error (20/20 runs) was:
TimeoutException: Timed out while waiting 60s for IsJavaScriptExpressionTrue.
😿 Pinpoint job stopped with an error.
https://pinpoint-dot-chromeperf.appspot.com/job/1413abd5a40000

All of the runs failed. The most common error (20/20 runs) was:
TimeoutException: Timed out while waiting 60s for IsJavaScriptExpressionTrue.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/17670bd5a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/113f6eb6a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/11b587e1a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/12de8559a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/1440f0c9a40000
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/1280f85da40000
Labels: ReleaseBlock-Beta
This is going to block us from shipping Chrome to Android in M69 as the regression is being reported for System Health as a ~50% regression in Android Graphics. Making this a release blocker for that purpose.
I don't think the experiment is running on beta, is it?
Labels: -ReleaseBlock-Beta
The experiment is not on beta at the moment, it doesn't need to be a blocker.

But we do plan to turn it on for beta in M69 eventually. And all the regressions present in the bug have been fixed for Clank for sure. I wanted to verify the status for Webview, for which I had triggered some tryjobs. And Webview still seems to have a minor regression which I'm looking at. I'll make sure these are addressed before we turn this on for beta.
📍 Job complete. See results below.
https://pinpoint-dot-chromeperf.appspot.com/job/129f0fcda40000
Status: Fixed (was: Assigned)
So I tried a pinpoint with OOP disabled (https://pinpoint-dot-chromeperf.appspot.com/job/11b587e1a40000) to compare it against GPU raster, and it shows a minor regression in memory:webview:all_processes:reported_by_os:private_dirty_size, from 54.4M to 56.8M. It looks like this metric became less noisy now, earlier the value oscillated between 45.3M and 53.8M and now its consistently at 53.8M. Digging through the trace, the only place with a consistent difference was the private dirty memory reported under Devices:GPU going from ~1.7M to ~3.2M. I don't see anything different/unexpected in the gpu memory reported by chrome or android_memtrack. The allocations under skia are identical (a single 4M texture). The cc tile size oscillates based on whether we have 6 or 7 tiles but that also seems to be the same in both cases.

There are changes in when our cache clearing logic would run now, with it happening on the GPU thread instead of in the renderer. The fact that the metric has become consistent now, instead of the noisy values earlier makes me believe its more of a timing issue of when clearing of things goes to the GPU. 

All the other regressions duped here have already been fixed, closing this big now.

Sign in to add a comment