New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 899940 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Nov 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression



Sign in to add a comment

5.6%-12.9% regression in memory:chrome:all_processes:reported_by_os:system_memory:private_footprint_size at 603087:603145

Project Member Reported by 42576172...@developer.gserviceaccount.com, Oct 29

Issue description

See the link to graphs below.
 
All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=899940

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=08c72ec82ea43a5ef18b9d2aeffb5db7e4cc9090877f60c101278aaa2319340a


Bot(s) for this bug's original alert(s):

mac-10_12_laptop_low_end-perf
mac-10_13_laptop_high_end-perf

memory.desktop - Benchmark documentation link:
  None
Cc: penghuang@chromium.org
Owner: penghuang@chromium.org
Status: Assigned (was: Untriaged)
📍 Found a significant difference after 1 commit.
https://pinpoint-dot-chromeperf.appspot.com/job/15082489e40000

Make browser UI to use OOPR as well when OOPR is enabled. by penghuang@chromium.org
https://chromium.googlesource.com/chromium/src/+/392cff8628eed1ce42f41362d2328f60768e2ce5
memory:chrome:all_processes:reported_by_os:system_memory:private_footprint_size: 1.349e+08 → 1.466e+08 (+1.167e+07)

Understanding performance regressions:
  http://g.co/ChromePerformanceRegressions

Benchmark documentation link:
  None
Cc: rjkroege@chromium.org sadrul@chromium.org backer@chromium.org enne@chromium.org piman@chromium.org vmi...@chromium.org
Owner: backer@chromium.org
I'll take a look at this.
I believe that this is because Skia is caching more in the GPU process.

This is one exampled I looked at: https://chromeperf.appspot.com/report?sid=38dd9c2f33aa2cc5d19a2fc5c2641d52812840fdece6358fc45493c02bee41d0&start_rev=601054&end_rev=603888. I examined traces before and after we OOP-Red the UI. The total memory jumped about 5MB.  The increase in total size is attributable to the GPU process. If you examine the GPU process in the memory dumps, you will see the Skia category increases (2.1 MB vs 6.7 MB). 

Fortunately, the regression readily repros. I did a small hack:

--- a/gpu/command_buffer/service/raster_decoder_context_state.cc
+++ b/gpu/command_buffer/service/raster_decoder_context_state.cc
@@ -97,6 +97,7 @@ void RasterDecoderContextState::InitializeGrContext(
 bool RasterDecoderContextState::OnMemoryDump(
     const base::trace_event::MemoryDumpArgs& args,
     base::trace_event::ProcessMemoryDump* pmd) {
+  PurgeMemory(base::MemoryPressureListener::MemoryPressureLevel::MEMORY_PRESSURE_LEVEL_CRITICAL);
   if (gr_context)
     DumpGrMemoryStatistics(gr_context.get(), pmd, base::nullopt);
   return true;

This tells Skia to release as much as possible before we do a memory dump. This hack erased the difference in my local repro.
Cc: khushals...@chromium.org
+cc khushalsagar

Maybe we need to tune gpu/command_buffer/service/gr_cache_controller.cc ?
GrCacheController should work the same way as the ContextCacheController on the client side for how it triggers the cleanup. Maybe there is a minor timing difference in getting to the idle state in the client vs service side that is affecting whether the cleanup happens before or after the dump?
I think the biggest difference is that we are now sharing the cache. 

- With GPU raster for renderer and UI, the cache is client side and not shared. Each client has it's own ContextCacheController and there is no interaction. If either renderer or UI is idle for a while, it will freeGpuResources independently.

- With OOP-R for renderer and UI, the cache is service side and shared. The GrCacheController will only freeGpuResources if renderer and UI are idle for a while (e.g. open new tab with cursor blinking in omnibox and cursor blink will prevent freeing of old tabs cache).

I think overall it's an improvement because a shared service side cache is not as fragmented. Moreover, we're well under our cache cap size of 20MB -- if we hit the limit, I believe that we would drop purge LRU items.
Components: Internals>Services>Viz Internals>Compositing>Rasterization
Per your observation: the switch from distributed to global cache ought to actually reduce total system memory? Can we measure this? It would be nice to show that the viz work has given us a whole-system improvement.
Cc: -backer@chromium.org ericrk@chromium.org
There are some cases which do show improvements in overall memory usage: https://chromeperf.appspot.com/group_report?rev=603112.

I'm guessing this ends up depending on what the combined renderer + ui cache usage was for each story? I'd still expect the shared cache to be a net win even in cases where the ui is static but the renderer is doing continuous raster. Since even if don't go completely idle we still do time based eviction of things (https://cs.chromium.org/chromium/src/gpu/command_buffer/service/gr_cache_controller.cc?type=cs&g=0&l=37), so that should evict things used only by the ui. The timing for that means this may not show up in our benchmarks.
Cc: backer@chromium.org
To elaborate Khusal's point, I found an interesting example to show some of the timing problems with our test infrastructure.

First I added some additional trace instrumentation (https://pastebin.com/ZjuRsW1G)
Then I ran telemetry locally enable cc,gpu categories.

Attached is a trace for TrivialGifPageSharedPageState.

Notable timing info:
- test duration is 12 seconds, so performDeferredCleanup(15s) is a no-op
- Gif animation means that we are never idle enough service side to freeGpuResources

TrivialGifPageSharedPageState_2018-10-31_10-16-01_42300.html
5.1 MB View Download
Cc: bsalomon@chromium.org
+cc bsalomon

There may be an opportunity to tune things service side further.

I'm guessing that with client side GPU raster (ContextCacheController) that we often freeGpuResources (1 second of inactivity) well before we performDeferredCleanup (wasn't used for 15 seconds). As freeGpuResources is much stronger, we would only see 15 seconds constant in exceptional cirumstances (animated GIF in a loop as above).

Things change server side with the shared GrContext. We are less likely to hit freeGpuResources after 1 second of inactivity. I think that this is especially true with animated ads.

Maybe we should tune 15 seconds down? I see that bsalomon@ originally set it to 30 seconds. And then he further reduced it to 15 seconds.

Maybe we could go as low as 5 or 10?
15s was chosen arbitrarily. It likely would not be a problem to reduce it.
I did a CT memory run (https://ct.skia.org/results/cluster-telemetry/tasks/chromium_perf_runs/backer-20181106202303/html/index.html) reducing that constant from 15s to 5s.

The results were underwhelming (-0.224%) in memory:chrome:all_processes:reported_by_os:private_footprint_size. Maybe this is because linux (which CT uses) is instrumented differently than OSX and this measure does not capture GPU memory on linux?

CT did report a > 10% memory decrease used by Skia in the GPU process (memory:chrome:gpu_process:reported_by_chrome:skia:effective_size).

I'm going to kick off another CT run for rendering. And I don't see major regressions in performance (total CPU and smoothness), we'll land it.


Labels: Proj-Vulkanize
Here's the rendering results:
https://ct.skia.org/results/cluster-telemetry/tasks/chromium_perf_runs/backer-20181108162853/html/index.html

The descrepancies are slight --- probably just noise. I will put CL up for review.
Project Member

Comment 19 by bugdroid1@chromium.org, Nov 15

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a388764c6e5d04915eed520baa5ea6b6b939efdb

commit a388764c6e5d04915eed520baa5ea6b6b939efdb
Author: Jonathan Backer <backer@chromium.org>
Date: Thu Nov 15 23:36:17 2018

Cleanup old resources faster

These resources are mostly scratch space. As per  crbug.com/899940 :

- most of this is scratch space textures for intermediate results cached
  to avoid GPU memory churn

- with the UI using OOP-R we don't go idle as often (e.g. cursor blink,
  throbber)

- 15 seconds is a long time to keep it around (many of our regression
  tests don't run this long)

Bug:  899940 
Change-Id: Idc077c98551978afa97b18be3dd61784f8898806
Reviewed-on: https://chromium-review.googlesource.com/c/1320790
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Jonathan Backer <backer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608579}
[modify] https://crrev.com/a388764c6e5d04915eed520baa5ea6b6b939efdb/gpu/command_buffer/service/gr_cache_controller.cc

Change in comment #19 did have some effect, but did not fix the regression.

https://chromeperf.appspot.com/report?sid=507f10daa470fb5f1723d0dde1421deecda24a6fb5b24c08a795212bfb929d4a&start_rev=601065&end_rev=610749

I'll take another look.
There is a 4MB texture with OOP-R UI showing up in GPU process Skia memory stats. It is 1024 wide and 2048 long. I believe that it's a glyph cache based on this stack that I captured when the texture is generated:

5   libskia.dylib                       0x000000011805dccc GrGLTexture::init(GrSurfaceDesc const&, GrGLTexture::IDDesc const&) + 364
6   libskia.dylib                       0x000000011805de28 GrGLTexture::GrGLTexture(GrGLGpu*, SkBudgeted, GrSurfaceDesc const&, GrGLTexture::IDDesc const&, GrMipMapsStatus) + 344
7   libskia.dylib                       0x000000011803a7c0 GrGLGpu::onCreateTexture(GrSurfaceDesc const&, SkBudgeted, GrMipLevel const*, int) + 768
8   libskia.dylib                       0x0000000117f2854c GrGpu::createTexture(GrSurfaceDesc const&, SkBudgeted, GrMipLevel const*, int) + 460
9   libskia.dylib                       0x0000000117f285f4 GrGpu::createTexture(GrSurfaceDesc const&, SkBudgeted) + 20
10  libskia.dylib                       0x0000000117f64ccc GrResourceProvider::createTexture(GrSurfaceDesc const&, SkBudgeted, GrResourceProvider::Flags) + 268
11  libskia.dylib                       0x0000000117f813d8 GrSurfaceProxy::createSurfaceImpl(GrResourceProvider*, int, bool, GrSurfaceFlags, GrMipMapped) const + 696
12  libskia.dylib                       0x0000000117f81a7e GrSurfaceProxy::instantiateImpl(GrResourceProvider*, int, bool, GrSurfaceFlags, GrMipMapped, GrUniqueKey const*) + 334
13  libskia.dylib                       0x0000000117f87359 GrTextureProxy::instantiate(GrResourceProvider*) + 105
14  libskia.dylib                       0x0000000117f21520 GrDrawOpAtlas::addToAtlas(GrResourceProvider*, unsigned long long*, GrDeferredUploadTarget*, int, int, void const*, SkIPoint16*) + 512
15  libskia.dylib                       0x000000011800e176 GrAtlasManager::addToAtlas(GrResourceProvider*, GrGlyphCache*, GrTextStrike*, unsigned long long*, GrDeferredUploadTarget*, GrMaskFormat, int, int, void const*, SkIPoint16*) + 230
16  libskia.dylib                       0x000000011801106a GrTextStrike::addGlyphToAtlas(GrResourceProvider*, GrDeferredUploadTarget*, GrGlyphCache*, GrAtlasManager*, GrGlyph*, SkGlyphCache*, GrMaskFormat, bool) + 3034
17  libskia.dylib                       0x000000011801878e bool GrTextBlob::VertexRegenerator::doRegen<false, false, true, false>(GrTextBlob::VertexRegenerator::Result*) + 894
18  libskia.dylib                       0x0000000117fa6284 GrAtlasTextOp::onPrepareDraws(GrMeshDrawOp::Target*) + 2212
19  libskia.dylib                       0x0000000117f564e7 GrRenderTargetOpList::onPrepare(GrOpFlushState*) + 423
20  libskia.dylib                       0x0000000117f2efc2 GrOpList::prepare(GrOpFlushState*) + 98
21  libskia.dylib                       0x0000000117f1bb24 GrDrawingManager::executeOpLists(int, int, GrOpFlushState*) + 452
22  libskia.dylib                       0x0000000117f1b3c8 GrDrawingManager::flush(GrSurfaceProxy*, int, GrBackendSemaphore*) + 2504
23  libskia.dylib                       0x0000000117f1bf89 GrDrawingManager::prepareSurfaceForExternalIO(GrSurfaceProxy*, int, GrBackendSemaphore*) + 185
24  libskia.dylib                       0x0000000117f51552 GrRenderTargetContext::prepareForExternalIO(int, GrBackendSemaphore*) + 290
25  libskia.dylib                       0x000000011808023a SkGpuDevice::flushAndSignalSemaphores(int, GrBackendSemaphore*) + 58
26  libgles2.dylib                      0x0000000132fe134a gpu::raster::RasterDecoderImpl::DoEndRasterCHROMIUM() + 314
27  libgles2.dylib                      0x0000000132fd38b9 gpu::raster::RasterDecoderImpl::HandleEndRasterCHROMIUM(unsigned int, void const volatile*) + 9
28  libgles2.dylib                      0x0000000132fd9617 gpu::error::Error gpu::raster::RasterDecoderImpl::DoCommandsImpl<false>(unsigned int, void const volatile*, int, int*) + 519

That is the glyph cache. Before OOP-R each renderer process would have had its own glyph cache texture (assuming the page contains text).
I've confirmed that this is definitely the GrContext::fGlyphCache. It is not released unless we call GrContext::freeGpuResources, which we only do if we go idle for 1 second in GrCacheController. As per comment #14, we are less likely to go idle for 1 second when UI is also using OOP-R.
We could make it so that the glyph cache does get purged, but it seems like a bad idea to me as it's highly likely going to be recreated the next time a frame needs to be produced and a fresh atlas will mean repopulating the texture with glyphs.
Status: WontFix (was: Assigned)
I confirmed that the 1 second idle mechanism does work. It does not freeGpuResources in this test because the test animates right up until the memory dump.

I'm marking this as WontFix (works as intended) because it's more and artifact of the test than a user visible regression. In general, sharing the GlyphCache will be beneficial (often have 1 instead of N) and we will reclaim it on idle.

Sign in to add a comment