New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 594928 link

Starred by 4 users

Issue metadata

Status: WontFix
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

MSAA GPU rasterization makes GPU process spend time in framebuffer initialization

Project Member Reported by kkinnu...@nvidia.com, Mar 15 2016

Issue description

MSAA GPU rasterization makes GPU process spend significant amount of time in framebuffer initialization. Maybe some of the spent time could be avoided.

The rasterization works as follows:

for each frame:
  Create new SkPicture p
  for each tile t:
    Create a SkSurface s to render to t
    Playback p to s
    Destroy SkSurface s

For MSAA rendering, Skia does this:
Create a SkSurface:
  glGenFramebuffers
  glGenRenderBuffers
  glRenderBufferStorageMultisample

Destroy SkSurface s:
  glDeleteRenderbuffers
  glDeleteFramebuffers

This is assuming MSAA rendering without multisampled-render-to-texture (as in NV hardware).

Tracing the GPU process, there is significant amount of samples in the framebuffer functions listed above. The functions themselves do not actually contribute to anything significant in "ideal" scenario. The FBOs could be preserved across frame boundaries. 

Naturally, from code structure perspective, creating and destroying the SkSurface makes sence, since it's cumbersome to associate the SkSurface to the "tile". Currently the Sk* concepts are accessed within raster "task" concept, which is more transient. 

Multisampled-render-to-texture would probably suffer from the same problem, though maybe less.

Conceptually better approach would be:
for once per init:
  for each tile t:
    // Run in gpu raster thread
    Create a SkSurface s to render to t
    surface(t) = s

for each frame:
  Create new SkPicture p
  for each tile t:
    Playback p to surface(t)

for once per deinit:
  for each tile t:
    // Run in gpu raster thread
    Destroy surface(t)


The effect of this can be achieved in many ways, of course, and it does not necessarily need to be structured above. The FBOs may be cacheable in Skia, if this design would be more desirable. 

The considerations include:
* Rendering to a tile is associated with plain GL texture ID
* Skia GrContext is associated with the GPU raster thread
* Rasterization "context" is associated with "raster task", which is transient
* No per-tile initialization / de-initialization seem to be easily available at the tile level in context with the gpu worker process

The Skia considerations related to solving this in caching FBOs in Skia:
* No concept of "framebuffer objects" that could be put to resource cache and recycled
** Even if there were, would be confusing since ideally the framebuffer objects would not need to unbind the stencil buffers and color buffers (non-FMS rendering), but at the same time, stencil buffers would go to resource cache.


 
Attached is a picture of a profile of the GPU process on Android, taken with a non-chromium sampling profiler. The content is a web page with a mobile viewport, showing the tiger svg with flashing background forcing repaint.

Majority of the calls to the framebuffer functions come from the tile initialization. The content rasterization should not induce any FBO calls.

The device is similar to NVIDIA Shield TV device. NV_path_rendering is disabled for this trace, just for clarity.
Cc: senorblanco@chromium.org bsalomon@chromium.org
I'd appreciate any insight on preferred place to fix this.

It is my understanding that Skia has been designed to function more in mode where the SkSurface should be re-used if it is intention to rasterize to the same texture. This is how Skia's benchmark app, nanobench, implements tiling SkPicture rasterization benchmark. Thus as far as I can see, the better approach would be to try to associate the SkSurface with the tile. Unless better solutions come, I can try to do that.
Cc: ericrk@chromium.org
I try to verify the impact of the fbo related functions. The time spent seems quite high in that trace screenshot.
Yeah, the FBO related functions indeed seem to have quite high sample count.
Here is maybe a more proper example of a capture (this time from Shield Tablet 8 device).
Screenshot from 2016-03-16 15:10:00.png
285 KB View Download
Maybe easier to fix in Skia, and then there's not that big problems with MSAA renderbuffer and stencilbuffers using memory.
Project Member

Comment 7 by bugdroid1@chromium.org, Apr 1 2016

Cc: enne@chromium.org vmp...@chromium.org
Components: Internals>Compositing>Rasterization
Labels: Performance Needs-Feedback GPU-NVidia Performance-Power
Adding enne@ and vmpstr@ (from cc OWNERS and recent reviews):

I'm trying to fix this issue: When cc selects MSAA-based GPU rasterization, the typical GPU rasterization spends fairly big amount of time just allocating and de-allocating renderbuffers -- more than on the actual rendering. This happens on HW that uses renderbuffers for MSAA instead of multisampled render to textures.

Would you be able to weigh in on how this should be solved, or perhaps point to a person who would be able to spend some thought on it?

I've implemented caching on Skia side in commit https://codereview.chromium.org/1810323002/ . However, that works against current Skia design of not caching such a structures.

Would it be an option to spend 4x or 8x more memory and associate the rendertargets together with the tiles, alongside with the textures that are currently associated with the tile? 

Do you have any other suggestions, such as maybe holding an explicit render target for a rasterization and then copying the results to the texture? This could be an option, as copying is something that is anyway done behind the scenes when MSAA rendering is done. There's no intuitive Skia API for this at the moment -- or, there is, but it suffers from the problem described in this bug.

Renderbuffer-based MSAA hardware is not the most common. It may be still fairly big chunk of any HW that runs the MSAA codepath, since MSAA seems to be blacklisted on many other HW.

Comment 9 by enne@chromium.org, Apr 7 2016

Cc: reve...@chromium.org

Comment 10 by enne@chromium.org, Apr 7 2016

I don't think caching at the tile level is the right design.  Tiles get rastered and then that resource isn't reused again until after the resource is freed.

reveman is probably the best person to comment here, but probably keeping this on something like the cc::ResourceProvider::Resource might be a better place than the tiles themselves.
Project Member

Comment 11 by bugdroid1@chromium.org, Apr 22 2016

The following revision refers to this bug:
  https://skia.googlesource.com/skia.git/+/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b

commit 2e6055b3ea14a04fcde1ac1974a70bf00b1e295b
Author: kkinnunen <kkinnunen@nvidia.com>
Date: Fri Apr 22 08:48:29 2016

Refactor to separate backend object lifecycle and GpuResource budget decision

Refactor GrGpuResource to contain two different pieces of state:
 a) instance is budgeted or not budgeted
 b) instance references wrapped backend objects or not

The "object lifecycle" was also attached to backend object
handles (ids), which made the code a bit unclear. Backend objects
would be associated with GrGpuResource::LifeCycle, even though
GrGpuResource::LifeCycle refers to the GpuResource, and individual
backend objects in one GpuResource might be governed with different
"lifecycle".

Mark the budgeted/not budgeted with SkBudgeted::kYes, SkBudgeted::kNo.
This was previously GrGpuResource::kCached_LifeCycle,
GrGpuResource::kUncached_LifeCycle.

Mark the "references wrapped object" with boolean. This was previously
GrGpuResource::kBorrowed_LifeCycle,
GrGpuResource::kAdopted_LifeCycle for GrGpuResource.

Associate the backend object ownership status with
GrBackendObjectOwnership for the backend object handles.

The resource type leaf constuctors, such has GrGLTexture or
GrGLTextureRenderTarget take "budgeted" parameter. This parameter
is passed to GrGpuResource::registerWithCache().

The resource type intermediary constructors, such as GrGLTexture
constructors for class GrGLTextureRenderTarget do not take "budgeted"
parameters, intermediary construtors do not call registerWithCache.

Removes the need for tagging GrGpuResource -derived subclass
constructors with "Derived" parameter.

Makes instances that wrap backend objects be registered with
a new function GrGpuResource::registerWithCacheWrapped().

Removes "budgeted" parameter from classes such as StencilAttahment, as
they are always cached and never wrap any external backend objects.

Removes the use of concept "external" from the member function names.
The API refers to the objects as "wrapped", so make all related
functions use the term consistently.

No change in functionality. Resources referencing wrapped objects are
always inserted to the cache with budget decision kNo.

BUG= 594928 
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1862043002

Review URL: https://codereview.chromium.org/1862043002

[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/bench/GrResourceCacheBench.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrBuffer.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrGpuResource.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrRenderTarget.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrSurface.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrTexture.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrTypesPriv.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpu.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpu.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResource.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResourceCacheAccess.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResourcePriv.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrPath.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrPathRange.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrResourceCache.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrResourceCache.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrStencilAttachment.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrTexture.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLBuffer.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLGpu.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLGpu.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLPath.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLPathRange.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLRenderTarget.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLRenderTarget.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLStencilAttachment.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLStencilAttachment.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTexture.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTexture.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTextureRenderTarget.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTextureRenderTarget.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkGpu.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkGpu.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkIndexBuffer.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkRenderTarget.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkRenderTarget.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkStencilAttachment.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkStencilAttachment.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTexture.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTexture.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTextureRenderTarget.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTextureRenderTarget.h
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTransferBuffer.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkVertexBuffer.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/image/SkSurface_Gpu.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/tests/ResourceCacheTest.cpp
[modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/tools/gpu/GrTest.cpp

Project Member

Comment 12 by bugdroid1@chromium.org, Apr 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/516778f8da98f40c94bfa34702fc167542bcd522

commit 516778f8da98f40c94bfa34702fc167542bcd522
Author: skia-deps-roller <skia-deps-roller@chromium.org>
Date: Fri Apr 22 09:56:59 2016

Roll src/third_party/skia/ cb61a6452..2e6055b3e (1 commit).

https://chromium.googlesource.com/skia.git/+log/cb61a6452fbf..2e6055b3ea14

$ git log cb61a6452..2e6055b3e --date=short --no-merges --format='%ad %ae %s'
2016-04-22 kkinnunen Refactor to separate backend object lifecycle and GpuResource budget decision

BUG= 594928 

CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel
TBR=fmalita@google.com

Review URL: https://codereview.chromium.org/1912063002

Cr-Commit-Position: refs/heads/master@{#389065}

[modify] https://crrev.com/516778f8da98f40c94bfa34702fc167542bcd522/DEPS

Project Member

Comment 13 by bugdroid1@chromium.org, Jun 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d0d9b578d0fce1a364f157c6cd669fd4439a916a

commit d0d9b578d0fce1a364f157c6cd669fd4439a916a
Author: senorblanco <senorblanco@chromium.org>
Date: Thu Jun 09 19:53:23 2016

telemetry: Update tough_path_rendering_cases Chalkboard test URL

[Copy of https://codereview.chromium.org/1998543002/ by
kkinnunen@nvidia.org with the WPR uploaded.]

Update tough_path_rendering_cases Chalkboard test URL. The old
url would go to a page that would load the content in an iframe.
The injected script would fail.

Load the iframe URL instead. Loading the main document would inject
the script to the parent main document. The injected code would
start the test by click()ing an element cross domains, which
would be rejected.

Fixing is needed in order to record new content to
tough_path_rendering_cases.

BUG= 594928 

Review-Url: https://codereview.chromium.org/2055823004
Cr-Commit-Position: refs/heads/master@{#398985}

[modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases.json
[add] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases_001.wpr.sha1
[modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/tough_path_rendering_cases.py

Project Member

Comment 14 by bugdroid1@chromium.org, Jun 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d0d9b578d0fce1a364f157c6cd669fd4439a916a

commit d0d9b578d0fce1a364f157c6cd669fd4439a916a
Author: senorblanco <senorblanco@chromium.org>
Date: Thu Jun 09 19:53:23 2016

telemetry: Update tough_path_rendering_cases Chalkboard test URL

[Copy of https://codereview.chromium.org/1998543002/ by
kkinnunen@nvidia.org with the WPR uploaded.]

Update tough_path_rendering_cases Chalkboard test URL. The old
url would go to a page that would load the content in an iframe.
The injected script would fail.

Load the iframe URL instead. Loading the main document would inject
the script to the parent main document. The injected code would
start the test by click()ing an element cross domains, which
would be rejected.

Fixing is needed in order to record new content to
tough_path_rendering_cases.

BUG= 594928 

Review-Url: https://codereview.chromium.org/2055823004
Cr-Commit-Position: refs/heads/master@{#398985}

[modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases.json
[add] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases_001.wpr.sha1
[modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/tough_path_rendering_cases.py

Status: Assigned (was: Untriaged)
Is there further work required here?
Status: WontFix (was: Assigned)
Yep, as far as I know, the problem still exists.
The actual patch was still pending for your review comments.
This was the payload patch to fix this in Chromium: https://codereview.chromium.org/1876363005/
(The other approach wast to fix this in Skia)

IIRC you wanted a testcase to prove the benefit. There was an idea to to add another testcase (the tiger head) to get more path-related content. IIRC I implemented the telemetry changes but you never integrated. I can't do it, since it needs WPR result uploads that I cannot do.

I fixed the chalkboard to show the benefit.

There was a change related to fixing chalkboard numbers to be more reliable (not wait in idle at the end), but that got reverted because it (naturally) changed the number and a sheriff did not like that.

All the patches incidentally have also been disconnected from my account, part of the process where our company reserved all @nvidia google accounts. The patches are there, if one has the direct links.

I have had hard time getting changes in that would further improve the system, so it's probably pointless for you to spend time on this. Maybe the Skia process thing will eventually solve the problem.



> IIRC I implemented the telemetry changes but you never integrated.

By this I mean the chalkboard telemetry changes were integrated, but the tiger head changes were not.
I'm sorry if my negligence caused part of this work to go unused. Looking back, I think my feeling was that the number of devices affected didn't justify the extra complexity it added (which may or may not be true). Regardless, I could have articulated that more clearly.

Sign in to add a comment