MSAA GPU rasterization makes GPU process spend time in framebuffer initialization |
|||||||
Issue description
MSAA GPU rasterization makes GPU process spend significant amount of time in framebuffer initialization. Maybe some of the spent time could be avoided.
The rasterization works as follows:
for each frame:
Create new SkPicture p
for each tile t:
Create a SkSurface s to render to t
Playback p to s
Destroy SkSurface s
For MSAA rendering, Skia does this:
Create a SkSurface:
glGenFramebuffers
glGenRenderBuffers
glRenderBufferStorageMultisample
Destroy SkSurface s:
glDeleteRenderbuffers
glDeleteFramebuffers
This is assuming MSAA rendering without multisampled-render-to-texture (as in NV hardware).
Tracing the GPU process, there is significant amount of samples in the framebuffer functions listed above. The functions themselves do not actually contribute to anything significant in "ideal" scenario. The FBOs could be preserved across frame boundaries.
Naturally, from code structure perspective, creating and destroying the SkSurface makes sence, since it's cumbersome to associate the SkSurface to the "tile". Currently the Sk* concepts are accessed within raster "task" concept, which is more transient.
Multisampled-render-to-texture would probably suffer from the same problem, though maybe less.
Conceptually better approach would be:
for once per init:
for each tile t:
// Run in gpu raster thread
Create a SkSurface s to render to t
surface(t) = s
for each frame:
Create new SkPicture p
for each tile t:
Playback p to surface(t)
for once per deinit:
for each tile t:
// Run in gpu raster thread
Destroy surface(t)
The effect of this can be achieved in many ways, of course, and it does not necessarily need to be structured above. The FBOs may be cacheable in Skia, if this design would be more desirable.
The considerations include:
* Rendering to a tile is associated with plain GL texture ID
* Skia GrContext is associated with the GPU raster thread
* Rasterization "context" is associated with "raster task", which is transient
* No per-tile initialization / de-initialization seem to be easily available at the tile level in context with the gpu worker process
The Skia considerations related to solving this in caching FBOs in Skia:
* No concept of "framebuffer objects" that could be put to resource cache and recycled
** Even if there were, would be confusing since ideally the framebuffer objects would not need to unbind the stencil buffers and color buffers (non-FMS rendering), but at the same time, stencil buffers would go to resource cache.
,
Mar 15 2016
I'd appreciate any insight on preferred place to fix this. It is my understanding that Skia has been designed to function more in mode where the SkSurface should be re-used if it is intention to rasterize to the same texture. This is how Skia's benchmark app, nanobench, implements tiling SkPicture rasterization benchmark. Thus as far as I can see, the better approach would be to try to associate the SkSurface with the tile. Unless better solutions come, I can try to do that.
,
Mar 15 2016
,
Mar 15 2016
I try to verify the impact of the fbo related functions. The time spent seems quite high in that trace screenshot.
,
Mar 16 2016
Yeah, the FBO related functions indeed seem to have quite high sample count. Here is maybe a more proper example of a capture (this time from Shield Tablet 8 device).
,
Mar 18 2016
Maybe easier to fix in Skia, and then there's not that big problems with MSAA renderbuffer and stencilbuffers using memory.
,
Apr 1 2016
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0 commit 49c4c22b3744d18e1cb38ecabcb8839d2e26afe0 Author: kkinnunen <kkinnunen@nvidia.com> Date: Fri Apr 01 11:50:37 2016 Remove ownership parameter from GrResourceProvider::wrapBackendTextureAsRenderTarget Remove ownership parameter from GrResourceProvider::wrapBackendTextureAsRenderTarget. The function leaks the texture id if kAdopt_LifeCycle is passed. There is no public API to access the parameter. BUG= 594928 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1842313003 Review URL: https://codereview.chromium.org/1842313003 [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/GrGpu.cpp [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/GrGpu.h [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/GrResourceProvider.cpp [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/GrResourceProvider.h [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/gl/GrGLGpu.cpp [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/gl/GrGLGpu.h [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/src/gpu/vk/GrVkGpu.h [modify] https://crrev.com/49c4c22b3744d18e1cb38ecabcb8839d2e26afe0/tools/gpu/GrTest.cpp
,
Apr 7 2016
Adding enne@ and vmpstr@ (from cc OWNERS and recent reviews): I'm trying to fix this issue: When cc selects MSAA-based GPU rasterization, the typical GPU rasterization spends fairly big amount of time just allocating and de-allocating renderbuffers -- more than on the actual rendering. This happens on HW that uses renderbuffers for MSAA instead of multisampled render to textures. Would you be able to weigh in on how this should be solved, or perhaps point to a person who would be able to spend some thought on it? I've implemented caching on Skia side in commit https://codereview.chromium.org/1810323002/ . However, that works against current Skia design of not caching such a structures. Would it be an option to spend 4x or 8x more memory and associate the rendertargets together with the tiles, alongside with the textures that are currently associated with the tile? Do you have any other suggestions, such as maybe holding an explicit render target for a rasterization and then copying the results to the texture? This could be an option, as copying is something that is anyway done behind the scenes when MSAA rendering is done. There's no intuitive Skia API for this at the moment -- or, there is, but it suffers from the problem described in this bug. Renderbuffer-based MSAA hardware is not the most common. It may be still fairly big chunk of any HW that runs the MSAA codepath, since MSAA seems to be blacklisted on many other HW.
,
Apr 7 2016
,
Apr 7 2016
I don't think caching at the tile level is the right design. Tiles get rastered and then that resource isn't reused again until after the resource is freed. reveman is probably the best person to comment here, but probably keeping this on something like the cc::ResourceProvider::Resource might be a better place than the tiles themselves.
,
Apr 22 2016
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b commit 2e6055b3ea14a04fcde1ac1974a70bf00b1e295b Author: kkinnunen <kkinnunen@nvidia.com> Date: Fri Apr 22 08:48:29 2016 Refactor to separate backend object lifecycle and GpuResource budget decision Refactor GrGpuResource to contain two different pieces of state: a) instance is budgeted or not budgeted b) instance references wrapped backend objects or not The "object lifecycle" was also attached to backend object handles (ids), which made the code a bit unclear. Backend objects would be associated with GrGpuResource::LifeCycle, even though GrGpuResource::LifeCycle refers to the GpuResource, and individual backend objects in one GpuResource might be governed with different "lifecycle". Mark the budgeted/not budgeted with SkBudgeted::kYes, SkBudgeted::kNo. This was previously GrGpuResource::kCached_LifeCycle, GrGpuResource::kUncached_LifeCycle. Mark the "references wrapped object" with boolean. This was previously GrGpuResource::kBorrowed_LifeCycle, GrGpuResource::kAdopted_LifeCycle for GrGpuResource. Associate the backend object ownership status with GrBackendObjectOwnership for the backend object handles. The resource type leaf constuctors, such has GrGLTexture or GrGLTextureRenderTarget take "budgeted" parameter. This parameter is passed to GrGpuResource::registerWithCache(). The resource type intermediary constructors, such as GrGLTexture constructors for class GrGLTextureRenderTarget do not take "budgeted" parameters, intermediary construtors do not call registerWithCache. Removes the need for tagging GrGpuResource -derived subclass constructors with "Derived" parameter. Makes instances that wrap backend objects be registered with a new function GrGpuResource::registerWithCacheWrapped(). Removes "budgeted" parameter from classes such as StencilAttahment, as they are always cached and never wrap any external backend objects. Removes the use of concept "external" from the member function names. The API refers to the objects as "wrapped", so make all related functions use the term consistently. No change in functionality. Resources referencing wrapped objects are always inserted to the cache with budget decision kNo. BUG= 594928 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1862043002 Review URL: https://codereview.chromium.org/1862043002 [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/bench/GrResourceCacheBench.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrBuffer.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrGpuResource.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrRenderTarget.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrSurface.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrTexture.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/include/gpu/GrTypesPriv.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpu.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpu.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResource.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResourceCacheAccess.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrGpuResourcePriv.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrPath.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrPathRange.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrResourceCache.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrResourceCache.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrStencilAttachment.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/GrTexture.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLBuffer.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLGpu.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLGpu.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLPath.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLPathRange.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLRenderTarget.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLRenderTarget.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLStencilAttachment.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLStencilAttachment.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTexture.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTexture.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTextureRenderTarget.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/gl/GrGLTextureRenderTarget.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkGpu.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkGpu.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkIndexBuffer.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkRenderTarget.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkRenderTarget.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkStencilAttachment.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkStencilAttachment.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTexture.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTexture.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTextureRenderTarget.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTextureRenderTarget.h [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkTransferBuffer.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/gpu/vk/GrVkVertexBuffer.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/src/image/SkSurface_Gpu.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/tests/ResourceCacheTest.cpp [modify] https://crrev.com/2e6055b3ea14a04fcde1ac1974a70bf00b1e295b/tools/gpu/GrTest.cpp
,
Apr 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/516778f8da98f40c94bfa34702fc167542bcd522 commit 516778f8da98f40c94bfa34702fc167542bcd522 Author: skia-deps-roller <skia-deps-roller@chromium.org> Date: Fri Apr 22 09:56:59 2016 Roll src/third_party/skia/ cb61a6452..2e6055b3e (1 commit). https://chromium.googlesource.com/skia.git/+log/cb61a6452fbf..2e6055b3ea14 $ git log cb61a6452..2e6055b3e --date=short --no-merges --format='%ad %ae %s' 2016-04-22 kkinnunen Refactor to separate backend object lifecycle and GpuResource budget decision BUG= 594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel TBR=fmalita@google.com Review URL: https://codereview.chromium.org/1912063002 Cr-Commit-Position: refs/heads/master@{#389065} [modify] https://crrev.com/516778f8da98f40c94bfa34702fc167542bcd522/DEPS
,
Jun 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d0d9b578d0fce1a364f157c6cd669fd4439a916a commit d0d9b578d0fce1a364f157c6cd669fd4439a916a Author: senorblanco <senorblanco@chromium.org> Date: Thu Jun 09 19:53:23 2016 telemetry: Update tough_path_rendering_cases Chalkboard test URL [Copy of https://codereview.chromium.org/1998543002/ by kkinnunen@nvidia.org with the WPR uploaded.] Update tough_path_rendering_cases Chalkboard test URL. The old url would go to a page that would load the content in an iframe. The injected script would fail. Load the iframe URL instead. Loading the main document would inject the script to the parent main document. The injected code would start the test by click()ing an element cross domains, which would be rejected. Fixing is needed in order to record new content to tough_path_rendering_cases. BUG= 594928 Review-Url: https://codereview.chromium.org/2055823004 Cr-Commit-Position: refs/heads/master@{#398985} [modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases.json [add] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases_001.wpr.sha1 [modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/tough_path_rendering_cases.py
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d0d9b578d0fce1a364f157c6cd669fd4439a916a commit d0d9b578d0fce1a364f157c6cd669fd4439a916a Author: senorblanco <senorblanco@chromium.org> Date: Thu Jun 09 19:53:23 2016 telemetry: Update tough_path_rendering_cases Chalkboard test URL [Copy of https://codereview.chromium.org/1998543002/ by kkinnunen@nvidia.org with the WPR uploaded.] Update tough_path_rendering_cases Chalkboard test URL. The old url would go to a page that would load the content in an iframe. The injected script would fail. Load the iframe URL instead. Loading the main document would inject the script to the parent main document. The injected code would start the test by click()ing an element cross domains, which would be rejected. Fixing is needed in order to record new content to tough_path_rendering_cases. BUG= 594928 Review-Url: https://codereview.chromium.org/2055823004 Cr-Commit-Position: refs/heads/master@{#398985} [modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases.json [add] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/data/tough_path_rendering_cases_001.wpr.sha1 [modify] https://crrev.com/d0d9b578d0fce1a364f157c6cd669fd4439a916a/tools/perf/page_sets/tough_path_rendering_cases.py
,
Nov 28 2016
Is there further work required here?
,
Nov 29 2016
Yep, as far as I know, the problem still exists. The actual patch was still pending for your review comments. This was the payload patch to fix this in Chromium: https://codereview.chromium.org/1876363005/ (The other approach wast to fix this in Skia) IIRC you wanted a testcase to prove the benefit. There was an idea to to add another testcase (the tiger head) to get more path-related content. IIRC I implemented the telemetry changes but you never integrated. I can't do it, since it needs WPR result uploads that I cannot do. I fixed the chalkboard to show the benefit. There was a change related to fixing chalkboard numbers to be more reliable (not wait in idle at the end), but that got reverted because it (naturally) changed the number and a sheriff did not like that. All the patches incidentally have also been disconnected from my account, part of the process where our company reserved all @nvidia google accounts. The patches are there, if one has the direct links. I have had hard time getting changes in that would further improve the system, so it's probably pointless for you to spend time on this. Maybe the Skia process thing will eventually solve the problem.
,
Nov 29 2016
> IIRC I implemented the telemetry changes but you never integrated. By this I mean the chalkboard telemetry changes were integrated, but the tiger head changes were not.
,
Nov 29 2016
I'm sorry if my negligence caused part of this work to go unused. Looking back, I think my feeling was that the number of devices affected didn't justify the extra complexity it added (which may or may not be true). Regardless, I could have articulated that more clearly. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by kkinnu...@nvidia.com
, Mar 15 2016