nanobench and SKPBench are suboptimal for msaa
Project Member Reported by kkinnu...@nvidia.com, Jan 13 2015
This issue came up while inspecting MSAA (NVPR, ofc) performance with nanobench: With MSAA backends, using nanobench to benchmark non-trivial .skp files runs into performance problems related to gpu resource cache filling up of surfaces, even though in "real world", the cache might not or should not fill up for these. Consider nanobench drawing a skp that is 1000x1000: 1) nanobench creates a big surface of 1000x1000 2) SKPBench creates n tile surfaces 3) SKPBench replays the picture to the tile surfaces (loops x samples) times 4) SKPBench draws tiles to the big surface The problem is that if picture creates any layers, it will probably start trashing render targets. The cache fills up due to nanobench surface and skp tile surfaces. Any layer created by the picture causes a render target deletion. Deleting render targets is very slow, and it's not *that* representative of Chromium rasterization, as the nanobench surface is not present there. The issue details: The render target *texture* is created uncached, but MSAA render target storage is created cached. One screenful of MSAA16 needs at least 1000x1000x4x16 + 1000x1000x1x16 = 64M + 4M in cache. Twice this is >100M. Each nanobench frame will clear the nanobench surface and resolve the msaa, even though the skpbench does not draw anything there within the frame. This skews the preformance report a bit (unneccessary clear and resolve), but also forces re-allocation of the big surface render targets if cache was able to free it, for each frame. Would it be possible to change nanobench in following way: 1) in "per canvas draw" testcases, the big canvas would be allocated only after the per canvas draw timing test was executed. E.g. big canvas would maybe be used only for the per canvas post draw screenshot. 2) the clears would apply to the tiled canvases, not the big canvas How would this be implemented, and who would want to implement it? (I can, if it is ok) Would for example following design be ok: a) Make current Benchmark a baseclass that had Benchmark::time or similar function which would do the draw looop b) Make the surface and canvas allocation responsibility of the benchmark c) Make normal tests be instances of DrawBenchmark (which is-a Benchmark) d) Make per canvas tests be instances of PerCanvasDrawBenchmark (which is-a Benchmark) This way "::draw(Canvas* canvas)" methods would be only on benchmarks that actually did draw to the canvas. Also "::onPerCanvasPreDraw(SomeInfoOnHowToAllocateSurfaces*)" would be only on benchmarks that do tiling (the name might be referring to allocation instead of predraw, too)
Jan 13 2015,
mtklein@ would have a better idea of the implementation details but the proposal seems ok to me. I think the large canvas should only be needed for a screenshot and drawing to it shouldn't be timed.
Jan 14 2015,
Hmm. Yeah, the more I look at this the more SKPBench doesn't seem to fit. I agree: 1) The calls to canvas->clear(SK_ColorWHITE) and canvas->flush() are pointless if we're not drawing to that canvas. 2) Allocating the giant 1000x1000 surface is pointless/harmful if we don't draw into it. 3) We only need that giant surface if we're going to write out the images. 4) If we're just using it to write out images, we can just read-back each tile independently into CPU RAM for compositing instead of using GPU resources. 5) We shouldn't be timing the compositing, PNG encoding, or file IO. As far as I can tell, we're at least OK on 5) for now. The others need some thinking. I think probably the best way to fix this is admit SKPBench should not inherit from Benchmark, and just have its own path for timing.
Jan 15 2015,
>Hmm. Yeah, the more I look at this the more SKPBench doesn't seem to fit. I agree: Ok. What about the last question, eg. "who would want to implement it?"
Jan 15 2015,
From chatting with Mike yesterday, it sounded like he wanted to take a crack at it. I may have made it so that the MSAA surface does not count against the cache budget here: https://skia.googlesource.com/skia/+/5236cf480daf82b2f36e42795abdbbc915533a59 I think it was a mistake that the MSAA surface was counted against the cache budget when the texture was not supposed to count.
Jan 27 2015,
The following revision refers to this bug: https://skia.googlesource.com/skia.git/+/36c57dfb4fe9bbaca436942d5eaa75b142ba251d commit 36c57dfb4fe9bbaca436942d5eaa75b142ba251d Author: kkinnunen <email@example.com> Date: Tue Jan 27 00:30:18 2015 -0800 Make stencil buffers uncached for uncached render target textures Make new stencil buffers of uncached render target textures not affect the cache budgets. This is consistent with render buffer storage of uncached render target textures. Affects only newly created stencil buffers. An uncached render target might still receive a cached stencil buffer if such is available from cache. BUG=skia:3119 BUG=skia:3301 Review URL: https://codereview.chromium.org/859013002 [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/GrGpu.cpp [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/GrGpu.h [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/GrStencilBuffer.h [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/GrTest.cpp [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/gl/GrGLGpu.cpp [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/gl/GrGLGpu.h [modify] http://crrev.com/36c57dfb4fe9bbaca436942d5eaa75b142ba251d/src/gpu/gl/GrGLStencilBuffer.h
Nov 17 2016,
Any more work to do here or can we close this out? +Chris for perspective on newer skpbench work
Sign in to add a comment