Issue metadata
Sign in to add a comment
|
Software compositing tab capture does not pool and reuse copy request backings |
||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jan 11 2018
๐ Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/14dfd3a7040000
,
Jan 11 2018
๐ Found a significant difference after 1 commit. https://pinpoint-dot-chromeperf.appspot.com/job/14dfd3a7040000 Reduce fragmentation of gpu memory by rounding RenderPass textures. By danakj@chromium.org ยท Wed Jan 03 18:25:14 2018 chromium @ 07fe0b6e0939a8aa223d8fb526a8040b39441858 Understanding performance regressions: http://g.co/ChromePerformanceRegressions
,
Jan 18 2018
Note: I just noticed the Y-axis label on the graphs is wrong. It should be milliseconds. Not sure why the perf infrastructure is ignoring the test stdout units... Anyway, the failed metrics are measuring the average time between capture events. Meaning, we attempt 60 FPS, so an ideal would be ~16.7 ms. However, we are seeing they went from 36 ms to 38 ms (about 26 FPS). So, the execution of CopyOutputRequests has slowed down with this CL. Also, notice that "_gpu" is not present in the metric name. That means these are testing the viz::SoftwareRenderer, not the viz::GLRenderer. I suspect the rounding-up of textures, which happens in viz::DirectRenderer, has increased the size of the surfaces in viz::SoftwareRenderer, and so the CPU is now doing more work. Or, perhaps, with the increased stride of the larger bitmaps in memory, CPU memory caching has suffered? Note that the old tab capture impl is still being used at this time. This means SoftwareRenderer:CopyFromDrawnRenderPass() is being called, but it is NOT being asked to do any scaling or I420 format conversion on the copy requests. Scaling and format conversion happens downstream, in DelegatedFrameHost::CopyFromCompositingSurfaceHasResultForVideo(): https://cs.chromium.org/chromium/src/content/browser/renderer_host/delegated_frame_host.cc?rcl=e886b675233053c768ef61a25d1f9e398bde0cc9&l=698
,
Jan 18 2018
I think it's because in the software renderer case we made no optimizations to prevent malloc/free of the renderpass backings every capture. If we draw any frames in between the capture events, the renderpass backings would be evicted from the SoftwareRenderer. In the GL case we have the tab capture client allocate the backing and pass it in to optimize for this. So now it's just allocating more memory. The way to fix this though is to stop allocating each frame, not just to shrink the renderpass size. This is where we allocate/reuse resources for the gpu path: https://cs.chromium.org/chromium/src/content/browser/renderer_host/delegated_frame_host.cc?rcl=b165dc5211edda5957d621643749d22d71152afe&l=414 That if will fail for the software path, and it has no fallback. miu@ all the work I saw for new-fangled tab capture was in GLRenderer. Does the software story improve there?
,
Jan 25 2018
Perf sheriff checking in: danakj, miu, any ideas of who could own this bug?
,
Jan 25 2018
My claim here is that software based tab capture is already slow and unoptimized, so changing it here doesn't really matter, but it points at that fact so now we have a bug. Since it was not a priority to make it fast in the past, it doesn't seem clear that it's a priority to do so now. miu@ do you have opinions?
,
Jan 25 2018
danakj: The "round-up-to-64" change wasn't meant to affect the main memory being allocated for the SoftwareRenderer, correct? I agree with you, in comments 5 and 7, that we don't care about capture performance in the SoftwareRenderer. However, the problem we really need to attend to here is totally orthogonal to tab capture: The "round-up-to-64" change is increasing main memory usage whenever software compositing is being used. Even if we don't care about software compositing performance, the extra memory usage is taking up a precious resource from the rest of the system. Thus, IMHO, this change should be revisited; and probably only affect texture allocations in GLRenderer. -------------- Reply to comments 5 and 7: For GLRenderer, the "backings" are being cached in GLRendererCopier to make GPU-accelerated tab capture efficient. Note that all the tab-capture-related code in DelegatedFrameHost will be deleted soon, as just yesterday it all became dead code paths. :) For software, we didn't bother to optimize. There's really no way to make it efficient enough to run full framerate video capture, even with the caching; so we really aren't bothering there.
,
Jan 25 2018
So, this crbug started as a tab capture performance regression; and we have all agreed not to bother resolving that. Since the important issue is main memory usage when using the software renderer, perhaps a new crbug should be cut?
,
Jan 26 2018
> danakj: The "round-up-to-64" change wasn't meant to affect the main memory being allocated for the SoftwareRenderer, correct? No, it did. Instead of allocating a 100x100 bitmap, we'd allocate a 128x128 one. See the linked bug for the motivation, but basically meant to help with memory fragmentation as we're reusing surfaces more and they are similar sizes. > The "round-up-to-64" change is increasing main memory usage whenever software compositing is being used. Even if we don't care about software compositing performance, the extra memory usage is taking up a precious resource from the rest of the system. Sure.. the memory change is pretty small, render surfaces are very few, and the goal was to reduce fragmentation which reduces memory pressure. See the linked bug. > Thus, IMHO, this change should be revisited; and probably only affect texture allocations in GLRenderer. The premise of that bug doesn't only apply to GL textures.
,
Jan 26 2018
Issue 801119 has been merged into this issue.
,
Jan 26 2018
Okay. It sounds like this is all a WontFix then. :)
,
Jan 26 2018
sg thanks!
,
Feb 6 2018
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Jan 11 2018