Back Pressure for WebGL using CA compositor. |
||||||
Issue descriptionOn Mac, we currently don't have any good way of determining when the Window Server has consumed the content we have passed it, and finished executing GL commands to display/transform that content. I've done some research into the topic, but haven't made much progress. We should probably just ask Apple engineers. https://bugs.chromium.org/p/chromium/issues/detail?id=603320 In the GL compositor, back pressure is created by waiting on the GPU process's main thread for a glFinish(). When the application is GPU bound, this will delay the SwapBuffer Ack, which will eventually delay RequestAnimationFrame from firing. The reason that glFinish() ends up waiting for WebGL content [which is in a separate GL context], is that on Mac, GPU drivers serialize all drawing commands. The GL compositor needs to do some blits, which cannot be performed until the WebGL commands have been executed. In the CA compositor, the glFinish() still exists, but now there are no drawing commands being issued on the compositor's GL context, and so the glFinish() does nothing. This removes all backpressure, and causes WebGL gpu-bound content to choke.
,
May 13 2016
piman's suggestion of a fence wait doesn't seem great because of the CPU busy wait. I made the suggestion: "why not just do a no-op draw command in the CA compositor to force the glFinish to wait for all draw commands?" ccameron replied that it might work. He was worried that it might affect power usage in low-power-fullscreen. The last alternative I can think of is to do something similar to piman's suggestion, but use a glFinish instead of a fence.
,
May 13 2016
Another option to the "no-op draw command" is to somehow track "is it necessary to wait on any GPU work" (sort of like "is there any fence at all"), and, only then do the no-op draw and glFinish.
,
May 13 2016
Also of note there is that, if we have good work-tracking, we can skip going to the GPU process to do the Finish if we know that we have no work to finish. I'm curious if that will help power or not.
,
May 13 2016
Since it seems most necessary to do this for WebGL, it would be possible to modify DrawingBuffer::prepareMailbox (src/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp) to drop in a server-side glFinish just before the ProduceTextureDirectCHROMIUM. We tried this with the existing client-side glFinish, and it worked to supply back-pressure, but also blocked the renderer process which was undesirable. The primitive that would be needed would be something like glAsyncFinishCHROMIUM, where the client-side returns immediately, but the service side executes the glFinish.
,
May 13 2016
I like #5 as well.
,
May 13 2016
I was under the impression that piman@ did not like #5 because he wanted the back pressure to be applied at a point where it directly delayed the swapbuffers ack [and hence RAF], rather than indirectly delaying the swapbuffers ack by clogging up the GPU process. That being said, I'll let him speak for himself.
,
May 16 2016
Right, I would prefer if the glFinish happened at presentation time rather than webgl time, to prevent offscreen content from slowing down everything else. Really, glFinish is a very big hammer. Unless we can do it on a separate thread?
,
May 16 2016
Separate thread on the client side or service side? Assume you meant service side. Would have to do a prototype to see if it would have the desired effect. If it did, that would be the simplest solution. For simple WebGL content rendering into an overlay, is there any good interposition point where we could issue that glFinish at presentation time? I thought everything was under Core Animation's control in that scenario.
,
May 16 2016
@#9: yes I meant service side re: presentation time, I mean the time we send it to CA. We can track the GLImages that webgl rendered to (similar to how Erik's patch would do it for using fences instead), as well as the context. Before sending it to CA, we would make that context current and issue the glFinish. Other question: on Mac, there's 2 APIs we can use for fences: GL_ARB_sync and GL_APPLE_fence. We default to GL_ARB_sync because it allows server-side waits whereas GL_APPLE_fence only allows client-side waits. For this use case we only need client-side waits, so it might be worth checking whether GL_APPLE_fence also spins. See GLFence::Create for the logic to use one vs another.
,
May 17 2016
GL_APPLE_fence is only available with the AppleGL implementation. WebGL 2 uses Core Profile. If we do choose to use GL_APPLE_fence, the function TestFenceAPPLE is non-blocking, so we could always implement a non-spin wait using sleep and TestFenceAPPLE.
,
May 17 2016
piman's suggestion of tracking the context on the GLImage, and finishing that context at the presentation layer is reasonably simple, and will make piman@ happy, so I suggest we move forward with that?
,
May 17 2016
@#11: FYI, you can also do that with GL_ARB_sync, see GLFenceARB::HasCompleted
,
May 17 2016
To clarify: GL_APPLE_fence is only available in the OpenGL compatibility profile on OS X. The Core Profile doesn't expose it any more since sync objects are built in to the core API. Here are the extensions on my machine when Chrome's run with --enable-unsafe-es3-apis: GL_ARB_blend_func_extended GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_ES2_compatibility GL_ARB_explicit_attrib_location GL_ARB_gpu_shader_fp64 GL_ARB_gpu_shader5 GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_occlusion_query2 GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_separate_shader_objects GL_ARB_shader_bit_encoding GL_ARB_shader_subroutine GL_ARB_shading_language_include GL_ARB_tessellation_shader GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_cube_map_array GL_ARB_texture_gather GL_ARB_texture_query_lod GL_ARB_texture_rgb10_a2ui GL_ARB_texture_storage GL_ARB_texture_swizzle GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_vertex_attrib_64bit GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_EXT_debug_label GL_EXT_debug_marker GL_EXT_framebuffer_multisample_blit_scaled GL_EXT_texture_compression_s3tc GL_EXT_texture_filter_anisotropic GL_EXT_texture_sRGB_decode GL_APPLE_client_storage GL_APPLE_container_object_shareable GL_APPLE_flush_render GL_APPLE_object_purgeable GL_APPLE_rgb_422 GL_APPLE_row_bytes GL_APPLE_texture_range GL_ATI_texture_mirror_once GL_NV_texture_barrier
,
May 17 2016
Right, what I meant is, the way you suggested to poll using sleep+TestFenceAPPLE, you can also poll using sleep+glClientWaitSync(, 0) (or sleep+glGetSynciv which should work nowsince we don't support 10.7 any more).
,
May 17 2016
I tested three mechanisms for backpressure: 1) glFence + client wait. - 100% CPU usage, 1 idle wakeup, "300" energy impact 2) glFence + sleeping client wait (100us sleep) - 25% CPU usage, 4000 idle wakeups, "300" energy impact 3) glFinish (using gl compositor) - 25% CPU usage, 200 idle wakeups, "300" energy impact Both "idle wakeups" and "energy impact" are in one sense totally arbitrary, and not reflective of real performance. On the other hand, OS X uses these numbers to shame processes and calls these out to users.
,
May 17 2016
It sounds like 2 with 2ms sleep would be on-par with glFinish? (I may be willing to bet it's what glFinish does under the hood - that's also how often we poll fences when idle).
,
May 17 2016
Yup, a 2ms sleep does the trick.
,
May 17 2016
Excellent. Then I think the fence version is better, because you may get more cpu/gpu overlap, especially for things that are not rAF-throttled, and/or if there are non-webgl updates as well.
,
May 18 2016
2 ms sounds like a really long time to sleep. That's 1/8 of the frame time at 60 Hz. Can this be adjusted?
,
May 18 2016
Agree that a 2ms poll seems unacceptably long. I still advocate for either kbr's suggestion in #5 (the service-side glFinish). We should start with the simple solution, and only add more complicated things if that appears insufficient.
,
May 18 2016
from #16 it's very likely that glFinish has the same 2ms granularity.
,
May 18 2016
Blocked a couple of bugs on this one. At this point, GPU-bound WebGL content is slower in Chrome on Mac than it was in previous releases which rendered into textures. How can we avoid this regression in M52? The branch point is tomorrow. Can or should we revert the default enabling of CHROMIUM_image for WebGL on the branch?
,
May 18 2016
CC'ing seththompson@; Seth, fixing this bug is very important for Unity's WebGL-exported content to run well on OS X.
,
May 19 2016
Today I talked with piman@, ccameron@, vmiura@, and sunnyps@ about this subject. At some point, I'm going to put together a doc that captures all the findings. There is a medium complexity solution which ccameron@, vmiura@ and sunnyps@ seem happy with. It doesn't block the gpu process main thread, so I expect piman@ to be happy. The details still needs to be specced out. I was hoping to up-sell piman@ on doing a service side glFinish on the WebGL context as a short term solution [that is a slightly better version of what the GLRenderer does today]. But given that branch point is tomorrow, I think it will be simpler to just turn off CHROMIUM image and fall back to the GLRenderer M52.
,
Jun 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7843491fb012d3d3a37f77355b92d999da732bca commit 7843491fb012d3d3a37f77355b92d999da732bca Author: erikchen <erikchen@chromium.org> Date: Fri Jun 10 18:17:43 2016 WebGL: Three small fixes to Image CHROMIUM logic. 1. Use the newly added method DescheduleUntilFinishedCHROMIUM() to add back pressure. 2. Update the color mask appropriately before calling clearFramebuffers(). 3. Plumb gpuMemoryBufferId through to the mailbox. BUG= 611805 , 617249 , 607130 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2053983002 Cr-Commit-Position: refs/heads/master@{#399231} [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/modules/webgl/WebGLRenderingContextBase.cpp [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.h
,
Jun 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7843491fb012d3d3a37f77355b92d999da732bca commit 7843491fb012d3d3a37f77355b92d999da732bca Author: erikchen <erikchen@chromium.org> Date: Fri Jun 10 18:17:43 2016 WebGL: Three small fixes to Image CHROMIUM logic. 1. Use the newly added method DescheduleUntilFinishedCHROMIUM() to add back pressure. 2. Update the color mask appropriately before calling clearFramebuffers(). 3. Plumb gpuMemoryBufferId through to the mailbox. BUG= 611805 , 617249 , 607130 CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2053983002 Cr-Commit-Position: refs/heads/master@{#399231} [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/modules/webgl/WebGLRenderingContextBase.cpp [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp [modify] https://crrev.com/7843491fb012d3d3a37f77355b92d999da732bca/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.h
,
Jun 21 2016
This was fixed by: https://bugs.chromium.org/p/chromium/issues/detail?id=617249#c19
,
Jun 22 2016
It would appear that the fix in c#29 does not work for all devices.
,
Jun 22 2016
Do you have examples of devices on which it does and does not work? Perhaps about:gpu from each?
,
Jun 23 2016
It succeeds on the MBA with Intel HD 5000, and fails on my MBP with NVIDIA GeForce GT 750M. I've already tested that my fixes to DescheduleUntilFinishedCHROMIUM successfully fix the problem.
,
Jun 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4acb5991cddf0dd2a0efb392c9a5691dfd5af2d6 commit 4acb5991cddf0dd2a0efb392c9a5691dfd5af2d6 Author: erikchen <erikchen@chromium.org> Date: Thu Jun 23 05:38:12 2016 Implement new behavior for DescheduleUntilFinishedCHROMIUM. Previously, the command immediately descheduled the command executor until all previously issued work had completed. This prevents pipelining of CPU-bound decoding and GPU-bound drawing. The new behavior deschedules the command executor until all work issued prior to the previous call to DescheduleUntilFinishedCHROMIUM has completed. BUG= 611805 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2096503002 Cr-Commit-Position: refs/heads/master@{#401544} [modify] https://crrev.com/4acb5991cddf0dd2a0efb392c9a5691dfd5af2d6/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_deschedule.txt [modify] https://crrev.com/4acb5991cddf0dd2a0efb392c9a5691dfd5af2d6/gpu/command_buffer/service/gles2_cmd_decoder.cc [modify] https://crrev.com/4acb5991cddf0dd2a0efb392c9a5691dfd5af2d6/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc
,
Jun 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8007bef4b45fcf90c81b1300509acfbda42cae47 commit 8007bef4b45fcf90c81b1300509acfbda42cae47 Author: erikchen <erikchen@chromium.org> Date: Thu Jun 23 18:35:39 2016 Add an Invalidate method to GLFence. The destructor of GLFence subclasses assumes that the context is current. This is not necessarily the case during destruction of the GLES2DecoderImpl. BUG= 611805 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2087403003 Cr-Commit-Position: refs/heads/master@{#401657} [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/gpu/command_buffer/service/gles2_cmd_decoder.cc [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence.cc [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence.h [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_apple.cc [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_apple.h [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_arb.cc [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_arb.h [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_nv.cc [modify] https://crrev.com/8007bef4b45fcf90c81b1300509acfbda42cae47/ui/gl/gl_fence_nv.h
,
Jun 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eb44a4aee7dc6e03ba4670509b08d038fbad2d48 commit eb44a4aee7dc6e03ba4670509b08d038fbad2d48 Author: erikchen <erikchen@chromium.org> Date: Thu Jun 23 19:45:17 2016 Add a call to DescheduleUntilFinishedCHROMIUM to WebGL. It was recently removed in https://codereview.chromium.org/2062813003/ because I thought that the new implementation to ImageTransportSurfaceOverlayMac::ClientWait correctly waited for the WebGL context's work to finish. That is only the case on some macOS/gpu configurations. BUG= 611805 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review-Url: https://codereview.chromium.org/2093533002 Cr-Commit-Position: refs/heads/master@{#401688} [modify] https://crrev.com/eb44a4aee7dc6e03ba4670509b08d038fbad2d48/gpu/ipc/service/gpu_command_buffer_stub.cc [modify] https://crrev.com/eb44a4aee7dc6e03ba4670509b08d038fbad2d48/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
,
Jun 27 2016
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by erikc...@chromium.org
, May 13 2016930 KB
930 KB Download
1.5 MB
1.5 MB Download
929 KB
929 KB Download