Epic Zen Garden flickers badly with --disable-features=WebGLImageChromium |
||||||||||||||
Issue descriptionChrome Version: r504537 OS: macOS What steps will reproduce the problem? (1) Build Release with dcheck_always_on=true (2) out/Release/Chromium.app/Contents/MacOS/Chromium --disable-features=WebGLImageChromium (3) Go to Epic Zen Garden build from Issue 764824 (sorry, not public due to URL in bug report) What is the expected result? Expect demo to render smoothly. What happens instead? Demo jitters frequently -- as though frames were being rendered out of order. about:gpu from an affected machine (MacBook Pro, dual Intel Iris Pro and NVIDIA GeForce GT 750M GPUs) attached.
,
Sep 27 2017
,
Sep 27 2017
Since you're generating a verified sync token here, don't you need ShallowFlushCHROMIUM before the GenSyncTokenCHROMIUM call, instead of OrderingBarrierCHROMIUM?
,
Sep 28 2017
Good point sunnyps@. Based on our offline conversation and piman's feedback on the work-in-progress CL https://chromium-review.googlesource.com/687832 , it seems that OrderingBarrierCHROMIUM is working, and saves one synchronous IPC per frame. Per your suggestion, let's move to unverified sync tokens for this code as soon as your work in progress is done. Note also that based on feedback from piman the change in the CL is different than c#1, namely because of the change to assume there are implicit flushes when switching contexts on all platforms.
,
Sep 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/325141c137c6808376255aad1ff0b28f0335bca7 commit 325141c137c6808376255aad1ff0b28f0335bca7 Author: Kenneth Russell <kbr@chromium.org> Date: Thu Sep 28 21:10:18 2017 Enforce flush ordering between contexts in GPU process on macOS. There are assumptions in many places that OrderingBarrierCHROMIUM is sufficient to ensure command ordering between producers and consumers in Chromium's pipeline. However, unless clients were carefully written, it was easy to miss a necessary glFlush, or put one in the wrong place. On macOS there is no implicit flush when switching contexts, though one exists on some other platforms (e.g. GLX), and exists implicitly when using virtualized GL contexts. Fix this at the lowest level in the GPU process by flushing when switching contexts on macOS. Also fix bug in command ordering between WebGL's DrawingBuffer and consumers; i.e., the compositor. BUG= 769488 Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: I46faa7537a3e2e12e3094a6aba3a85378a677c3f Reviewed-on: https://chromium-review.googlesource.com/687832 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#505155} [modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/gpu/command_buffer/tests/gl_texture_mailbox_unittest.cc [modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp [modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/ui/gl/gl_context_cgl.cc
,
Sep 28 2017
,
Sep 28 2017
,
Sep 29 2017
,
Sep 29 2017
,
Oct 2 2017
FYI, r505155 fixed WebVR's issue 747159 "Frames rendered in fast/slow pattern" and increased the framerate by almost 10% in some tests. Specifically, the DrawingBuffer.cpp change seems to be what causes this improvement. Does changing from gl->Flush() to gl->OrderingBarrierCHROMIUM() effectively delay the underlying driver flush until a later time? This is on Android/Qualcomm which uses virtualized contexts. According to comment #1, it sounds as if the correct order of operations would be to flush before calling InsertFenceSyncCHROMIUM? https://cs.chromium.org/chromium/src/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt seems to say that the fence sync needs to be flushed, so would it need two flushes, one before and one after?
,
Oct 2 2017
Whoa! That's good to know. I don't know all of the semantics of OrderingBarrierCHROMIUM, but it was designed to reduce IPCs between the renderer and GPU process by batching up flushes. From talking with sunnyps@ we can further eliminate a synchronous IPC in the DrawingBuffer code by using an untrusted, rather than trusted, sync token in DrawingBuffer. I'll work with him to make that improvement next.
,
Oct 2 2017
I recommend keeping the Flush instead of OrderingBarrier. Using unverified sync tokens is orthogonal to that, and that's also something I'd recommend.
,
Oct 2 2017
I thought removing the flush caused a regression. Sorry for the confusion.
,
Oct 2 2017
,
Oct 2 2017
While we're on the subject, should the glFlush() commands be added to gl_context_egl.cc also? Previously I had tried disabling virtual contexts for a WebVR optimization (see issue 691102 and linked blocking/blocked on), and this had broken non-VR WebGL with tearing or incomplete images. Not sure if it was the same issue, but it seems like a potential trap for the future to assume that virtualized contexts must be in use to ensure correct behavior. Would these flushes be unused while virtualized contexts are active since the underlying low-level context doesn't get switched?
,
Oct 2 2017
glFlush probably is neither necessary nor sufficient on EGL contexts in general. Only Mac ensures ordering via glFlush. To do correct ordering without virtual contexts on Android, you'd need a glFenceSync/glWaitSync (or the egl ones) across a context switch. Last time I tried (admittedly a while ago) on the ARM Mali driver (on a Chrome OS device), that option was slower than virtual contexts. YMMV, we could investigate that option, but it's not an obvious home run.
,
Oct 11 2017
,
Oct 11 2017
,
Oct 11 2017
Issue 773342 has been merged into this issue.
,
Oct 11 2017
At this point we've received a couple of reports from a customer (MapBox) that are definitely this issue, so I'm upgrading this to P1 and requesting to merge 325141c137c6808376255aad1ff0b28f0335bca7 back to the M62 branch. The fix is small, well understood, and should not cause any risks for e.g. WebVR in M62. There's no workaround from the application level and rendering of WebGL content is unpredictable on macOS without the fix.
,
Oct 11 2017
This bug requires manual review: We are only 5 days from stable. Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Oct 12 2017
Thanks for the fix. Has the fix been verified and confirmed in Canary or Dev?
,
Oct 12 2017
Yes, the fix has been verified in Canary and Dev.
,
Oct 13 2017
Approving merge to M62. Branch:3202
,
Oct 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6af3ad09b53ca761c66a72b1a3b67fe788a1342e commit 6af3ad09b53ca761c66a72b1a3b67fe788a1342e Author: Kenneth Russell <kbr@chromium.org> Date: Fri Oct 13 18:55:58 2017 Enforce flush ordering between contexts in GPU process on macOS. There are assumptions in many places that OrderingBarrierCHROMIUM is sufficient to ensure command ordering between producers and consumers in Chromium's pipeline. However, unless clients were carefully written, it was easy to miss a necessary glFlush, or put one in the wrong place. On macOS there is no implicit flush when switching contexts, though one exists on some other platforms (e.g. GLX), and exists implicitly when using virtualized GL contexts. Fix this at the lowest level in the GPU process by flushing when switching contexts on macOS. Also fix bug in command ordering between WebGL's DrawingBuffer and consumers; i.e., the compositor. BUG= 769488 TBR=kbr@chromium.org (cherry picked from commit 325141c137c6808376255aad1ff0b28f0335bca7) Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: I46faa7537a3e2e12e3094a6aba3a85378a677c3f Reviewed-on: https://chromium-review.googlesource.com/687832 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#505155} Reviewed-on: https://chromium-review.googlesource.com/719408 Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/branch-heads/3202@{#681} Cr-Branched-From: fa6a5d87adff761bc16afc5498c3f5944c1daa68-refs/heads/master@{#499098} [modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/gpu/command_buffer/tests/gl_texture_mailbox_unittest.cc [modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp [modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/ui/gl/gl_context_cgl.cc
,
Oct 26 2017
Issue 776922 has been merged into this issue.
,
Oct 30 2017
,
Nov 11 2017
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by kbr@chromium.org
, Sep 27 2017piman@ and I have investigated this bug in detail offline. It's caused by incorrect synchronization between OpenGL contexts at the lowest level. macOS uses glFlush as an ordering barrier even between OpenGL contexts, where other platforms require a glFinish. In WebGL's DrawingBuffer, which we thought actually had correct synchronization, the following patch is needed in order to properly flush the drawing commands: diff --git a/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp b/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp index d43720833702..463920cf40a3 100644 --- a/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp +++ b/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp @@ -392,11 +392,12 @@ bool DrawingBuffer::FinishPrepareTextureMailboxGpu( color_buffer_for_mailbox->texture_id, color_buffer_for_mailbox->parameters.target, color_buffer_for_mailbox->mailbox.name); + gl_->Flush(); const GLuint64 fence_sync = gl_->InsertFenceSyncCHROMIUM(); #if defined(OS_MACOSX) gl_->DescheduleUntilFinishedCHROMIUM(); #endif - gl_->Flush(); + gl_->OrderingBarrierCHROMIUM(); gl_->GenSyncTokenCHROMIUM( fence_sync, color_buffer_for_mailbox->produce_sync_token.GetData()); } In other words, the call to gl_->InsertFenceSyncCHROMIUM() was allowing the sync token to be passed, and the resulting mailbox's texture to be consumed, *before* the OpenGL commands were actually flushed (at the real OpenGL context level) for that context. This could allow earlier frames' commands to be buffered up and executed later, out of order with respect to the compositor. I/we think it's likely that similar incorrect synchronization may be the cause of longstanding Issue 702369 , where video frames very occasionally and infrequently have the wrong contents. piman@ points out that there are many places in Chromium's code base where the client-side code deliberately avoids calling glFlush, preferring glOrderingBarrierCHROMIUM instead to avoid IPCs. We've discussed possible solutions and are leaning toward a fix at the lowest level, in gl_context_cgl.cc, enforcing flushes during context switches on the service side. This will implicitly fix all clients, including WebGL and video, and may fix Issue 702369 . The thought is that this will not have any worse performance than issuing glFlush at the needed points in the client-side code, since context switches will be avoided and batched because we're using glOrderingBarrierCHROMIUM, and we'll only be flushing once per context switch. Flushes are implicit on other platforms, for example in glXMakeCurrent.