New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 769488 link

Starred by 14 users

Issue metadata

Status: Fixed
Owner:
OOO until 2019-01-24
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug


Sign in to add a comment

Epic Zen Garden flickers badly with --disable-features=WebGLImageChromium

Project Member Reported by kbr@chromium.org, Sep 27 2017

Issue description

Chrome Version: r504537
OS: macOS

What steps will reproduce the problem?
(1) Build Release with dcheck_always_on=true
(2) out/Release/Chromium.app/Contents/MacOS/Chromium --disable-features=WebGLImageChromium
(3) Go to Epic Zen Garden build from Issue 764824 (sorry, not public due to URL in bug report)

What is the expected result?

Expect demo to render smoothly.


What happens instead?

Demo jitters frequently -- as though frames were being rendered out of order.

about:gpu from an affected machine (MacBook Pro, dual Intel Iris Pro and NVIDIA GeForce GT 750M GPUs) attached.

 
about-gpu.txt
7.9 KB View Download

Comment 1 by kbr@chromium.org, Sep 27 2017

Cc: sande...@chromium.org hubbe@chromium.org sunn...@chromium.org
piman@ and I have investigated this bug in detail offline. It's caused by incorrect synchronization between OpenGL contexts at the lowest level. macOS uses glFlush as an ordering barrier even between OpenGL contexts, where other platforms require a glFinish. In WebGL's DrawingBuffer, which we thought actually had correct synchronization, the following patch is needed in order to properly flush the drawing commands:

diff --git a/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp b/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
index d43720833702..463920cf40a3 100644
--- a/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
+++ b/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
@@ -392,11 +392,12 @@ bool DrawingBuffer::FinishPrepareTextureMailboxGpu(
         color_buffer_for_mailbox->texture_id,
         color_buffer_for_mailbox->parameters.target,
         color_buffer_for_mailbox->mailbox.name);
+    gl_->Flush();
     const GLuint64 fence_sync = gl_->InsertFenceSyncCHROMIUM();
 #if defined(OS_MACOSX)
     gl_->DescheduleUntilFinishedCHROMIUM();
 #endif
-    gl_->Flush();
+    gl_->OrderingBarrierCHROMIUM();
     gl_->GenSyncTokenCHROMIUM(
         fence_sync, color_buffer_for_mailbox->produce_sync_token.GetData());
   }

In other words, the call to gl_->InsertFenceSyncCHROMIUM() was allowing the sync token to be passed, and the resulting mailbox's texture to be consumed, *before* the OpenGL commands were actually flushed (at the real OpenGL context level) for that context. This could allow earlier frames' commands to be buffered up and executed later, out of order with respect to the compositor.

I/we think it's likely that similar incorrect synchronization may be the cause of longstanding  Issue 702369 , where video frames very occasionally and infrequently have the wrong contents.

piman@ points out that there are many places in Chromium's code base where the client-side code deliberately avoids calling glFlush, preferring glOrderingBarrierCHROMIUM instead to avoid IPCs.

We've discussed possible solutions and are leaning toward a fix at the lowest level, in gl_context_cgl.cc, enforcing flushes during context switches on the service side. This will implicitly fix all clients, including WebGL and video, and may fix  Issue 702369 . The thought is that this will not have any worse performance than issuing glFlush at the needed points in the client-side code, since context switches will be avoided and batched because we're using glOrderingBarrierCHROMIUM, and we'll only be flushing once per context switch. Flushes are implicit on other platforms, for example in glXMakeCurrent.

Comment 2 by kbr@chromium.org, Sep 27 2017

Cc: leslie.n...@epicgames.com
Since you're generating a verified sync token here, don't you need ShallowFlushCHROMIUM before the GenSyncTokenCHROMIUM call, instead of OrderingBarrierCHROMIUM?

Comment 4 by kbr@chromium.org, Sep 28 2017

Good point sunnyps@. Based on our offline conversation and piman's feedback on the work-in-progress CL https://chromium-review.googlesource.com/687832 , it seems that OrderingBarrierCHROMIUM is working, and saves one synchronous IPC per frame. Per your suggestion, let's move to unverified sync tokens for this code as soon as your work in progress is done.

Note also that based on feedback from piman the change in the CL is different than c#1, namely because of the change to assume there are implicit flushes when switching contexts on all platforms.

Project Member

Comment 5 by bugdroid1@chromium.org, Sep 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/325141c137c6808376255aad1ff0b28f0335bca7

commit 325141c137c6808376255aad1ff0b28f0335bca7
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Sep 28 21:10:18 2017

Enforce flush ordering between contexts in GPU process on macOS.

There are assumptions in many places that OrderingBarrierCHROMIUM is
sufficient to ensure command ordering between producers and consumers
in Chromium's pipeline. However, unless clients were carefully
written, it was easy to miss a necessary glFlush, or put one in the
wrong place. On macOS there is no implicit flush when switching
contexts, though one exists on some other platforms (e.g. GLX), and
exists implicitly when using virtualized GL contexts.

Fix this at the lowest level in the GPU process by flushing when
switching contexts on macOS. Also fix bug in command ordering between
WebGL's DrawingBuffer and consumers; i.e., the compositor.

BUG= 769488 

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I46faa7537a3e2e12e3094a6aba3a85378a677c3f
Reviewed-on: https://chromium-review.googlesource.com/687832
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#505155}
[modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/gpu/command_buffer/tests/gl_texture_mailbox_unittest.cc
[modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
[modify] https://crrev.com/325141c137c6808376255aad1ff0b28f0335bca7/ui/gl/gl_context_cgl.cc

Comment 6 by kbr@chromium.org, Sep 28 2017

Status: Fixed (was: Started)

Comment 7 by kbr@chromium.org, Sep 28 2017

Blocking: 769671

Comment 8 by kbr@chromium.org, Sep 29 2017

Blocking: 770214

Comment 9 by kbr@chromium.org, Sep 29 2017

Blocking: -770214
FYI, r505155 fixed WebVR's  issue 747159  "Frames rendered in fast/slow pattern" and increased the framerate by almost 10% in some tests. Specifically, the DrawingBuffer.cpp change seems to be what causes this improvement.

Does changing from gl->Flush() to gl->OrderingBarrierCHROMIUM() effectively delay the underlying driver flush until a later time? This is on Android/Qualcomm which uses virtualized contexts.

According to comment #1, it sounds as if the correct order of operations would be to flush before calling InsertFenceSyncCHROMIUM?
https://cs.chromium.org/chromium/src/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt seems to say that the fence sync needs to be flushed, so would it need two flushes, one before and one after? 

Comment 11 by kbr@chromium.org, Oct 2 2017

Whoa! That's good to know. I don't know all of the semantics of OrderingBarrierCHROMIUM, but it was designed to reduce IPCs between the renderer and GPU process by batching up flushes. From talking with sunnyps@ we can further eliminate a synchronous IPC in the DrawingBuffer code by using an untrusted, rather than trusted, sync token in DrawingBuffer. I'll work with him to make that improvement next.

I recommend keeping the Flush instead of OrderingBarrier. Using unverified sync tokens is orthogonal to that, and that's also something I'd recommend.
I thought removing the flush caused a regression. Sorry for the confusion.

Comment 14 by kbr@chromium.org, Oct 2 2017

Blocking: 770924
While we're on the subject, should the glFlush() commands be added to gl_context_egl.cc also? Previously I had tried disabling virtual contexts for a WebVR optimization (see issue 691102 and linked blocking/blocked on), and this had broken non-VR WebGL with tearing or incomplete images. Not sure if it was the same issue, but it seems like a potential trap for the future to 
assume that virtualized contexts must be in use to ensure correct behavior.

Would these flushes be unused while virtualized contexts are active since the underlying low-level context doesn't get switched?
glFlush probably is neither necessary nor sufficient on EGL contexts in general. Only Mac ensures ordering via glFlush.

To do correct ordering without virtual contexts on Android, you'd need a glFenceSync/glWaitSync (or the egl ones) across a context switch. Last time I tried (admittedly a while ago) on the ARM Mali driver (on a Chrome OS device), that option was slower than virtual contexts. YMMV, we could investigate that option, but it's not an obvious home run.

Comment 17 by kbr@chromium.org, Oct 11 2017

Blocking: 773337

Comment 18 by kbr@chromium.org, Oct 11 2017

Cc: kbr@chromium.org vamshi.k...@techmahindra.com
 Issue 773337  has been merged into this issue.

Comment 19 by kbr@chromium.org, Oct 11 2017

 Issue 773342  has been merged into this issue.

Comment 20 by kbr@chromium.org, Oct 11 2017

Labels: -Pri-2 ReleaseBlock-Stable Merge-Request-62 M-62 Pri-1
At this point we've received a couple of reports from a customer (MapBox) that are definitely this issue, so I'm upgrading this to P1 and requesting to merge 325141c137c6808376255aad1ff0b28f0335bca7 back to the M62 branch. The fix is small, well understood, and should not cause any risks for e.g. WebVR in M62. There's no workaround from the application level and rendering of WebGL content is unpredictable on macOS without the fix.

Project Member

Comment 21 by sheriffbot@chromium.org, Oct 11 2017

Labels: -Merge-Request-62 Merge-Review-62 Hotlist-Merge-Review
This bug requires manual review: We are only 5 days from stable.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Thanks for the fix. Has the fix been verified and confirmed in Canary or Dev?

Comment 23 by kbr@chromium.org, Oct 12 2017

Yes, the fix has been verified in Canary and Dev.


Labels: -Merge-Review-62 Merge-Approved-62
Approving merge to M62. Branch:3202
Project Member

Comment 25 by bugdroid1@chromium.org, Oct 13 2017

Labels: -merge-approved-62 merge-merged-3202
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6af3ad09b53ca761c66a72b1a3b67fe788a1342e

commit 6af3ad09b53ca761c66a72b1a3b67fe788a1342e
Author: Kenneth Russell <kbr@chromium.org>
Date: Fri Oct 13 18:55:58 2017

Enforce flush ordering between contexts in GPU process on macOS.

There are assumptions in many places that OrderingBarrierCHROMIUM is
sufficient to ensure command ordering between producers and consumers
in Chromium's pipeline. However, unless clients were carefully
written, it was easy to miss a necessary glFlush, or put one in the
wrong place. On macOS there is no implicit flush when switching
contexts, though one exists on some other platforms (e.g. GLX), and
exists implicitly when using virtualized GL contexts.

Fix this at the lowest level in the GPU process by flushing when
switching contexts on macOS. Also fix bug in command ordering between
WebGL's DrawingBuffer and consumers; i.e., the compositor.

BUG= 769488 
TBR=kbr@chromium.org

(cherry picked from commit 325141c137c6808376255aad1ff0b28f0335bca7)

Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I46faa7537a3e2e12e3094a6aba3a85378a677c3f
Reviewed-on: https://chromium-review.googlesource.com/687832
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#505155}
Reviewed-on: https://chromium-review.googlesource.com/719408
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/branch-heads/3202@{#681}
Cr-Branched-From: fa6a5d87adff761bc16afc5498c3f5944c1daa68-refs/heads/master@{#499098}
[modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/gpu/command_buffer/tests/gl_texture_mailbox_unittest.cc
[modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp
[modify] https://crrev.com/6af3ad09b53ca761c66a72b1a3b67fe788a1342e/ui/gl/gl_context_cgl.cc

Comment 26 by kbr@chromium.org, Oct 26 2017

 Issue 776922  has been merged into this issue.

Comment 27 by kbr@chromium.org, Oct 30 2017

Blocking: 770635

Comment 28 by kbr@chromium.org, Nov 11 2017

Blocking: 784030

Sign in to add a comment