New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 770924 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
OOO until 2019-01-24
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 769488

Blocking:
issue 747159



Sign in to add a comment

Use unverified sync token in WebGL's DrawingBuffer

Project Member Reported by kbr@chromium.org, Oct 2 2017

Issue description

In  Issue 769488  the protocol of transferring WebGL's back buffer (DrawingBuffer) to the compositor was changed slightly, and this yielded a surprising speedup in some cases.

Per discussion with sunnyps@ we can eliminate one or two synchronous IPCs per frame by using an unverified rather than a verified sync token in WebGL's DrawingBuffer. This should yield even more improvements.

(Note that for WebVR it's not certain that this will yield a benefit in all cases. Since latency is crucial it's possible that flushing earlier may be better. If this turns out to be true then we'll work on detecting when WebVR is using the WebGL context and re-insert the flush.)

 

Comment 1 Deleted

(Deleted and reposted, the sequence examples in my earlier comment #1 were confused.)

It's a bit more complicated than that. We need GL commands to be flushed somewhat aggressively for low latency in VR, but the issue here was specifically how much two succeeding frames are allowed to overlap. Removing the DrawingBuffer flush has *improved* WebVR performance.

Specifically, it appears that flushing work for frame N+1 while frame N was still rendering seems to have triggered unhelpful scheduling for Qualcomm/Adreno with frames being rendered in pairs in a slow/fast pattern.

Bad sequence example:

Frame N drawing commands
Frame N flush
Frame N+1 drawing commands
Frame N+1 flush in DrawingBuffer
(Wait for frame N to complete rendering)
Frame N+1 completion fence for sync token
Frame N+1 flush

Good sequence example:

Frame N drawing commands
Frame N flush
Frame N+1 drawing commands
(Wait for frame N to complete rendering)
Frame N+1 completion fence for sync token
Frame N+1 flush

It seems to be ok if some drawing commands for frame N+1 get emitted to the driver while frame N is still rendering, as long as they don't include the commands that the driver interprets as a signal to start actual rendering work. (IIRC this includes unbinding an FBO's color buffer or switching the active framebuffer.)

Comment 3 by junov@chromium.org, Oct 3 2017

FWIW, I recently wrote a similar optimization for 2D canvas and ImageBitmap: https://chromium-review.googlesource.com/c/chromium/src/+/602679

Pointing it out in case it might give you some ideas.

That CL takes care of all cases that generate mailboxes by going through StaticBitmapImage::EnsureMailbox()/GetMailbox(). EnsureMailbox() now receives a MailboxSyncMode argument. The idea is that the call site requesting the mailbox must specify the level of synchronization that it requires.

Comment 4 by kbr@chromium.org, Oct 4 2017

Thanks klausw@ and junov@ for the feedback.

junov@ - I'm pretty sure that recycling not only of mailboxes but the underlying textures / GpuMemoryBuffers is handled by DrawingBuffer and its ColorBuffer inner class, and the viz::SingleReleaseCallback that's called by the compositor. But thanks for the pointer.

Project Member

Comment 5 by bugdroid1@chromium.org, Oct 4 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1f6e0695f16fb623455758c6fd483c9ee41cf183

commit 1f6e0695f16fb623455758c6fd483c9ee41cf183
Author: Kenneth Russell <kbr@chromium.org>
Date: Wed Oct 04 20:28:42 2017

Use unverified sync token in WebGL's DrawingBuffer.

This should eliminate a synchronous IPC.

Also change location of Mac-specific GPU back-pressure code.

BUG= 770924 

Change-Id: Id44286f3359662ec3122b1e421d22f860ea24463
Reviewed-on: https://chromium-review.googlesource.com/696666
Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#506501}
[modify] https://crrev.com/1f6e0695f16fb623455758c6fd483c9ee41cf183/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp

Project Member

Comment 6 by bugdroid1@chromium.org, Oct 5 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/293cf6820c52e30cc598ee794d16f44459b3e046

commit 293cf6820c52e30cc598ee794d16f44459b3e046
Author: Kenneth Russell <kbr@chromium.org>
Date: Thu Oct 05 00:43:48 2017

Rename gl variables in DrawingBuffer::CopyToPlatformTexture for clarity.

BUG= 770924 

Change-Id: Ic9ec7d08da3e907a006a5284ecfdeaadd5149ff4
Reviewed-on: https://chromium-review.googlesource.com/701402
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#506589}
[modify] https://crrev.com/293cf6820c52e30cc598ee794d16f44459b3e046/third_party/WebKit/Source/platform/graphics/gpu/DrawingBuffer.cpp

Comment 7 by kbr@chromium.org, Oct 5 2017

Status: Fixed (was: Assigned)
FYI, change r506501 from comment #5 / https://chromium-review.googlesource.com/696666 made WebVR slow again. We'll reopen  issue 747159  and see if we can stabilize things again.

Comment 9 by kbr@chromium.org, Oct 5 2017

Blocking: 747159
Argh. Sorry Klaus. Can you please try to insert a gl_->Flush() call inside DrawingBuffer::FinishPrepareTextureMailboxGpu, perhaps after the call to OrderingBarrierCHROMIUM, and see if it helps?

kbr@, no worries, overall I'm happy since these unintended effects were a good clue towards figuring out how to avoid the weird scheduling.

I'm not entirely sure why this got slow again due to less flushing. We had a static_image->EnsureMailbox(kVerifiedSyncToken) call which I think involves flushing to verify the token. Maybe the difference is what state the command stream is in at flush time and that determines if the GPU decides to start actual rendering or not?

I'm trying a slightly different approach in https://chromium-review.googlesource.com/c/chromium/src/+/703998 by moving the "wait for previous frame to render" mojo latch to just before GetStaticBitmapImage, this should make it more robust to changes in how the mailbox texture gets released.

Comment 11 by kbr@chromium.org, Oct 6 2017

I doubt it's related to the state the command stream was in when flush was called. Instead it seems more likely that the call to GenSyncTokenCHROMIUM, which implied a synchronous IPC, was both flushing the command stream and causing enough back-pressure on the renderer process to avoid swamping the GPU.

What if you put in the call to gl_->DescheduleUntilFinishedCHROMIUM() (in DrawingBuffer::FinishPrepareTextureMailboxGpu) which is currently only enabled on macOS. Does that help with the stuttering? That was necessary as a back-pressure mechanism on integrated GPUs in particular.

I can experiment more, but I suspect the current location for MacOS may be
too late since it looks as if the new texture is in a state where the GPU
can start rendering it. Assuming my hypothesis is right, we do want to wait
"in the middle" and not atomically. The point of "deschedule until
finished" would be to allow previous work to finish before bundling up a
new FBO texture for the GPU to work on. But I don't know how deep the
pipeline overlap is in the non-VR case.

Another problem is that part of the previous frame's work we want to wait
for in the VR case is in an unrelated context that the GPU process doesn't
know about, so it can't fence it right unless it has a cross process fence
(which we are looking into, was discussing this with reveman@ today).
One more thing, I think the issue isn't swamping the GPU as such, in the VR
case at least it's scheduling outstanding work in a bad order. I suspect
that the GPU driver treats a renderable FBO as high priority since it's
usually used as input for further rendering steps, not realizing that it's
for a future frame and is making the previous frame late. But that's
guesswork at this point.
For background, I'm assuming that integrated GPU scheduling is similar to
this Mali documentation:
https://community.arm.com/graphics/b/blog/posts/mali-performance-2-how-to-correctly-handle-framebuffers
. I think I saw similar notes for Adreno that unbinding an FBO is a signal
to start rendering. But this is not well documented and appears very
implementation dependent.
re comment #11, you're right that the performance-changing part of the patch is the GenSyncToken call, specifically https://chromium-review.googlesource.com/696666 for FinishPrepareTextureMailboxGpu at line 405:
-    gl_->GenSyncTokenCHROMIUM(
+    gl_->GenUnverifiedSyncTokenCHROMIUM(
         fence_sync, color_buffer_for_mailbox->produce_sync_token.GetData());

If I undo just this change, the slow/fast pattern goes away. (The CopyToPlatformTexture changes aren't relevant, the WebVR render path doesn't use that in the usual case where preserveDrawingBuffer is off.)

After spending a few hours on experimenting, I'm confused about what's going on.

Smooth render at ~45fps:
- GenSyncTokenCHROMIUM in FinishPrepareTextureMailboxGpu
- wait-after-mailbox in VRDisplay
  (This is basically the before-r506501 logic)

Smooth render at ~42fps"
- wait-before-mailbox in VRDisplay
  This is the proposed change from https://chromium-review.googlesource.com/c/chromium/src/+/703998,
  in this case changing Gen{,Unverified}SyncTokenCHROMIUM has little effect. (Maybe a 1fps drop for GenSyncTokenCHROMIUM.)

ToT: Irregular render at <40fps:
- GenUnverifiedSyncTokenCHROMIUM in FinishPrepareTextureMailboxGpu
- wait-after-mailbox in VRDisplay

For the "irregular render" case, I've tried various variations, none of which seemed to fix the irregular render pattern and some of them made performance worse:

- gl_->DescheduleUntilFinishedCHROMIUM() before OrderingBarrier: no effect
- gl_->DescheduleUntilFinishedCHROMIUM() in ToT MACOSX location: no effect
- gl_->DescheduleUntilFinishedCHROMIUM() in PrepareTextureMailboxInternal before ResolveIfNeeded: no effect

I've also tried putting gl_->ShallowFlushCHROMIUM and gl_->Flush in various places, including before and after GenUnverifiedSyncTokenCHROMIUM, none of those got the same smooth performance as GenSyncTokenCHROMIUM.

I think I found a likely cause for the performance difference, see the attached traces.

Roughly, the sequence in WebVR is:

  RefPtr<Image> image_ref = rendering_context_->GetStaticBitmapImage();

  StaticBitmapImage* static_image =
      static_cast<StaticBitmapImage*>(image_ref.get());

  static_image->EnsureMailbox(kVerifiedSyncToken);

If DrawingBuffer creates a verified token via GenSyncTokenCHROMIUM, the EnsureMailbox call doesn't need to do anything since it's already verified.

However, if DrawingBuffer uses GenUnverifiedSyncTokenCHROMIUM, we end up in a code path where EnsureMailbox ends up calling gl->Flush():

https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/graphics/MailboxTextureHolder.cpp?type=cs&l=111

We'd only need a ShallowFlushCHROMIUM here, the full Flush causes bad scheduling.

Assuming I'm reading it right, the observed performance changes happened something like this:

- before r505155, DrawingBuffer did a Flush, lower performance

- in r505155, Flush was replaced with an OrderingBarrierCHROMIUM, but still using GenSyncTokenCHROMIUM. The token was marked verified so MailboxTexture didn't Flush, resulting in good performance

- r506501 changed to GenUnverifiedSyncTokenCHROMIUM, so EnsureMailbox did a Flush

What's the best way to fix this?

Would it be ok to change MailboxTextureHolder::Sync to do ShallowFlushCHROMIUM? AFAIK that would be sufficient to ensure correct verification. Note there's already a related TODO in MailboxTextureHolder::Sync:

    // TODO(junov): Batch this verification in the case where there are multiple
    // offscreen canvases being committed.

If that's unsafe because other code somehow depends on there being a full flush here, we could add a new MailboxSyncMode such as kVerifiedShallowFlushedSyncToken that does ShallowFlushCHROMIUM instead of Flush.

Does this sound plausible?
trace_fastslow2-verified.html
3.7 MB View Download
trace_fastslow2-unverified.html
3.7 MB View Download
See https://chromium-review.googlesource.com/c/chromium/src/+/707354 , using ShallowFlushCHROMIUM in MailboxTextureHolder::Sync does fix the performance regression for me.
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6867595a40855978c1bcbf7012ef75fc79626fc9

commit 6867595a40855978c1bcbf7012ef75fc79626fc9
Author: Klaus Weidner <klausw@chromium.org>
Date: Tue Oct 10 16:33:16 2017

Use minimal flushing in MailboxTextureHolder::Sync

On mobile GPUs, full glFlush() can be very expensive since it may force tiled memory
to be forced out to slow main memory, and it can also cause unhelpful scheduling
in virtualized contexts.

For verified sync tokens, according to the CHROMIUM_sync_point documentation,
"The use of ShallowFlushCHROMIUM is enough for the command to be flushed to the
server", so we should use that here, especially for cases where multiple
textures may be in use. If the application code expects a full driver level
glFlush, I think it should do this explicitly.

For unverified sync tokens, use OrderingBarrierCHROMIUM instead of
ShallowFlushCHROMIUM since that's documented to be sufficient for this purpose.

WARNING: there's a risk that this may break code that was assuming more
aggressive flushing as a side effect of updating mailboxes. If needed, consider
making the new flushing methods opt-in?

See:
https://cs.chromium.org/chromium/src/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt

BUG= 770924 

Change-Id: I80e111ef60ac56f832f2e580ce737be99ee36d37
Reviewed-on: https://chromium-review.googlesource.com/707354
Reviewed-by: Justin Novosad <junov@chromium.org>
Commit-Queue: Klaus Weidner <klausw@chromium.org>
Cr-Commit-Position: refs/heads/master@{#507688}
[modify] https://crrev.com/6867595a40855978c1bcbf7012ef75fc79626fc9/third_party/WebKit/Source/platform/graphics/MailboxTextureHolder.cpp

Comment 19 by kbr@chromium.org, Oct 11 2017

Great investigation and work Klaus!

Sign in to add a comment