Simplify timing in ImageTransportSurfaceOverlayMac |
||||||||
Issue description
ImageTransportSurfaceOverlayMac attempts to coordinate when swaps occur on Mac, but does a bad job at it.
Current design is as follows.
The lifetime of a SwapBuffers call is as follows
0. GPU work is issued (either compositor, WebGL, nor potentially nothing if we're in CA mode)
1. SwapBuffer is called on ImageTransportSurfaceOverlayMac
2. Create and issue a GL fence object and glFlush
3. Package up the swap information into a PendingSwap
4. Post a task to draw the PendingSwap in 1 vsync
[ Wait until either (A) a subsequent SwapBuffers call comes in or (B) the posted task runs ]
5. Finish on the GL fence object
6. Update the CALayer tree and ack the swap to the browser
It's actually more complicated than that, but let's live in the simplified world.
These are pipelined, so if you have swaps A, B, C coming in at 60fps, you'll have the sequence of events:
A0,A1,A2,A3,A4,......A5,A6.
B0,B1,......B2,B3,B4,......B5,B6,
C0,C1,......C2,C3,C4.. (some time later) ..C5,C6
This complexity isn't buying us much. Issues we see are as follows:
* WindowServer starvation: Notice that the GPU work issued to B0 is sent before the GL fence and CALayer update at A5. This means that it is likely that this work will be already issued to the GPU before WindowServer can render the updated CALayer tree in A6. This manifests as the actual framerate being far below the framerate that Chrome thinks it has.
* Latency: This adds at least an extra vsync of latency before swaps are acknowledged.
* CPU usage: The GL fence finish in step 5 sometimes do busy-waits on crappy drivers.
It appears that the following simplified pipeline works well:
0. GPU work is issued (either compositor, WebGL, nor potentially nothing if we're in CA mode)
1. SwapBuffer is called on ImageTransportSurfaceOverlayMac
2. Do glFinish on the GL context
3. Update the CALayer tree and ack the swap to the browser
Note that we do the glFinish in step 2 because we have no other way of getting GPU back-pressure -- we get no signal from the WindowServer that our CALayer tree update has actually been displayed.
Also note that even though we are doing the glFinish in step 2, we do allow a pipeline of pending frames in the browser process, so the browser will queue up new GL commands for us to execute.
The drawbacks to this are:
GPU process pipelining: We can't start decoding GL commands for the next frame until the glFinish has completed. So, if we're limited by GL command decoding, this will make it harder to hit 60fps. That said, if we do allow GL command decoding, then we hit the starvation issue from before -- I think that this is the better risk to run.
Multiple Windows: With multiple windows, we don't have an easy way to batch up the GL commands into a single glFinish. This has not been a substantial issue in practice.
Adding jank: Updating a CALayer tree with vsync is tricky business -- we want to avoid the part of the vsync interval where the WindowServer picks up its new contents (because if we are updating the CALayer tree at the same time, we may end up having missed and duplicated frames because of timing artifacts). We already run this risk to some degree with the existing code (we have a glFinish in there) -- avoiding it completely would require have a very Mac-specific scheduler (maybe not a bad idea).
,
Apr 7 2016
This is a point where we should probably reiterate that "there is no limit to the complexity of the wrong solution".
,
Apr 8 2016
Thanks to @thespite for pointing out this problem.
,
Apr 8 2016
A few questions: 1. Why does step 6 (update CALayer tree) happen after glFinish or FinishFence? Can't we do that right after the flush? 2. Would it be better if we could periodically TestFence after the flush instead of FinishFence after an entire vsync interval? 3. Even with the glFinish how do we know that all of our frames are being displayed by the WindowServer?
,
Apr 8 2016
To clarify my first question, it looks like all we need to do is ensure that we don't change the contents of the CALayer too often (which leads to dropped frames) and glFinish ensures that most of the time whereas flush wouldn't. So it looks like all glFinish is doing for us is delaying the swap ack to the browser. So we need A: Flush -> CALayer commit -> (periodically TestFence) -> SwapAck -> Next Vsync B: Flush -> ... after the vsync following swapack ... -> CALayer commit This still doesn't guarantee that A's CALayer commit was displayed (I believe this is impossible with CALayer + setContents) but it should achieve the same result as a glFinish right?
,
Apr 9 2016
To the question in #4: First, to limit our attention, I'm only concerned here with in GPU-bound or nearly-GPU-bound situations. For non-GPU-bound situations, an immediate glFlush+CACommit+Ack works fine. The problem is that setting up any kind of pipeline in the GPU process causes starvation of the WindowServer and dropped frames. Q1: Just doing a glFlush and delaying the ack will not stop the renderer from submitting the GL commands (e.g, heavy WebGL calls) for the next frame (recall that we can have ~2 in-flight frames). The GL work for the next frame will starve the WindowServer, and result in only the second frame being shown (sometimes). Q2: This would add a lot of complexity. What would this gain us? Q3: We don't, at least not with any APIs that I've found so far. I suspect there is a mechanism out there -- after all, CAOpenGLLayer seems to do this. We should spend more time investigating this issue. If we do find a reliable signal that content has appeared on the screen, then we should re-investigate how to ensure maximum smoothness.
,
Apr 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5a3c87b37a46ce1b0813f0530d981165daaa7cfe commit 5a3c87b37a46ce1b0813f0530d981165daaa7cfe Author: ccameron <ccameron@chromium.org> Date: Sun Apr 10 05:34:35 2016 Mac: Clean up ImageTransportSurfaceOverlayMac timing Change the processing of a frame to be the following: 0. GPU commands are decoded (compositor, WebGL, nor nothing) 1. SwapBuffers is called on ImageTransportSurfaceOverlayMac 2. Do glFinish on the GL context 3. Update the CALayer tree and ack the swap to the browser This is much simpler to reason about, and appears to result in improved performance. BUG= 601608 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Review URL: https://codereview.chromium.org/1867163002 Cr-Commit-Position: refs/heads/master@{#386316} [modify] https://crrev.com/5a3c87b37a46ce1b0813f0530d981165daaa7cfe/gpu/ipc/service/image_transport_surface_overlay_mac.h [modify] https://crrev.com/5a3c87b37a46ce1b0813f0530d981165daaa7cfe/gpu/ipc/service/image_transport_surface_overlay_mac.mm
,
Apr 11 2016
erikchen just added support to draw WebGL using the CoreAnimation renderer. It appears that, as a side-effect of this, we no longer glFinish the WebGL work, because we're only glFinish-ing the compositor context. Ideas on how to reach over to the WebGL CGLContextObj?
,
Apr 14 2016
We could instrument DrawingBuffer.cpp in Blink to do this when it produces its mailboxes, though doing it in exactly the right place and the right number of times per frame may be tricky. What exactly is the desired situation? glFinish right before WebGL produces its mailbox, for example? What about if there are 2 WebGL-rendered canvases on the page?
,
May 2 2016
,
May 12 2016
,
May 12 2016
,
May 18 2016
,
Jun 7 2016
ericrk has further improved this
,
Jun 7 2016
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by kbr@chromium.org
, Apr 7 2016