New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 467617 link

Starred by 45 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Feature

Blocked on:
issue 401331

Blocking:
issue 422000



Sign in to add a comment

How to dramatically improve Chrome's requestAnimationFrame VSYNC accuracy in Windows

Reported by jer...@duckware.com, Mar 16 2015

Issue description

There is a very easy way to achieve nearly perfect VSYNC synchronization in Chrome under Windows, that:

 - wakes up animation code in around 60 to 120 microseconds of true vsync
 - works on Vista, Win7, Win8, etc
 - does not spin wait, so is very CPU efficient
 - does not use the Windows Desktop Window Manager (DWM)
 - ...so it works even with DWM/Aero turned off
 - works even with the OS default 15.625ms timer (in)accuracy
 - not affected by notebooks running on battery power

Chrome attempts to wake up requestAnimationFrame() animations at vsync intervals, but the inter-frame timings for these animations (under Windows) still have significant jitter (around 1ms to 4ms) -- caused primarily by the inaccuracy of Windows WaitForXXX timing functions (that Chrome uses to wait for the vsync time).  This can be seen at www.vsynctester.com, especially with notebooks, and even more so for notebooks not on AC (on battery).  At 120Hz that jitter represents an unacceptably large portion of the 8.3ms frame budget. 

Window's Sleep/WaitForXXX have an accuracy plus or minus an entire time quantum, which on many Windows computers is 15.625ms.  Chrome uses timeBeginPeriod() to attempt to reduce that to 1ms while on AC power, and 4ms while on battery power (see time_win.cc), which affects (increases) power usage.

THE SOLUTION: The solution is to use the same interface that Window's own Desktop Window Manager already uses to implement a OS level compositing manager -- the DirectX graphics kernel subsystem.  So, use D3DKMTWaitForVerticalBlankEvent():

  https://msdn.microsoft.com/en-us/library/windows/hardware/ff547265(v=vs.85).aspx

A successful proof of concept prototype used D3DKMTOpenAdapterFromHdc() to obtain handles from the application's hDC that were then passed into D3DKMTWaitForVerticalBlankEvent(), to create a vsync synchronization loop, that woke up an animation loop every vsync.

Hopefully this can be added into Chrome in a very timely manner...

 
Cc: briander...@chromium.org bajones@chromium.org
Blockedon: chromium:401331
Cc: mit...@mithis.com simonh...@chromium.org sunn...@chromium.org
Labels: Hotlist-Scheduling
Status: Available
Thanks Jerry for figuring this all out! It's going to help a lot once our code is ready to hook into this.

To summarize your findings:
* DirectDraw's WaitForVerticalBlank spin waits (you verified this on your machine.)
* IDXGIOutput::WaitForVBlank spin waits (you saw comments online indicating it does, but didn't try.)
* DwmFlush() doesn't spin wait, but relies on DWM being enabled (i.e. Aero).
* D3DKMTWaitForVerticalBlankEvent doesn't spin wait, doesn't need DWM enabled, and works on Window's machines sinse Vista.

D3DKMTWaitForVerticalBlankEvent definitely sounds like an optimal approach.
Looks like D3DKMTWaitForVerticalBlankEvent (https://msdn.microsoft.com/en-us/library/windows/hardware/ff547265%28v=vs.85%29.aspx) is a part of GDI. We should verify if it needs Aero/DWM to be disabled to be used.
As far as I can tell this is part of a set of functions intended to allow IHVs to write OpenGL drivers that interact with the DWM. It does mention on https://msdn.microsoft.com/en-us/library/windows/hardware/ff568606(v=vs.85).aspx that the drivers need to load Gdi32.dll, but that doesn't preclude the use of Aero/DWM.

Regardless, we should test it to see what situations it works with.
Thanks a lot for continuing to research this jerry. :) 

IYO, would this solve the multi-window/tooltip desync problem that affects the current approach? That issue is still a problem for games and apps that are mouse-controlled, as a moving cursor tends to 'bleed over' and raise tooltips from tabs, address bar, etc, causing stutters when vysc is handed back and forth from system level to chrome.

I really hope this works out; it sounds like a much better approach.

Comment 6 by ashlaa...@gmail.com, Mar 16 2015

For Windows 8.1+ you may also want to investigate IDXGISwapChain2::GetFrameLatencyWaitableObject. See:
https://msdn.microsoft.com/en-us/library/dn268309.aspx
https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx

Comment 7 by jer...@duckware.com, Mar 16 2015

briander/2: RE findings, yes, correct.  Additionally:
* IDXGIOutput::WaitForVBlank: I also saw comments where people complained that
  using it blocks all other DXGI calls until it returns (all unconfirmed).
* DwmFlush: I just found a situation where it does not return for over
  3ms past vsync, which makes it no longer usable for the purpose for
  finding 'near vsync'.
* D3DKMTWaitForVerticalBlankEvent: I personally tested on Win7 and Win81.  The docs
  say it has been there since Vista, which I have no reason to doubt, since Vista
  is when DWM came to exist (and we know that DWM must vsync align efficiently).

sunn/3: I probably misunderstood your comment (about needing DWM to be disabled), but the beauty of D3DKMTWaitForVerticalBlankEvent is that it works *with* or *without* DWM, as D3DKMT is in the Windows kernel and sits even *below* DWM.

bajones/4: When you (or anyone @chromium.org) has serious cycles to work on this, just email me and I will turn over my prototype to jump start development.  Using D3DKMTWaitForVerticalBlankEvent is crazy easy.  The tough part is going to be how *best* to add this into Chrome.

TiAmIsT/5: I guess I am not 'close' enough to that even to even know.


Comment 8 by jer...@duckware.com, Mar 17 2015

1. I just confirmed it works under Vista.

2. Attached you will find inter-frame times from my prototype running on a Win81 box (specs at http://www.vsynctester.com/manual.html#testsetup)

In the prototype, there is an animation loop that calculates the next vsync wake up time and uses WaitForXXX on an event object and the appropriate timeout.  As soon as it returns, it obtains a time (the frame time).

There is then a VSYNC synchronizer thread that loops calling D3DKMTWaitForVerticalBlankEvent and triggers the event object, which wakes up the animation loop.

win81-no-load is my prototype app running alone.  The Windows default timer accuracy is 15.625 ms.

win81-load is my prototype app running, but with Canary also up and running the www.vsynctester.com animation (where Canary is using timeBeginPeriod(1)

The known precise inter-frame time for the Win81 box is 16.637ms.

not too bad?
win81-no-load.jpg
50.4 KB View Download
win81-load.jpg
55.9 KB View Download

Comment 9 by mit...@mithis.com, Mar 19 2015

Hi Jerry,

Just wanted to mention that you should be able to upload to Rietveld (our
code review tool) any patches you have for Chrome code. It would be great
if you could do this, even if these patches are 100% experimental and full
of hacks because it allows us to better see your ideas in code, replicate
your results and save round trips when we have questions. We can then work
on a real implementation if they prove to be useful.

If you have stand alone code, it would also be awesome if you could upload
the source to GitHub or similar code repository. This would importantly
allow us to understand if we can map your solution into the Chromium
codebase or if we need a very different approach.

Thank you very much for your work here, it is very much appreciated. Our
aim is to make Chrome the *best* browser out there but we have a long way
to go before this is the case here.
https://gist.github.com/anonymous/4397e4909c524c939bee shows example code using D3DKMTWaitForVerticalBlankEvent.

Chromium needs to figure out how to 'best' unblock the task (cause it to run immediately) that ultimately runs requestAnimationFrame(), based upon an (additional/optional) 'external' signal, like from 'vsyncSignalAllWaiters()' in the example code above, instead of solely based upon a time estimate (which has micro-jitter) of the next vsync time.
Thanks to TiAmIsTiAm for testing the prototype (and forcing me to investigate, and I think solve, multi-monitor issues).

What I found is that DWM is crazy buggy (on Win7) -- that DwmGetCompositionTimingInfo() can return a qpcRefreshPeriod that looks like "16.xxx, 16.xxx, 26.xxx" in a repeating pattern, or that DWM can sometimes return 16.667 (the 1/60 constant, which seems more like a fallback value when there is some internal DWM error), when that is not the actual period.

The great news is that using D3DKMTWaitForVerticalBlankEvent works around all of the DWM bugs, because DWM is thrown out as a source for vsync information.

What does DWM do when there are multiple monitors running at different vsync frequencies?  The Chromium source code suggests that DWM uses the frequency of the primary monitor.  Does anyone know why DWM would not just drive each adapter at its native frequency?  The prototype currently mimics the presumed 'sync to primary' behavior.

But with DWM off, the prototype syncs to adapter that the prototype window is (mostly) running on.

When there are no tests running, the prototype will display in its main window the monitor and ms/Hz that it thinks it should be (and is) vsync'ing to.  So as DWM goes up/down, as the primary adapter changes, or as the window is dragged around (with dwm off), you can actively see it change.

Anyone willing to play around with the prototype can:

  http://www.duckware.com/test/chrome/dwm-vsync-tests.exe

(source code has been sent to briander..., bajones, sunn... and mit...)


I'm having a pretty terrible time finding any actual Microsoft-written documentation of DWM behavior in multi-monitor scenarios. My documentation in the code was based on personal testing. I've got a 60hz monitor and a 120hz monitor connected to my Windows machine at work, and a 60hz monitor and 75hz monitor (an Oculus Rift) connected to my machine at home. They both exhibit behavior of limiting refresh rates on the higher refresh monitor unless that monitor is set as the primary device.

It's encouraging to hear that D3DKMTWaitForVerticalBlankEvent seems more reliable than the DWM provided values.

Comment 13 by mit...@mithis.com, Mar 23 2015

UnifiedBeginFrame should allow us to send different VSync signals to
different Chrome windows on different monitors. Fixing a lot of this
properly depends on us getting that project finished.
Under Windows, Canary 43.0.2353.0 (r323184) (April 1) dramatically improved VSYNC accuracy from 1ms to nearly 100% spot on (only on AC; not on battery).  When I asked around, nobody responded, so today I took the time to track when the change took place -- it was introduced between r323177 and r323182:

  https://chromium.googlesource.com/chromium/src/+log/0bd2a738f107ad7021c70bfd36ae41e4565fe946..cdb7395d70f3f04fe91c75b39e67cca7abc8251f

What stands out is "Truncate the timeout of WaitableEvent::TimedWait on Windows" (waitable_event_win.cc):

  https://codereview.chromium.org/1040833002 

The side effect on VSYNC accurary (when the PC is on AC power) is very positive -- but at the expense that there must now be "spin waiting" going on somewhere.  On average, there will now be a spin wait of 0.5 ms sixty times/sec (during an animation) -- meaning 30ms of spin wait every 1 second (3% increased overhead for a single core).

Was this change 'known' -- an intentional change to improve VSYNC accuracy -- or a side effect?

Comment 15 by mit...@mithis.com, Apr 15 2015

The patch doesn't have a bug associated with the it, so it is unclear why it was done. I believe it is unlikely to be an intentional change related to vsync. It is more likely related to other latency, power saving or general clean up changes.

I don't see any extra spin wait occurring here, the patch just changes the code so we wake up early rather than late. The Chrome rendering system shouldn't be dependent on *running* code at exactly a given time, just getting data to the video card *by* a given time. There is no reason to delay/wait that extra 0.5 ms when the wakeup happens early.
mit/15: This is Windows only.  See the attached performance charts for vsynctester.com running against r323177 and r323182.  The '1ms' precision (inaccuracy) of the OS has been eliminated.

The jitter seen in r323182.jpg is mostly due to the jitter in the metrics coming out of DwmGetCompositionTimingInfo  -- because sometimes Canary starts in perfect 16.666 mode, ignoring DWM timing info, the line virtually perfectly flat; as it is in attached 60Hz.

I say 'spin waiting' because that logic is ALL over the place in Chrome in the form of 'if a deadline has not passed, continue to issue the Sleep/WaitFor/etc, until the deadline has passed'. (on return from waits, the deadline is checked *again* against current time).  On an early return from these wait functions, that only results in a slew of 0.xxx ms being passed into WaitForSingleObject(), which is (now) truncated to 0ms which "If dwMilliseconds is zero, the function does not enter a wait state if the object is not signaled; it always returns immediately."

There is nothing wrong per se this new spin wait behavior.  It is actually a very effective and cheap (logic wise) way to achieve nearly perfect VSYNC.  The only issues are (1) was it intended, and (2) it does not work well 'on battery' where timer precision is often greater than 1ms (so the line is no longer flat even with spin waiting), and it reduces battery life, which maybe is not an issue given the low priority given to battery issues (no progress on issue 439751, reporting that  issue 153139  is broken).

r323182.jpg
29.1 KB View Download
r323177.jpg
37.5 KB View Download
60hz.jpg
26.4 KB View Download
The latest Canary has reverted back to the 1ms OS accuracy for VSYNC (see comment #14 above).  Range: r328168 to r328177: 

https://chromium.googlesource.com/chromium/src/+log/e3e4605ee5132f71e1a593b99626ce4d1460991a..ca9d6916c5fedf6f0ee73dfc397cb0f65ad326b9

with "Enable BeginFrame scheduling on aura" being the likely reason why.

I now regularly see VSYNC all over the place (see attached notgood.jpg)
r318177.jpg
42.3 KB View Download
notgood.jpg
80.3 KB View Download
r328168.jpg
28.9 KB View Download
Labels: -Pri-2 Pri-3
Lowering priority of this since there's a lot of other issues we are focusing on to improve performance of all platforms.  We also need to address  issue 401331  first.
Labels: -Hotlist-Scheduling Cr-Blink-Scheduler
Labels: -Cr-Blink-Scheduler Cr-Blink-Scheduling

Comment 21 by mit...@mithis.com, Feb 24 2016

Cc: -mit...@mithis.com

Comment 22 by mit...@mithis.com, Feb 24 2016

Cc: tansell@chromium.org
Cc: stanisc@chromium.org brucedaw...@chromium.org
Regarding the spinning mentioned in comment #16, that should now be avoided starting with crrev.com/2086123002.

I would love to see us syncing to the actual vblank. I believe that IE/Edge treat setInterval/setTimeout values of near 16-17 ms as if they are requesting vblank synchronization, and I think it would be appropriate if we did the same.

brucedaw/24: did that change land in 402027?

Because I noticed that the spinning behavior #16 went away in 401797 with https://chromium.googlesource.com/chromium/src/+/f2d7f5e1891703ec4384ededd80f896816921204 (and that in earlier Canary, --enable-begin-frame-scheduling would also do that).  See issue 422000#212 (comment #212)
Yes, crrev.com/2086123002 is #402027. I was not aware of --enable-begin-frame-scheduling, however I think the #402027 fix is more generic - it applies outside of frame scheduling.
Here is my proof of concept code (and lots of other debug timing tests) showing how to use D3DKMTWaitForVerticalBlankEvent():

    http://www.duckware.com/test/chrome/467617-source-code.zip

Hopefully someone there can use this for ideas and run with this...
Blocking: 422000
I am going to experiment with D3DKMTWaitForVerticalBlankEvent.
The idea for now is to invoke this on GPU side, on a dedicated thread, on demand from PassThroughImageTransportSurface::StartSwapBuffers and see what kind of timing I get from this.
stanisc/29: Thanks!

And since D3DKMTWaitForVerticalBlankEvent is Windows only, I have a question I hope someone else out in the community can answer.  For a long time, Chrome (Windows only) would not vsync properly until the Chrome app was resized (recently fixed; see  issue 465356  and  issue 632785 ).  But the curious thing is that Chrome (before the app resize) had a one frame of input lag (but no vsync).  After resizing the Chrome Window, Chrome had a two frame input lag (but vsync worked).

--> What changed in Chrome that caused the extra frame of input lag?  And more importantly, is it possible to have both: (1) keep vsync, and (2) revert back to one frame of input lag?

I bring this up now, only because I wonder if this is caused by vsync timing and when Chrome sends frames to the OS, and how the OS (Windows) then composites and sends those frames to the screen.  If Chrome swaps buffers based upon vsync, is that not 'too late' (under Windows only) since the Windows OS is compositing frames?  If not, then great.  But if it is too late, should Chrome 'swap buffers' be based upon a deadline that is maybe some split millisecond (0.5ms or something similar) *before* the next anticipated vsync event -- in hopes of getting the current Chrome buffer in the next Windows OS composited frame?

stanisc/29, please mark this issue as blocked on new  issue 658601 .  Using D3DKMTWaitForVerticalBlankEvent (or not) and resolving new  issue 658601  should go hand in hand.
 Issue 658601  has been merged into this issue.
Chrome tries to use vsync to trigger when we start to generate a frame. If the entire process completes in less than 16 ms then the frame will be ready and should be presented on the *next* vsync. There is a lot of pipelining in the process (GPUs in particular are highly pipelined and get additional throughput when they buffer one or more frames), and DWM (Desktop Window Manager) can also add some latency.

Even video games (my previous career) usually have a few frames of latency from input to photons. VR apps work particularly hard to reduce latency because the problem of latency is much more severe in that context. See this article for thoughts on that:

http://oculusrift-blog.com/john-carmacks-message-of-latency/682/

I'm not sure what Chrome's input-to-photo latency is. I would like to measure it. There are tradeoffs (increased power, reduced scene complexity, increased code complexity) for pushing latency to extremely low levels so I don't think Chrome will try emphasize input lag as aggressively as VR apps do, but keeping it "as low as reasonable" is a worthy goal.
brucedaw, the surprising find is that Chrome presenting a rendered frame "on the *next* vsync", with Aero ON under Windows -- actually itself adds one frame of input lag.

When vsync happens, DWM has already swapped buffers at an OS level (starts the NEXT frame).  Then Chrome acts on the vsync signal, and it is too late for Chrome's 'present' to make it to the CURRENT frame (it is now the NEXT frame, which does not make it to the screen until one frame later).

With Aero ON, presenting frames on vsync is the wrong present location.
stanisc, after having several offline discussions, I am now convinced that Chrome can both (1) use D3DKMTWaitForVerticalBlankEvent now and (2) later solve  issue 658601  (there are several possible strategies).

If you have something you want tested regarding D3DKMTWaitForVerticalBlankEvent, let me know...

As a FYI, the attached graphs show why DwmFlush() is not suitable as a method to synchronize to VSYNC.  

DwmFlush() 'wakes up' late even on a system with no load, and when the system is under load (running FishIE Tank in IE) , DwmFlush() wakes up really late.

[It is interesting to note that under the same load, Chrome today -- using timers -- performs *better* than DwmFlush]

Tested on a Dell Inspiron 15R notebook (Intel i7-4500U 1.80Ghz with 2 cores (4 threads), Intel HD Graphics 4400, 12GB, Windows 8.1) running the "High Performance" power plan.
win81-noload.jpg
68.2 KB View Download
win81-underload.jpg
109 KB View Download
So far I've been able to confirm a few things with my prototype:
1) No problem using D3DKMTWaitForVerticalBlankEvent from a dedicated background thread. 

2) The latency of IPC calls from GPU process to Browser process seems reasonable on my dev workstation - 98.3% of calls are delivered within 1 ms so this has a good chance of being better than the timer based solution latency.

3) I don't want background thread to just generate VSync signals continuously and send them over IPC to the browser process - that would result in at least 60 idle wakeups on GPU process side and the same amount on Browser process side. So there should be a scheduling mechanism for waiting for VSync on GPU process side.

4) I tried a naive implementation where VSync waiting is triggered by each frame swap very similarly to how VSyncProvider is pinged to refresh VSync parameters in the current implementation. However that didn't work well and I realized I need a new IPC call from Browser to GPU to enable/disable VSync production.

5) I think we could and should use the existing GpuCommandBufferMsg_UpdateVSyncParameters IPC to deliver each VSync signal to the compositor code.

6) On the browser side this could be implemented as a new type of BeginFrameSource which generates a new BeginFrame signal every time it gets delivered a VSync IPC from GPU side.
The same BeginFrameSource class would be responsible for making an IPC back to GPU side to enable/disable VSync production there.

I am figuring out how to integrate this with the existing Compositor architecture and whether some refactoring would be needed to accommodate this.

7) There are some parts of code that still have to know VSync parameters (recent VSync basetime and interval) that are regularly updated by the current implementation (see UpdateVSyncParameters). The basetime for each VSync signal can be generated on GPU side right after D3DKMTWaitForVerticalBlankEvent returns from wait. The interval will probably have to be calculated on GPU side from the last few timestamps.
stanisc/37: Great news.  Thanks for your time for working on this.

Some tips on using vsynctester.com to help in comparing before/after implementations...

(1) There is a new "Use rAF time arg as frame time".  Under Chrome, the time argument to the rAF callback is the vsync time from Windows/DWM.  And because this is so tightly grouped (microseconds), it is hard to see variation, so there is now the ability to set the graph scale.  See attached for an example of 50 microsecond inter-frame jitter in Chrome today on a notebook computer.  You should also be able to replicate a similar tight grouping when using D3DKMTWaitForVerticalBlankEvent(), even under high system load (all cores/threads maxed out).  [tip: if not, review priority of background thread].  In my testing in a native Win32 app, timings from D3DKMTWaitForVerticalBlankEvent() wake up times mimic the tight grouping of times coming from DWM even under load (but the grouping range is computer specific).

(2) In Chrome, the 'late' line at vsynctester.com effectively graphs the time from vsync wakeup until rAF callback.  Chrome a while back (due to a bug) spin-waited for the vsync time and was spot on.  When I went back and tested against that version (r389148), I see a 200 microsecond delay for 'late' (attached) on one system, and a 100 microsecond delay on a second system.

(3) any 'power plan' in effect can greatly affect timings.  I tested under 'High Performance' for best results.
chromemicrojitter.jpg
38.0 KB View Download
chrome-late.jpg
29.5 KB View Download
I've got the prototype working end-to-end on my Windows 10 workstation and on a test Windows 10 laptop. The results look promising so far - see snapshots from vsynctester.com made with the prototype vs. Stable build of Chrome. These screenshots were captured on a workstation with a large number of cores. 

I'll share more details in the next couple of days.
prototype.PNG
39.7 KB View Download
stable.PNG
109 KB View Download
Owner: stanisc@chromium.org
Status: Started (was: Available)
Project Member

Comment 41 by bugdroid1@chromium.org, Dec 3 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3566f880e80d65a57b7e084ccd55030972585c98

commit 3566f880e80d65a57b7e084ccd55030972585c98
Author: stanisc <stanisc@chromium.org>
Date: Sat Dec 03 01:23:19 2016

Decouple BrowserCompositorOutputSurface from BeginFrameSource.

This change is a part of larger effort of propagating D3D VSync
signal to the compositor. Since the current implementation in
BrowserCompositorOutputSurface explicitly depends on a time based
SyntheticBeginFrameSource, enne@ suggested that a good first step
would be to try to decouple it from a specific BeginFrameSource
type.

Instead of passing SyntheticBeginFrameSource and CompositorVSyncManager
in every constructor of BrowserCompositorOutputSurface and classes
derived from it, this change replaces the SyntheticBFS / VSyncManager
pair with a callback to update VSync parameters. The callback is
handled at GpuProcessTransportFactory.

BUG=467617

Review-Url: https://codereview.chromium.org/2511273002
Cr-Commit-Position: refs/heads/master@{#436122}

[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_output_surface_mac.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_output_surface_mac.mm
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_process_transport_factory.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_surfaceless_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/gpu_surfaceless_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/mus_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/mus_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/offscreen_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/offscreen_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/reflector_impl_unittest.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/software_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/software_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/software_browser_compositor_output_surface_unittest.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/vulkan_browser_compositor_output_surface.cc
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/content/browser/compositor/vulkan_browser_compositor_output_surface.h
[modify] https://crrev.com/3566f880e80d65a57b7e084ccd55030972585c98/ui/compositor/compositor.cc

Project Member

Comment 42 by bugdroid1@chromium.org, Dec 7 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/90f047410034268ef03377f12408395e10fc1140

commit 90f047410034268ef03377f12408395e10fc1140
Author: stanisc <stanisc@chromium.org>
Date: Wed Dec 07 02:44:43 2016

Remove begin_frame_source argument from VulkanBrowserCompositorOutputSurface ctor

This removes an unneeded begin_frame_source from VulkanBrowserCompositorOutputSurface ctor.
I've overlooked this in my previous change (https://codereview.chromium.org/2511273002/).
The code wouldn't actually compile with the argument but I guess this
class isn't built by either local build or trybots - that's why this
has been overlooked.

BUG=467617

Review-Url: https://codereview.chromium.org/2557913002
Cr-Commit-Position: refs/heads/master@{#436847}

[modify] https://crrev.com/90f047410034268ef03377f12408395e10fc1140/content/browser/compositor/vulkan_browser_compositor_output_surface.cc
[modify] https://crrev.com/90f047410034268ef03377f12408395e10fc1140/content/browser/compositor/vulkan_browser_compositor_output_surface.h

Here is the end-to-end prototype which gets VSync timing from D3DKMTWaitForVerticalBlankEvent running on GPU process on a separate thread.

This is still work in progress but should give an idea of what I am trying to achieve. For now this builds on Windows only.

https://codereview.chromium.org/2555173003/
stanis/43: do you have a snapshot/zip of the prototype that I can play around with?
Project Member

Comment 45 by bugdroid1@chromium.org, Jan 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a4002d66074c748ac4630dd9dd5414aa4ce06ec4

commit a4002d66074c748ac4630dd9dd5414aa4ce06ec4
Author: stanisc <stanisc@chromium.org>
Date: Mon Jan 09 22:49:10 2017

GpuVSyncProvider with unit test

This introduces a class for waiting for GPU VSync signals on background thread.

For now this class isn't hooked anywhere but eventually it is going to be hooked
to either GpuCommandBufferStub or PassThroughImageTransportSurface and
replace the current VSyncProvider based mechanism on Windows.

I verified functionality of this code as a part of the prototype
(where this class is called VSyncThread):
https://codereview.chromium.org/2555173003/

BUG=467617

CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2596123002
Cr-Commit-Position: refs/heads/master@{#442386}

[modify] https://crrev.com/a4002d66074c748ac4630dd9dd5414aa4ce06ec4/gpu/ipc/service/BUILD.gn
[add] https://crrev.com/a4002d66074c748ac4630dd9dd5414aa4ce06ec4/gpu/ipc/service/gpu_vsync_provider.h
[add] https://crrev.com/a4002d66074c748ac4630dd9dd5414aa4ce06ec4/gpu/ipc/service/gpu_vsync_provider_posix.cc
[add] https://crrev.com/a4002d66074c748ac4630dd9dd5414aa4ce06ec4/gpu/ipc/service/gpu_vsync_provider_unittest_win.cc
[add] https://crrev.com/a4002d66074c748ac4630dd9dd5414aa4ce06ec4/gpu/ipc/service/gpu_vsync_provider_win.cc

Sadly the GPU VSync solution with D3DKMTWaitForVerticalBlankEvent waiting for VBlank doesn't seem to work well on my Windows 7 workstation with NVidia GPU. 

D3DKMTWaitForVerticalBlankEvent itself works fine and returns from wait every 16.6 ms as expected. But all other graphics calls seem to freeze / run very slowly resulting in a super slow refresh rate - about 2-3 frames per second. 

Looking in a chrome trace profile it seems other GPU related tasks align with D3DKMTWaitForVerticalBlankEvent finishing the wait. Also I've tried replacing D3DKMTWaitForVerticalBlankEvent call with a simple Sleep(16) and that resolves the freezing.

I don't know if this is specific to all Windows 7 clients or just the ones running NVidia driver. I might try to update my graphics driver to see if that resolves the issue. But this is very concerning. Too bad I didn't test this on a Windows 7 machine earlier.

The same code works very nicely on my another Windows 10 workstation with NVidia GPU and on a Windows 10 laptop with Intel GPU.

Cc: jbau...@chromium.org
cc jbauman@ in case he has any comments about c#46.
The reason for locking described in c#46 is dxgkrnl.sys!DXGFASTMUTEX::Acquire call. Both GPU main thread and VSync thread run into this but the main thread is blocked way more.

Possible solutions for Windows 7 that come to my mind are:
1) Use DwmFlush which is reported in c#36 to have a higher latency
2) Use some sort of combined timed wait and D3DKMTWaitForVerticalBlankEvent so that D3DKMTWaitForVerticalBlankEvent waits only for the final 0-1 ms. This might work but would be less reliable. And this solution won't allow us to stop increasing the system timer frequency which is one of the main goals of using GPU VSync signal instead of the timer based one.
3) Don't use GPU VSync on Windows7 and just stick to the current timer based implementation.

stanisc, what is your best guess as to the ultimate cause -- is this a Windows issue or a driver issue?
I think this is Windows / DirectX issue. As I mentioned above the contention is on dxgkrnl.sys!DXGFASTMUTEX.
does it also happen on Intel integrated graphics?
Cc: -tansell@chromium.org
If you have a snapshot/zip, I can test on several Win 7 Intel laptops and Win 8 Intel laptops...
stanisc, On the Win 7 laptop with integrated Intel graphics, failure.  On the Win 8.1 laptop with integrated Intel graphics, it works.
As a Win 7 experiment I replaced D3DKMTWaitForVerticalBlankEvent with DwmFlush in my code - the rest of the code is pretty much the same. It works fine on Windows 7 and vsynctester.com chart looks fairly the same as with D3DKMTWaitForVerticalBlankEvent on Win 10. But I tested this on a pretty beefy machine so I might not be seeing latency issues mentioned in c#36.

I now consider another small change to get actual frame timing right after finishing the wait using DwmGetCompositionTimingInfo. This would be similar to the approach taken by Mozilla developers (https://bugzilla.mozilla.org/show_bug.cgi?id=1127151). This is already implemented in VSyncProviderWin so basically this code just needs to call VSyncProviderWin after finishing the wait using D3DKMTWaitForVerticalBlankEvent on Win 8+ and DwmFlush on Win 7 and it should get back accurate vsync timestamp and vsync interval.





Another great test for web browsers (Chrome) that pass the actual vsync time as the time argument to the rAF callback....

Visit vsynctester.com, check "Use rAF time arg as frame time", wait 20 seconds, check "locked", then uncheck "Use rAF time arg as frame time" -- and then the blue line effectively shows the delay from *true* vsync until the rAF callback -- so a great way to compare how well/fast timers / DwmFlush() / D3DKMTWaitForVerticalBlankEvent are working (or not).  Especially when you put the system under a load that maxes out all cores.


So this assumes rAF callback is called with the actual vsync timestamp, right?
I don't know if that is the case but I'll look into that.
I've been adding my own tracing events that measure the latency from the vsync time reported by D3D (DwmGetCompositionTimingInfo) to the moment vsync gets handled on GPU and Compositor. This should help me to compare the latency between different implementations.
Yes.  As of https://codereview.chromium.org/787763006 (Chrome 45.0.2415.0 and later), the rAF callback time has been the vsync time.  FF tries, but gets it wrong (they intentionally fake the time argument).  So being able to graph and see that difference is a great tool.

tip: the vsync times you get from DwmGetCompositionTimingInfo can be anywhere from 2 frames behind to two frames ahead of real time.  The easy way to deal with this is to simply conform the current time to the last vsync time (using the Dwm numbers), similar to the formula seen in section 4 of http://www.vsynctester.com/firefoxisbroken.html

Looking at call stacks in ETW profile it appears on Windows 7 D3DKMTWaitForVerticalBlankEvent is essentially the same as IDXGIOutput::WaitForVBlank - both end up calling gdi32.dll!NtGdiDdDDIWaitForVerticalBlankEvent (see callstack here - https://bugzilla.mozilla.org/show_bug.cgi?id=1199468#c8).

As mentioned in comment #7 above:
* IDXGIOutput::WaitForVBlank: I also saw comments where people complained that
  using it blocks all other DXGI calls until it returns (all unconfirmed).

That doesn't seem to be an issue on Windows 8+. I am going to initially limit D3D VSync implementation to Windows 8+ and keep the timer based VSync on Windows 7 until we find a better solution.
Project Member

Comment 60 by bugdroid1@chromium.org, Feb 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eed2187ba7e9acb812754997400e56ff9107d8a0

commit eed2187ba7e9acb812754997400e56ff9107d8a0
Author: stanisc <stanisc@chromium.org>
Date: Wed Feb 15 19:56:05 2017

Changed GpuVSyncProvider to implement gfx::VSyncProvider

GpuVSyncProvider was introduced recently to generate D3D VSync
signals in the GPU process. This change makes GpuVSyncProvider an actual VSync provider that derives from gfx::VSyncProvider.
The class name is also changed to GpuVSyncProviderWin.

This will make it easy to integrate GpuVSyncProviderWin into
existing code by replacing the current version of the provider with
this new one without having to change much of the existing code.

GpuVSyncProviderWin ignores the existing mechanism for requesting
VSync parameter updates and instead uses IPC::MessageFilter to
send and receive VSync related IPC messages directly. That bypasses
the main GPU thread and should allow us to achieve the best possible
latency.

BUG=467617
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2681033011
Cr-Commit-Position: refs/heads/master@{#450784}

[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/common/gpu_messages.h
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/in_process_command_buffer.cc
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/in_process_command_buffer.h
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/BUILD.gn
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/gpu_command_buffer_stub.cc
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/gpu_command_buffer_stub.h
[delete] https://crrev.com/5a91761d5cf889ee6990230b3add84cc48f1022a/gpu/ipc/service/gpu_vsync_provider.h
[delete] https://crrev.com/5a91761d5cf889ee6990230b3add84cc48f1022a/gpu/ipc/service/gpu_vsync_provider_posix.cc
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/gpu_vsync_provider_unittest_win.cc
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/gpu_vsync_provider_win.cc
[add] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/gpu_vsync_provider_win.h
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/gpu/ipc/service/image_transport_surface_delegate.h
[modify] https://crrev.com/eed2187ba7e9acb812754997400e56ff9107d8a0/ui/gl/vsync_provider_win.h

Project Member

Comment 61 by bugdroid1@chromium.org, Feb 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4950173d4782941bd49f90be89c1d77422e8ca73

commit 4950173d4782941bd49f90be89c1d77422e8ca73
Author: stanisc <stanisc@chromium.org>
Date: Wed Feb 22 22:51:56 2017

Route D3D VSync signal to Compositor

This change introduces a new type of BeginFrameSource that listens to
accurate VSync signals generated on GPU side and delivered via
existing GpuCommandBufferMsg_UpdateVSyncParameters IPC.

This provides noticeably less VSync timing variation and ~3x less
latency than the existing delay based mechanism.

List of included changes:
1) GpuVSyncBeginFrameSource - new type of BFS derives from from
   ExternalBeginFrameSource and triggers OnBeginFrame promptly when
   it receives an external signal.
   GpuVSyncBeginFrameSource also includes a VSync control
   mechanism to signal whether VSync production need to be started or
   stopped on GPU side.

2) GpuBrowserCompositorOutputSurface and CommandBufferProxyImpl
   get a new method to start / stop VSync production on GPU side.

3) GpuProcessTransportFactory decides which type of BeginFrameSource
   to create based on a combination of type of output surface, OS,
   OS version and whether or not D3D VSync feature is enabled.

4) Based on previous feedback PerCompositorData now supports either
   Synthetic BFS or GPU VSync which are mutually exclusive.
   VSync parameters used by Synthetic BSF and VSync signal used by
   Gpu VSync BFS are delivered using exactly the same codepath and
   initiated from the same IPC call. The payload is also the same in
   both cases - VSync timestamp and interval.

5) On GPU side this feature is activated by instantiating
   GpuVSyncProviderWin instead of VSyncProviderWin. The new provider
   is hosted exactly the same way as other providers but it uses a
   different mechanism to deliver VSync signal.
   See https://codereview.chromium.org/2681033011 for more details.

This feature is disabled by default and might be turned on by
--enable-features=D3DVsync switch or by configuring an experiment
that enables D3DVsync feature.

Please note that this feature is currently disabled on Win7 where
the implementation of GpuVSyncProviderWin seems to be grabbing an
internal D3D lock while waiting for VSync and that significantly
slows down all other D3D calls. I plan to address this in a feature
patch.

BUG=467617
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel

Review-Url: https://codereview.chromium.org/2626413002
Cr-Commit-Position: refs/heads/master@{#452249}

[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/BUILD.gn
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/compositor/gpu_browser_compositor_output_surface.cc
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/compositor/gpu_browser_compositor_output_surface.h
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/compositor/gpu_process_transport_factory.cc
[add] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/compositor/gpu_vsync_begin_frame_source.cc
[add] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/content/browser/compositor/gpu_vsync_begin_frame_source.h
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/gpu/ipc/client/command_buffer_proxy_impl.cc
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/gpu/ipc/client/command_buffer_proxy_impl.h
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/gpu/ipc/service/image_transport_surface_win.cc
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/ui/gl/gl_switches.cc
[modify] https://crrev.com/4950173d4782941bd49f90be89c1d77422e8ca73/ui/gl/gl_switches.h

stanisc, Thanks!  Tried on Win 8.1 but only got a blank screen with the feature enabled?
The problem mentioned in comment above is due to Direct Composition being disabled as a driver workaround. That results in taking a codepath that doesn't turn GPU VSync feature on the GPU process side (while it gets turned on on the browser process side).

This will be fixed in the next patch (currently in code review).
Project Member

Comment 64 by bugdroid1@chromium.org, Feb 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f188ef999b74cbf90fee37f3dd7a4cc53a701334

commit f188ef999b74cbf90fee37f3dd7a4cc53a701334
Author: stanisc <stanisc@chromium.org>
Date: Sat Feb 25 02:39:11 2017

Support GPU VSync when DirectComposition isn't enabled

This fixes a few additional issues that came up in private testing.

1) Blank screen with DirectComposition isn't enabled. What happens is
   that GPU VSync ends up being activated on Compositor side but not
   on GPU side. I'd prototyped this solution when I tested GPU VSync
   on Win 7 but decided to leave it out because I mistakenly assumed
   that DirectComposition is always available on Win 8+. I verified
   this but running Chrome with --disable-direct-composition flag.

2) In gpu_vsync_provider_win.cc "GpuVSyncWorker::SendVSyncUpdate"
   trace event always logs adjustment = 0 due to incorrect variable
   scope.

3) It was pointed out to me that if D3DKMTWaitForVerticalBlankEvent
   keep returning an error for whatever reason the worker thread
   would just spin. It might be a better idea to just crash in that
   unlikely case so that we can investigate the crash dump.

What would be a condition for GL Implementation being anything other
than kGLImplementationEGLGLES2? Should we care about that case?

BUG=467617

Review-Url: https://codereview.chromium.org/2710183004
Cr-Commit-Position: refs/heads/master@{#453050}

[modify] https://crrev.com/f188ef999b74cbf90fee37f3dd7a4cc53a701334/gpu/ipc/service/gpu_vsync_provider_win.cc
[modify] https://crrev.com/f188ef999b74cbf90fee37f3dd7a4cc53a701334/gpu/ipc/service/image_transport_surface_win.cc
[modify] https://crrev.com/f188ef999b74cbf90fee37f3dd7a4cc53a701334/ui/gl/init/gl_factory.h
[modify] https://crrev.com/f188ef999b74cbf90fee37f3dd7a4cc53a701334/ui/gl/init/gl_factory_win.cc

stanisc, Thanks!  It now runs on Win 8.1.  This just missed today's Canary, so once it hits tomorrows Canary, I will provide some performance comparisons.  Early testing on snapshot builds looks very good.
Using the test procedure in comment 56 above, attached are before and after test results (on 'ac' and on 'battery') for Canary 58.0.3024.0 on 'System Two' at vsynctester.com/manual.html#testsetup.
a3-ac-before.jpg
65.0 KB View Download
a3-ac-after.jpg
23.6 KB View Download
a3-battery-before.jpg
67.3 KB View Download
a3-battery-after.jpg
28.2 KB View Download
Using the test procedure in comment 56 above, attached are before and after test results (on 'ac' and on 'battery') for Canary 58.0.3024.0 on 'System Three' at vsynctester.com/manual.html#testsetup.
k1-ac-before.jpg
36.6 KB View Download
k1-ac-after.jpg
29.2 KB View Download
k1-battery-before.jpg
47.5 KB View Download
k1-battery-after.jpg
21.4 KB View Download
Project Member

Comment 68 by bugdroid1@chromium.org, Mar 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3fb710b34693d9e9917a3c60367bae9798845b1c

commit 3fb710b34693d9e9917a3c60367bae9798845b1c
Author: stanisc <stanisc@chromium.org>
Date: Wed Mar 01 23:41:27 2017

GPU VSync: add timer based v-sync as a backup mechanism for when display goes to sleep.

When display goes to sleep GPU v-sync is no longer available - D3D
calls that waits for v-blank returns with
STATUS_GRAPHICS_PRESENT_OCCLUDED error.
This change adds a backup mechanism which posts a delayed task to
wait for the next expected v-sync.

BUG=467617
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2722073002
Cr-Commit-Position: refs/heads/master@{#454095}

[modify] https://crrev.com/3fb710b34693d9e9917a3c60367bae9798845b1c/gpu/ipc/service/gpu_vsync_provider_win.cc

stanisc, Running Chrome with the new "--enable-features=D3DVsync" feature is looking very good.

One very minor issue I just noticed is with multiple monitors in 'extend the desktop' mode is that the time argument passed to the rAF callback *drifts* (check 'late' box at vsynctester.com to see the drift; best seen when displays are different Hz) -- which I believe is the frame time Chrome uses internally.

Is this issue because the code is still using DWM timing information (which I believe is based upon whatever the 'primary' display is)?  Comments in the code imply this.

Would a solution be to simply use the wake up time from the 'wait for vsync' thread as the frame time?
In the early prototype of this feature the code used the current time at wake-up as v-sync frame time, but later I replaced that with a code that takes it from DwmGetCompositionTimingInfo because that seemed more accurate. The current time was typically about 30 microseconds behind the time reported by DwmGetCompositionTimingInfo. 

I tested the early prototype on multi-monitor setup with one monitor running with custom resolution @ 50 Hz and it worked correctly with vsynctester.com.
But I can see the problem you are describing now. I guess DwmGetCompositionTimingInfo still returns v-blank timestamps for the main monitor.

I could go back to using the current time which would be slightly less accurate in a regular single-monitor case. I'll see what else could be done to address this.

Comment 71 Deleted

Comment 72 Deleted

The rules for multi-monitor with dwm are a bit weird - see https://msdn.microsoft.com/en-us/library/windows/desktop/hh437350(v=vs.85).aspx . DWM apparently times when it tries to draw based on the primary monitor, though when the flip happens is probably based on the actual monitor that's being presented on. We've seen this be a big problem when having a higher framerate (e.g. 120Hz) on the non-primary monitor.

Though it's possible they've fixed that in windows 10.

Comment 74 Deleted

Comment 75 Deleted

[This comment corrects/replaces comments 71/72/74/75 -- to document how Windows DWM works with multiple monitors in 'Extend' mode]

REFERENCES: Thanks to jbauman for pointing to https://msdn.microsoft.com/en-us/library/windows/desktop/hh437350(v=vs.85).aspx – which spells out the steps used by the "DirectComposition composition engine", and talks about dwm.exe and DWM -- so the presumption is the doc is talking about how DWM works.  Also, review https://youtu.be/E3wTajGZOsA, which discusses the presentation modes in Windows, by Jesse Natalie.

DWM RUNS AT HZ OF PRIMARY MONITOR: The Windows DWM compositor operates at the frequency/Hz of the primary monitor -- and that is the frequency that DWM then 'presents' to the secondary monitor (regardless of the Hz of the secondary monitor).

DWM PRESENTS TO MONITORS ON VSYNC: But there is no 'tearing' on the secondary monitor even when the Hz does not match the primary monitor Hz, which implies that DWM updates all monitors on the vsync of the individual monitor (this was confirmed by MS).  This means that on the primary monitor, after DWM composites, that the DWM 'present' has to wait for nearly an entire frame for the next vsync.

DWM IN SUMMARY: DWM wakes up the compositing loop on vsync of the primary monitor, composites all monitors, and then 'presents' to each monitor, which makes it to the monitor display on the NEXT vsync of the monitor (may be nearly an entire frame later).

DWMFLUSH: This DWM behavior now fully explains why DwmFlush() sometimes returns well after vsync – because it returns after DWM 'presents' (after composition).

WHY GAMES ARE NOT AFFECTED: Games take advantage of certain D3D modes (fullscreen) that bypass DWM.

CHROME+PRIMARY: When Chrome is run on the primary monitor (regardless of a dual monitor or not), Chrome can always successfully vsync to the primary monitor.  The corollary to this is that if you are running Chrome on a secondary monitor and vsync is not working, just change the secondary monitor to be the primary monitor to 'fix' vsync problems.  Annoying, but it works.

IE+SECONDARY: IE syncs to the primary monitor, even when run on the secondary monitor.  So when IE runs on a secondary monitor that operates at a different Hz than the primary, there is horrible jank.  I tested at 60Hz / 50Hz.  The reason the for the jank is the interference pattern created by the two Hz – where a display frame is not receiving exactly one rendered frame.

CHROME+SECONDARY: When Chrome release is run on a secondary monitor in a primary=60 secondary=50 situation, vsynctester.com shows a very messy inter-frame graph, but the VSYNC indicator seems to work.  It shows that Chrome is attempting to vsync at 60Hz, but runs at (an average; inter-frame has large spikes) 50fps.  The presumption is that the Present(1) in ANGLE is syncing to 50Hz and not 60Hz?  Is this then back pressure from the 'GPU' syncing to the 50Hz monitor?

D3DVSYNC+SECONDARY: This 'seems' to work with vsynctester.com in a primary=60 secondary=50 situation.  Won’t know for sure until issue discussed in comment #69 is fixed.

ANGLE VSYNC + OTHER ISSUES: Testing the new "D3DVsync" feature is complicated by the fact that Chrome actually has a secondary vsync method which is *always* turned on (ANGLE vsync) -- it can not be turned off.  It would sure help to validate the new "D3DVsync" feature if ANGLE vysnc could be selectively turned off ( issue 693761 ).  And resizing the Chrome app window still changes something regarding vsync in Chrome, so this still plays some role in things (see  issue 480361 )?

HOW I TESTED: Notebook computer with an internal display operating at 60Hz.  Notebook connected via HDMI to a Vizio HDTV running at 50Hz.  To set this up, right click on desktop; Screen Resolution; click on HDTV icon; click advanced settings; monitor tab; select 50Hz.

CAVEATS: When I use chrome://tracing to attempt to look at how ANGLE vsync was affecting things, sometimes it was clearly on and sometimes off.  I think the second tab was interfering with results.

DWM BEHAVIOR MAY BE CHANGING: When I discussed the issue with some MS people, they pointed out the DWM does not work well when primary/secondary Hz are *different* values and that "There is an RS3 feature to better enable DWM to handle this situation".

--> Does anyone have contacts within MS to investigate this and confirm / plan for this upcoming feature?



Guys if you could also test what happens after the laptop or tablet is
undocked and primary switches from plugged in monitor to built in monitor.
I'm getting vsync and touch issues every time with chrome in this
scenario.  and  am forced to reset to resolve them.  win 10, surface pro 4,
chrome 56.0.2924.87
Cheers.
Update: D3DVsync can now be activated via a flag on chrome://flags.

One outstanding issue that still needs to be addressed is that the v-sync timestamp is kind of broken when running on a secondary monitor. The implementation waits for v-blank event for the monitor the chrome window is on, but then it gets the timestamp and interval from DWM which returns the timestamp synchronized with the primary monitor. So if the secondary monitor isn't exactly in sync with the primary one this causes the timestamp drift relative to the timing of v-sync events. Not good!

The possible alternatives are:
1) Always wait for v-blank corresponding to the primary monitor. This would be more or less in sync with the currently existing timer based v-sync.

2) Ignore DWM and get the now() timestamp from the v-sync thread when it is awaken by v-blank event. This will require this solution to also calculate v-sync interval based on a recent history of v-sync timestamps. Based on my experiments this will make the accuracy of RAF timestamp a bit worse compared to DWM.

3) A hybrid approach - use DWM timestamps when running on a primary monitor; otherwise use now() timestamp from the thread.

I am leaning towards #3.


Project Member

Comment 80 by bugdroid1@chromium.org, May 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/873b91f1c41220589c05e39a9424ffc99eab785a

commit 873b91f1c41220589c05e39a9424ffc99eab785a
Author: stanisc <stanisc@chromium.org>
Date: Wed May 17 01:08:31 2017

D3D V-sync: prevent timestamp drift on a secondary monitor

I got back some preliminary UMA data from Canary experiment that
confirm the timestamp drift relative to the timing of v-sync signal
which makes BeginImplFrameLatency2 UMA to be all over the place with
a distribution that is spread evenly in the entire 0 - 16667 range.

This happens because D3D V-sync signal is generated based on v-blank
event for a display that contains contains the window (the current
display), but the timestamp is obtained from DWM which is based on
the most recent v-blank timing for the primary monitor. So if a
secondary monitor frequency is even slightly different that causes
v-sync / RAF timestamp drift that is clearly visible on some websites
like vsynctester.com.

One possible solution is to capture the timestamp when v-blank event
is received, but that seems to be a bit less smooth than the DWM
timestamp. So the compromise is to use DWM timing only when running on
a primary monitor; otherwise use the v-blank wake-up timestamp.
I've verified that this fixes BeginImplFrameLatency2 UMA distribution on
my setup where the secondary monitor refresh rate seems to differ from
the primary monitor by about 0.15 Hz.

BUG=467617, 680639 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2874833003
Cr-Commit-Position: refs/heads/master@{#472279}

[modify] https://crrev.com/873b91f1c41220589c05e39a9424ffc99eab785a/gpu/ipc/service/BUILD.gn
[modify] https://crrev.com/873b91f1c41220589c05e39a9424ffc99eab785a/gpu/ipc/service/gpu_vsync_provider_win.cc

Another consideration is variable refresh rate.   

The brand new Apple iPad now supports custom refresh rates (24Hz, 48Hz) all the way up to 120Hz.  Also, on the PC, it's possible for refresh rate to vary while Chrome is running.  It is expected that (eventually) Apple will probably end up supporting variable refresh rate within the Safari web browser, at least to a limited extent (e.g. full-screen HTML5 video playback, playing back 24fps videos at 24Hz) and potentially full screen WebGL.

For example, a windowed videogame running in GSYNC/FreeSync (or HDMI 2.1 VRR / VESA Adaptive-Sync) next to a Chrome window.  requestAnimationFrame() rate varies with the framerate, but the framebuffer flipping is erratic.  One can obtain a GSYNC monitor (e.g. www.blurbusters.com/gsync/list), enabled windowed GSYNC, run a game window alongside a Chrome window, and reproduce variable refresh rate stutter problems with Chrome.  However, variable refresh rate support should be natively baked-in.

As a W3C Web Platform Working Group, Invited Expert, I'm collaborating on a standardization of support for variable refresh rates, see current ongoing work at https://github.com/w3c/html/issues/375 -- see the proposal at the bottom.
So far this effort has been more about the internal v-sync signal that is used for requestAnimationFrame and for kicking off BeginFrame events.

Frame buffer flipping is a related but a separate issue that we need to address too. Chrome sets swap interval to 1 meaning that frame buffers should be swapped on next v-blank. But the implementation relies on DWM to do that which in the case of multi-monitor setup seem to be tied to the primary monitor. At least that is my understanding.
Best solution regarding multiple monitor vsync would be to petition Microsoft to add multi monitor vsync support the Windows DWM.

I think they said somewhere that they will maybe change the DWM to support it in redstone 3 or later.
A followup to comment 76 above, attached is a graph that shows how/why there is a 3+ frame delay in Chrome under Windows.
chrome-dwm-composition.jpg
52.4 KB View Download
Linking to related issues for reference:

   crbug.com/680639  - D3DVsync: multi-monitor work
  crbug.com/751340 - D3DVsync: improve UMA latency distribution

Comment 86 Deleted

Redstone 3 builds above build 16215 seems to break D3D V-sync when multiple monitors with 144,60hz are used. FPS is halved constantly and can be seen in vsynctester.com 

Curiously running process explorer seems to exacerbate the issue somewhat and it seems the same issue also affects Firefox's vsync method.

Running Chromium without D3D V-sync flag seems okay.
fxyydd, curious what results you get when you run the dwm-vsync-tests.exe from comment #11 above (just look at Hz displayed in the window of that app) and drag the app between primary/secondary displays?
fxyydd, please be specific.  What hz is each display (primary/secondary) running at, or set to (display properties), what what hz do you observe (vsynctester) while running Chromium on each display (with and without D3D V-sync)?
I can't reproduce it with the dwm-vsync-tests.exe, but it's behaving a bit better than on older builds.
My primary display is at 144hz with secondary at 75hz, Running chromium on both displays without D3D V-sync behavior is normal with expected results of 144 & 75 fps/hz.
Running it with D3D V-sync flag the refresh rate seems to display correctly now on latest builds but fps is bouncing between 115-125 on the 144hz display and constantly at 50fps on the 75hz one.

It also seems that fps is affected by other programs running in the foreground, like running a fullscreen exlusive program like MPC-HC on secondary monitor actually fixes the issue and makes it behave correctly with 144/75fps  while in windowed it actually throttles it to the monitor refresh rate. This behavior started appearing with the Redstone 16215 builds and it seems it still is not fixed on the RTM Candidate build 16299.15.
I'm actually unsure if the problem is with the implementation of vsync or if the fault lies with Microsoft.



Doh i forgot to set 144hz monitor as primary with it this happens.

D3D V-sync Enabled
https://www.youtube.com/watch?v=uGOBnGwZXiI&feature=youtu.be





Owner: ----
Status: Untriaged (was: Started)
I won't be able to work on this anymore.

Status: Available (was: Untriaged)
This is still valid.
Components: Internals>GPU>Scheduling
Components: Internals>Compositing

Sign in to add a comment