New issue
Advanced search Search tips

Issue 848093 link

Starred by 5 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on: View detail
issue 911847
issue 917399
issue 869677



Sign in to add a comment

Enable zero-copy NV12 decoder swap chains on Windows wherever supported

Project Member Reported by zmo@chromium.org, May 30 2018

Issue description

It was tried before on a device with Intel Skylake GPU. Even though documentation says NV12 swapchains are supported, but we failed to create one on that device.

However, clearly it is supported and enabled on ChromeOS with Intel Kabylake GPUs (See https://chromium-review.googlesource.com/c/chromium/src/+/569144). So we should look into that for Windows platform also.

There are two tasks.

1) Use NV12 swapchains, so we don't have to copy from one format to another. This might give us a minor perf improvement.

2) Look into possibilities for video decoder to directly decode into swapchain's buffer, i.e., enabling zero-copy path. Hopefully this gives us some non minor perf improvement.
 
This is actually way simpler than I imagined at least for D3D11VideoDecoder which is the new mojo based VDA.

D3D11VideoDecoder creates the texture array backing for its picture buffers here:
https://cs.chromium.org/chromium/src/media/gpu/windows/d3d11_video_decoder_impl.cc?rcl=ac4647011531e18d597130dd57b313af514961ba&l=325

The GLImageDXGI we get in DirectCompositionSurfaceWin has the array index (called level there) and the texture array so it should be able to use CreateDecodeSwapChainForCompositionSurfaceHandle directly, and use the array index for Present calls. We'll need to recreate the swap chain when the backing texture changes, but that should be easy to track across frames.

In issue 858286, we're discussing resetting the swap chain when the crypto session is torn down. If we recreate the swap chain when the texture changes, we get this for free because the texture has to be recreated after crypto teardown anyway.

Note: this probably won't work with the old, deprecated, but in production :) VDA video decoder because it creates individual textures using command buffer AFAICT.
Just to add some information about the protected video. As I found out the swap chain was not promoted to HW overlay and worked with Intel on this issue. Intel suspected something wrong with the NV12 to YUY2 BLT due to no NV12 swapchian support. I did a quick hack to create NV12 swapchains and remove the BLT (the playback didn't show correct frames since this is just a hack). The NV12 swapchian was still not promoted to HW overlay. So we are sure the HW overlay issue is not blocked by the NV12 swapchain support unless the issue is hidden behind something else.
Project Member

Comment 3 by bugdroid1@chromium.org, Jul 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675

commit fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Tue Jul 10 18:08:29 2018

Collect NV12, YUY2, and BGRA overlay support information in GPUInfo

We intend to use zero-copy for video decoder to NV12 overlays so we want
to show if it's supported in chrome://gpu and record it in UMA.

This replaces GPU.DirectComposition.OverlaySupportFlags UMA histogram
with specific ones for each format.

R=piman
BUG=848093

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I49530d699bcfd91f8f39498e7c6d45145d461f77
Reviewed-on: https://chromium-review.googlesource.com/1121622
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Robert Kaplow <rkaplow@chromium.org>
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#573809}
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/content/browser/devtools/protocol/system_info_handler.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/content/browser/gpu/gpu_internals_ui.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/config/gpu_info.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/config/gpu_info.h
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info.mojom
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info_struct_traits.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info_struct_traits.h
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/struct_traits_unittest.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/direct_composition_surface_win.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/direct_composition_surface_win.h
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/gpu_init.cc
[modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/tools/metrics/histograms/histograms.xml

Blockedon: 869677
Summary: Enable zero-copy NV12 decoder swap chains on Windows wherever supported (was: Enabling NV12 swapchains on Windows wherever supported)
This bug is for tracking zero-copy NV12. Issue 869677 is for enabling NV12 with the VideoProcessorBlt.
Project Member

Comment 5 by bugdroid1@chromium.org, Sep 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a

commit 0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Sat Sep 01 02:55:50 2018

Split out VideoProcessorBlt from PresentToSwapChain

Zero-copy NV12 will have to skip VideoProcessorBlt so split it out, and
cleanup some of the color space calculation logic as well.

Bug: 848093
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I260d2f060a957383054b46bdffdd2105baf2a896
Reviewed-on: https://chromium-review.googlesource.com/1171842
Reviewed-by: ccameron <ccameron@chromium.org>
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#588254}
[modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/gpu/ipc/service/direct_composition_surface_win.cc
[modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/ui/gfx/color_space_win.cc
[modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/ui/gfx/color_space_win.h

https://chromium-review.googlesource.com/c/chromium/src/+/1239729 contains a working-ish prototype (tested on Kaby Lake Lenovo Yoga 710).  Video seems to play ok, but the transform is off by a few pixels, and the colors look wrong (darker compared to GLRenderer).

It doesn't work on my desktop (CreateDecodeSwapChain... fails with E_INVALIDARG).

The initial version wasn't working on either desktop or test device.  That was because we're setting D3D11_RESOURCE_MISC_SHARED flag in D3D11VideoDecoder::CreatePictureBuffers().  That flag is needed for cross-device synchronization, but since we create and use the decoder buffers on the same D3D11 device (from ANGLE) we don't need it AFAICT.  In any case we don't use a keyed mutex in D3D11VideoDecoder.

Finding the above restriction wasn't easy.  There's nothing about this in documentation, and the returned DXGI_ERROR_INVALID_CALL isn't useful.  So I stepped through the assembly in CreateDecodeSwapChainForCompositionSurfaceHandle.  This is the interesting bit:

00007FFB89B23255  mov         rax,qword ptr [rsi]  
00007FFB89B23258  lea         r8,[rsp+40h]  
00007FFB89B2325D  and         qword ptr [rsp+40h],0  
00007FFB89B23263  lea         rdx,[_GUID_6f15aaf2_d208_4e89_9ab4_489535d34f9c (07FFB89B47180h)]  
00007FFB89B2326A  mov         rcx,rsi  
00007FFB89B2326D  mov         rax,qword ptr [rax]  
00007FFB89B23270  call        qword ptr [__guard_dispatch_icall_fptr (07FFB89B3D9C8h)]  
00007FFB89B23276  mov         ebx,eax  
00007FFB89B23278  test        eax,eax  
00007FFB89B2327A  js          CDXGIFactory::CreateDecodeSwapChainForCompositionSurfaceHandle+179h (07FFB89B23349h)  
00007FFB89B23280  mov         rcx,qword ptr [rsp+40h]  
00007FFB89B23285  lea         rdx,[rbp-20h]  
00007FFB89B23289  mov         rax,qword ptr [rcx]  
00007FFB89B2328C  mov         rax,qword ptr [rax+50h]  
00007FFB89B23290  call        qword ptr [__guard_dispatch_icall_fptr (07FFB89B3D9C8h)]  
00007FFB89B23296  test        dword ptr [rbp+8],902h  
00007FFB89B2329D  je          CDXGIFactory::CreateDecodeSwapChainForCompositionSurfaceHandle+0D9h (07FFB89B232A9h)  
00007FFB89B2329F  mov         ebx,887A0001h  

The test and jump at the end is essentially if ([rbp+8] & 0x902) return 0x887A0001 (DXGI_ERROR_INVALID_CALL).

_GUID_6f15aaf2_d208_4e89_9ab4_489535d34f9c corresponds to ID3D11Texture2D so this is checking something on the texture, probably from D3D11_TEXTURE2D_DESC.

0x902 looks like something that could be OR'd from 0x800, 0x100, and 0x2.  Searching for these constants in d3d11.h confirms that these are D3D11_RESOURCE_MISC_SHARED_NTHANDLE, D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX, and D3D11_RESOURCE_MISC_SHARED respectively. DWORD [rbp+8] is 0x2 (D3D11_RESOURCE_MISC_SHARED) which we set in D3D11VideoDecoder::CreatePictureBuffers().  Removing that makes this work on the test device, though not on my workstation (creating the swap chain instance fails probably because the driver doesn't support it).
Great working debugging and finding out the root cause of this failure!
Project Member

Comment 8 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/aa687ab65ba15a898429387a7a6d5ccd2fac9f6d

commit aa687ab65ba15a898429387a7a6d5ccd2fac9f6d
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Wed Oct 10 22:38:10 2018

Power measurement script fixes

Fix import path of ipg_utils in power measurement script since it's not
run via telemetry harness.  Misc other fixes to script's output.

Bug: 848093
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Id5eafc4bc825ccdae123eed09642d029ba8cf0cf
Reviewed-on: https://chromium-review.googlesource.com/c/1271986
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#598548}
[rename] https://crrev.com/aa687ab65ba15a898429387a7a6d5ccd2fac9f6d/content/test/gpu/measure_power_win_intel.py

Project Member

Comment 9 by bugdroid1@chromium.org, Oct 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56

commit 4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Sat Oct 13 01:17:02 2018

Refactor direct composition swap chain and visual logic

The major changes in this CL are:
1) Move logic for updating visuals to SwapChainPresenter, and split out
   logic for updating backbuffer visual.
2) Rebuild the entire visual tree if any visual changes.
3) Reduce coupling between DCLayerTree and DirectCompositionSurfaceWin.
4) Add lots of comments for DCLayerTree and SwapChainPresenter.

This makes the code simpler to understand, and paves the way for
splitting this file, and adding new presentation modes such as zero-copy
decode swap chains.

It also has the side-effect of allowing reuse of swap chains when a
video switches from underlay to overlay in the common case of single
video because the backbuffer visual is managed separately now.

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: Icfe0fd7ff923b26781f3da877d905e20f665c2f9
Bug: 848093
Reviewed-on: https://chromium-review.googlesource.com/c/1256365
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#599459}
[modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win.cc
[modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win.h
[modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win_unittest.cc

Blockedon: 911847
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8918dc67ac4d462d03c24c962038ff638066c46f

commit 8918dc67ac4d462d03c24c962038ff638066c46f
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Wed Dec 05 02:01:51 2018

Zero copy decode swap chain for presenting overlays on Windows

This CL adds support for using decode swap chains for presenting
overlays when the following conditions are met:
1) DirectCompositionUseNV12DecodeSwapChain feature flag is enabled,
2) NV12 hardware overlays are supported,
3) This code path hasn't failed before for a particular
   SwapChainPresenter instance, and
4) The video frame is created from a compatible resource which is only
   supported by D3D11VideoDecoder at the moment.

Histograms are added to track failures in creating and presenting to
decode swap chains.  In case of failure it falls back to blit.

This CL also removes the resource sharing flag from D3D11VideoDecoder's
texture because it's not supported for decode swap chains and not needed
in any case because it uses the same D3D device as ANGLE.  The resource
sharing flag was in the original VDA implementation and it was needed
then because of using a different D3D device (like DXVA VDA):
https://codereview.chromium.org/2534313004

To test run Chrome with:

--enable-direct-composition-layers --enable-features=D3D11VideoDecoder,DirectCompositionUseNV12DecodeSwapChain

Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I6803e5dc17508cf0db1dd99a6d19623580e4005e
Bug: 848093
Reviewed-on: https://chromium-review.googlesource.com/c/1239729
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Reviewed-by: Frank Liberato <liberato@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613812}
[modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/config/gpu_finch_features.cc
[modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/config/gpu_finch_features.h
[modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/ipc/service/direct_composition_surface_win.cc
[modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/media/gpu/windows/d3d11_video_decoder.cc
[modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/tools/metrics/histograms/histograms.xml

Project Member

Comment 12 by bugdroid1@chromium.org, Dec 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/61ee91417d2d26be7de7b1b9667a5f2a183743c4

commit 61ee91417d2d26be7de7b1b9667a5f2a183743c4
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri Dec 21 03:04:31 2018

Early out if decode swap chain presents same image second time

Presenting same image multiple times might be responsible for increased
memory consumption at high percentiles with decode swap chain enabled.
We skip presenting same image for the blitting code path so do the same
here, and hope that fixes the regression.

Bug: 848093
Change-Id: Ic1aa1718062bdc8fcdddea35638364705fe6430f
Reviewed-on: https://chromium-review.googlesource.com/c/1382509
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618442}
[modify] https://crrev.com/61ee91417d2d26be7de7b1b9667a5f2a183743c4/gpu/ipc/service/direct_composition_surface_win.cc

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e1f01a1580387ffd655ad2504e9336d79b4cb5c1

commit e1f01a1580387ffd655ad2504e9336d79b4cb5c1
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri Dec 21 03:10:31 2018

Enable decode swap chain feature on waterfall

Enables DirectCompositionUseNV12DecodeSwapChain and D3D11VideoDecoder
features on waterfall.

Bug: 848093
Change-Id: I561f5910a32cbe2adb1c47bd9c35df355bdce1a4
Reviewed-on: https://chromium-review.googlesource.com/c/1364341
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Reviewed-by: Frank Liberato <liberato@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618447}
[modify] https://crrev.com/e1f01a1580387ffd655ad2504e9336d79b4cb5c1/testing/variations/fieldtrial_testing_config.json

Blockedon: 917399
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/311d14e9566d4f9f082d6b5c2fd309ed4c450e8e

commit 311d14e9566d4f9f082d6b5c2fd309ed4c450e8e
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Fri Dec 21 17:26:26 2018

Revert "Enable decode swap chain feature on waterfall"

This reverts commit e1f01a1580387ffd655ad2504e9336d79b4cb5c1.

Reason for revert: Makes video pixel tests flaky: crbug.com/917399

Original change's description:
> Enable decode swap chain feature on waterfall
> 
> Enables DirectCompositionUseNV12DecodeSwapChain and D3D11VideoDecoder
> features on waterfall.
> 
> Bug: 848093
> Change-Id: I561f5910a32cbe2adb1c47bd9c35df355bdce1a4
> Reviewed-on: https://chromium-review.googlesource.com/c/1364341
> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
> Reviewed-by: Ilya Sherman <isherman@chromium.org>
> Reviewed-by: Frank Liberato <liberato@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#618447}

TBR=isherman@chromium.org,sunnyps@chromium.org,liberato@chromium.org

Change-Id: I13cb41b59140475eb46e789e8c94fc4e610ab614
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: 848093, 917399
Reviewed-on: https://chromium-review.googlesource.com/c/1388645
Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618549}
[modify] https://crrev.com/311d14e9566d4f9f082d6b5c2fd309ed4c450e8e/testing/variations/fieldtrial_testing_config.json

Project Member

Comment 16 by bugdroid1@chromium.org, Dec 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dbf4142cb21869051c3a7c7e3301abf3138485e0

commit dbf4142cb21869051c3a7c7e3301abf3138485e0
Author: Sunny Sachanandani <sunnyps@chromium.org>
Date: Sat Dec 22 03:08:57 2018

Set color space on decode swap chain

Use BT 709 primaries by default, use nominal (limited) range if color
space is Rec 709 or 601, and use xvYCC for full range encoding.

Bug: 848093, 917399
Change-Id: Iab54c9523826853265cf92cdd5b806fd82cd03e1
Reviewed-on: https://chromium-review.googlesource.com/c/1389504
Reviewed-by: Zhenyao Mo <zmo@chromium.org>
Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618719}
[modify] https://crrev.com/dbf4142cb21869051c3a7c7e3301abf3138485e0/gpu/ipc/service/direct_composition_surface_win.cc

Labels: -Pri-2 Pri-1

Sign in to add a comment