Enable zero-copy NV12 decoder swap chains on Windows wherever supported |
||||||||
Issue descriptionIt was tried before on a device with Intel Skylake GPU. Even though documentation says NV12 swapchains are supported, but we failed to create one on that device. However, clearly it is supported and enabled on ChromeOS with Intel Kabylake GPUs (See https://chromium-review.googlesource.com/c/chromium/src/+/569144). So we should look into that for Windows platform also. There are two tasks. 1) Use NV12 swapchains, so we don't have to copy from one format to another. This might give us a minor perf improvement. 2) Look into possibilities for video decoder to directly decode into swapchain's buffer, i.e., enabling zero-copy path. Hopefully this gives us some non minor perf improvement. ⛆ |
|
|
,
Jul 2
Just to add some information about the protected video. As I found out the swap chain was not promoted to HW overlay and worked with Intel on this issue. Intel suspected something wrong with the NV12 to YUY2 BLT due to no NV12 swapchian support. I did a quick hack to create NV12 swapchains and remove the BLT (the playback didn't show correct frames since this is just a hack). The NV12 swapchian was still not promoted to HW overlay. So we are sure the HW overlay issue is not blocked by the NV12 swapchain support unless the issue is hidden behind something else.
,
Jul 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675 commit fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Tue Jul 10 18:08:29 2018 Collect NV12, YUY2, and BGRA overlay support information in GPUInfo We intend to use zero-copy for video decoder to NV12 overlays so we want to show if it's supported in chrome://gpu and record it in UMA. This replaces GPU.DirectComposition.OverlaySupportFlags UMA histogram with specific ones for each format. R=piman BUG=848093 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I49530d699bcfd91f8f39498e7c6d45145d461f77 Reviewed-on: https://chromium-review.googlesource.com/1121622 Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Reviewed-by: Robert Kaplow <rkaplow@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#573809} [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/content/browser/devtools/protocol/system_info_handler.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/content/browser/gpu/gpu_internals_ui.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/config/gpu_info.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/config/gpu_info.h [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info.mojom [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info_struct_traits.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/gpu_info_struct_traits.h [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/common/struct_traits_unittest.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/direct_composition_surface_win.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/direct_composition_surface_win.h [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/gpu/ipc/service/gpu_init.cc [modify] https://crrev.com/fc66e9a78ae1b5a9cc6934a2c01e81de79dd0675/tools/metrics/histograms/histograms.xml
,
Aug 10
This bug is for tracking zero-copy NV12. Issue 869677 is for enabling NV12 with the VideoProcessorBlt.
,
Sep 1
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a commit 0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Sat Sep 01 02:55:50 2018 Split out VideoProcessorBlt from PresentToSwapChain Zero-copy NV12 will have to skip VideoProcessorBlt so split it out, and cleanup some of the color space calculation logic as well. Bug: 848093 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I260d2f060a957383054b46bdffdd2105baf2a896 Reviewed-on: https://chromium-review.googlesource.com/1171842 Reviewed-by: ccameron <ccameron@chromium.org> Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#588254} [modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/gpu/ipc/service/direct_composition_surface_win.cc [modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/ui/gfx/color_space_win.cc [modify] https://crrev.com/0ed13b743e11f5d3a13b04d8e35a5c6c4c61393a/ui/gfx/color_space_win.h
,
Sep 27
https://chromium-review.googlesource.com/c/chromium/src/+/1239729 contains a working-ish prototype (tested on Kaby Lake Lenovo Yoga 710). Video seems to play ok, but the transform is off by a few pixels, and the colors look wrong (darker compared to GLRenderer). It doesn't work on my desktop (CreateDecodeSwapChain... fails with E_INVALIDARG). The initial version wasn't working on either desktop or test device. That was because we're setting D3D11_RESOURCE_MISC_SHARED flag in D3D11VideoDecoder::CreatePictureBuffers(). That flag is needed for cross-device synchronization, but since we create and use the decoder buffers on the same D3D11 device (from ANGLE) we don't need it AFAICT. In any case we don't use a keyed mutex in D3D11VideoDecoder. Finding the above restriction wasn't easy. There's nothing about this in documentation, and the returned DXGI_ERROR_INVALID_CALL isn't useful. So I stepped through the assembly in CreateDecodeSwapChainForCompositionSurfaceHandle. This is the interesting bit: 00007FFB89B23255 mov rax,qword ptr [rsi] 00007FFB89B23258 lea r8,[rsp+40h] 00007FFB89B2325D and qword ptr [rsp+40h],0 00007FFB89B23263 lea rdx,[_GUID_6f15aaf2_d208_4e89_9ab4_489535d34f9c (07FFB89B47180h)] 00007FFB89B2326A mov rcx,rsi 00007FFB89B2326D mov rax,qword ptr [rax] 00007FFB89B23270 call qword ptr [__guard_dispatch_icall_fptr (07FFB89B3D9C8h)] 00007FFB89B23276 mov ebx,eax 00007FFB89B23278 test eax,eax 00007FFB89B2327A js CDXGIFactory::CreateDecodeSwapChainForCompositionSurfaceHandle+179h (07FFB89B23349h) 00007FFB89B23280 mov rcx,qword ptr [rsp+40h] 00007FFB89B23285 lea rdx,[rbp-20h] 00007FFB89B23289 mov rax,qword ptr [rcx] 00007FFB89B2328C mov rax,qword ptr [rax+50h] 00007FFB89B23290 call qword ptr [__guard_dispatch_icall_fptr (07FFB89B3D9C8h)] 00007FFB89B23296 test dword ptr [rbp+8],902h 00007FFB89B2329D je CDXGIFactory::CreateDecodeSwapChainForCompositionSurfaceHandle+0D9h (07FFB89B232A9h) 00007FFB89B2329F mov ebx,887A0001h The test and jump at the end is essentially if ([rbp+8] & 0x902) return 0x887A0001 (DXGI_ERROR_INVALID_CALL). _GUID_6f15aaf2_d208_4e89_9ab4_489535d34f9c corresponds to ID3D11Texture2D so this is checking something on the texture, probably from D3D11_TEXTURE2D_DESC. 0x902 looks like something that could be OR'd from 0x800, 0x100, and 0x2. Searching for these constants in d3d11.h confirms that these are D3D11_RESOURCE_MISC_SHARED_NTHANDLE, D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX, and D3D11_RESOURCE_MISC_SHARED respectively. DWORD [rbp+8] is 0x2 (D3D11_RESOURCE_MISC_SHARED) which we set in D3D11VideoDecoder::CreatePictureBuffers(). Removing that makes this work on the test device, though not on my workstation (creating the swap chain instance fails probably because the driver doesn't support it).
,
Sep 27
Great working debugging and finding out the root cause of this failure!
,
Oct 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/aa687ab65ba15a898429387a7a6d5ccd2fac9f6d commit aa687ab65ba15a898429387a7a6d5ccd2fac9f6d Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Wed Oct 10 22:38:10 2018 Power measurement script fixes Fix import path of ipg_utils in power measurement script since it's not run via telemetry harness. Misc other fixes to script's output. Bug: 848093 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Id5eafc4bc825ccdae123eed09642d029ba8cf0cf Reviewed-on: https://chromium-review.googlesource.com/c/1271986 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#598548} [rename] https://crrev.com/aa687ab65ba15a898429387a7a6d5ccd2fac9f6d/content/test/gpu/measure_power_win_intel.py
,
Oct 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56 commit 4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Sat Oct 13 01:17:02 2018 Refactor direct composition swap chain and visual logic The major changes in this CL are: 1) Move logic for updating visuals to SwapChainPresenter, and split out logic for updating backbuffer visual. 2) Rebuild the entire visual tree if any visual changes. 3) Reduce coupling between DCLayerTree and DirectCompositionSurfaceWin. 4) Add lots of comments for DCLayerTree and SwapChainPresenter. This makes the code simpler to understand, and paves the way for splitting this file, and adding new presentation modes such as zero-copy decode swap chains. It also has the side-effect of allowing reuse of swap chains when a video switches from underlay to overlay in the common case of single video because the backbuffer visual is managed separately now. Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Icfe0fd7ff923b26781f3da877d905e20f665c2f9 Bug: 848093 Reviewed-on: https://chromium-review.googlesource.com/c/1256365 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#599459} [modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win.cc [modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win.h [modify] https://crrev.com/4c462de2b2d5c0ce2ee3f8f0c3c390cf1cfadb56/gpu/ipc/service/direct_composition_surface_win_unittest.cc
,
Dec 4
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8918dc67ac4d462d03c24c962038ff638066c46f commit 8918dc67ac4d462d03c24c962038ff638066c46f Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Wed Dec 05 02:01:51 2018 Zero copy decode swap chain for presenting overlays on Windows This CL adds support for using decode swap chains for presenting overlays when the following conditions are met: 1) DirectCompositionUseNV12DecodeSwapChain feature flag is enabled, 2) NV12 hardware overlays are supported, 3) This code path hasn't failed before for a particular SwapChainPresenter instance, and 4) The video frame is created from a compatible resource which is only supported by D3D11VideoDecoder at the moment. Histograms are added to track failures in creating and presenting to decode swap chains. In case of failure it falls back to blit. This CL also removes the resource sharing flag from D3D11VideoDecoder's texture because it's not supported for decode swap chains and not needed in any case because it uses the same D3D device as ANGLE. The resource sharing flag was in the original VDA implementation and it was needed then because of using a different D3D device (like DXVA VDA): https://codereview.chromium.org/2534313004 To test run Chrome with: --enable-direct-composition-layers --enable-features=D3D11VideoDecoder,DirectCompositionUseNV12DecodeSwapChain Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I6803e5dc17508cf0db1dd99a6d19623580e4005e Bug: 848093 Reviewed-on: https://chromium-review.googlesource.com/c/1239729 Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Zhenyao Mo <zmo@chromium.org> Reviewed-by: Frank Liberato <liberato@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#613812} [modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/config/gpu_finch_features.cc [modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/config/gpu_finch_features.h [modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/gpu/ipc/service/direct_composition_surface_win.cc [modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/media/gpu/windows/d3d11_video_decoder.cc [modify] https://crrev.com/8918dc67ac4d462d03c24c962038ff638066c46f/tools/metrics/histograms/histograms.xml
,
Dec 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/61ee91417d2d26be7de7b1b9667a5f2a183743c4 commit 61ee91417d2d26be7de7b1b9667a5f2a183743c4 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Fri Dec 21 03:04:31 2018 Early out if decode swap chain presents same image second time Presenting same image multiple times might be responsible for increased memory consumption at high percentiles with decode swap chain enabled. We skip presenting same image for the blitting code path so do the same here, and hope that fixes the regression. Bug: 848093 Change-Id: Ic1aa1718062bdc8fcdddea35638364705fe6430f Reviewed-on: https://chromium-review.googlesource.com/c/1382509 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#618442} [modify] https://crrev.com/61ee91417d2d26be7de7b1b9667a5f2a183743c4/gpu/ipc/service/direct_composition_surface_win.cc
,
Dec 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e1f01a1580387ffd655ad2504e9336d79b4cb5c1 commit e1f01a1580387ffd655ad2504e9336d79b4cb5c1 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Fri Dec 21 03:10:31 2018 Enable decode swap chain feature on waterfall Enables DirectCompositionUseNV12DecodeSwapChain and D3D11VideoDecoder features on waterfall. Bug: 848093 Change-Id: I561f5910a32cbe2adb1c47bd9c35df355bdce1a4 Reviewed-on: https://chromium-review.googlesource.com/c/1364341 Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Frank Liberato <liberato@chromium.org> Cr-Commit-Position: refs/heads/master@{#618447} [modify] https://crrev.com/e1f01a1580387ffd655ad2504e9336d79b4cb5c1/testing/variations/fieldtrial_testing_config.json
,
Dec 21
,
Dec 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/311d14e9566d4f9f082d6b5c2fd309ed4c450e8e commit 311d14e9566d4f9f082d6b5c2fd309ed4c450e8e Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Fri Dec 21 17:26:26 2018 Revert "Enable decode swap chain feature on waterfall" This reverts commit e1f01a1580387ffd655ad2504e9336d79b4cb5c1. Reason for revert: Makes video pixel tests flaky: crbug.com/917399 Original change's description: > Enable decode swap chain feature on waterfall > > Enables DirectCompositionUseNV12DecodeSwapChain and D3D11VideoDecoder > features on waterfall. > > Bug: 848093 > Change-Id: I561f5910a32cbe2adb1c47bd9c35df355bdce1a4 > Reviewed-on: https://chromium-review.googlesource.com/c/1364341 > Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> > Reviewed-by: Ilya Sherman <isherman@chromium.org> > Reviewed-by: Frank Liberato <liberato@chromium.org> > Cr-Commit-Position: refs/heads/master@{#618447} TBR=isherman@chromium.org,sunnyps@chromium.org,liberato@chromium.org Change-Id: I13cb41b59140475eb46e789e8c94fc4e610ab614 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 848093, 917399 Reviewed-on: https://chromium-review.googlesource.com/c/1388645 Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#618549} [modify] https://crrev.com/311d14e9566d4f9f082d6b5c2fd309ed4c450e8e/testing/variations/fieldtrial_testing_config.json
,
Dec 22
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dbf4142cb21869051c3a7c7e3301abf3138485e0 commit dbf4142cb21869051c3a7c7e3301abf3138485e0 Author: Sunny Sachanandani <sunnyps@chromium.org> Date: Sat Dec 22 03:08:57 2018 Set color space on decode swap chain Use BT 709 primaries by default, use nominal (limited) range if color space is Rec 709 or 601, and use xvYCC for full range encoding. Bug: 848093, 917399 Change-Id: Iab54c9523826853265cf92cdd5b806fd82cd03e1 Reviewed-on: https://chromium-review.googlesource.com/c/1389504 Reviewed-by: Zhenyao Mo <zmo@chromium.org> Commit-Queue: Sunny Sachanandani <sunnyps@chromium.org> Cr-Commit-Position: refs/heads/master@{#618719} [modify] https://crrev.com/dbf4142cb21869051c3a7c7e3301abf3138485e0/gpu/ipc/service/direct_composition_surface_win.cc
,
Jan 15
|
|||||
►
Sign in to add a comment |
||||||||
Comment 1 by sunn...@chromium.org
, Jun 30 2018