New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 912295 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug
72

Blocked on:
issue 920510

Blocking:
issue 909926



Sign in to add a comment

Reduce amount of VaVDA requested/allocated PictureBuffers/VASurfaces

Project Member Reported by mcasas@chromium.org, Dec 5

Issue description

VaVDA queries the decoder's needed number of PictureBuffers [1];
the returned number GetRequiredNumOfPictures() [2] collates:
a- the maximum number of reference frames needed/desired by the codec:
 VP8 = 4, VP9 = 8, H264= varies with the DPB somewhere 4-12
b- the maximum number of VideoFrames in the pipeline [3]
 kMaxVideoFrames = 4
c- a bit of grease in the transmission = 2

This is, in general, correct when the hw accelerated decoders 
decode directly onto PictureBuffers, but the VaVDA does not work
like this (unless under a series of circumstances, see 
|decode_using_client_picture_buffers_| [4] (essentially, modern
CPU architectures and VP9, also see crbug.com/822346).

When we _don't_ decode onto client buffers, the value of 
GetRequiredNumOfPictures() is used for both:
- the client allocation [5a] (done by client, |requested_num_pics_|)
- the internal VaAPI VASurface allocation [5b]

These two numbers are too high, which is reflected by the fact
that the associated TRACE_COUNTERs are never reaching high values.

Instead, when not |decode_using_client_picture_buffers_|:
- the client should allocate the aforementioned b- (kMaxVideoFrames)
- VaAPI should only allocate the aforementioned a- (max num ref frames)


[1] https://cs.chromium.org/chromium/src/media/gpu/accelerated_video_decoder.h?type=cs&q=getrequirednumofpictures&sq=package:chromium&g=0&l=73
[2] https://cs.chromium.org/search/?q=getrequirednumofpictures+file:decoder%5C.cc&sq=package:chromium&type=cs
[3] https://cs.chromium.org/chromium/src/media/base/limits.h?type=cs&q=kMaxVideoFrames&sq=package:chromium&g=0&l=24
[4] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.h?g=0&l=242
[5a] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?g=0&l=584
[5b] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?g=0&l=640
 
Cc: marc...@chromium.org hiroh@chromium.org dcasta...@chromium.org sreerenj...@intel.com
I have a playground CL "splitting" the amount of buffers among 
VaAPI internals and media/cc PictureBuffers and seems to work 
fine.

Of course, ideally we would _not need_ this splitting, and should 
instead avoid the Video Processor step altogether, as tracked in
crbug.com/822346, but we see power consumption regressions preventing
us from landing it crbug.com/910986 altogether, so I think this bug
is worth working on.
Blocking: 909926
Project Member

Comment 3 by bugdroid1@chromium.org, Dec 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e58ff43be711ddf8132cdb298942a87ecf1f4e79

commit e58ff43be711ddf8132cdb298942a87ecf1f4e79
Author: Miguel Casas <mcasas@chromium.org>
Date: Wed Dec 05 20:45:23 2018

Vaapi decode: extract ShouldDecodeOnclientPictureBuffers()

Cleanup: extract ShouldDecodeOnclientPictureBuffers(), which
is used in e.g. relanding crrev.com/c/1356019, crrev.com/c/1348869.
No new code intended, just extracting that method, that was
already reviewed and landed.

TBR=hiroh@chromium.org

Bug: 909926, 910986, 912295
Change-Id: I24112e737b882484802e5236235164490360ee6f
Reviewed-on: https://chromium-review.googlesource.com/c/1363796
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#614089}
[modify] https://crrev.com/e58ff43be711ddf8132cdb298942a87ecf1f4e79/media/gpu/vaapi/vaapi_video_decode_accelerator.cc


IIUC, the idea is, when not using the |decode_using_client_picture_buffer| (explicit vpp), internally created pool must allocate the codec spec mandated number of surfaces and the gfx compositor allocate less number of frames(how many?) . right? 
Description: Show this description
#4: exactly. 

The compositor-demanded amount of frames is part of the 
GetRequiredNumOfPictures():
https://cs.chromium.org/search/?q=getrequirednumofpictures+file:decoder%5C.cc&sq=package:chromium&type=cs

concretely in the {h264,vp8,vp9}_decoder.{cc,h} files is usually 
called kPicsInPipeline, and is equal to  kMaxVideoFrames (4) plus one

https://cs.chromium.org/chromium/src/media/base/limits.h?type=cs&q=kMaxVideoFrames&sq=package:chromium&g=0&l=24

Comment 7 Deleted

crrev.com/c/1363807 causes ARC++ video playback breakage.
Let me revert it first.
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d4dd0edbdb408cb45002531d00fc96b318fd34e9

commit d4dd0edbdb408cb45002531d00fc96b318fd34e9
Author: Hirokazu Honda <hiroh@chromium.org>
Date: Fri Dec 14 03:02:01 2018

Revert "Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()"

This reverts commit 30fbb99c379e3aa5cd9db30649fdf370681ae334.

Reason for revert: Breaks ARC++ video playback with YouTube app.

Original change's description:
> Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()
>
> This CL reduces the amount of PictureBuffers requested to be allocated
> by the |client_| when we are not |decode_using_client_picture_buffers_|.
> Instead, it "splits" the requested allocations into
> - the actual needed PictureBuffers (A)
> - the codec's requested reference frames (B) (a new method
>  GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this).
>
> This splitting saves a lot of memory, since we allocate A+B buffers
> instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9)
>
> Test: crosvideo changing resolutions for each codec, v_d_a_unittest
> on nocturne (KBL) and caroline (SKL).
>
> Bug: 909926
> Change-Id: I1fce92863beddce0e8b221861cf156b9c3ad0c0f
> Reviewed-on: https://chromium-review.googlesource.com/c/1363807
> Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
> Commit-Queue: Miguel Casas <mcasas@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#615571}

TBR=mcasas@chromium.org,dcastagna@chromium.org,hiroh@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 909926, 912295
Change-Id: Iac9941bdcaef1a7638518f835329ffb8be82233e
Reviewed-on: https://chromium-review.googlesource.com/c/1377480
Commit-Queue: Hirokazu Honda <hiroh@chromium.org>
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616569}
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/accelerated_video_decoder.h
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/h264_decoder.cc
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/h264_decoder.h
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator.cc
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator.h
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp8_decoder.cc
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp8_decoder.h
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp9_decoder.cc
[modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp9_decoder.h

VDA unittest for vp9 with import mode is also broken due to this.
I ran VDA unittest for vp8 with import mode. The test passed.
That is, the problem is located in VDA side.

YouTube app also uses vp9 now on ARC++ and VDA works with import mode.
Hence, the problem in crrev.com/c/1363807 happens if NOT |decode_using_client_picture_buffers| and the codec is VP9.
#10: I could repro by forcing |decode_using_client_picture_buffers_| [1]
to false and running v_d_a_ut with VP9:
  video_decode_accelerator_unittest --test_video_data=test-25fps.vp9:320:240:250:250:35:150:12

the problem is that the landed (and then reverted) CL called
 ShouldDecodeOnclientPictureBuffers(profile_)
from InitiateSurfaceSetChange() and read back true, but later on, when the 
client called AssignPictureBuffers(), then
 ShouldDecodeOnclientPictureBuffers(profile_, !va_surface_ids.empty())
returned false and we ended up calling CreateContextAndSurfaces
with |requested_num_reference_frames_| == 0;

It's a bit unfortunate that when we call InitiateSurfaceSetChange()
we still haven't executed AssignPictureBuffers() so we cannot know
that the |client_| wants to work in |output_mode_| IMPORT.

Anyway I'm working on a CL that fixes it in crrev.com/c/1379274.


[1] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?sq=package:chromium&dr=C&g=0&l=633
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f3a251c4ae81ebb23a95131e040c217d022b3a93

commit f3a251c4ae81ebb23a95131e040c217d022b3a93
Author: Miguel Casas <mcasas@chromium.org>
Date: Tue Dec 18 03:37:19 2018

RELAND: Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()

The problem is that the landed (and then reverted) CL called
  ShouldDecodeOnclientPictureBuffers(profile_)
from InitiateSurfaceSetChange() and read back true, but later on, when
the client called AssignPictureBuffers(), then
  ShouldDecodeOnclientPictureBuffers(profile_, !va_surface_ids.empty())
returned false and we ended up calling CreateContextAndSurfaces
with |requested_num_reference_frames_| == 0;

This CL fixes that: crrev.com/c/1379274/1..3

Original CL description -----------------------------------------------
Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()

This CL reduces the amount of PictureBuffers requested to be allocated
by the |client_| when we are not |decode_using_client_picture_buffers_|.
Instead, it "splits" the requested allocations into
- the actual needed PictureBuffers (A)
- the codec's requested reference frames (B) (a new method
 GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this).

This splitting saves a lot of memory, since we allocate A+B buffers
instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9)

Test: crosvideo changing resolutions for each codec, v_d_a_unittest
on nocturne (KBL) and caroline (SKL).

Bug: 909926
Change-Id: I5f6bda45b748f3b7a82663ba70f0cd4e0d20e37f
Reviewed-on: https://chromium-review.googlesource.com/c/1363807
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#615571}
Reviewed-on: https://chromium-review.googlesource.com/c/1379274
Cr-Commit-Position: refs/heads/master@{#617366}
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/accelerated_video_decoder.h
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/h264_decoder.cc
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/h264_decoder.h
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator.cc
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator.h
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp8_decoder.cc
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp8_decoder.h
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp9_decoder.cc
[modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp9_decoder.h
Project Member

Comment 13 by bugdroid1@chromium.org, Dec 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e575c98500572ad043834280ba3ef207fbc48d27

commit e575c98500572ad043834280ba3ef207fbc48d27
Author: Miguel Casas <mcasas@chromium.org>
Date: Thu Dec 20 17:41:15 2018

docs/gpu: add entry for VaAPI

This CL adds a vaapi.md entry to docs/gpu, explaining how to trace its
cpu/memory/power consumption, and to verify/debug its installation.

Bug: 912295, 514914
Change-Id: Ieb7ad033e20f1211a9ae170af9378296a58c0689
Reviewed-on: https://chromium-review.googlesource.com/c/1383147
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Andres Calderon Jaramillo <andrescj@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618265}
[add] https://crrev.com/e575c98500572ad043834280ba3ef207fbc48d27/docs/gpu/vaapi.md

Project Member

Comment 16 by bugdroid1@chromium.org, Jan 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/477a706a9026c91d86a056880d29abc43c7eaf77

commit 477a706a9026c91d86a056880d29abc43c7eaf77
Author: Miguel Casas <mcasas@chromium.org>
Date: Fri Jan 04 19:13:45 2019

RELAND2: Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()

This CL is a smart-rebase-and-fix of the CLs below: it introduces
a new parameter |use_reduced_number_of_allocations_| to allow for the
new working mode described below and to temporarily circumvent the
GtsExoPlayerTestCases failures (b/121169667 and b/121003733); this
new flag is false when |output_mode_| is IMPORT, so all ARC++ cases
should work bc left untouched.

OTOH a mem savings overview can be found at https://goo.gl/3PaMiA.

Original CL description -----------------------------------------------
Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()

This CL reduces the amount of PictureBuffers requested to be allocated
by the |client_| when we are not |decode_using_client_picture_buffers_|.
Instead, it "splits" the requested allocations into
- the actual needed PictureBuffers (A)
- the codec's requested reference frames (B) (a new method
 GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this).

This splitting saves a lot of memory, since we allocate A+B buffers
instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9)

Test: crosvideo changing resolutions for each codec, v_d_a_unittest
on nocturne (KBL) and caroline (SKL).

Bug: 912295

Reviewed-on: https://chromium-review.googlesource.com/c/1363807
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Cr-Original-Original-Commit-Position: refs/heads/master@{#615571}
Reviewed-on: https://chromium-review.googlesource.com/c/1379274
Cr-Original-Commit-Position: refs/heads/master@{#617366}
Change-Id: Ibf9a1455f8df4d52b77aee8e01f15c02878947ae
Reviewed-on: https://chromium-review.googlesource.com/c/1387391
Cr-Commit-Position: refs/heads/master@{#620025}
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/accelerated_video_decoder.h
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/h264_decoder.cc
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/h264_decoder.h
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator.cc
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator.h
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp8_decoder.cc
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp8_decoder.h
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp9_decoder.cc
[modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp9_decoder.h

Blockedon: 920510

Sign in to add a comment