Reduce amount of VaVDA requested/allocated PictureBuffers/VASurfaces |
||||
Issue descriptionVaVDA queries the decoder's needed number of PictureBuffers [1]; the returned number GetRequiredNumOfPictures() [2] collates: a- the maximum number of reference frames needed/desired by the codec: VP8 = 4, VP9 = 8, H264= varies with the DPB somewhere 4-12 b- the maximum number of VideoFrames in the pipeline [3] kMaxVideoFrames = 4 c- a bit of grease in the transmission = 2 This is, in general, correct when the hw accelerated decoders decode directly onto PictureBuffers, but the VaVDA does not work like this (unless under a series of circumstances, see |decode_using_client_picture_buffers_| [4] (essentially, modern CPU architectures and VP9, also see crbug.com/822346). When we _don't_ decode onto client buffers, the value of GetRequiredNumOfPictures() is used for both: - the client allocation [5a] (done by client, |requested_num_pics_|) - the internal VaAPI VASurface allocation [5b] These two numbers are too high, which is reflected by the fact that the associated TRACE_COUNTERs are never reaching high values. Instead, when not |decode_using_client_picture_buffers_|: - the client should allocate the aforementioned b- (kMaxVideoFrames) - VaAPI should only allocate the aforementioned a- (max num ref frames) [1] https://cs.chromium.org/chromium/src/media/gpu/accelerated_video_decoder.h?type=cs&q=getrequirednumofpictures&sq=package:chromium&g=0&l=73 [2] https://cs.chromium.org/search/?q=getrequirednumofpictures+file:decoder%5C.cc&sq=package:chromium&type=cs [3] https://cs.chromium.org/chromium/src/media/base/limits.h?type=cs&q=kMaxVideoFrames&sq=package:chromium&g=0&l=24 [4] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.h?g=0&l=242 [5a] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?g=0&l=584 [5b] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?g=0&l=640
,
Dec 5
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e58ff43be711ddf8132cdb298942a87ecf1f4e79 commit e58ff43be711ddf8132cdb298942a87ecf1f4e79 Author: Miguel Casas <mcasas@chromium.org> Date: Wed Dec 05 20:45:23 2018 Vaapi decode: extract ShouldDecodeOnclientPictureBuffers() Cleanup: extract ShouldDecodeOnclientPictureBuffers(), which is used in e.g. relanding crrev.com/c/1356019, crrev.com/c/1348869. No new code intended, just extracting that method, that was already reviewed and landed. TBR=hiroh@chromium.org Bug: 909926, 910986, 912295 Change-Id: I24112e737b882484802e5236235164490360ee6f Reviewed-on: https://chromium-review.googlesource.com/c/1363796 Commit-Queue: Miguel Casas <mcasas@chromium.org> Reviewed-by: Miguel Casas <mcasas@chromium.org> Cr-Commit-Position: refs/heads/master@{#614089} [modify] https://crrev.com/e58ff43be711ddf8132cdb298942a87ecf1f4e79/media/gpu/vaapi/vaapi_video_decode_accelerator.cc
,
Dec 6
IIUC, the idea is, when not using the |decode_using_client_picture_buffer| (explicit vpp), internally created pool must allocate the codec spec mandated number of surfaces and the gfx compositor allocate less number of frames(how many?) . right?
,
Dec 6
,
Dec 6
#4: exactly. The compositor-demanded amount of frames is part of the GetRequiredNumOfPictures(): https://cs.chromium.org/search/?q=getrequirednumofpictures+file:decoder%5C.cc&sq=package:chromium&type=cs concretely in the {h264,vp8,vp9}_decoder.{cc,h} files is usually called kPicsInPipeline, and is equal to kMaxVideoFrames (4) plus one https://cs.chromium.org/chromium/src/media/base/limits.h?type=cs&q=kMaxVideoFrames&sq=package:chromium&g=0&l=24
,
Dec 14
crrev.com/c/1363807 causes ARC++ video playback breakage. Let me revert it first.
,
Dec 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d4dd0edbdb408cb45002531d00fc96b318fd34e9 commit d4dd0edbdb408cb45002531d00fc96b318fd34e9 Author: Hirokazu Honda <hiroh@chromium.org> Date: Fri Dec 14 03:02:01 2018 Revert "Vaapi decode: split |decoder_|s GetRequiredNumOfPictures()" This reverts commit 30fbb99c379e3aa5cd9db30649fdf370681ae334. Reason for revert: Breaks ARC++ video playback with YouTube app. Original change's description: > Vaapi decode: split |decoder_|s GetRequiredNumOfPictures() > > This CL reduces the amount of PictureBuffers requested to be allocated > by the |client_| when we are not |decode_using_client_picture_buffers_|. > Instead, it "splits" the requested allocations into > - the actual needed PictureBuffers (A) > - the codec's requested reference frames (B) (a new method > GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this). > > This splitting saves a lot of memory, since we allocate A+B buffers > instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9) > > Test: crosvideo changing resolutions for each codec, v_d_a_unittest > on nocturne (KBL) and caroline (SKL). > > Bug: 909926 > Change-Id: I1fce92863beddce0e8b221861cf156b9c3ad0c0f > Reviewed-on: https://chromium-review.googlesource.com/c/1363807 > Reviewed-by: Hirokazu Honda <hiroh@chromium.org> > Commit-Queue: Miguel Casas <mcasas@chromium.org> > Cr-Commit-Position: refs/heads/master@{#615571} TBR=mcasas@chromium.org,dcastagna@chromium.org,hiroh@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 909926, 912295 Change-Id: Iac9941bdcaef1a7638518f835329ffb8be82233e Reviewed-on: https://chromium-review.googlesource.com/c/1377480 Commit-Queue: Hirokazu Honda <hiroh@chromium.org> Reviewed-by: Hirokazu Honda <hiroh@chromium.org> Cr-Commit-Position: refs/heads/master@{#616569} [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/accelerated_video_decoder.h [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/h264_decoder.cc [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/h264_decoder.h [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator.cc [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator.h [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp8_decoder.cc [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp8_decoder.h [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp9_decoder.cc [modify] https://crrev.com/d4dd0edbdb408cb45002531d00fc96b318fd34e9/media/gpu/vp9_decoder.h
,
Dec 14
VDA unittest for vp9 with import mode is also broken due to this. I ran VDA unittest for vp8 with import mode. The test passed. That is, the problem is located in VDA side. YouTube app also uses vp9 now on ARC++ and VDA works with import mode. Hence, the problem in crrev.com/c/1363807 happens if NOT |decode_using_client_picture_buffers| and the codec is VP9.
,
Dec 17
#10: I could repro by forcing |decode_using_client_picture_buffers_| [1] to false and running v_d_a_ut with VP9: video_decode_accelerator_unittest --test_video_data=test-25fps.vp9:320:240:250:250:35:150:12 the problem is that the landed (and then reverted) CL called ShouldDecodeOnclientPictureBuffers(profile_) from InitiateSurfaceSetChange() and read back true, but later on, when the client called AssignPictureBuffers(), then ShouldDecodeOnclientPictureBuffers(profile_, !va_surface_ids.empty()) returned false and we ended up calling CreateContextAndSurfaces with |requested_num_reference_frames_| == 0; It's a bit unfortunate that when we call InitiateSurfaceSetChange() we still haven't executed AssignPictureBuffers() so we cannot know that the |client_| wants to work in |output_mode_| IMPORT. Anyway I'm working on a CL that fixes it in crrev.com/c/1379274. [1] https://cs.chromium.org/chromium/src/media/gpu/vaapi/vaapi_video_decode_accelerator.cc?sq=package:chromium&dr=C&g=0&l=633
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f3a251c4ae81ebb23a95131e040c217d022b3a93 commit f3a251c4ae81ebb23a95131e040c217d022b3a93 Author: Miguel Casas <mcasas@chromium.org> Date: Tue Dec 18 03:37:19 2018 RELAND: Vaapi decode: split |decoder_|s GetRequiredNumOfPictures() The problem is that the landed (and then reverted) CL called ShouldDecodeOnclientPictureBuffers(profile_) from InitiateSurfaceSetChange() and read back true, but later on, when the client called AssignPictureBuffers(), then ShouldDecodeOnclientPictureBuffers(profile_, !va_surface_ids.empty()) returned false and we ended up calling CreateContextAndSurfaces with |requested_num_reference_frames_| == 0; This CL fixes that: crrev.com/c/1379274/1..3 Original CL description ----------------------------------------------- Vaapi decode: split |decoder_|s GetRequiredNumOfPictures() This CL reduces the amount of PictureBuffers requested to be allocated by the |client_| when we are not |decode_using_client_picture_buffers_|. Instead, it "splits" the requested allocations into - the actual needed PictureBuffers (A) - the codec's requested reference frames (B) (a new method GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this). This splitting saves a lot of memory, since we allocate A+B buffers instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9) Test: crosvideo changing resolutions for each codec, v_d_a_unittest on nocturne (KBL) and caroline (SKL). Bug: 909926 Change-Id: I5f6bda45b748f3b7a82663ba70f0cd4e0d20e37f Reviewed-on: https://chromium-review.googlesource.com/c/1363807 Reviewed-by: Hirokazu Honda <hiroh@chromium.org> Commit-Queue: Miguel Casas <mcasas@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#615571} Reviewed-on: https://chromium-review.googlesource.com/c/1379274 Cr-Commit-Position: refs/heads/master@{#617366} [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/accelerated_video_decoder.h [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/h264_decoder.cc [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/h264_decoder.h [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator.cc [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator.h [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp8_decoder.cc [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp8_decoder.h [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp9_decoder.cc [modify] https://crrev.com/f3a251c4ae81ebb23a95131e040c217d022b3a93/media/gpu/vp9_decoder.h
,
Dec 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e575c98500572ad043834280ba3ef207fbc48d27 commit e575c98500572ad043834280ba3ef207fbc48d27 Author: Miguel Casas <mcasas@chromium.org> Date: Thu Dec 20 17:41:15 2018 docs/gpu: add entry for VaAPI This CL adds a vaapi.md entry to docs/gpu, explaining how to trace its cpu/memory/power consumption, and to verify/debug its installation. Bug: 912295, 514914 Change-Id: Ieb7ad033e20f1211a9ae170af9378296a58c0689 Reviewed-on: https://chromium-review.googlesource.com/c/1383147 Commit-Queue: Miguel Casas <mcasas@chromium.org> Reviewed-by: Andres Calderon Jaramillo <andrescj@chromium.org> Cr-Commit-Position: refs/heads/master@{#618265} [add] https://crrev.com/e575c98500572ad043834280ba3ef207fbc48d27/docs/gpu/vaapi.md
,
Dec 26
--info-- #12 was reverted in crrev.com/c/1385692, which landed on 73.0.3647.0; currently all eve builds are still red, which makes sense since Canary is now at 11469.0.0 - 73.0.3644.0, so it hasn't caught up. Some results on Eve: R73-11460.0.0 vp9 failed R73-11460.0.0 vp9 failed R73-11393.0.0 vp9 failed Caroline: vp9 never failed, but other issues keep it red. eve runs: https://stainless.corp.google.com/search?view=matrix&row=build&col=board&first_date=2018-11-29&last_date=2018-12-26&test=GtsExoPlayerTestCases&board=%5Eeve%24&model=%5Eeve%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false caroline runs: https://stainless.corp.google.com/search?view=matrix&row=build&col=board&first_date=2018-11-29&last_date=2018-12-26&test=GtsExoPlayerTestCases&board=%5Ecaroline%24&model=%5Ecaroline%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Jan 3
Finally today most platforms reached 11516.0.0 - 73.0.3654.0 in Canary, leaving 73.0.3644.0 behind; the bots are indeed turning green, yay! hiroh@: This means we can land crrev.com/c/1387391; I would only expect dropped-frames regressions on old devices only, if any. eve: https://stainless.corp.google.com/search?view=matrix&row=build&col=board&first_date=2018-12-28&last_date=2019-01-03&test=GtsExoPlayerTestCases&board=%5Eeve%24&model=%5Eeve%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false caroline: https://stainless.corp.google.com/search?view=matrix&row=build&col=board&first_date=2018-12-28&last_date=2019-01-03&test=GtsExoPlayerTestCases&board=%5Ecaroline%24&model=%5Ecaroline%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Jan 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/477a706a9026c91d86a056880d29abc43c7eaf77 commit 477a706a9026c91d86a056880d29abc43c7eaf77 Author: Miguel Casas <mcasas@chromium.org> Date: Fri Jan 04 19:13:45 2019 RELAND2: Vaapi decode: split |decoder_|s GetRequiredNumOfPictures() This CL is a smart-rebase-and-fix of the CLs below: it introduces a new parameter |use_reduced_number_of_allocations_| to allow for the new working mode described below and to temporarily circumvent the GtsExoPlayerTestCases failures (b/121169667 and b/121003733); this new flag is false when |output_mode_| is IMPORT, so all ARC++ cases should work bc left untouched. OTOH a mem savings overview can be found at https://goo.gl/3PaMiA. Original CL description ----------------------------------------------- Vaapi decode: split |decoder_|s GetRequiredNumOfPictures() This CL reduces the amount of PictureBuffers requested to be allocated by the |client_| when we are not |decode_using_client_picture_buffers_|. Instead, it "splits" the requested allocations into - the actual needed PictureBuffers (A) - the codec's requested reference frames (B) (a new method GetNumReferenceFrames() is added to AcceleratedVideoDecoder for this). This splitting saves a lot of memory, since we allocate A+B buffers instead of 2*(A+B). (B is 5 and A is 4-VP8, 4-12 H264/VP9) Test: crosvideo changing resolutions for each codec, v_d_a_unittest on nocturne (KBL) and caroline (SKL). Bug: 912295 Reviewed-on: https://chromium-review.googlesource.com/c/1363807 Reviewed-by: Hirokazu Honda <hiroh@chromium.org> Commit-Queue: Miguel Casas <mcasas@chromium.org> Cr-Original-Original-Commit-Position: refs/heads/master@{#615571} Reviewed-on: https://chromium-review.googlesource.com/c/1379274 Cr-Original-Commit-Position: refs/heads/master@{#617366} Change-Id: Ibf9a1455f8df4d52b77aee8e01f15c02878947ae Reviewed-on: https://chromium-review.googlesource.com/c/1387391 Cr-Commit-Position: refs/heads/master@{#620025} [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/accelerated_video_decoder.h [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/h264_decoder.cc [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/h264_decoder.h [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator.cc [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator.h [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vaapi/vaapi_video_decode_accelerator_unittest.cc [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp8_decoder.cc [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp8_decoder.h [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp9_decoder.cc [modify] https://crrev.com/477a706a9026c91d86a056880d29abc43c7eaf77/media/gpu/vp9_decoder.h
,
Jan 14
|
||||
►
Sign in to add a comment |
||||
Comment 1 by mcasas@chromium.org
, Dec 5