New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 684792 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocked on:
issue 695595
issue 877673

Blocking:
issue 698749



Sign in to add a comment

Up minimum hardware decode resolution to 360p.

Project Member Reported by dalecur...@chromium.org, Jan 24 2017

Issue description

Data for Android suggested anything below this was not worthwhile. Likely we could even go as high as 720p for desktop if I recall the data correctly.

We see lots of sites spinning up hardware decoders for 16x16 or smaller clips; this feels excessive.

At least on Android we have data suggesting it's power inefficient to use the hardware decoder for smaller resolutions. Do we have similar analysis on other platforms?
 
I've only seen metrics showing that 720p *can* be similar to software decode.

It is notable that VideoToolbox always uses software decoding under 480p.

The main reason we have not already done this is that MSE streams are likely to start at a low resolution, then ramp up. Without a path to upgrade to hardware decode, we could really hurt power.

GpuVideoDecoder also uses a hard cutoff, and raising the limit would break decoding, but that is a much easier problem to solve.

On Android we would be using MediaCodec either way, so the logic belongs in AVDA.
Good point about MSE based streams adapting upwards. We do have a way for demuxers to signal that they are expecting config changes, so we could apply this only to src= playback. We'd need to reorganize either how we select decoders or include the config change bit in the decoder config. 
There is also the scenario where we have many (possibly small) resolution decoders: multiple videos on one page, or multiple pages with videos to consider. Also, for Hangouts and VC in general, small streams of all people connected to the meeting; even with 180p, if we have many of these, we'd probably still prefer using HW decode then. These also go up and down in resolution as they get focused. All that goes through RTCVD though.

We could also set a larger min resolution in VDA::SupportedProfiles per platform/VDA.
Blocking: 698749
Blockedon: 695595
posciak@ are you sure you'd want to hardware decode those at 180p? Even on Android we've found that the lower bound for power/cpu efficiency is around 360p; anything below that used more power with hardware.
Probably not. However, what would happen when/if the stream switched up to higher resolutions (and back) for both VC and HTML5 video playback scenarios?  The overhead of switching back and forth between HW/SW decode could then be a factor.

Would this be happening in any cases here?
For HTML5, we'd just switch back and forth as necessary; so long as the switch takes less 4x frames or occurs when not playing, it won't be noticeable to the user.

For RTC, I think it already handles switching from software to hardware:
https://cs.chromium.org/chromium/src/content/renderer/media/gpu/rtc_video_decoder.cc?l=221
https://cs.chromium.org/chromium/src/third_party/webrtc/media/engine/videodecodersoftwarefallbackwrapper.cc

Project Member

Comment 9 by bugdroid1@chromium.org, Jul 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fbf1006e21d5ec242a5c0cc18d546aa12af214e3

commit fbf1006e21d5ec242a5c0cc18d546aa12af214e3
Author: Dan Sanders <sandersd@chromium.org>
Date: Tue Jul 17 19:56:48 2018

[media] Add media::*Decoder::IsPlatformDecoder().

Add IsPlatformDecoder() API to media::VideoDecoder, and also to
media::AudioDecoder for symmetry. This value will be used by
media::DecoderStream to decide when to fall back to software decode for
low-resolution video.

Bug: 684792
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Change-Id: I778ebdcaa84a9cabfc74be9729969db53b2aad7a
Reviewed-on: https://chromium-review.googlesource.com/1139097
Commit-Queue: Dan Sanders <sandersd@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#575749}
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/content/browser/media/media_internals.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/base/audio_decoder.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/base/audio_decoder.h
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/base/video_decoder.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/base/video_decoder.h
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/filters/decoder_stream.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/filters/gpu_video_decoder.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/filters/gpu_video_decoder.h
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/mojo/clients/mojo_audio_decoder.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/mojo/clients/mojo_audio_decoder.h
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/mojo/clients/mojo_video_decoder.cc
[modify] https://crrev.com/fbf1006e21d5ec242a5c0cc18d546aa12af214e3/media/mojo/clients/mojo_video_decoder.h

Cc: tfiga@chromium.org
Switching from SW to HW decoder mid-stream is usually a costly operation, which may include opening the device nodes and setting up the drivers, negotiating formats and parameters with the drivers, setting up hardware codec, powering it on, loading and initializing firmware, setting up working buffer set, programming IOMMUs, clearing caches, etc.

In addition, depending on the resolution threshold and the particular platform, we may still be able to save power/have better smoothness even at low resolutions (e.g. 360p) with HW decoders.

Unless we would be able to somehow infer that the stream would never switch into higher resolutions mid-playback (or, especially, change resolutions causing multiple SW-HW decoder switches), which to my knowledge may not be possible in most cases, I think we may need to be especially careful before enabling this optimization, especially on non-desktop.

Would it perhaps be an option to allow the platform decoder to inform the client (e.g. via a static capability call) whether SW fallback should be done or not at all, and for what resolutions if so?
Generally agreed with what Pawel said before.

I'd also suggest that we don't over-generalize based on some Android testing, because there is a big variance between different hardware and software platforms:
 - different codec driver interfaces - VAAPI, V4L2, Stateless V4L2, with potentially different overheads and initialization times,
 - different CPU performance/power characteristics - not only ARM vs x86, but also variance between particular ARM implementations,
 - integration with graphics stack - is it always possible to composite from software-decoded videos directly, without texture uploads, as we do with hardware-decoded videos?

and probably more. If we decide that this is an optimization worthwhile on non-desktop (specifically Chrome OS) systems, we would definitely want to have some knobs to tune the behavior to particular hardware platforms.
Cc: mcasas@chromium.org
How many cases of switching resolution mid-stream up from thumbnail do we 
see that are not WebRTC use cases?  IOW I think the majority of non-RTC 
video playback cases would start at least at 240p and move up from it (think
YT).  Only RTC would progress from a stamp size up to 720p.  

GPU could provide the minimum desired resolution for a given hw deco as a
hint, RTC would ignore it and the "normal" video playback honour it. Does
it make sense?

It's currently the case that if a stream dips below a hardware decoder's minimum resolution, we fall back to software decode and never transition back to hardware decode even if the resolution increases.

We know that on Mac OS X, we don't want to use the hardware decoder below 480p, but actually specifying that in SupportedProfiles would trigger this issue constantly. (Basically all YouTube playbacks.)

While we intend to experiment to determine if we can set higher thresholds to get power savings without introducing unacceptable latency, the goal of current work is specifically to implement the 'fall forward' feature.
On the latest gen Intel CrOs (soraka, eve etc), for 720p VP9 we see that the 
power consumption of the Sw decoder (VpxVideoDecoder) is lower than the Hw
decoder (both of them with overlays engaged), see the _pkg_pwr metrics in:

https://chromeperf.appspot.com/report?sid=578d94b340109e518b217c266f04fde1867ebbf4d8df659f572f67b5e487a527


It looks like hw decode bumps gfx power significantly, which I guess is expected, since the decoder on Intel platforms is a part of the GPU. On the other hand, it might be interesting to check if there isn't any inefficiency (or bugs) in GPU runtime PM under light (or non-gfx) loads (decode + hw overlays).

I tried looking at some ARM boards in that test, but couldn't find any that included power consumption results. I wouldn't be surprised if it was much more efficient there, because codecs on ARM platforms are normally separate IP blocks that don't require GPU to be running to operate and use V4L2 drivers that are normally designed towards proper runtime PM.
Blocking: 873871
Do you want to use this bug for that block? if so we need to make this Pri-1 M70.
Labels: -Pri-3 M-70 Pri-1
I should probably split into a new bug, but I didn't want to lose the context in this thread.
Blockedon: 877673
Blocking: -873871
Split to  issue 877673 .

Sign in to add a comment