New issue
Advanced search Search tips

Issue 601191 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 3
Type: Bug



Sign in to add a comment

Significant performance degradation in VideoDecode NaCl sample

Reported by aicomman...@gmail.com, Apr 6 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2701.0 Safari/537.36

Example URL:

Steps to reproduce the problem:
1. Run the VideoDecode sample NaCl plugin
2. Observe the frame rate of the video

What is the expected behavior?
The video playback should be smooth

What went wrong?
The video playback is extremely choppy

Did this work before? Yes Definitely at 34fb6bacfdd34c536f75504073ce1211898f413b

Is it a problem with Flash or HTML5? N/A

Does this work in other browsers? N/A 

Chrome version: 51.0.2701.0  Channel: canary
OS Version: OS X 10.11.4
Flash Version: Shockwave Flash 21.0 r0

YouTube video playback seems to be unaffected (with H.264 forced on). 

MacBook Pro 13-inch Mid-2010
NVIDIA GeForce 320M
 
Components: Platform>NaCl
Cc: sande...@chromium.org ccameron@chromium.org
I think this might be due to the new osx power improvements requiring some copies in less frequent use cases.
Status: Available (was: Unconfirmed)
anybody can take a look?
Labels: Needs-Bisect
Please provide a bisect (lots of video stuff has changed lately, no sense in idle speculation).
Labels: Needs-Feedback
aicommander@Could you please provide the sample URL, actual and expected behavior screencast for further triaging the issue.
The sample is not hosted anywhere (to my knowledge). It is present within the Chromium tree's NaCl SDK samples.

https://chromium.googlesource.com/chromium/src/+/master/ppapi/examples/video_decode

I will try to get a video of the issue later today.


Video is here: https://youtu.be/LOt2w4WllJ4
Reproduced the issue again on 52.0.2705.0.

I only had Chrome 50.0.2661.37 available as a working reference. If there is a place where I can find archives of Canary builds, I'd give one of those a shot too.

I realized I forgot to explicitly mention in my initial report that there is one modification required in video_decode to make it use H264 rather than VP8 -  The USE_VP8_TESTDATA_INSTEAD_OF_H264 define must be commented out. The performance issue does not reproduce using VP8, only H264.

Also to clarify, it's not just the video_decode sample that shows a performance regression. I filed this bug primarily because I also see similar video decoding performance degradation in my NaCl extension which uses code based on the video_decode sample, among many other things. In that code, I'm receiving a 720p 60 FPS H.264 High Profile stream and also doing hardware accelerated decode. I figure video_decode is a better repro case for you all since it's much easier to test.
there are lots of problems WRT h264 performance on TOT -- see  issue 598388  issue 599314. They'll result in poor performance and incorrect results. Hopefully will be patched in the next day or so.
Owner: ccameron@chromium.org
Status: Assigned (was: Available)
It is still reproducing with a r386832 Chromium build. I'm starting to think it might be the same issue that the automated performance tests caught as  issue 601824 . Performance on that canvas test hasn't bounced back to baseline since https://codereview.chromium.org/1870323002 landed.

I did some bisecting based on the Chromium snapshot builds. This bug doesn't reproduce on r385091 but it does on r385096 (the next available snapshot build). So it looks like https://codereview.chromium.org/1851293004 is to blame for this performance bug too.
Labels: -Needs-Bisect
Removing Needs-bisect label as per comment #8.Please add if required.
I'm surprised that https://codereview.chromium.org/1870323002 didn't improve the situation.

There is a chance that we are still re-compiling the YUV->RGB shader at every frame -- I'll try moving the converter into the GLContextCGL. If that doesn't fix the issue, then we will have to absorb this regression -- using 4:2:0 instead of 4:2:2 makes a ~2x difference in fullscreen video power consumption.
After more testing, it looks like the CPU usage of the GPU process during playback of the VideoDecode sample rises significantly after r385092.

The GPU process's CPU usage jumps from 40% when playing back smoothly on r385091 to 70% on r385096. Something must be hitting a pathological code path during decode using NaCl's pp::VideoDecoder. Hopefully it's the shader compilation as you suspect. 
I realized I should also test a revision after https://codereview.chromium.org/1870323002 landed. I did so and noted similar CPU usage in the GPU process (around 70%).
This patch https://codereview.chromium.org/1882953006/ may help a bit. After that I don't think there will be much room for improvement.
I can confirm a large CPU usage increase in the GPU process from r385091 to a build from today (r387814) when running the video_decode PPAPI sample. My GPU process's CPU usage jumps from 20% to 80%. I have a tough time believing this is a net energy savings (at least in this workload) with such a large CPU usage rise.

My frame rate wasn't as bad as the video depicted, but that's probably down to the significantly higher performance CPU in this MacBook compared to the initial reporter's.

I'm on a MacBook Pro (Retina, Mid 2012) with GeForce GT 650M graphics running 10.11.4.

ccameron, have you been able to reproduce this CPU usage regression on any of your test machines? Do you think it may be specific to PPAPI?
Okay, now things are looking up. r388072 does not display this performance degradation in the VideoDecode sample anymore. I'm also seeing a CPU usage drop on the GPU process from 40% on r385091 to 25% on r388072.

Video decoding performance in my NaCl app with 720p 60 FPS H.264 video is still poor on this build however. I'm still seeing a jump in CPU usage and a drop in decoding performance. On r385091, I'm hovering around 20-30% CPU usage in the GPU process. On r388072, I'm up to 45-50% CPU usage.

The difference may be for NaCl content that hits the VideoToolbox hardware accelerated codepath rather than the software fallback. The VideoDecode sample fails to decode in hardware, so VideoToolbox decodes it in software. However, my NaCl app is definitely hitting the hardware accelerated path in VT. Maybe there's something different about the IOSurfaces between the two?
I just tested forcing myself into the VT software decode mode by modifying my H.264 SPS to be incompatible with VT hardware decoding. Performance is still worse in the r385091 vs r388072. Performance _may_ be a bit better, but I'm just eyeballing it.

Do you think when you re-land the patch to switch to AVSampleBufferDisplayLayer, we may see further improvements?

> Do you think when you re-land the patch to switch to AVSampleBufferDisplayLayer, we may see further improvements?

I suspect not. If you're seeing this performance degradation, you are going through a path that requires converting the video from YUV to RGB, which is the slow/power-inefficient path.

This path is only hit when someone is manually compositing the video frame (e.g, YouTube 360, which turns it into a texture).

I'm surprised that there is still such high CPU usage after r387986. It may be that just creating the OpenGL texture and binding it to the planes' IOSurfaces is expensive.

ccameron, is there a way to avoid this nasty composition path for NaCl plugins? I'm not doing anything that requires an OGL texture, but it seems that pp::VideoDecoder can't just render directly. Maybe a better question for the NaCl team perhaps? 
Can you update the repro instructions with steps that someone unfamiliar with NaCL can run? I can take a look locally to see if anything jumps out at me.
Sure thing.

To run the video_decode sample:

1) Download and extract the NaCl SDK from here https://developer.chrome.com/native-client/sdk/download

2) From the nacl_sdk directory, run './naclsdk update' to get the latest Pepper platform files

3) cd into 'pepper_49/examples/api/video_decode'

4) Open video_decode.cc and comment out '#define USE_VP8_TESTDATA_INSTEAD_OF_H264' on line 29, then run 'make'

5) In Chrome's extensions page (in developer mode), hit 'Load unpacked extension' and point it at your 'nacl_sdk/pepper_49/examples/api/video_decode' folder

6) You should be able to run it from the extensions page or any other method of running Chrome apps. Click the video to start it from the beginning.
Labels: -Pri-2 Pri-3
Owner: ----
Status: Available (was: Assigned)
Project Member

Comment 24 by sheriffbot@chromium.org, May 3 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: WontFix (was: Untriaged)
stale bug for > 1 year.

Sign in to add a comment