Significant performance degradation in VideoDecode NaCl sample
Reported by
aicomman...@gmail.com,
Apr 6 2016
|
||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2701.0 Safari/537.36 Example URL: Steps to reproduce the problem: 1. Run the VideoDecode sample NaCl plugin 2. Observe the frame rate of the video What is the expected behavior? The video playback should be smooth What went wrong? The video playback is extremely choppy Did this work before? Yes Definitely at 34fb6bacfdd34c536f75504073ce1211898f413b Is it a problem with Flash or HTML5? N/A Does this work in other browsers? N/A Chrome version: 51.0.2701.0 Channel: canary OS Version: OS X 10.11.4 Flash Version: Shockwave Flash 21.0 r0 YouTube video playback seems to be unaffected (with H.264 forced on). MacBook Pro 13-inch Mid-2010 NVIDIA GeForce 320M
,
Apr 6 2016
I think this might be due to the new osx power improvements requiring some copies in less frequent use cases.
,
Apr 7 2016
anybody can take a look?
,
Apr 7 2016
Please provide a bisect (lots of video stuff has changed lately, no sense in idle speculation).
,
Apr 11 2016
aicommander@Could you please provide the sample URL, actual and expected behavior screencast for further triaging the issue.
,
Apr 11 2016
The sample is not hosted anywhere (to my knowledge). It is present within the Chromium tree's NaCl SDK samples. https://chromium.googlesource.com/chromium/src/+/master/ppapi/examples/video_decode I will try to get a video of the issue later today.
,
Apr 12 2016
Video is here: https://youtu.be/LOt2w4WllJ4 Reproduced the issue again on 52.0.2705.0. I only had Chrome 50.0.2661.37 available as a working reference. If there is a place where I can find archives of Canary builds, I'd give one of those a shot too. I realized I forgot to explicitly mention in my initial report that there is one modification required in video_decode to make it use H264 rather than VP8 - The USE_VP8_TESTDATA_INSTEAD_OF_H264 define must be commented out. The performance issue does not reproduce using VP8, only H264. Also to clarify, it's not just the video_decode sample that shows a performance regression. I filed this bug primarily because I also see similar video decoding performance degradation in my NaCl extension which uses code based on the video_decode sample, among many other things. In that code, I'm receiving a 720p 60 FPS H.264 High Profile stream and also doing hardware accelerated decode. I figure video_decode is a better repro case for you all since it's much easier to test.
,
Apr 12 2016
there are lots of problems WRT h264 performance on TOT -- see issue 598388 issue 599314. They'll result in poor performance and incorrect results. Hopefully will be patched in the next day or so.
,
Apr 12 2016
,
Apr 12 2016
It is still reproducing with a r386832 Chromium build. I'm starting to think it might be the same issue that the automated performance tests caught as issue 601824 . Performance on that canvas test hasn't bounced back to baseline since https://codereview.chromium.org/1870323002 landed. I did some bisecting based on the Chromium snapshot builds. This bug doesn't reproduce on r385091 but it does on r385096 (the next available snapshot build). So it looks like https://codereview.chromium.org/1851293004 is to blame for this performance bug too.
,
Apr 13 2016
Removing Needs-bisect label as per comment #8.Please add if required.
,
Apr 13 2016
I'm surprised that https://codereview.chromium.org/1870323002 didn't improve the situation. There is a chance that we are still re-compiling the YUV->RGB shader at every frame -- I'll try moving the converter into the GLContextCGL. If that doesn't fix the issue, then we will have to absorb this regression -- using 4:2:0 instead of 4:2:2 makes a ~2x difference in fullscreen video power consumption.
,
Apr 14 2016
After more testing, it looks like the CPU usage of the GPU process during playback of the VideoDecode sample rises significantly after r385092. The GPU process's CPU usage jumps from 40% when playing back smoothly on r385091 to 70% on r385096. Something must be hitting a pathological code path during decode using NaCl's pp::VideoDecoder. Hopefully it's the shader compilation as you suspect.
,
Apr 15 2016
I realized I should also test a revision after https://codereview.chromium.org/1870323002 landed. I did so and noted similar CPU usage in the GPU process (around 70%).
,
Apr 15 2016
This patch https://codereview.chromium.org/1882953006/ may help a bit. After that I don't think there will be much room for improvement.
,
Apr 16 2016
I can confirm a large CPU usage increase in the GPU process from r385091 to a build from today (r387814) when running the video_decode PPAPI sample. My GPU process's CPU usage jumps from 20% to 80%. I have a tough time believing this is a net energy savings (at least in this workload) with such a large CPU usage rise. My frame rate wasn't as bad as the video depicted, but that's probably down to the significantly higher performance CPU in this MacBook compared to the initial reporter's. I'm on a MacBook Pro (Retina, Mid 2012) with GeForce GT 650M graphics running 10.11.4. ccameron, have you been able to reproduce this CPU usage regression on any of your test machines? Do you think it may be specific to PPAPI?
,
Apr 19 2016
Okay, now things are looking up. r388072 does not display this performance degradation in the VideoDecode sample anymore. I'm also seeing a CPU usage drop on the GPU process from 40% on r385091 to 25% on r388072. Video decoding performance in my NaCl app with 720p 60 FPS H.264 video is still poor on this build however. I'm still seeing a jump in CPU usage and a drop in decoding performance. On r385091, I'm hovering around 20-30% CPU usage in the GPU process. On r388072, I'm up to 45-50% CPU usage. The difference may be for NaCl content that hits the VideoToolbox hardware accelerated codepath rather than the software fallback. The VideoDecode sample fails to decode in hardware, so VideoToolbox decodes it in software. However, my NaCl app is definitely hitting the hardware accelerated path in VT. Maybe there's something different about the IOSurfaces between the two?
,
Apr 19 2016
I just tested forcing myself into the VT software decode mode by modifying my H.264 SPS to be incompatible with VT hardware decoding. Performance is still worse in the r385091 vs r388072. Performance _may_ be a bit better, but I'm just eyeballing it. Do you think when you re-land the patch to switch to AVSampleBufferDisplayLayer, we may see further improvements?
,
Apr 19 2016
> Do you think when you re-land the patch to switch to AVSampleBufferDisplayLayer, we may see further improvements? I suspect not. If you're seeing this performance degradation, you are going through a path that requires converting the video from YUV to RGB, which is the slow/power-inefficient path. This path is only hit when someone is manually compositing the video frame (e.g, YouTube 360, which turns it into a texture). I'm surprised that there is still such high CPU usage after r387986. It may be that just creating the OpenGL texture and binding it to the planes' IOSurfaces is expensive.
,
Apr 20 2016
ccameron, is there a way to avoid this nasty composition path for NaCl plugins? I'm not doing anything that requires an OGL texture, but it seems that pp::VideoDecoder can't just render directly. Maybe a better question for the NaCl team perhaps?
,
Apr 20 2016
Can you update the repro instructions with steps that someone unfamiliar with NaCL can run? I can take a look locally to see if anything jumps out at me.
,
Apr 21 2016
Sure thing. To run the video_decode sample: 1) Download and extract the NaCl SDK from here https://developer.chrome.com/native-client/sdk/download 2) From the nacl_sdk directory, run './naclsdk update' to get the latest Pepper platform files 3) cd into 'pepper_49/examples/api/video_decode' 4) Open video_decode.cc and comment out '#define USE_VP8_TESTDATA_INSTEAD_OF_H264' on line 29, then run 'make' 5) In Chrome's extensions page (in developer mode), hit 'Load unpacked extension' and point it at your 'nacl_sdk/pepper_49/examples/api/video_decode' folder 6) You should be able to run it from the extensions page or any other method of running Chrome apps. Click the video to start it from the beginning.
,
May 3 2017
,
May 3 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
May 9 2018
stale bug for > 1 year. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by rsesek@chromium.org
, Apr 6 2016