Issue metadata
Sign in to add a comment
|
v4l2 video decode accelerator sometimes deadlocks after switching video resolution
Reported by
kiagiada...@gmail.com,
Nov 3 2017
|
||||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 Steps to reproduce the problem: On a platform that operates a hardware decoder through the v4l2 video decode accelerator, try the following: 1. Open a video in youtube, with quality set to Auto 2. Wait... 3. At some point, although the video has buffered a lot, you will notice that it pauses and the youtube player's spinning circles appear. At this point, it is not even possible to seek. The only way to unblock it is to reload the page. What is the expected behavior? Video should play uninterrupted What went wrong? There is a deadlock in V4L2VideoDecodeAccelerator [media/gpu/v4l2_video_decode_accelerator.cc] that causes this. The root cause of the deadlock is this: ServiceDeviceTask() and DevicePollTask() are meant to run in sequence and activate one another. But, they run from different threads. Normally, StartDevicePoll() schedules a call to DevicePollTask(), which then in turn schedules ServiceDeviceTask(), which schedules DevicePollTask() again and they continue like this. When switching resolutions, there is an interesting chain of events: 1) ServiceDeviceTask() schedules DevicePollTask() 2) DevicePollTask() now runs in the other thread in parallel to ServiceDeviceTask() 3) DevicePollTask() schedules ServiceDeviceTask() 4) the previous ServiceDeviceTask() at the same time calls StartResolutionChange() before returning 5) StartResolutionChange() causes the thread that runs DevicePollTask() to get restarted by calling StopDevicePoll() & StartDevicePoll() 6) StartDevicePoll() schedules DevicePollTask() 7) ServiceDeviceTask() returns and runs again because it was scheduled earlier in (3) 8) ServiceDeviceTask() schedules DevicePollTask() -> At this point we've ended up with 2 scheduled calls to DevicePollTask(), which are going to schedule 2 calls to ServiceDeviceTask() and so forth, which leads to problems. The deadlock situation happens when at some point ServiceDeviceTask() schedules DevicePollTask() with poll_device=false because there are no input or output buffers available, but then the next ServiceDeviceTask() (which runs immediately because there are always 2 of them scheduled) gets an output buffer and schedules DevicePollTask with poll_device=true. At this point, the state of the VDA is such that it is waiting for an event from the v4l device, but DevicePollTask() has called poll() without the device fd, and therefore the event is never received. I am attaching a patch that I implemented to fix this. The solution that is implemented is basically to break the chain of calls when it is detected that device_poll_thread_ has been restarted. With this additional check, at event (7) in the above example, ServiceDeviceTask() will return early and do nothing, because it will detect that it was scheduled by the previous device_poll_thread_ and not the current one. Did this work before? N/A Chrome version: 61.0.3163.79 (Developer Build) Channel: n/a OS Version: Endless 3.4.0 Flash Version: N/A
,
Nov 3 2017
,
Nov 8 2017
Requesting someone from the dev team for help in triaging this further.
,
Nov 8 2017
kiagiadakis.george@ thanks for the report and the patch; before anything else, could you give details about the chromebook where this fails? It should be ARM based, right? Then, could you in any case please follow the instructions of [1] to submit a patch, that should be verified with a simplechrome [2]? Thanks! [1] https://www.chromium.org/developers/contributing-code [2] https://chromium.googlesource.com/chromiumos/docs/+/master/simple_chrome_workflow.md
,
Nov 10 2017
I'm sorry I wasn't clear about that. This problem was spotted on an Endless Mini computer running Endless OS, not a chromebook. The SoC is an Amlogic S805 (meson8b) and the OS is ubuntu/debian based. Chromium is built for a linux desktop target there, with a few tweaks. Despite this, I reported the issue here because it looks like a logic error in chromium's v4l VDA code and doesn't seem like it would affect only this board/platform. Being a race condition, it might be harder to reproduce elsewhere, but still, it could affect chromium running on any ARM board that uses the v4l VDA. I am going to open a review ticket on gerrit asap.
,
Nov 10 2017
Thank you for providing more feedback. Adding requester "mcasas@chromium.org" to the cc list and removing "Needs-Feedback" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 10 2017
+mcasas@ who seems to be looking into this.
,
Nov 15 2017
The NextAction date has arrived: 2017-11-15 |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by ligim...@chromium.org
, Nov 3 2017