Super subtle race causes DCHECK in VideoCaptureImpl::OnBufferDestroyed() to fail. |
||
Issue descriptionWhile I was working on the new tab capture impl, and integrating it with the video capture stack, I came across a very subtle race. The problem manifests as a DCHECK() failure in VideoCaptureImpl::OnBufferDestroyed() because the ref-count on the ClientBuffer is not exactly 1, as expected. This leads the DCHECK() to believe a buffer is being destroyed while in-use. Digging deeper, I believe the root-cause is that VideoCaptureImpl::DidFinishConsumingFrame(), which runs on a different thread, is implicitly holding a reference to the ClientBuffer via the callback it is running. The following very unlikely, but obviously possible sequence of steps occurs: 1. DidFinishConsumingFrame() is called and runs the "callback_to_io_thread". It has NOT YET returned from the Run() call. The thread sleeps throughout the remaining steps, and so there is a reference being held on the ClientBuffer from here the entire time. 2. The callback itself trampolines to the IO thread and it calls OnClientBufferFinished(). 3. OnClientBufferFinished() calls VideoCaptureHost::ReleaseBuffer(), which sends a mojo message to the browser process. 4. The browser process then sends a mojo message back, causing OnBufferDestroyed() to be called. Since the ref-count on the ClientBuffer is one too high, the DCHECK() fails. 5. Finally, the thread from step 1 resumes and exits, auto-releasing its reference to the ClientBuffer. Unfortunately, it's too late. So, there's actually nothing catastrophic going on here other than the DCHECK() thinking the buffer is still in use. Nevertheless, we can correct the problem so that the DCHECK() is not "false alarming" by making sure the scoped_refptr<ClientBuffer> is always moved and never copied, and that the thread in step 1 can't possibly hold a reference. When I make this modification locally, my "frequently crashes" use case stops crashing, showing the problem is solved.
,
Jan 3 2018
Thanks miu@ for finding this and posting this detailed and clear analysis. I believe this is probably also the cause for issue 784269 , which I will mark as duplicate.
,
Jan 3 2018
No problem! I was lucky (or unlucky?) to have written code to drive a usage pattern that reliably triggered the problem. :)
,
Jan 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e34817a44dbc20a2f2d2f165a0b8275cc6ce137c commit e34817a44dbc20a2f2d2f165a0b8275cc6ce137c Author: Yuri Wiitala <miu@chromium.org> Date: Wed Jan 03 20:52:07 2018 Fix subtle buffer-finished race in VideoCaptureImpl::OnBufferDestroyed() Ensures the ref-counted VideoCaptureImpl::ClientBuffer object has its ref-count adjusted with non-racy timing as the object is passed between threads. This prevents a DCHECK() in VideoCaptuerImpl::OnBufferDestroyed() from false-alarming on what it thinks is a "free while in-use" situation. More-complete discussion on this in the crbug. Testing: Confirmed fix by running reliable repro case for 5 days straight without it tripping the DCHECK(). Bug: 797851 Change-Id: Ia9f031493bec32f978c03f64d25d42611d6518b5 Reviewed-on: https://chromium-review.googlesource.com/848323 Reviewed-by: Christian Fremerey <chfremer@chromium.org> Commit-Queue: Yuri Wiitala <miu@chromium.org> Cr-Commit-Position: refs/heads/master@{#526799} [modify] https://crrev.com/e34817a44dbc20a2f2d2f165a0b8275cc6ce137c/content/renderer/media/video_capture_impl.cc [modify] https://crrev.com/e34817a44dbc20a2f2d2f165a0b8275cc6ce137c/content/renderer/media/video_capture_impl.h
,
Jan 3 2018
|
||
►
Sign in to add a comment |
||
Comment 1 by chfremer@chromium.org
, Jan 3 2018