Use after free in gpu::GpuChannelHost::Send() |
||||
Issue descriptionNew test cases about to be added as part of https://codereview.chromium.org/2867213004/ revealed a use-after-free during MSan runs. This happens during the browser shutdown while gpu-accelerated video capture frame decoding is running. ==121014==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x259ed4d in gpu::GpuChannelHost::Send(IPC::Message*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../gpu/ipc/client/gpu_channel_host.cc:111:17 #1 0x27ce48b in Send /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/gpu/ipc/client/gpu_jpeg_decode_accelerator_host.cc:207:18 #2 0x27ce48b in media::GpuJpegDecodeAcceleratorHost::Decode(media::BitstreamBuffer const&, scoped_refptr<media::VideoFrame> const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/gpu/ipc/client/gpu_jpeg_decode_accelerator_host.cc:197:0 #3 0x6e4e3b7 in content::VideoCaptureGpuJpegDecoder::DecodeCapturedData(unsigned char const*, unsigned long, media::VideoCaptureFormat const&, base::TimeTicks, base::TimeDelta, media::VideoCaptureDevice::Client::Buffer) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/renderer_host/media/video_capture_gpu_jpeg_decoder.cc:176:13 #4 0xd11c784 in media::VideoCaptureDeviceClient::OnIncomingCapturedData(unsigned char const*, int, media::VideoCaptureFormat const&, int, base::TimeTicks, base::TimeDelta, int) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/video_capture_device_client.cc:291:31 #5 0xd13ffbf in media::JpegEncodingFrameDeliverer::PaintAndDeliverNextFrame(base::TimeDelta) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/fake_video_capture_device.cc:612:13 #6 0xd1401fc in media::FakeVideoCaptureDevice::OnNextFrameDue(base::TimeTicks, int) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/fake_video_capture_device.cc:655:21 #7 0x9048ba8 in Run /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/callback.h:91:12 #8 0x9048ba8 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/debug/task_annotator.cc:59:0 #9 0x8df18b3 in base::MessageLoop::RunTask(base::PendingTask*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:409:19 #10 0x8df3497 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:420:5 #11 0x8df3f4e in base::MessageLoop::DoWork() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:508:13 #12 0x8e002fa in base::MessagePumpDefault::Run(base::MessagePump::Delegate*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_pump_default.cc:33:31 #13 0x8e91450 in base::RunLoop::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/run_loop.cc:111:14 #14 0x8f46b85 in base::Thread::ThreadMain() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/threading/thread.cc:338:3 #15 0x8f2f835 in base::(anonymous namespace)::ThreadFunc(void*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/threading/platform_thread_posix.cc:71:13 #16 0x7fa2a040f183 in start_thread /build/eglibc-MjiXCM/eglibc-2.19/nptl/pthread_create.c:312:0 #17 0x7fa29ff26bec in clone /build/eglibc-MjiXCM/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111:0 Uninitialized value was created by a heap deallocation #0 0x599ce9 in operator delete(void*) ??:0:0 #1 0x67dfd86 in content::BrowserGpuChannelHostFactory::Terminate() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/gpu/browser_gpu_channel_host_factory.cc:199:3 #2 0x61b8f94 in content::BrowserMainLoop::ShutdownThreadsAndCleanUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/browser_main_loop.cc:1352:7 #3 0x61c3fb9 in content::BrowserMainRunnerImpl::Shutdown() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/browser_main_runner.cc:202:19 #4 0x7e75949 in ShellBrowserMain(content::MainFunctionParams const&, std::__1::unique_ptr<content::BrowserMainRunner, std::__1::default_delete<content::BrowserMainRunner> > const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/shell/browser/shell_browser_main.cc:33:16 #5 0x7e4f2ed in content::ShellMainDelegate::RunProcess(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, content::MainFunctionParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/shell/app/shell_main_delegate.cc:311:16 #6 0x5df3bf2 in content::RunNamedProcessTypeMain(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, content::MainFunctionParams const&, content::ContentMainDelegate*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main_runner.cc:399:35 #7 0x5df6a5d in content::ContentMainRunnerImpl::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main_runner.cc:705:12 #8 0xd2186d5 in service_manager::Main(service_manager::MainParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../services/service_manager/embedder/main.cc:469:29 #9 0x2a619fe in content::ContentMain(content::ContentMainParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main.cc:19:10 #10 0x7d114de in content::BrowserTestBase::SetUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/browser_test_base.cc:266:3 #11 0x7ce3e39 in content::ContentBrowserTest::SetUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/content_browser_test.cc:89:20 #12 0x7982227 in HandleExceptionsInMethodIfSupported<testing::Test, void> /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2458:12 #13 0x7982227 in testing::Test::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2470:0 #14 0x7984daa in testing::TestInfo::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2656:11 #15 0x7986359 in testing::TestCase::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2774:28 #16 0x79a5f4d in testing::internal::UnitTestImpl::RunAllTests() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:4650:43 #17 0x79a4de1 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2458:12 #18 0x79a4de1 in testing::UnitTest::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:4258:0 #19 0x7dbcae4 in RUN_ALL_TESTS /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/include/gtest/gtest.h:2237:46 #20 0x7dbcae4 in base::TestSuite::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/test/test_suite.cc:271:0 #21 0x7cf6659 in content::ContentTestLauncherDelegate::RunTestSuite(int, char**) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/test/content_test_launcher.cc:105:48 #22 0x7d77e90 in content::LaunchTests(content::TestLauncherDelegate*, int, int, char**) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/test_launcher.cc:520:31
,
May 22 2017
It's illegal to access the GpuChannelHost after threads have been cleaned up. It can't work anyway. The problem is that your thread that runs FakeVideoCaptureDevice outlives the browser threads (main, IO). You need to join it before we destroy the browser threads.
,
May 23 2017
Thanks, for the input. Yes, that makes sense. Is joining the thread that runs the video capture device the only option we have? Or would it also be possible to somehow use a WeakPtr<> to stop it from calling into GpuChannelHost() without having to wait for the thread to exit?
,
May 23 2017
WeakPtrs aren't safe across threads (must only be dereferenced on the same thread that invalidates it). Joining is the right thing to do. We already join all browser threads. A random thread running amok during destruction is a large cause for problems.
,
May 23 2017
After a little bit of digging, I am afraid it may be more complicated than it first looked. For the fake video capture device used in the tests that revealed the issue (see #1), the thread that it runs on is (in the context of the video caputre stack) referred to as the "device thread". On most OSes (including Linux, where the logs from #1 came from), this thread happens to be the same as the AudioManager's thread. For this case, we could probably mitigate the issue by making sure this thread is joined before the BrowserGpuChannelHostFactory is destroyed, i.e. by moving the code block at [1] a few lines down to after the block at [2]. However: 1.) On Windows, the "device thread" is a dedicated thread owned by MediaStreamManager, see [3]. To join it, we would have to add a blocking shutdown API to MediaStreamManager. 2.) Non-fake capture devices often come with their own internal thread. The VideoCaptureDevice API [4] does not currently offer methods for joining such threads. It appears to me that the video capture device API was designed with the intent of shutting down the devices asynchronously from the rest of the browser. And that seems like a reasonable approach to me. Instead of blocking the BrowserMainLoop shutdown sequence to wait for the shutdown of all video capture devices, it should be sufficient to communicate to the video capture device threads that certain things can no longer be used. This could be done via an atomic flag to avoid inter-thread round trips. My proposal would be to add an atomic bool to GpuChannelHost and set it to false when DestroyChannel() is called. After that, it would still be legal for clients to call methods on the GpuChannelHost instance, but they would be either no-ops or return failure codes. What do you think about this proposal? [1] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?dr=CSs&l=1352 [2] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?dr=CSs&l=1368 [3] https://cs.chromium.org/chromium/src/content/browser/renderer_host/media/media_stream_manager.h?q=MediaStreamManager&dr=CSs&l=415 [4] https://cs.chromium.org/chromium/src/media/capture/video/video_capture_device.h?l=71
,
May 23 2017
Please, no. Just join the threads please.
,
May 23 2017
More details: 1- GpuChannelHost is in the critical path for many rendering/display/UI workloads, and adding overhead (atomic variable tests) is not good. 2- there's probably a lot more than GpuChannelHost that is not safe to use any more after the browser threads are joined. The main thread which assumes it can freely destroy its data structures without synchronization, and having threads outlive it yet still access its data structures is unsafe. We shouldn't have to play whack-a-mole on these things. Joining the threads is the correct thing to do.
,
May 23 2017
Thanks for the input on this. Looks like we have to modify the design of the video capture stack and add APIs for joining threads on shutdown. Since this will be a good piece of work, I will open a new issue to track the progress on this and set this one as blocked.
,
May 23 2017
,
Mar 20 2018
I believe the particular issue tracked in this report can no longer happen since the accelerated jpeg decoding has been converted from legacy IPC to Mojo in [1]. The decode call coming from a thread that might still run after the Browser::IO threads have been shut down now posts to the Browser::IO thread which should at that point lead to the posted task being discarded. => marking as fixed [1] https://chromium-review.googlesource.com/c/chromium/src/+/525672 |
||||
►
Sign in to add a comment |
||||
Comment 1 by chfremer@chromium.org
, May 22 2017Labels: -Pri-3 Pri-2
Owner: piman@chromium.org