New issue
Advanced search Search tips

Issue 725271 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 725627

Blocking:
issue 584797



Sign in to add a comment

Use after free in gpu::GpuChannelHost::Send()

Project Member Reported by chfremer@chromium.org, May 22 2017

Issue description

New test cases about to be added as part of https://codereview.chromium.org/2867213004/ revealed a use-after-free during MSan runs. This happens during the browser shutdown while gpu-accelerated video capture frame decoding is running.

==121014==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x259ed4d in gpu::GpuChannelHost::Send(IPC::Message*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../gpu/ipc/client/gpu_channel_host.cc:111:17
    #1 0x27ce48b in Send /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/gpu/ipc/client/gpu_jpeg_decode_accelerator_host.cc:207:18
    #2 0x27ce48b in media::GpuJpegDecodeAcceleratorHost::Decode(media::BitstreamBuffer const&, scoped_refptr<media::VideoFrame> const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/gpu/ipc/client/gpu_jpeg_decode_accelerator_host.cc:197:0
    #3 0x6e4e3b7 in content::VideoCaptureGpuJpegDecoder::DecodeCapturedData(unsigned char const*, unsigned long, media::VideoCaptureFormat const&, base::TimeTicks, base::TimeDelta, media::VideoCaptureDevice::Client::Buffer) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/renderer_host/media/video_capture_gpu_jpeg_decoder.cc:176:13
    #4 0xd11c784 in media::VideoCaptureDeviceClient::OnIncomingCapturedData(unsigned char const*, int, media::VideoCaptureFormat const&, int, base::TimeTicks, base::TimeDelta, int) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/video_capture_device_client.cc:291:31
    #5 0xd13ffbf in media::JpegEncodingFrameDeliverer::PaintAndDeliverNextFrame(base::TimeDelta) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/fake_video_capture_device.cc:612:13
    #6 0xd1401fc in media::FakeVideoCaptureDevice::OnNextFrameDue(base::TimeTicks, int) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../media/capture/video/fake_video_capture_device.cc:655:21
    #7 0x9048ba8 in Run /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/callback.h:91:12
    #8 0x9048ba8 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/debug/task_annotator.cc:59:0
    #9 0x8df18b3 in base::MessageLoop::RunTask(base::PendingTask*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:409:19
    #10 0x8df3497 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:420:5
    #11 0x8df3f4e in base::MessageLoop::DoWork() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_loop.cc:508:13
    #12 0x8e002fa in base::MessagePumpDefault::Run(base::MessagePump::Delegate*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/message_loop/message_pump_default.cc:33:31
    #13 0x8e91450 in base::RunLoop::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/run_loop.cc:111:14
    #14 0x8f46b85 in base::Thread::ThreadMain() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/threading/thread.cc:338:3
    #15 0x8f2f835 in base::(anonymous namespace)::ThreadFunc(void*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/threading/platform_thread_posix.cc:71:13
    #16 0x7fa2a040f183 in start_thread /build/eglibc-MjiXCM/eglibc-2.19/nptl/pthread_create.c:312:0
    #17 0x7fa29ff26bec in clone /build/eglibc-MjiXCM/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111:0

  Uninitialized value was created by a heap deallocation
    #0 0x599ce9 in operator delete(void*) ??:0:0
    #1 0x67dfd86 in content::BrowserGpuChannelHostFactory::Terminate() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/gpu/browser_gpu_channel_host_factory.cc:199:3
    #2 0x61b8f94 in content::BrowserMainLoop::ShutdownThreadsAndCleanUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/browser_main_loop.cc:1352:7
    #3 0x61c3fb9 in content::BrowserMainRunnerImpl::Shutdown() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/browser/browser_main_runner.cc:202:19
    #4 0x7e75949 in ShellBrowserMain(content::MainFunctionParams const&, std::__1::unique_ptr<content::BrowserMainRunner, std::__1::default_delete<content::BrowserMainRunner> > const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/shell/browser/shell_browser_main.cc:33:16
    #5 0x7e4f2ed in content::ShellMainDelegate::RunProcess(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, content::MainFunctionParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/shell/app/shell_main_delegate.cc:311:16
    #6 0x5df3bf2 in content::RunNamedProcessTypeMain(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, content::MainFunctionParams const&, content::ContentMainDelegate*) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main_runner.cc:399:35
    #7 0x5df6a5d in content::ContentMainRunnerImpl::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main_runner.cc:705:12
    #8 0xd2186d5 in service_manager::Main(service_manager::MainParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../services/service_manager/embedder/main.cc:469:29
    #9 0x2a619fe in content::ContentMain(content::ContentMainParams const&) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/app/content_main.cc:19:10
    #10 0x7d114de in content::BrowserTestBase::SetUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/browser_test_base.cc:266:3
    #11 0x7ce3e39 in content::ContentBrowserTest::SetUp() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/content_browser_test.cc:89:20
    #12 0x7982227 in HandleExceptionsInMethodIfSupported<testing::Test, void> /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2458:12
    #13 0x7982227 in testing::Test::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2470:0
    #14 0x7984daa in testing::TestInfo::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2656:11
    #15 0x7986359 in testing::TestCase::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2774:28
    #16 0x79a5f4d in testing::internal::UnitTestImpl::RunAllTests() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:4650:43
    #17 0x79a4de1 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:2458:12
    #18 0x79a4de1 in testing::UnitTest::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/src/gtest.cc:4258:0
    #19 0x7dbcae4 in RUN_ALL_TESTS /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../third_party/googletest/src/googletest/include/gtest/gtest.h:2237:46
    #20 0x7dbcae4 in base::TestSuite::Run() /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../base/test/test_suite.cc:271:0
    #21 0x7cf6659 in content::ContentTestLauncherDelegate::RunTestSuite(int, char**) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/test/content_test_launcher.cc:105:48
    #22 0x7d77e90 in content::LaunchTests(content::TestLauncherDelegate*, int, int, char**) /usr/local/google/home/chfremer/code/chromium/src/out/msan/../../content/public/test/test_launcher.cc:520:31
 
Cc: -piman@chromium.org chfremer@chromium.org
Labels: -Pri-3 Pri-2
Owner: piman@chromium.org
piman@: According to the logs, you may be the most familiar with this code. Could you please take a look?

Since this is probably not a new issue, I am temporarily disabling the test cases that can trigger this.

Comment 2 by piman@chromium.org, May 22 2017

Cc: -chfremer@chromium.org piman@chromium.org
Owner: chfremer@chromium.org
It's illegal to access the GpuChannelHost after threads have been cleaned up. It can't work anyway.
The problem is that your thread that runs FakeVideoCaptureDevice outlives the browser threads (main, IO). You need to join it before we destroy the browser threads.
Thanks, for the input. Yes, that makes sense.

Is joining the thread that runs the video capture device the only option we have? Or would it also be possible to somehow use a WeakPtr<> to stop it from calling into GpuChannelHost() without having to wait for the thread to exit?

Comment 4 by piman@chromium.org, May 23 2017

WeakPtrs aren't safe across threads (must only be dereferenced on the same thread that invalidates it).

Joining is the right thing to do. We already join all browser threads. A random thread running amok during destruction is a large cause for problems.
After a little bit of digging, I am afraid it may be more complicated than it 
first looked.

For the fake video capture device used in the tests that revealed the issue 
(see #1), the thread that it runs on is (in the context of the video caputre 
stack) referred to as the "device thread". On most OSes (including Linux, where 
the logs from #1 came from), this thread happens to be the same as the 
AudioManager's thread. For this case, we could probably mitigate the issue by 
making sure this thread is joined before the BrowserGpuChannelHostFactory is 
destroyed, i.e. by moving the code block at [1] a few lines down to after the 
block at [2].

However:
1.) On Windows, the "device thread" is a dedicated thread owned by 
MediaStreamManager, see [3]. To join it, we would have to add a blocking 
shutdown API to MediaStreamManager.
2.) Non-fake capture devices often come with their own internal thread. The 
VideoCaptureDevice API [4] does not currently offer methods for joining such 
threads.

It appears to me that the video capture device API was designed with the intent
of shutting down the devices asynchronously from the rest of the browser. And 
that seems like a reasonable approach to me. Instead of blocking the 
BrowserMainLoop shutdown sequence to wait for the shutdown of all video capture 
devices, it should be sufficient to communicate to the video capture device 
threads that certain things can no longer be used. This could be done via an 
atomic flag to avoid inter-thread round trips.

My proposal would be to add an atomic bool to GpuChannelHost and set it to false
when DestroyChannel() is called. After that, it would still be legal for clients
to call methods on the GpuChannelHost instance, but they would be either no-ops 
or return failure codes.

What do you think about this proposal?

[1] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?dr=CSs&l=1352
[2] https://cs.chromium.org/chromium/src/content/browser/browser_main_loop.cc?dr=CSs&l=1368
[3] https://cs.chromium.org/chromium/src/content/browser/renderer_host/media/media_stream_manager.h?q=MediaStreamManager&dr=CSs&l=415
[4] https://cs.chromium.org/chromium/src/media/capture/video/video_capture_device.h?l=71

Comment 6 by piman@chromium.org, May 23 2017

Please, no. Just join the threads please.

Comment 7 by piman@chromium.org, May 23 2017

More details:
1- GpuChannelHost is in the critical path for many rendering/display/UI workloads, and adding overhead (atomic variable tests) is not good.
2- there's probably a lot more than GpuChannelHost that is not safe to use any more after the browser threads are joined. The main thread which assumes it can freely destroy its data structures without synchronization, and having threads outlive it yet still access its data structures is unsafe. We shouldn't have to play whack-a-mole on these things. Joining the threads is the correct thing to do.
Thanks for the input on this.

Looks like we have to modify the design of the video capture stack and add APIs for joining threads on shutdown. Since this will be a good piece of work, I will open a new issue to track the progress on this and set this one as blocked.
Blockedon: 725627
Status: Fixed (was: Assigned)
I believe the particular issue tracked in this report can no longer happen since the accelerated jpeg decoding has been converted from legacy IPC to Mojo in [1]. The decode call coming from a thread that might still run after the Browser::IO threads have been shut down now posts to the Browser::IO thread which should at that point lead to the posted task being discarded.
=> marking as fixed

[1] https://chromium-review.googlesource.com/c/chromium/src/+/525672

Sign in to add a comment