New issue
Advanced search Search tips

Issue 854154 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

SimpleThread::Start may stall indefinitely

Project Member Reported by ossu@chromium.org, Jun 19 2018

Issue description

Upon investigating possible reasons why the browser's audio thread may stall I came upon a couple of crashes during shutdown, where the main thread stalls waiting for the AudioManager to shut down, which can't happen because the audio thread is busy waiting for a WaitableEvent.

Example: crash/8e59bc7e2d7a6b37
In this case, WASAPIAudioOutputStream running on thread 8 is trying to start a new thread for audio device I/O. Since the crash isn't due to that failing, but caught as a shutdown hang, it's impossible to tell for how long the audio thread has been waiting.

It's waiting here: https://cs.chromium.org/chromium/src/base/threading/simple_thread.cc?l=33&rcl=d52474c3202ace50d3a849ec2b19f7ae4191a3d5

The event it's waiting for is triggered on the new thread, in SimpleThread::ThreadMain. If PlatformThread::Create(NonJoinable)WithPriority fails in SimpleThread::StartAsync, that event will never be triggered and the calling thread will stall indefinitely.

As this doesn't cause a crash directly, it's difficult to say how often this happens. Searching through crashes for the two stack frames of base::WaitableEvent::Wait() called by media::WASAPIAudioOutputStream::Start() turned up 600 instances, but even that is not conclusive, since something unrelated may be crashing at the same time as an output stream is started. The hang could also affect other users of SimpleThread.

I suggest changing the DCHECK in SimpleThread::StartAsync to a CHECK, to get proper, actionable crashes from this failure mode. 

 
You may also be interested in issue 795425 (we haven't looked closely at it).
Project Member

Comment 2 by bugdroid1@chromium.org, Jun 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ad50951821ee00c338994a11542f372825af9b4d

commit ad50951821ee00c338994a11542f372825af9b4d
Author: Oskar Sundbom <ossu@chromium.org>
Date: Tue Jun 19 15:02:09 2018

CHECK that SimpleThread is able to create a thread

I stumbled on a couple of crashes where it looks like
the audio thread is deadlocked on the event_.Wait()
call. From what I can tell, there's no way to recover
from a fail thread creation, so any thread that calls
SimpleThread::Start() has a chance of freezing
indefinitely.

This was changed[1] from a CHECK to a DCHECK way back in
2011. There shouldn't be any valid cases where thread
creation fails, so it's probably better to get a clean
crash as soon as this happens, than possibly getting a
crash dump much later, provided a watchdog kicks in.

[1] https://codereview.chromium.org/8368009

Bug: 854154
Change-Id: I72f7318642a39b0e1b23964f0841423bc9e87496
Reviewed-on: https://chromium-review.googlesource.com/1104679
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Oskar Sundbom <ossu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#568449}
[modify] https://crrev.com/ad50951821ee00c338994a11542f372825af9b4d/base/threading/simple_thread.cc

Comment 3 by ossu@chromium.org, Jun 19 2018

Thanks, maxmorin@! Had a quick look and thread creation really do fails quite a lot in that case, so it's not impossible to think this bug is actually hiding some real problems.

A quick look makes me think this might be happening predominantly on 32 bit systems/Chrome. Should have a look at those stats once the change in #2 starts turning up crashes.  
Some crashes have started trickling in, the biggest group of which is currently in CategorizedWorkerThread (see bug 881099).

Overall crashes can be tracked at go/simplethreadstartcrashes - depending on the platform and optimizations, the crash may be attributed to StartAsync or Start.

After cursory analysis, it looks like thread creation fails also outside of low memory conditions, at least if I'm reading the information correctly.

Sign in to add a comment