ServiceManager Deadlock |
|||||
Issue descriptionCurrently it is possible for services to deadlock with ServiceManager. This occurs if ServiceManager is tearing down services, and a child process is currently blocked waiting on a mojo call. ServiceManager::Instance will delete its ServiceProcessLauncher which blocks on shutting down the child process. mojo::Binding::WaitForIncomingMethodCall, when called from a child process will block indefinitely until an appropriate signal is called. If the child service is waiting on a service which never starts before ServiceManager shutdown, then the wait method never exits. Normally it is expected that the pipe closure would interrupt the wait. However when this occurs MessagePipeDispatcher::OnPortStatusChanged is never called. An example of this can be seen in mash_browser_tests. quick_launch blocks on UI service ServiceManager tries to shutdown quick_launch before UI service has started ServiceManager blocks on trying to shutdown the child process quick_launch never gets OnPortStatusChanged, and stays in WaitForIncomingMethodCall A similar timeout appears to be occurring with local_state, but I haven't debugged what local state is blocking on. I have a proposed change which introduces timeouts on both side of the boundary: https://chromium-review.googlesource.com/c/chromium/src/+/624528 However I haven't debugged the root cause, of why the OnPortStatusChanged is never received. With the patch, after the timeout, the child process can poll the pipe state to be told that the peer has closed. Without ever getting the notification.
,
Aug 29 2017
And actually, in this case specifically, we only need to look for instances of peer_closed changing a port without a corresponding OnPortStatusChanged call. But this can only happen on kReceiving ports (i.e. ports which can be wrapped by a MessagePipeDispatcher) in OnObserveClosure and DestroyAllPortsWithPeer, and both of those methods seem to do the right thing. Curious.
,
Nov 8 2017
Bulk applying component Internals>Services>ServiceManager to issues referencing the text ServiceManager. This may not be 100% accurate, so please feel free to pull the component as needed.
,
Aug 1
,
Oct 17
,
Jan 4
There has been a lot of work done in ServiceManager, and the multi-process mash structure since I filed this. Looking at ServiceProcessLauncher::ProcessState::StopInBackground, where the closure is now processed. The WaitForExit (no timeout) is still there. Though this is now a posted task, and not blocking the join from occurring. The call site that reproduced this, in screen_mus.cc, is gone now anyways. I'm marking this as WontFix, since the root cause is all gone. The potential may still be there, if so, feel free to re-open and triage accordingly. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by roc...@chromium.org
, Aug 29 2017