New issue
Advanced search Search tips

Issue 760257 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Jan 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 678687



Sign in to add a comment

ServiceManager Deadlock

Project Member Reported by jonr...@chromium.org, Aug 29 2017

Issue description

Currently it is possible for services to deadlock with ServiceManager. This occurs if ServiceManager is tearing down services, and a child process is currently blocked waiting on a mojo call.

ServiceManager::Instance will delete its ServiceProcessLauncher which blocks on shutting down the child process.

mojo::Binding::WaitForIncomingMethodCall, when called from a child process will block indefinitely until an appropriate signal is called.

If the child service is waiting on a service which never starts before ServiceManager shutdown, then the wait method never exits.

Normally it is expected that the pipe closure would interrupt the wait. However when this occurs MessagePipeDispatcher::OnPortStatusChanged is never called.

An example of this can be seen in mash_browser_tests.
  quick_launch blocks on UI service
  ServiceManager tries to shutdown quick_launch before UI service has started
  ServiceManager blocks on trying to shutdown the child process
  quick_launch never gets OnPortStatusChanged, and stays in WaitForIncomingMethodCall

A similar timeout appears to be occurring with local_state, but I haven't debugged what local state is blocking on.

I have a proposed change which introduces timeouts on both side of the boundary: https://chromium-review.googlesource.com/c/chromium/src/+/624528

However I haven't debugged the root cause, of why the OnPortStatusChanged is never received.

With the patch, after the timeout, the child process can poll the pipe state to be told that the peer has closed. Without ever getting the notification.
 

Comment 1 by roc...@chromium.org, Aug 29 2017

I unfortunately don't have time to look at this at the moment, but I can at
least clarify that if this is really happening, it must be a result of a
port changing state somewhere in node.cc
<https://cs.chromium.org/chromium/src/mojo/edk/system/ports/node.cc?q=node.cc&sq=package:chromium&l=1>,
and
delegate_->PortStatusChanged not being called afterwards for that port.
Should be easy enough to look for instances of such an oversight.

Comment 2 by roc...@chromium.org, Aug 29 2017

And actually, in this case specifically, we only need to look for instances
of peer_closed changing a port without a corresponding OnPortStatusChanged
call. But this can only happen on kReceiving ports (i.e. ports which can be
wrapped by a MessagePipeDispatcher) in OnObserveClosure and
DestroyAllPortsWithPeer, and both of those methods seem to do the right
thing.

Curious.

Comment 3 by laforge@google.com, Nov 8 2017

Components: Internals>Services>ServiceManager
Bulk applying component Internals>Services>ServiceManager to issues referencing the text ServiceManager.  This may not be 100% accurate, so please feel free to pull the component as needed.
Status: Assigned (was: Untriaged)
Cc: -roc...@chromium.org rockot@google.com
Status: WontFix (was: Assigned)
There has been a lot of work done in ServiceManager, and the multi-process mash structure since I filed this.

Looking at ServiceProcessLauncher::ProcessState::StopInBackground, where the closure is now processed. The WaitForExit (no timeout) is still there. Though this is now a posted task, and not blocking the join from occurring. 

The call site that reproduced this, in screen_mus.cc, is gone now anyways.

I'm marking this as WontFix, since the root cause is all gone. The potential may still be there, if so, feel free to re-open and triage accordingly.

Sign in to add a comment