New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 816620 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 1
Type: Bug

Blocking:
issue 738275



Sign in to add a comment

Mojo/IPC messages sent immediately before a child process exits sometimes get "lost"

Project Member Reported by w...@chromium.org, Feb 26 2018

Issue description

Under the Fuchsia FYI bots we see infrequent flakes of IPCChannel and Mojo tests, which fall into two observed behaviours:

- MOJO_RESULT_FAILED_PRECONDITION result from Mojo WaitForSignals() on a connected channel (e.g  issue 810448 ).
- Test "hangs" due to IPC "quit" message never being received from the child process (e.g.  issue 764015 ).

All of these test cases involve tests which launch a child process that sends a message back to the main test process, before exiting.  The failures are therefore consistent with Mojo messages occasionally being "dropped" if they are sent immediately before process exit (or possibly if they are sent immediately before Channel teardown?).

Existing flake reports matching one of the two symptoms described above will be merged into this bug.  New reports of flakes in these tests can be added as comments to this bug if they match, or if in doubt then please file a new bug for us to review and de-dupe if appropriate.
 

Comment 1 by w...@chromium.org, Feb 26 2018

Cc: w...@chromium.org
 Issue 764015  has been merged into this issue.

Comment 2 by w...@chromium.org, Feb 26 2018

 Issue 810448  has been merged into this issue.

Comment 3 by w...@chromium.org, Feb 26 2018

 Issue 755848  has been merged into this issue.

Comment 4 by w...@chromium.org, Feb 26 2018

 Issue 816606  has been merged into this issue.

Comment 5 by w...@chromium.org, Feb 26 2018

 Issue 764018  has been merged into this issue.
Project Member

Comment 6 by bugdroid1@chromium.org, Feb 26 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d95d4804ffb0644c2bdc32a1b890a9473b45be1e

commit d95d4804ffb0644c2bdc32a1b890a9473b45be1e
Author: Wez <wez@chromium.org>
Date: Mon Feb 26 20:50:47 2018

Add OnChannelError() handler to QuitListener rather than hang on close.

Without an OnChannelError() handler on the QuitListener, the test can
hang if the IPC channel disconnects before the Quit message is received.
Adding an explicit handler will make it easier to diagnose this failure
mode.

Also migrates DCHECK()s in some existing OnChannelError() handlers in
tests to use CHECK(), to avoid those tests potentially hanging in
Release builds.

Bug:  816606 ,  816620 ,  764015 
Change-Id: Ibe6ffc7ecd02f6029c74a7b3c1190030b656056f
Reviewed-on: https://chromium-review.googlesource.com/936427
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#539262}
[modify] https://crrev.com/d95d4804ffb0644c2bdc32a1b890a9473b45be1e/ipc/ipc_channel_mojo_unittest.cc
[modify] https://crrev.com/d95d4804ffb0644c2bdc32a1b890a9473b45be1e/ipc/ipc_channel_proxy_unittest.cc

Comment 7 by w...@chromium.org, Feb 26 2018

 Issue 815314  has been merged into this issue.

Comment 8 by w...@chromium.org, Feb 26 2018

Blocking: 738275

Comment 9 by w...@chromium.org, Feb 26 2018

Cc: jbudorick@chromium.org
 Issue 761401  has been merged into this issue.

Comment 10 by w...@chromium.org, Mar 7 2018

IPCMojoBootstrapTest.ReceiveEmptyMessage appears to have hit this in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/14550
Hmm. We don't actually block shutdown on channels flushing their outgoing
queue, but we probably should. It's so rare that a channel queues anything
though, I'm surprised any tests would hit this. That is unless something
about the Fuchsia implementation can cause writes to not complete
immediately?

Comment 12 by w...@chromium.org, Mar 7 2018

The Channel implementation for Fuchsia actually doesn't implement queuing
at all, since the zx_channel_write() API doesn't yet provide a flow-control
signal.

Potential sources of these message drops seem to be:
1. Incorrect handling of channel-readable vs channel-closed signals in the
Channel impl for Fuchsia.
2. Lack of blocking to wait for IO-thread work to complete when exiting.
3. Raciness in QuitWhenIdle vs PostTask(WriteMessage) ordering during exit
(i.e. this would most likely be highly specific to the tests, which do a
lot of RunLoop+QuitWhenIdle stuff).

Comment 13 by w...@chromium.org, Mar 7 2018

Crash stack from failures in IPCChannelBadMessageTest look a little more helpful then IPCChannelProxy failures (which lose context due to use of PostTask):

[00001.999] 02166.02205> [3:1682880091:0307/212758.430939:1999143:FATAL:ipc_channel_proxy_unittest.cc(75)] Check failed: quit_message_received_.
#00: base::debug::StackTrace::StackTrace(unsigned long) at ??:?
#01: base::debug::StackTrace::StackTrace() at ??:?
#02: logging::LogMessage::~LogMessage() at ??:?
#03: (anonymous namespace)::QuitListener::OnChannelError() at ??:?
#04: IPC::ChannelMojo::OnPipeError() at ??:?
#05: IPC::internal::MessagePipeReader::OnPipeError(unsigned int) at ??:?
... base::Callback glue ...
#09: base::internal::Invoker<base::internal::BindState<void (IPC::internal::MessagePipeReader::*)(unsigned int), base::internal::UnretainedWrapper<IPC::internal::MessagePipeReader>, unsigned int>, void ()>::Run(base::internal::BindStateBase*) at ??:?
#10: base::OnceCallback<void ()>::Run() && at ??:?
#11: mojo::InterfaceEndpointClient::NotifyError(base::Optional<mojo::DisconnectReason> const&) at ??:?
#12: IPC::(anonymous namespace)::ChannelAssociatedGroupController::NotifyEndpointOfError(IPC::(anonymous namespace)::ChannelAssociatedGroupController::Endpoint*, bool) at ??:?
#13: IPC::(anonymous namespace)::ChannelAssociatedGroupController::OnPipeError() at ??:?
... base::Callback glue ...
#17: base::internal::Invoker<base::internal::BindState<void (IPC::(anonymous namespace)::ChannelAssociatedGroupController::*)(), base::internal::UnretainedWrapper<IPC::(anonymous namespace)::ChannelAssociatedGroupController> >, void ()>::Run(base::internal::BindStateBase*) at ??:?
#18: base::OnceCallback<void ()>::Run() && at ??:?
#19: mojo::Connector::HandleError(bool, bool) at ??:?
#20: mojo::Connector::OnHandleReadyInternal(unsigned int) at ??:?
#21: mojo::Connector::OnWatcherHandleReady(unsigned int) at ??:?
... base::Callback glue ...
#27: mojo::SimpleWatcher::DiscardReadyState(base::RepeatingCallback<void (unsigned int)> const&, unsigned int, mojo::HandleSignalsState const&) at ??:?
... base::Callback glue ...
#33: mojo::SimpleWatcher::OnHandleReady(int, unsigned int, mojo::HandleSignalsState const&) at ??:?
... base::Callback glue ...
#37: base::internal::Invoker<base::internal::BindState<void (mojo::SimpleWatcher::*)(int, unsigned int, mojo::HandleSignalsState const&), base::WeakPtr<mojo::SimpleWatcher>, int, unsigned int, mojo::HandleSignalsState>, void ()>::Run(base::internal::BindStateBase*) at ??:?
#38: base::OnceCallback<void ()>::Run() && at ??:?
#39: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:?
... base::RunLoop stuff ...
#46: base::RunLoop::Run() at ??:?
#47: (anonymous namespace)::CreateRunLoopAndRun(base::RunLoop**) at ??:?
#48: (anonymous namespace)::IPCChannelBadMessageTest::SendQuitMessageAndWaitForIdle() at ??:?
#49: (anonymous namespace)::IPCChannelBadMessageTest_BadMessage_Test::TestBody() at ??:?
...


Comment 14 by w...@chromium.org, Mar 8 2018

Summary: Mojo/IPC messages sent immediately before a child process exits sometimes get "lost" (was: Mojo/IPC messages send immediately before a child process exits sometimes get "lost")
It appears that this is a race-condition between delivery of a Node update message to the peer, and the peer tearing down the underlying platform channel.

I am able to repro a failure in the IPCChannelBadMessageTest.BadMessage test under e.g. Linux by introducing an artificial delay (see https://chromium-review.googlesource.com/c/chromium/src/+/954443), sufficient to allow the child process to have fully terminated, and closed the native IPC channel, before we try to call on to InitiateproxyRemoval(), which results in the Node update being sent.


Comment 15 by w...@chromium.org, Mar 8 2018

Running IPCMojoBootstrapTest.ReceiveEmptyMessage with SetMinLogLevel(-2), and the repro patch from #14 applied, I get:

[175678:175678:0308/112356.146953:2503994187262:VERBOSE2:node.cc(786)] Created port 5CDC66076F901318.6B946D787CEDD930@DA638EC8A6563EE9.2F27A2D8F98FAB00
[175678:175678:0308/112356.147048:2503994187345:VERBOSE2:node.cc(786)] Created port E877BA60F7B70C94.C976F798C584191@DA638EC8A6563EE9.2F27A2D8F98FAB00
[175678:175678:0308/112356.147097:2503994187394:VERBOSE2:node_controller.cc(721)] Ignoring status change for 5CDC66076F901318.6B946D787CEDD930 because it doesn't have an observer.
[175678:175678:0308/112356.147140:2503994187438:VERBOSE2:node_controller.cc(721)] Ignoring status change for E877BA60F7B70C94.C976F798C584191 because it doesn't have an observer.
[175678:175678:0308/112356.147184:2503994187481:VERBOSE2:message_pipe_dispatcher.cc(96)] Creating new MessagePipeDispatcher for port 5CDC66076F901318.6B946D787CEDD930 [pipe_id=9187201950435737471; endpoint=0]
[175678:175678:0308/112356.148592:2503994188894:VERBOSE2:node_controller.cc(721)] Ignoring status change for E877BA60F7B70C94.C976F798C584191 because it doesn't have an observer.
[175678:175679:0308/112356.170809:2503994211117:VERBOSE1:node_controller.cc(789)] Node DA638EC8A6563EE9.2F27A2D8F98FAB00 accepted invitee 74138D154D2AC45F.C47AB9B677782C4
[175678:175679:0308/112356.170943:2503994211246:VERBOSE2:node_controller.cc(497)] Accepting new peer 74138D154D2AC45F.C47AB9B677782C4 on node DA638EC8A6563EE9.2F27A2D8F98FAB00
[175678:175679:0308/112356.171249:2503994211556:VERBOSE2:node_controller.cc(989)] Node DA638EC8A6563EE9.2F27A2D8F98FAB00 received RequestPortMerge for name test_pipe and port 5BEB17DD2D5E3109.35E5C9433F859D1C@74138D154D2AC45F.C47AB9B677782C4
[175678:175679:0308/112356.171393:2503994211694:VERBOSE1:node.cc(398)] Sending MergePort from E877BA60F7B70C94.C976F798C584191@DA638EC8A6563EE9.2F27A2D8F98FAB00 to 5BEB17DD2D5E3109.35E5C9433F859D1C@74138D154D2AC45F.C47AB9B677782C4
[175678:175679:0308/112356.171664:2503994211966:VERBOSE2:node.cc(540)] PortAccepted at E877BA60F7B70C94.C976F798C584191@DA638EC8A6563EE9.2F27A2D8F98FAB00 pointing to 647F1FDDEB862E84.D124A1FCF4A5375B@74138D154D2AC45F.C47AB9B677782C4
^---- up to this point is identical to the trace without the patch (below).
[175678:175679:0308/112357.172290:2503995212600:VERBOSE1:node_controller.cc(527)] Dropped peer 74138D154D2AC45F.C47AB9B677782C4
[175678:175679:0308/112357.172446:2503995212750:VERBOSE1:node.cc(435)] Observing lost connection from node DA638EC8A6563EE9.2F27A2D8F98FAB00 to node 74138D154D2AC45F.C47AB9B677782C4
[175678:175679:0308/112357.172577:2503995212880:VERBOSE2:node.cc(809)] Deleted port E877BA60F7B70C94.C976F798C584191@DA638EC8A6563EE9.2F27A2D8F98FAB00
[175678:175679:0308/112357.172650:2503995212952:VERBOSE2:node.cc(1339)] Forcibly deleted port E877BA60F7B70C94.C976F798C584191@DA638EC8A6563EE9.2F27A2D8F98FAB00
[175678:175679:0308/112357.172772:2503995213075:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=5CDC66076F901318.6B946D787CEDD930]

whereas without the patch I see:

[176083:176083:0308/112657.927113:2504175967420:VERBOSE2:node.cc(786)] Created port 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176083:0308/112657.927201:2504175967498:VERBOSE2:node.cc(786)] Created port 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176083:0308/112657.927240:2504175967537:VERBOSE2:node_controller.cc(721)] Ignoring status change for 6C541D14264C2297.4F79786702084FE1 because it doesn't have an observer.
[176083:176083:0308/112657.927281:2504175967577:VERBOSE2:node_controller.cc(721)] Ignoring status change for 1B4C89D5513F6C30.7B09BB60BD03A131 because it doesn't have an observer.
[176083:176083:0308/112657.927319:2504175967615:VERBOSE2:message_pipe_dispatcher.cc(96)] Creating new MessagePipeDispatcher for port 6C541D14264C2297.4F79786702084FE1 [pipe_id=9187201950435737471; endpoint=0]
[176083:176083:0308/112657.928904:2504175969214:VERBOSE2:node_controller.cc(721)] Ignoring status change for 1B4C89D5513F6C30.7B09BB60BD03A131 because it doesn't have an observer.
[176083:176084:0308/112657.943205:2504175983512:VERBOSE1:node_controller.cc(789)] Node AE1D8068B8EFCF22.18128A3E19699EEC accepted invitee BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.943349:2504175983651:VERBOSE2:node_controller.cc(497)] Accepting new peer BAE5C86F7BEB9CE8.B9521AC353BD4DDD on node AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176084:0308/112657.943570:2504175983876:VERBOSE2:node_controller.cc(989)] Node AE1D8068B8EFCF22.18128A3E19699EEC received RequestPortMerge for name test_pipe and port 94B7C9E4A6988CCE.8ACFEF97F7611FDF@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.943695:2504175983996:VERBOSE1:node.cc(398)] Sending MergePort from 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC to 94B7C9E4A6988CCE.8ACFEF97F7611FDF@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.944048:2504175984354:VERBOSE2:node.cc(540)] PortAccepted at 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC pointing to BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
^---- up to this point is identical to the trace with the patch (above).
[176083:176084:0308/112657.944445:2504175984749:VERBOSE2:node.cc(574)] ObserveProxy at 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC, proxy at BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD pointing to F79C58FEDF15F13E.92CCFFC8BA033DDD@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.944561:2504175984866:VERBOSE2:node.cc(574)] ObserveProxy at 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC, proxy at BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD pointing to F79C58FEDF15F13E.92CCFFC8BA033DDD@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.944649:2504175984949:VERBOSE2:node.cc(610)] Delaying ObserveProxyAck to BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.944773:2504175985074:VERBOSE2:node.cc(574)] ObserveProxy at 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC, proxy at 94B7C9E4A6988CCE.8ACFEF97F7611FDF@BAE5C86F7BEB9CE8.B9521AC353BD4DDD pointing to 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176084:0308/112657.944860:2504175985160:VERBOSE2:node.cc(574)] ObserveProxy at 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC, proxy at 94B7C9E4A6988CCE.8ACFEF97F7611FDF@BAE5C86F7BEB9CE8.B9521AC353BD4DDD pointing to 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176084:0308/112657.945009:2504175985310:VERBOSE2:node.cc(574)] ObserveProxy at 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC, proxy at 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC pointing to BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.945107:2504175985407:VERBOSE2:node.cc(596)] Forwarding ObserveProxyAck from 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC to 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176084:0308/112657.945182:2504175985482:VERBOSE2:node.cc(637)] ObserveProxyAck at 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC (last_sequence_num=1)
[176083:176084:0308/112657.945283:2504175985583:VERBOSE2:node.cc(809)] Deleted port 1B4C89D5513F6C30.7B09BB60BD03A131@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176084:0308/112657.945489:2504175985793:VERBOSE2:node.cc(694)] ObserveClosure at 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC (state=1) pointing to BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD (last_sequence_num=3)
[176083:176084:0308/112657.945608:2504175985909:VERBOSE2:node.cc(724)] Forwarding ObserveClosure from 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC to peer BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD (last_sequence_num=1)
[176083:176084:0308/112657.945714:2504175986014:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=6C541D14264C2297.4F79786702084FE1]
[176083:176084:0308/112657.946991:2504175987295:VERBOSE1:node_controller.cc(527)] Dropped peer BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176084:0308/112657.947085:2504175987386:VERBOSE1:node.cc(435)] Observing lost connection from node AE1D8068B8EFCF22.18128A3E19699EEC to node BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176083:0308/112657.948776:2504175989083:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 0 [port=6C541D14264C2297.4F79786702084FE1]
[176083:176083:0308/112657.948890:2504175989192:VERBOSE2:node.cc(809)] Deleted port 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC
[176083:176083:0308/112657.948960:2504175989261:VERBOSE2:node.cc(282)] Sending ObserveClosure from 6C541D14264C2297.4F79786702084FE1@AE1D8068B8EFCF22.18128A3E19699EEC to BCDC4A893346D7A7.4A21D133EBDDA6CC@BAE5C86F7BEB9CE8.B9521AC353BD4DDD
[176083:176083:0308/112657.949050:2504175989351:VERBOSE1:node_controller.cc(629)] Dropping message for unknown peer: BAE5C86F7BEB9CE8.B9521AC353BD4DDD

Comment 16 by w...@chromium.org, Mar 8 2018

Running IPCChannelBadMessageTest.BadMessage with the repro patch:

[177468:177468:0308/113218.435439:2504496475739:VERBOSE2:node.cc(786)] Created port E26DA1EA7C3DA8E9.7B963D5C7503563C@24AA90917416AFAB.116F15595B43FD50
[177468:177468:0308/113218.435519:2504496475816:VERBOSE2:node.cc(786)] Created port 27AC433286CE3F9C.6603A1213D410ACD@24AA90917416AFAB.116F15595B43FD50
[177468:177468:0308/113218.435562:2504496475868:VERBOSE2:node_controller.cc(721)] Ignoring status change for E26DA1EA7C3DA8E9.7B963D5C7503563C because it doesn't have an observer.
[177468:177468:0308/113218.435611:2504496475908:VERBOSE2:node_controller.cc(721)] Ignoring status change for 27AC433286CE3F9C.6603A1213D410ACD because it doesn't have an observer.
[177468:177468:0308/113218.435646:2504496475942:VERBOSE2:message_pipe_dispatcher.cc(96)] Creating new MessagePipeDispatcher for port E26DA1EA7C3DA8E9.7B963D5C7503563C [pipe_id=9187201950435737471; endpoint=0]
[177468:177468:0308/113218.437060:2504496477374:VERBOSE2:node_controller.cc(721)] Ignoring status change for 27AC433286CE3F9C.6603A1213D410ACD because it doesn't have an observer.
[177468:177468:0308/113218.437421:2504496477725:VERBOSE2:node_controller.cc(721)] Ignoring status change for 27AC433286CE3F9C.6603A1213D410ACD because it doesn't have an observer.
[177468:177468:0308/113218.437621:2504496477924:VERBOSE2:node_controller.cc(721)] Ignoring status change for 27AC433286CE3F9C.6603A1213D410ACD because it doesn't have an observer.
[177468:177469:0308/113218.465120:2504496505427:VERBOSE1:node_controller.cc(789)] Node 24AA90917416AFAB.116F15595B43FD50 accepted invitee 8266D204C4892860.724DBB3F25A0A42F
[177468:177469:0308/113218.465261:2504496505563:VERBOSE2:node_controller.cc(497)] Accepting new peer 8266D204C4892860.724DBB3F25A0A42F on node 24AA90917416AFAB.116F15595B43FD50
[177468:177469:0308/113218.465551:2504496505858:VERBOSE2:node_controller.cc(989)] Node 24AA90917416AFAB.116F15595B43FD50 received RequestPortMerge for name test_pipe and port 1FCBF9AA65820BF8.C4DDB91D0B00A3C@8266D204C4892860.724DBB3F25A0A42F
[177468:177469:0308/113218.465667:2504496505968:VERBOSE1:node.cc(398)] Sending MergePort from 27AC433286CE3F9C.6603A1213D410ACD@24AA90917416AFAB.116F15595B43FD50 to 1FCBF9AA65820BF8.C4DDB91D0B00A3C@8266D204C4892860.724DBB3F25A0A42F
[177468:177469:0308/113218.466019:2504496506326:VERBOSE2:node.cc(540)] PortAccepted at 27AC433286CE3F9C.6603A1213D410ACD@24AA90917416AFAB.116F15595B43FD50 pointing to 8B9A723C8576DDCC.E2907E872E4F5761@8266D204C4892860.724DBB3F25A0A42F
^-- appears identical up to here.

[177468:177469:0308/113219.466603:2504497506914:VERBOSE1:node_controller.cc(527)] Dropped peer 8266D204C4892860.724DBB3F25A0A42F
[177468:177469:0308/113219.466772:2504497507075:VERBOSE1:node.cc(435)] Observing lost connection from node 24AA90917416AFAB.116F15595B43FD50 to node 8266D204C4892860.724DBB3F25A0A42F
[177468:177469:0308/113219.466889:2504497507191:VERBOSE2:node.cc(809)] Deleted port 27AC433286CE3F9C.6603A1213D410ACD@24AA90917416AFAB.116F15595B43FD50
[177468:177469:0308/113219.466965:2504497507267:VERBOSE2:node.cc(1339)] Forcibly deleted port 27AC433286CE3F9C.6603A1213D410ACD@24AA90917416AFAB.116F15595B43FD50
[177468:177469:0308/113219.467091:2504497507404:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=E26DA1EA7C3DA8E9.7B963D5C7503563C]
[177468:177468:0308/113219.467952:2504497508260:FATAL:ipc_channel_proxy_unittest.cc(75)] Check failed: quit_message_received_. 

versus without:

[177257:177257:0308/113117.312615:2504435352914:VERBOSE2:node.cc(786)] Created port 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177257:0308/113117.312697:2504435352995:VERBOSE2:node.cc(786)] Created port 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177257:0308/113117.312746:2504435353052:VERBOSE2:node_controller.cc(721)] Ignoring status change for 7EE2166142A147AE.ABE50F4FB87870DC because it doesn't have an observer.
[177257:177257:0308/113117.312798:2504435353095:VERBOSE2:node_controller.cc(721)] Ignoring status change for 884D339BF13E5F9C.F0C2E2093E24FA8E because it doesn't have an observer.
[177257:177257:0308/113117.312839:2504435353136:VERBOSE2:message_pipe_dispatcher.cc(96)] Creating new MessagePipeDispatcher for port 7EE2166142A147AE.ABE50F4FB87870DC [pipe_id=9187201950435737471; endpoint=0]
[177257:177257:0308/113117.314096:2504435354398:VERBOSE2:node_controller.cc(721)] Ignoring status change for 884D339BF13E5F9C.F0C2E2093E24FA8E because it doesn't have an observer.
[177257:177257:0308/113117.314292:2504435354590:VERBOSE2:node_controller.cc(721)] Ignoring status change for 884D339BF13E5F9C.F0C2E2093E24FA8E because it doesn't have an observer.
[177257:177257:0308/113117.314406:2504435354703:VERBOSE2:node_controller.cc(721)] Ignoring status change for 884D339BF13E5F9C.F0C2E2093E24FA8E because it doesn't have an observer.
[177257:177258:0308/113117.339097:2504435379405:VERBOSE1:node_controller.cc(789)] Node F4EB2300055074AC.C8A6EFC63ACE2BA2 accepted invitee C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.339251:2504435379553:VERBOSE2:node_controller.cc(497)] Accepting new peer C0392218FC033CAC.3372B36A6CBB2A21 on node F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177258:0308/113117.339539:2504435379846:VERBOSE2:node_controller.cc(989)] Node F4EB2300055074AC.C8A6EFC63ACE2BA2 received RequestPortMerge for name test_pipe and port 77290FBC5C0496.B59621FE5E84D775@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.339656:2504435379957:VERBOSE1:node.cc(398)] Sending MergePort from 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2 to 77290FBC5C0496.B59621FE5E84D775@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.339974:2504435380275:VERBOSE2:node.cc(540)] PortAccepted at 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2 pointing to EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21
^-- appears identical up to here.

[177257:177258:0308/113117.340342:2504435380644:VERBOSE2:node.cc(574)] ObserveProxy at 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21 pointing to 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.340467:2504435380773:VERBOSE2:node.cc(574)] ObserveProxy at 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21 pointing to 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.340576:2504435380877:VERBOSE2:node.cc(610)] Delaying ObserveProxyAck to EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.340681:2504435380982:VERBOSE2:node.cc(574)] ObserveProxy at 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at 77290FBC5C0496.B59621FE5E84D775@C0392218FC033CAC.3372B36A6CBB2A21 pointing to 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177258:0308/113117.340764:2504435381064:VERBOSE2:node.cc(574)] ObserveProxy at 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at 77290FBC5C0496.B59621FE5E84D775@C0392218FC033CAC.3372B36A6CBB2A21 pointing to 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177258:0308/113117.340925:2504435381226:VERBOSE2:node.cc(574)] ObserveProxy at 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2 pointing to EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.341038:2504435381339:VERBOSE2:node.cc(596)] Forwarding ObserveProxyAck from 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2 to 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177258:0308/113117.341115:2504435381415:VERBOSE2:node.cc(637)] ObserveProxyAck at 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2 (last_sequence_num=3)
[177257:177258:0308/113117.341207:2504435381508:VERBOSE2:node.cc(809)] Deleted port 884D339BF13E5F9C.F0C2E2093E24FA8E@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177258:0308/113117.341532:2504435381833:VERBOSE2:node.cc(574)] ObserveProxy at 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2, proxy at EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21 pointing to 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.341646:2504435381947:VERBOSE2:node.cc(596)] Forwarding ObserveProxyAck from 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2 to EE374D430B7BCB7E.FC60FCC822D5C890@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.341804:2504435382105:VERBOSE2:node.cc(694)] ObserveClosure at 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2 (state=1) pointing to 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21 (last_sequence_num=5)
[177257:177258:0308/113117.341915:2504435382216:VERBOSE2:node.cc(724)] Forwarding ObserveClosure from 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2 to peer 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21 (last_sequence_num=3)
[177257:177258:0308/113117.342047:2504435382348:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=7EE2166142A147AE.ABE50F4FB87870DC]
[177257:177258:0308/113117.345928:2504435386232:VERBOSE1:node_controller.cc(527)] Dropped peer C0392218FC033CAC.3372B36A6CBB2A21
[177257:177258:0308/113117.346018:2504435386319:VERBOSE1:node.cc(435)] Observing lost connection from node F4EB2300055074AC.C8A6EFC63ACE2BA2 to node C0392218FC033CAC.3372B36A6CBB2A21
[177257:177257:0308/113117.346401:2504435386708:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 0 [port=7EE2166142A147AE.ABE50F4FB87870DC]
[177257:177257:0308/113117.346544:2504435386846:VERBOSE2:node.cc(809)] Deleted port 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2
[177257:177257:0308/113117.346614:2504435386915:VERBOSE2:node.cc(282)] Sending ObserveClosure from 7EE2166142A147AE.ABE50F4FB87870DC@F4EB2300055074AC.C8A6EFC63ACE2BA2 to 577DB812978E3413.25621097804E2E7C@C0392218FC033CAC.3372B36A6CBB2A21
[177257:177257:0308/113117.346731:2504435387032:VERBOSE1:node_controller.cc(629)] Dropping message for unknown peer: C0392218FC033CAC.3372B36A6CBB2A21

Comment 17 by w...@chromium.org, Mar 8 2018

Reducing the repro sleep to 200ms, I get IPCMojoBootstrapTest.ReceiveEmptyMessage runs that look identical up to the sleep:

179949:179949:0308/114407.814065:2505205854364:VERBOSE2:node.cc(786)] Created port CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41
[179949:179949:0308/114407.814144:2505205854447:VERBOSE2:node.cc(786)] Created port 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41
[179949:179949:0308/114407.814187:2505205854483:VERBOSE2:node_controller.cc(721)] Ignoring status change for CE38CF2893AC91A9.B7B984E6370CA228 because it doesn't have an observer.
[179949:179949:0308/114407.814241:2505205854537:VERBOSE2:node_controller.cc(721)] Ignoring status change for 8302860777A211D8.4CC287C61808DF60 because it doesn't have an observer.
[179949:179949:0308/114407.814276:2505205854572:VERBOSE2:message_pipe_dispatcher.cc(96)] Creating new MessagePipeDispatcher for port CE38CF2893AC91A9.B7B984E6370CA228 [pipe_id=9187201950435737471; endpoint=0]
[179949:179949:0308/114407.816000:2505205856309:VERBOSE2:node_controller.cc(721)] Ignoring status change for 8302860777A211D8.4CC287C61808DF60 because it doesn't have an observer.
[179951:179952:0308/114407.846083:2505205886390:VERBOSE1:node_controller.cc(754)] Broker client A3C11982ADEE03AE.AFEC689BC1602D64 accepting invitation from B6E4120287BBD1E3.882349BCB43F2A41
[179949:179950:0308/114407.846164:2505205886474:VERBOSE1:node_controller.cc(789)] Node B6E4120287BBD1E3.882349BCB43F2A41 accepted invitee A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114407.846331:2505205886637:VERBOSE2:node_controller.cc(497)] Accepting new peer A3C11982ADEE03AE.AFEC689BC1602D64 on node B6E4120287BBD1E3.882349BCB43F2A41
[179951:179952:0308/114407.846542:2505205886852:VERBOSE2:node_controller.cc(497)] Accepting new peer B6E4120287BBD1E3.882349BCB43F2A41 on node A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179951:0308/114407.846664:2505205886969:VERBOSE2:node_controller.cc(721)] Ignoring status change for 1E168D8E76A3FA03.DC139092D1D6CDF6 because it doesn't have an observer.
[179951:179952:0308/114407.846690:2505205886992:VERBOSE1:node_controller.cc(963)] Client A3C11982ADEE03AE.AFEC689BC1602D64 accepted by broker B6E4120287BBD1E3.882349BCB43F2A41
[179949:179950:0308/114407.846728:2505205887035:VERBOSE2:node_controller.cc(989)] Node B6E4120287BBD1E3.882349BCB43F2A41 received RequestPortMerge for name test_pipe and port 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179951:0308/114407.846892:2505205887196:VERBOSE2:node_controller.cc(721)] Ignoring status change for 1E168D8E76A3FA03.DC139092D1D6CDF6 because it doesn't have an observer.
[179949:179950:0308/114407.846892:2505205887200:VERBOSE1:node.cc(398)] Sending MergePort from 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41 to 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179952:0308/114407.847176:2505205887480:VERBOSE1:node.cc(749)] MergePort at 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64 merging with proxy FA8ACC5E58E74636.BAD40D0925F1BA97@A3C11982ADEE03AE.AFEC689BC1602D64 pointing to CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41 referred by 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41
[179951:179952:0308/114407.847323:2505205887640:VERBOSE2:node.cc(1006)] Accepting port FA8ACC5E58E74636.BAD40D0925F1BA97 [peer_closed=0; last_sequence_num_to_receive=0]
[179951:179952:0308/114407.847442:2505205887744:VERBOSE2:node.cc(786)] Created port FA8ACC5E58E74636.BAD40D0925F1BA97@A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179952:0308/114407.847639:2505205887946:VERBOSE2:node.cc(574)] ObserveProxy at 5998C8A4C8A2204D.BD4E5B002547B3BB@A3C11982ADEE03AE.AFEC689BC1602D64, proxy at FA8ACC5E58E74636.BAD40D0925F1BA97@A3C11982ADEE03AE.AFEC689BC1602D64 pointing to 5998C8A4C8A2204D.BD4E5B002547B3BB@A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114407.847650:2505205887951:VERBOSE2:node.cc(540)] PortAccepted at 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41 pointing to FA8ACC5E58E74636.BAD40D0925F1BA97@A3C11982ADEE03AE.AFEC689BC1602D64

and then either succeed:

[179951:179952:0308/114407.847774:2505205888076:VERBOSE2:node.cc(574)] ObserveProxy at 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64, proxy at FA8ACC5E58E74636.BAD40D0925F1BA97@A3C11982ADEE03AE.AFEC689BC1602D64 pointing to 5998C8A4C8A2204D.BD4E5B002547B3BB@A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114407.847829:2505205888131:ERROR:node.cc(1196)] SLEEP
[179951:179951:0308/114407.848524:2505205888825:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 1 [port=5998C8A4C8A2204D.BD4E5B002547B3BB]
[179951:179951:0308/114407.848613:2505205888912:VERBOSE2:node.cc(809)] Deleted port 5998C8A4C8A2204D.BD4E5B002547B3BB@A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179951:0308/114407.848672:2505205888970:VERBOSE2:node.cc(282)] Sending ObserveClosure from 5998C8A4C8A2204D.BD4E5B002547B3BB@A3C11982ADEE03AE.AFEC689BC1602D64 to 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64
[179951:179951:0308/114407.848735:2505205889034:VERBOSE2:node.cc(694)] ObserveClosure at 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64 (state=3) pointing to CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41 (last_sequence_num=3)
[179951:179951:0308/114407.848796:2505205889094:VERBOSE2:node.cc(724)] Forwarding ObserveClosure from 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64 to peer CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41 (last_sequence_num=3)
[179951:179951:0308/114407.848858:2505205889157:VERBOSE2:node.cc(809)] Deleted port 1E168D8E76A3FA03.DC139092D1D6CDF6@A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114408.048471:2505206088783:VERBOSE1:node_controller.cc(527)] Dropped peer A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114408.048622:2505206088929:VERBOSE1:node.cc(435)] Observing lost connection from node B6E4120287BBD1E3.882349BCB43F2A41 to node A3C11982ADEE03AE.AFEC689BC1602D64
[179949:179950:0308/114408.048759:2505206089062:VERBOSE2:node.cc(809)] Deleted port 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41
[179949:179950:0308/114408.048831:2505206089133:VERBOSE2:node.cc(1340)] Forcibly deleted port 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41
[179949:179950:0308/114408.048975:2505206089277:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=CE38CF2893AC91A9.B7B984E6370CA228]
[179949:179949:0308/114408.049673:2505206089977:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 0 [port=CE38CF2893AC91A9.B7B984E6370CA228]
[179949:179949:0308/114408.049823:2505206090125:VERBOSE2:node.cc(809)] Deleted port CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41
[179949:179949:0308/114408.049891:2505206090191:VERBOSE2:node.cc(282)] Sending ObserveClosure from CE38CF2893AC91A9.B7B984E6370CA228@B6E4120287BBD1E3.882349BCB43F2A41 to 8302860777A211D8.4CC287C61808DF60@B6E4120287BBD1E3.882349BCB43F2A41

or fail:

[179953:179954:0308/114410.503341:2505208543643:ERROR:node.cc(1196)] SLEEP
[179955:179956:0308/114410.503313:2505208543615:VERBOSE2:node.cc(574)] ObserveProxy at 5B8EC058D8B5B1E2.C1BF3238901CB1C4@7BE30C0747773B49.5A03F6C01E2DC0B, proxy at 9554886FA5839852.F4CE2FBE8FDBD6B1@7BE30C0747773B49.5A03F6C01E2DC0B pointing to C2169F90BCFD8AE0.FD366184862FD931@7BE30C0747773B49.5A03F6C01E2DC0B
[179955:179955:0308/114410.504347:2505208544652:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 1 [port=C2169F90BCFD8AE0.FD366184862FD931]
[179955:179955:0308/114410.504462:2505208544764:VERBOSE2:node.cc(809)] Deleted port C2169F90BCFD8AE0.FD366184862FD931@7BE30C0747773B49.5A03F6C01E2DC0B
[179955:179955:0308/114410.504539:2505208544839:VERBOSE2:node.cc(282)] Sending ObserveClosure from C2169F90BCFD8AE0.FD366184862FD931@7BE30C0747773B49.5A03F6C01E2DC0B to 5B8EC058D8B5B1E2.C1BF3238901CB1C4@7BE30C0747773B49.5A03F6C01E2DC0B
[179955:179955:0308/114410.504619:2505208544920:VERBOSE2:node.cc(694)] ObserveClosure at 5B8EC058D8B5B1E2.C1BF3238901CB1C4@7BE30C0747773B49.5A03F6C01E2DC0B (state=3) pointing to 6C726A7CCD7F17D4.C4D4D72A373B92E0@C5DEB589D7D41E6.997A05BAB1EF094F (last_sequence_num=3)
[179955:179955:0308/114410.504694:2505208544994:VERBOSE2:node.cc(724)] Forwarding ObserveClosure from 5B8EC058D8B5B1E2.C1BF3238901CB1C4@7BE30C0747773B49.5A03F6C01E2DC0B to peer 6C726A7CCD7F17D4.C4D4D72A373B92E0@C5DEB589D7D41E6.997A05BAB1EF094F (last_sequence_num=3)
[179955:179955:0308/114410.504773:2505208545073:VERBOSE2:node.cc(809)] Deleted port 5B8EC058D8B5B1E2.C1BF3238901CB1C4@7BE30C0747773B49.5A03F6C01E2DC0B
[179953:179954:0308/114410.703785:2505208744094:VERBOSE1:node_controller.cc(527)] Dropped peer 7BE30C0747773B49.5A03F6C01E2DC0B
[179953:179954:0308/114410.703959:2505208744264:VERBOSE1:node.cc(435)] Observing lost connection from node C5DEB589D7D41E6.997A05BAB1EF094F to node 7BE30C0747773B49.5A03F6C01E2DC0B
[179953:179954:0308/114410.704079:2505208744380:VERBOSE2:node.cc(809)] Deleted port 559D13F18E5057BD.40A72522534FED27@C5DEB589D7D41E6.997A05BAB1EF094F
[179953:179954:0308/114410.704169:2505208744471:VERBOSE2:node.cc(1340)] Forcibly deleted port 559D13F18E5057BD.40A72522534FED27@C5DEB589D7D41E6.997A05BAB1EF094F
[179953:179954:0308/114410.704286:2505208744587:VERBOSE2:message_pipe_dispatcher.cc(360)] Peer closure detected on message pipe 9187201950435737471 endpoint 0 [port=6C726A7CCD7F17D4.C4D4D72A373B92E0]

Comment 18 by w...@chromium.org, Mar 9 2018

More log output Now With Added IPC:Channel::Send()!

...
[201790:201791:0308/133114.247796:2511632288097:ERROR:node.cc(1196)] SLEEP
[201792:201792:0308/133114.248261:2511632288572:VERBOSE2:ipc_channel_mojo.cc(220)] sending message @0x18cd9428c660 on channel @0x18cd9429eb60 with type 327717
[201792:201792:0308/133114.248621:2511632288925:VERBOSE2:ipc_channel_mojo.cc(220)] sending message @0x18cd942b6020 on channel @0x18cd9429eb60 with type 393262
[201792:201792:0308/133114.249510:2511632289814:VERBOSE2:message_pipe_dispatcher.cc(134)] Closing message pipe 9187201950435737471 endpoint 1 [port=28D6C7FE28A16C6A.505E99342AA2211D]
[201792:201792:0308/133114.249666:2511632289968:VERBOSE2:node.cc(809)] Deleted port 28D6C7FE28A16C6A.505E99342AA2211D@B51677E3BC9F0D30.24B7399638C04E69
[201792:201792:0308/133114.249738:2511632290040:VERBOSE2:node.cc(282)] Sending ObserveClosure from 28D6C7FE28A16C6A.505E99342AA2211D@B51677E3BC9F0D30.24B7399638C04E69 to 5F152440B36B8249.BC06E816C5AB9BA@B51677E3BC9F0D30.24B7399638C04E69
[201792:201792:0308/133114.249828:2511632290129:VERBOSE2:node.cc(694)] ObserveClosure at 5F152440B36B8249.BC06E816C5AB9BA@B51677E3BC9F0D30.24B7399638C04E69 (state=3) pointing to A5E76F0305FA2ADF.862147BE4EA48F5A@916BCEDC7F635E74.29085D65AD5FB54D (last_sequence_num=5)
[201792:201792:0308/133114.249913:2511632290214:VERBOSE2:node.cc(724)] Forwarding ObserveClosure from 5F152440B36B8249.BC06E816C5AB9BA@B51677E3BC9F0D30.24B7399638C04E69 to peer A5E76F0305FA2ADF.862147BE4EA48F5A@916BCEDC7F635E74.29085D65AD5FB54D (last_sequence_num=5)
[201792:201792:0308/133114.250001:2511632290302:VERBOSE2:node.cc(809)] Deleted port 5F152440B36B8249.BC06E816C5AB9BA@B51677E3BC9F0D30.24B7399638C04E69
[201790:201791:0308/133114.448320:2511632488628:VERBOSE1:node_controller.cc(527)] Dropped peer B51677E3BC9F0D30.24B7399638C04E69

It looks like the receive side of the IPC::ChannelMojo has become connected, enabling the client to receive and process incoming Chrome IPCs (TestMsg_SendBadMessage and then WorkerMsg_Quit).  Each message provokes a Chrome IPC to be sent back to the parent, but those are queued rather than being sent immediately - the send side of the channel isn't initialized yet? - then the process exits, closing the underlying platform channel.  The test(s) fail because the expected messages were never actually sent by the client.

As you'd expect, introducing a delay in the e.g. the ChannelProxyClient before Close() (in addition to the repro-Sleep in mojo::edk::Node) causes the IPCChannelBadMessageTest to start passing again.

The failures in IPCMojoBootstrapTest are harder to repro, even with the Sleep() introduced in Node, but this turns out to be because the tests expecting a message from the child process don't actually verify that one was received.

Comment 19 by w...@chromium.org, Mar 9 2018

Labels: -Pri-2 Pri-1
OK, so this problem occurs when:

1. Sending process writes messages, then exits.
2. Receiving process tries to write to the channel, from its IO thread - in the IPCChannelBadMessageTest case, this is happening within a channel-readable callback.
3. Write operation fails, because the channel is closed, so we PostTask(OnError) back to the I/O thread.
4. When the IO thread returns to the loop, it processes the posted task ahead of channel-readable notifications.
5. We invoke OnError while data is still in the channel, waiting to be read.

I've drafted a CL at https://chromium-review.googlesource.com/c/chromium/src/+/956932 which defers invoking Channel::OnError(), and thereby notifying the delegate, in case of apparent disconnection, until we observe end-of-stream.
Project Member

Comment 20 by bugdroid1@chromium.org, Mar 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c7fcdeaa38cc77b31d40b19f026c2af1b433855c

commit c7fcdeaa38cc77b31d40b19f026c2af1b433855c
Author: Wez <wez@chromium.org>
Date: Fri Mar 09 06:20:00 2018

Reintroduce verbose logging in IPC::Channel::Send().

Platform-specific IPC::Channel implementations included Chrome IPC
message logging, but this was not carried into the Mojo-based Channel
implementation.

Chrome IPC message logging is higher-level than Mojo IPC message logs,
which can be useful for certain kinds of diagnosis.

Bug:  816620 
Change-Id: I467857fd07b66c3d468039d6fb7c2950f931c8bb
Reviewed-on: https://chromium-review.googlesource.com/956347
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542050}
[modify] https://crrev.com/c7fcdeaa38cc77b31d40b19f026c2af1b433855c/ipc/ipc_channel_mojo.cc

Project Member

Comment 21 by bugdroid1@chromium.org, Mar 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cdcb02624c340873b4d7d6e2564666e60f3aa856

commit cdcb02624c340873b4d7d6e2564666e60f3aa856
Author: Wez <wez@chromium.org>
Date: Sat Mar 10 00:49:30 2018

Add missing message-received expectation to IPCMojoBootstrapTest.

Some of these tests expected a message from the child process, and would
verify that it was valid/invalid as expected, but would still pass if
no message were received at all.

Bug:  816620 
Change-Id: Ie8acf85086f4416c7023118673c40d00a2e190d7
Reviewed-on: https://chromium-review.googlesource.com/955554
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542296}
[modify] https://crrev.com/cdcb02624c340873b4d7d6e2564666e60f3aa856/ipc/ipc_mojo_bootstrap_unittest.cc

Comment 22 by w...@chromium.org, Mar 12 2018

SharedBufferTest.CreateAndPassReadOnlyBuffer failed in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/17032 with a Mojo precondition error from wait.
Cc: gsennton@chromium.org
The IPCMojoBootstrapTest.ReceiveEmptyMessage has been failing flakily on the Android KitKat Phone Tester (dbg):

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=ipc_tests&tests=IPCMojoBootstrapTest.ReceiveEmptyMessage

The failure is
I   49.118s run_tests_on_device(06b9f8df00622ff4)  [ RUN      ] IPCMojoBootstrapTest.ReceiveEmptyMessage
I   49.118s run_tests_on_device(06b9f8df00622ff4)  ../../ipc/ipc_mojo_bootstrap_unittest.cc:71: Failure
I   49.118s run_tests_on_device(06b9f8df00622ff4)  Expected equality of these values:
I   49.118s run_tests_on_device(06b9f8df00622ff4)    expected_message
I   49.119s run_tests_on_device(06b9f8df00622ff4)      Which is: true
I   49.119s run_tests_on_device(06b9f8df00622ff4)    received_message_
I   49.119s run_tests_on_device(06b9f8df00622ff4)      Which is: false
I   49.119s run_tests_on_device(06b9f8df00622ff4)  [  FAILED  ] IPCMojoBootstrapTest.ReceiveEmptyMessage (79 ms)


will that be fixed in this bug?

Comment 24 by w...@chromium.org, Mar 12 2018

Re #23: Yes, I believe so. I added that expectation in https://chromium-review.googlesource.com/955554 - I'll revert that, to re-land after the fix is in-place.
Project Member

Comment 25 by bugdroid1@chromium.org, Mar 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6d1fd84a5ea8e6f47a83cfd5cac5f39221836370

commit 6d1fd84a5ea8e6f47a83cfd5cac5f39221836370
Author: Wez <wez@chromium.org>
Date: Mon Mar 12 17:29:56 2018

Revert "Add missing message-received expectation to IPCMojoBootstrapTest."

This reverts commit cdcb02624c340873b4d7d6e2564666e60f3aa856.

Reason for revert: We believe that this causes the tests to flake on various platforms, due to  issue 816620 . We'll re-land it when the mojo::edk::Channel fix has landed.


Original change's description:
> Add missing message-received expectation to IPCMojoBootstrapTest.
> 
> Some of these tests expected a message from the child process, and would
> verify that it was valid/invalid as expected, but would still pass if
> no message were received at all.
> 
> Bug:  816620 
> Change-Id: Ie8acf85086f4416c7023118673c40d00a2e190d7
> Reviewed-on: https://chromium-review.googlesource.com/955554
> Reviewed-by: Ken Rockot <rockot@chromium.org>
> Commit-Queue: Wez <wez@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#542296}

TBR=wez@chromium.org,rockot@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  816620 
Change-Id: I534fa23c12072cf08507ed4c8073e7505edaf262
Reviewed-on: https://chromium-review.googlesource.com/959241
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542526}
[modify] https://crrev.com/6d1fd84a5ea8e6f47a83cfd5cac5f39221836370/ipc/ipc_mojo_bootstrap_unittest.cc

Project Member

Comment 26 by bugdroid1@chromium.org, Mar 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e960518251867845ed23bf2ab8f042a77afe1885

commit e960518251867845ed23bf2ab8f042a77afe1885
Author: Wez <wez@chromium.org>
Date: Mon Mar 12 22:40:29 2018

Delay Channel::OnError() in case of kDisconnected during Write().

Write() operations to a Channel can fail due to the peer having closed
it, while there are still messages waiting to be read from it. We must
therefore defer notifying the caller of the Channel::Error until we
observe end-of-stream via a readable notification, otherwise those
messages may be dropped (depending on whether the posted OnError task
is processed before or after a pending Channel-readable event).

Bug:  816620 
Change-Id: I75bd34a48edf4022809d27ce49f9cfba7a5d4daf
Reviewed-on: https://chromium-review.googlesource.com/956932
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542634}
[modify] https://crrev.com/e960518251867845ed23bf2ab8f042a77afe1885/mojo/edk/system/channel_fuchsia.cc
[modify] https://crrev.com/e960518251867845ed23bf2ab8f042a77afe1885/mojo/edk/system/channel_posix.cc
[modify] https://crrev.com/e960518251867845ed23bf2ab8f042a77afe1885/mojo/edk/system/channel_unittest.cc
[modify] https://crrev.com/e960518251867845ed23bf2ab8f042a77afe1885/mojo/edk/system/channel_win.cc

Project Member

Comment 27 by bugdroid1@chromium.org, Mar 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3fe9c08982644207abfeae5e02ffb90a74309d46

commit 3fe9c08982644207abfeae5e02ffb90a74309d46
Author: Wez <wez@chromium.org>
Date: Mon Mar 12 23:19:23 2018

Reland "Add missing message-received expectation to IPCMojoBootstrapTest."

This is a reland of cdcb02624c340873b4d7d6e2564666e60f3aa856, which was
reverted pending a fix for an underlying Mojo message-loss issue on
channel shutdown.

Original change's description:
> Add missing message-received expectation to IPCMojoBootstrapTest.
>
> Some of these tests expected a message from the child process, and would
> verify that it was valid/invalid as expected, but would still pass if
> no message were received at all.
>
> Bug:  816620 
> Change-Id: Ie8acf85086f4416c7023118673c40d00a2e190d7
> Reviewed-on: https://chromium-review.googlesource.com/955554
> Reviewed-by: Ken Rockot <rockot@chromium.org>
> Commit-Queue: Wez <wez@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#542296}

TBR: rockot
Bug:  816620 
Change-Id: Ifdf10581091be31a6c9d1aa53542f2ed6c2ccbfa
Reviewed-on: https://chromium-review.googlesource.com/958016
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542650}
[modify] https://crrev.com/3fe9c08982644207abfeae5e02ffb90a74309d46/ipc/ipc_mojo_bootstrap_unittest.cc

Comment 28 by w...@chromium.org, Mar 12 2018

Labels: OS-Android OS-Chrome OS-Linux OS-Mac OS-Windows
Remaining work is to un-filter tests under Fuchsia; and re-enable Mojo/IPC tests that were previously disabled as flaky due to this failure mode.
Project Member

Comment 29 by bugdroid1@chromium.org, Mar 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a84b3ed1c97d1fad1ebf401dc937e806cc895b7

commit 6a84b3ed1c97d1fad1ebf401dc937e806cc895b7
Author: Wez <wez@chromium.org>
Date: Tue Mar 13 00:16:23 2018

Un-filter some Mojo and IPC tests under Fuchsia.

These tests were flakey due to messages being dropped by the receiver
occasionally when the underlying plaform-specific Channel was closed.

Bug:  764015 ,  764018 ,  756112 ,  810448 ,  815314 ,  816620 
Change-Id: I2b6cfb105660a570c09b29555da7811541d49ec9
Reviewed-on: https://chromium-review.googlesource.com/959674
Commit-Queue: Wez <wez@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542661}
[modify] https://crrev.com/6a84b3ed1c97d1fad1ebf401dc937e806cc895b7/testing/buildbot/filters/fuchsia.ipc_tests.filter
[modify] https://crrev.com/6a84b3ed1c97d1fad1ebf401dc937e806cc895b7/testing/buildbot/filters/fuchsia.mojo_unittests.filter

Project Member

Comment 30 by bugdroid1@chromium.org, Mar 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5d5fef1259d675889ed0767ed1002f4695520870

commit 5d5fef1259d675889ed0767ed1002f4695520870
Author: Findit <findit-for-me@appspot.gserviceaccount.com>
Date: Tue Mar 13 02:29:08 2018

Revert "Delay Channel::OnError() in case of kDisconnected during Write()."

This reverts commit e960518251867845ed23bf2ab8f042a77afe1885.

Reason for revert:

Findit (https://goo.gl/kROfz5) identified CL at revision 542634 as the
culprit for failures in the build cycles as shown on:
https://findit-for-me.appspot.com/waterfall/culprit?key=ag9zfmZpbmRpdC1mb3ItbWVyRAsSDVdmU3VzcGVjdGVkQ0wiMWNocm9taXVtL2U5NjA1MTgyNTE4Njc4NDVlZDIzYmYyYWI4ZjA0MmE3N2FmZTE4ODUM

Sample Failed Build: https://ci.chromium.org/buildbot/chromium.memory/Linux%20MSan%20Tests/8479

Sample Failed Step: mojo_unittests

Original change's description:
> Delay Channel::OnError() in case of kDisconnected during Write().
> 
> Write() operations to a Channel can fail due to the peer having closed
> it, while there are still messages waiting to be read from it. We must
> therefore defer notifying the caller of the Channel::Error until we
> observe end-of-stream via a readable notification, otherwise those
> messages may be dropped (depending on whether the posted OnError task
> is processed before or after a pending Channel-readable event).
> 
> Bug:  816620 
> Change-Id: I75bd34a48edf4022809d27ce49f9cfba7a5d4daf
> Reviewed-on: https://chromium-review.googlesource.com/956932
> Commit-Queue: Wez <wez@chromium.org>
> Reviewed-by: Ken Rockot <rockot@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#542634}

Change-Id: I9d212ecb8dde1e90422b60e950547e823357f0d4
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  816620 
Reviewed-on: https://chromium-review.googlesource.com/959761
Cr-Commit-Position: refs/heads/master@{#542698}
[modify] https://crrev.com/5d5fef1259d675889ed0767ed1002f4695520870/mojo/edk/system/channel_fuchsia.cc
[modify] https://crrev.com/5d5fef1259d675889ed0767ed1002f4695520870/mojo/edk/system/channel_posix.cc
[modify] https://crrev.com/5d5fef1259d675889ed0767ed1002f4695520870/mojo/edk/system/channel_unittest.cc
[modify] https://crrev.com/5d5fef1259d675889ed0767ed1002f4695520870/mojo/edk/system/channel_win.cc

Project Member

Comment 31 by bugdroid1@chromium.org, Mar 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3e64a8a06daa346ccdd75dac3befaa16cee24def

commit 3e64a8a06daa346ccdd75dac3befaa16cee24def
Author: Wez <wez@chromium.org>
Date: Tue Mar 13 05:36:06 2018

Reland "Delay Channel::OnError() in case of kDisconnected during Write()."

This is a reland of e960518251867845ed23bf2ab8f042a77afe1885

The original CL added a unit-test which created a single-byte
Channel::Message, without actually initializing that single-byte. This
caused the MSAN bots to (correctly) spot that uninitialized data was
being read during serialization.

Original change's description:
> Delay Channel::OnError() in case of kDisconnected during Write().
>
> Write() operations to a Channel can fail due to the peer having closed
> it, while there are still messages waiting to be read from it. We must
> therefore defer notifying the caller of the Channel::Error until we
> observe end-of-stream via a readable notification, otherwise those
> messages may be dropped (depending on whether the posted OnError task
> is processed before or after a pending Channel-readable event).
>
> Bug:  816620 
> Change-Id: I75bd34a48edf4022809d27ce49f9cfba7a5d4daf
> Reviewed-on: https://chromium-review.googlesource.com/956932
> Commit-Queue: Wez <wez@chromium.org>
> Reviewed-by: Ken Rockot <rockot@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#542634}

TBR: rockot
Bug:  816620 
Change-Id: I1a1d6eb7fa712e50b3d9c86591878900f0aeb388
Reviewed-on: https://chromium-review.googlesource.com/959762
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#542739}
[modify] https://crrev.com/3e64a8a06daa346ccdd75dac3befaa16cee24def/mojo/edk/system/channel_fuchsia.cc
[modify] https://crrev.com/3e64a8a06daa346ccdd75dac3befaa16cee24def/mojo/edk/system/channel_posix.cc
[modify] https://crrev.com/3e64a8a06daa346ccdd75dac3befaa16cee24def/mojo/edk/system/channel_unittest.cc
[modify] https://crrev.com/3e64a8a06daa346ccdd75dac3befaa16cee24def/mojo/edk/system/channel_win.cc

Project Member

Comment 32 by bugdroid1@chromium.org, Mar 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0021badf45558cc13b25dfb9da7b3d671d464c5f

commit 0021badf45558cc13b25dfb9da7b3d671d464c5f
Author: Wez <wez@chromium.org>
Date: Wed Mar 14 19:45:42 2018

Revert to manual ref-counting in mojo::edk::ChannelWin.

Changes to manual ref-counting in a recent change to the timing of
write-error notifications in ChannelWin missed that multiple write
operations may be in-progress at the same time, and so introduced
issues with the lifetime of the ChannelWin object if ShutDown()
while write operations were in-progress.

We also add some DCHECKs to sanity-check the pending-IO flags.

Bug: 821658,  816620 
Change-Id: I69cea0a1ed080ca2920738f5212a25d5f5e63c42
Reviewed-on: https://chromium-review.googlesource.com/961941
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543162}
[modify] https://crrev.com/0021badf45558cc13b25dfb9da7b3d671d464c5f/mojo/edk/system/channel_win.cc

Project Member

Comment 33 by bugdroid1@chromium.org, Mar 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cd568190cfdef377b434f31c56a7ac102e66d0cc

commit cd568190cfdef377b434f31c56a7ac102e66d0cc
Author: Wez <wez@chromium.org>
Date: Thu Mar 15 16:12:32 2018

Re-enable a test that we suspect flaked due to message-loss.

Bug:  816620 
Change-Id: I25d8b9d627578946a6c4adb2c220c9401f21c59b
Reviewed-on: https://chromium-review.googlesource.com/959677
Reviewed-by: Ken Rockot <rockot@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#543386}
[modify] https://crrev.com/cd568190cfdef377b434f31c56a7ac102e66d0cc/ipc/ipc_channel_proxy_unittest.cc

Comment 34 by w...@chromium.org, Mar 16 2018

Status: Fixed (was: Started)
Not aware of any further formerly-flaky tests that are candidates for re-enabling following this fix, so closing out.

Comment 35 by w...@chromium.org, May 30 2018

 Issue 819358  has been merged into this issue.

Sign in to add a comment