New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 721689 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
please use my google.com address
Closed: May 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Race condition in mojo::WaitSet::State class triggers DCHECK

Reported by land...@opera.com, May 12 2017

Issue description

We sometimes hits a DCHECK in mojo WaitSet implementation with the following log:

[19751:19753:0511/153249.723283:8574564291282:FATAL:wait_set.cc(283)] Check failed: MOJO_RESULT_CANCELLED == result (1 vs. 9)
#0 0x7f3aa0e692f7 base::debug::StackTrace::StackTrace()
#1 0x7f3aa0e89197 logging::LogMessage::~LogMessage()
#2 0x7f3aa14fdabd mojo::WaitSet::State::Notify()
#3 0x7f3aa14fdca2 mojo::WaitSet::State::Context::OnNotification()
#4 0x7f3aa1ed585f mojo::edk::Watch::InvokeCallback()
#5 0x7f3aa1ed3ac1 mojo::edk::RequestContext::~RequestContext()
#6 0x7f3aa1ec47d0 mojo::edk::NodeChannel::OnChannelMessage()
#7 0x7f3aa1eb2c40 mojo::edk::Channel::OnReadComplete()
#8 0x7f3aa1eb7086 mojo::edk::(anonymous namespace)::ChannelPosix::OnFileCanReadWithoutBlocking()
#9 0x7f3aa0e9da68 base::MessagePumpLibevent::OnLibeventNotification()
#10 0x7f3aa0f3ff6e event_base_loop
#11 0x7f3aa0e9d211 base::MessagePumpLibevent::Run()
#12 0x7f3aa0e9c74f base::MessageLoop::RunHandler()
#13 0x7f3aa0ec420b base::RunLoop::Run()
#14 0x7f3aa0ef74b1 base::Thread::Run()
#15 0x7f3aa0ef8830 base::Thread::ThreadMain()
#16 0x7f3aa0ef06c3 base::(anonymous namespace)::ThreadFunc()
#17 0x7f3aa6e84184 start_thread
#18 0x7f3a9dff237d clone

I have identified this to be a race because of insufficient synchronization between WaitSet::State::RemoveHandle() and WaitSet::State::Notify(). The reason this happens is that the handle is removed (with lock held) from handle_to_context_ in RemoveHandle() and then the lock is released before calling MojoCancelWatch() which will (always?) trigger a notification with result == MOJO_RESULT_CANCELLED. There could however be another notification happening on another thread that arrives at WaitSet::State::Notify() before the MOJO_RESULT_CANCELLED arrive and the DCHECK is triggered.

So, with the current implementation the DCHECK doesn't hold. A comment states that it's not possible to extend the time the lock is held because that would cause reentrancy issues.
Question is if the solution is to simply remove the DCHECK or if a more elaborate solution is needed?

The problem was introduced by: https://codereview.chromium.org/2744943002

 

Comment 1 by roc...@chromium.org, May 12 2017

Cc: -roc...@chromium.org
Labels: -Pri-3 Pri-1
Owner: roc...@chromium.org
Status: Assigned (was: Untriaged)
Thanks for the report, I'll look at this.
Project Member

Comment 2 by bugdroid1@chromium.org, May 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6dbb86d4a6bd9aacd50573b3473afaa857bca4c3

commit 6dbb86d4a6bd9aacd50573b3473afaa857bca4c3
Author: Ken Rockot <rockot@chromium.org>
Date: Mon May 15 18:18:18 2017

Fix notification expectation in mojo::WaitSet

This removes an errant DCHECK in mojo::WaitSet::State::Notify.

BUG= 721689 
R=yzshen@chromium.org

Change-Id: Ifc9e91874e43ce33db7b94c76596435c812f60c0
Reviewed-on: https://chromium-review.googlesource.com/505417
Reviewed-by: Yuzhu Shen <yzshen@chromium.org>
Commit-Queue: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#471833}
[modify] https://crrev.com/6dbb86d4a6bd9aacd50573b3473afaa857bca4c3/mojo/public/cpp/system/wait_set.cc

Comment 3 by roc...@chromium.org, May 15 2017

Status: Fixed (was: Assigned)
Thanks again for reporting this. I agree with your analysis; the DCHECK was simply incorrect.

Comment 4 by land...@opera.com, May 16 2017

Thanks for the quick resolution.

Sign in to add a comment