New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 620904 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
not on Chrome anymore
Closed: Aug 2016
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 609252



Sign in to add a comment

Verify whether hang in message pump's "WaitForWork" is due to a recycled event handle

Project Member Reported by stanisc@chromium.org, Jun 16 2016

Issue description

The purpose of this is to investigate hangs in MessagePumpForGpu and MessagePumpDefault. There is a large number of hangs in both GPU process and
Renderer where the wait call (MsgWaitForMultipleObjectsEx, WaitForSingleObject)
appears to be stuck for a very long time despite an evidence that the event object that these calls are waiting on is signaled.

While it is possible for a thread that is woken up to not be immediately scheduled by the OS, it is hard to imagine that going up for 15+ seconds. So the theory is that the even handle might be recycled and there might be some
other code that closes but doesn't null out the old handle. Since the event is auto-reset, when there are multiple waiters only one of them would be waken up and reset the event, and the other one would just continue waiting. So if the other code is somehow still waiting on its closed and now recycled handle, that
would explain the hang.
 
Owner: stanisc@chromium.org
Status: Started (was: Untriaged)
Project Member

Comment 2 by bugdroid1@chromium.org, Jun 21 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/702c0f481843035dd46c6a6a256cbe65dda8629c

commit 702c0f481843035dd46c6a6a256cbe65dda8629c
Author: stanisc <stanisc@chromium.org>
Date: Tue Jun 21 01:34:05 2016

Verify if GPU message pump is signaled when it hangs in WaitForWork

This is a diagnostic change. The code introduced in this
change runs only when GPU process is about to terminate
with a deliberate crash.

We a getting a number of crashes triggered by GPU hang in
MessagePumpForGpu::WaitForWork. There is already some
instrumentation that indicates that:
a) MessagePumpForGpu::WaitForWork is sitting in
MsgWaitForMultipleObjectsEx for longer than 15 seconds
b) MessagePumpForGpu::ScheduleWork is called after
WaitForWork enters the wait, sometimes several seconds
after, and SetEvent must be called (at least we grab the
timestamp right before calling SetEvent method)
c) The event is set but it doesn't wake up the wait

While it is possible for a thread that is awaken to not
be immediately scheduled by the OS, it is hard to imagine
that going up for 15+ seconds. So the theory is that the
event handle might be recycled and there might be some
other code that has closed but hasn't nulled out the
old handle. Since the event is auto-reset, when there are
multiple waiters only one of them would be awaken and
reset the event, and the other one would
just continue waiting. So if the other code is somehow
still waiting on its closed and now recycled handle, that
would explain the hang.

This code would allow GPU watchdog to check whether
the event was set at the time of the crash. This would
give us a clue of whether the situation described above
is actually happening.

Extra bonus: the investigation would also explain if
Renderer hangs in MessagePumpDefault::Run are caused by
the same issue.

BUG= 620904 

Review-Url: https://codereview.chromium.org/2077613002
Cr-Commit-Position: refs/heads/master@{#400871}

[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_loop.cc
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_loop.h
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_pump.cc
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_pump.h
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_pump_win.cc
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/base/message_loop/message_pump_win.h
[modify] https://crrev.com/702c0f481843035dd46c6a6a256cbe65dda8629c/content/gpu/gpu_watchdog_thread.cc

Comment 3 by kbr@chromium.org, Jun 29 2016

Blocking: 609252
Project Member

Comment 4 by bugdroid1@chromium.org, Jun 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2660facba19f2101c6c9c67e2739511baf4cb6b8

commit 2660facba19f2101c6c9c67e2739511baf4cb6b8
Author: stanisc <stanisc@chromium.org>
Date: Thu Jun 30 03:47:47 2016

Change MessagePumpForGpu and SharedMemory to ScopedHandle.

One of the suspected reasons for GPU hangs is handle
recycling with subsequent usage of the old closed handle - either
closing it again or waiting on it.

Changing these two classes from plain HANDLE to ScopedHandle
should help to verify at least the double close case.
It should help to detect handle leaks too.

BUG= 620904 

Review-Url: https://codereview.chromium.org/2102923002
Cr-Commit-Position: refs/heads/master@{#403082}

[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/memory/shared_memory.h
[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/memory/shared_memory_unittest.cc
[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/memory/shared_memory_win.cc
[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/message_loop/message_pump_win.cc
[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/message_loop/message_pump_win.h
[modify] https://crrev.com/2660facba19f2101c6c9c67e2739511baf4cb6b8/base/metrics/persistent_memory_allocator_unittest.cc

Status: Fixed (was: Started)
Project Member

Comment 6 by bugdroid1@chromium.org, Oct 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9fc5e16ec88edd066a0725c80918f6001dd8620f

commit 9fc5e16ec88edd066a0725c80918f6001dd8620f
Author: stanisc <stanisc@chromium.org>
Date: Sat Oct 08 01:57:10 2016

Remove code that checks if MessagePumpForGpu was signaled.

This reverts https://codereview.chromium.org/2077613002/.

The check was supposed to tell us whether GPU process main
thread's message pump was signaled at the time of the hang.
In practice 100% of crash dumps had the negative result.
After some additional research I realized that that was
a false negative and that this check doesn't work as
expected with auto-reset events. I confirmed that an
auto-reset event gets promptly reset back to non-signaled
when it gets signaled as long as there is at least one
thread already waiting on it. That is the case even when
when the target thread is never scheduled to run. The
check would work with a manual-reset event but apparently
it is useless in the case of an auto-reset event.

BUG= 620904 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2393333002
Cr-Commit-Position: refs/heads/master@{#424040}

[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_loop.cc
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_loop.h
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_pump.cc
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_pump.h
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_pump_win.cc
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/base/message_loop/message_pump_win.h
[modify] https://crrev.com/9fc5e16ec88edd066a0725c80918f6001dd8620f/gpu/ipc/service/gpu_watchdog_thread.cc

Sign in to add a comment