New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 609252 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
not on Chrome anymore
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows , Mac
Pri: 2
Type: Bug

Blocked on:
issue 628464
issue 352807
issue 563716
issue 596622
issue 611390
issue 612219
issue 619264
issue 620904
issue 644575
issue 644920
issue 646538
issue 659081
issue 662802
issue 689350

Blocking:
issue 648318
issue 622813
issue 642735



Sign in to add a comment

Random GPU process hangs on Mac and Windows causing random webgl_conformance_tests failures

Project Member Reported by kbr@chromium.org, May 4 2016

Issue description

Random GPU process hangs are being observed on the Mac OS trybots in the Swarming pool. They are happening both on the Mac Minis with Intel GPUs and MacBook Pro Retinas with AMD GPUs, so are unlikely to be a bug in the driver, but rather a bug in Chrome's logic.

Here are a couple of tryjobs which had random tests fail because the GPU process's watchdog terminated them:

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/221910
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/221792

The log excerpt is:

[22046:16387:0503/184522:ERROR:gpu_watchdog_thread.cc(368)] The GPU process hung. Terminating after 10000 ms.
Received signal 11 SEGV_MAPERR 000000000000
 [0x00010621a866]
 [0x7fff89464f1a]
...

It's doubtful that the stack trace of this thread would be useful, but as a first priority we should figure out how to get the GPU process to produce a minidump when this happens so that Telemetry can symbolize all the threads' stacks.

 
stdout1.txt
4.6 MB View Download
stdout2.txt
4.4 MB View Download

Comment 1 by kbr@chromium.org, May 4 2016

Summary: Random GPU process hangs on Mac OS causing random webgl_conformance_tests failures (was: Random GPU process hangs on Mac OS failing random webgl_conformance_tests)

Comment 2 by kbr@chromium.org, May 7 2016

Cc: jbau...@chromium.org
Labels: -Pri-2 OS-Windows Pri-1
Owner: vmi...@chromium.org
Status: Assigned (was: Available)
Summary: Random GPU process hangs on Mac and Windows causing random webgl_conformance_tests failures (was: Random GPU process hangs on Mac OS causing random webgl_conformance_tests failures)
Actually, these are happening on multiple platforms, not just Mac. Here's a failure of WebglConformance.conformance_more_conformance_quickCheckAPI_G_I caused by the watchdog firing:

https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/217774

Excerpt:

[ RUN      ] WebglConformance.conformance_more_conformance_quickCheckAPI_G_I
...
[2492:2176:0506/153704:INFO:CONSOLE(0)] "WebGL: INVALID_VALUE: getAttribLocation: string not ASCII", source: http://127.0.0.1:59072/third_party/webgl/src/sdk/tests/conformance/more/conformance/quickCheckAPI-G_I.html (0)
	content::GpuWatchdogThread::DeliberatelyTerminateToRecoverFromHang [0x6137437E+590]
	base::internal::Invoker<base::IndexSequence<0>,base::internal::BindState<base::internal::RunnableAdapter<void (__thiscall content::WebMediaPlayerMS::*)(void)>,void __cdecl(content::WebMediaPlayerMS *),base::WeakPtr<content::WebMediaPlayerMS> >,base::inter [0x60D179A9+73]
	base::debug::TaskAnnotator::RunTask [0x5F6190A7+247]
	base::MessageLoop::RunTask [0x5F5E685B+1211]
	base::MessageLoop::DoDelayedWork [0x5F5E590A+218]
[2836:2780:0506/153704:ERROR:gles2_cmd_decoder.cc(6240)] [.Offscreen-For-WebGL-051989D8]GL ERROR :GL_INVALID_VALUE : glBindAttribLocation: Invalid character
	base::MessagePumpDefault::Run [0x5F61B41A+138]
	base::MessageLoop::RunHandler [0x5F5E6397+103]
	base::RunLoop::Run [0x5F61B629+41]
	base::MessageLoop::Run [0x5F5E6322+98]
[2492:2176:0506/153704:INFO:CONSOLE(0)] "WebGL: INVALID_VALUE: getAttribLocation: string not ASCII", source: http://127.0.0.1:59072/third_party/webgl/src/sdk/tests/conformance/more/conformance/quickCheckAPI-G_I.html (0)
	base::Thread::Run [0x5F5FBC1B+11]
	base::Thread::ThreadMain [0x5F5FC297+471]
[2836:2780:0506/153704:ERROR:gles2_cmd_decoder.cc(6240)] [.Offscreen-For-WebGL-051989D8]GL ERROR :GL_INVALID_VALUE : glBindAttribLocation: Invalid character
	base::PlatformThread::Sleep [0x5F603972+290]
	BaseThreadInitThunk [0x767F337A+18]
	RtlInitializeExceptionChain [0x774392B2+99]
	RtlInitializeExceptionChain [0x77439285+54]
[0506/153704:ERROR:process_info.cc(608)] range at 0xfffde000, size 0x230 fully unreadable
...
[2492:2176:0506/153704:INFO:CONSOLE(0)] "WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost", source: http://127.0.0.1:59072/third_party/webgl/src/sdk/tests/conformance/more/conformance/quickCheckAPI-G_I.html (0)
Traceback (most recent call last):
  File "c:\users\chrome~1\appdata\local\temp\runslirap\third_party\catapult\telemetry\telemetry\internal\story_runner.py", line 84, in _RunStoryAndProcessErrorIfNeeded
    state.RunStory(results)
  File "c:\users\chrome~1\appdata\local\temp\runslirap\content\test\gpu\gpu_tests\gpu_test_base.py", line 122, in RunStory
    RunStoryWithRetries(DesktopGpuSharedPageState, self, results)
  File "c:\users\chrome~1\appdata\local\temp\runslirap\content\test\gpu\gpu_tests\gpu_test_base.py", line 72, in RunStoryWithRetries
    super(cls, shared_page_state).RunStory(results)
  File "c:\users\chrome~1\appdata\local\temp\runslirap\third_party\catapult\telemetry\telemetry\page\shared_page_state.py", line 326, in RunStory
    self._current_page, self._current_tab, results)
  File "c:\users\chrome~1\appdata\local\temp\runslirap\content\test\gpu\gpu_tests\webgl_conformance.py", line 87, in ValidateAndMeasurePage
    raise page_test.Failure(_WebGLTestMessages(tab))
Failure: testValidArgs


[  FAILED  ] WebglConformance.conformance_more_conformance_quickCheckAPI_G_I (31781 ms)

Again, it's crucial for diagnosis that we (1) ensure that a minidump is created for the GPU process in this case and (2) that we get a symbolized stack for the main thread which is hung at this point.

To that end, we should instrument the GPU process to introduce a random hang, run the WebGL conformance tests locally (say, on Windows), and make sure that the watchdog causes a minidump to be generated and that it's symbolized by the harness.

Victor, could we please find someone to own that task in particular? Once that's done, diagnosis should be easy the next time this happens on the bots.

stdout.txt
3.9 MB View Download

Comment 3 by kbr@chromium.org, May 12 2016

Blockedon: 611390

Comment 4 by kbr@chromium.org, May 24 2016

Labels: Hotlist-PixelWrangler

Comment 5 by kbr@chromium.org, May 26 2016

This continues to affect commit queue jobs randomly. https://codereview.chromium.org/2009073002/ is one example where the GPU process hung at some point during test execution. It's urgent that someone step up and make the harness more robust so that a minidump is generated and symbolized when this happens, which will provide stack traces for all threads, including the hung one.

Comment 6 by kbr@chromium.org, Jun 15 2016

These random GPU process hangs continue to be a cause of flakiness of the WebGL conformance tests on the commit queue. A few example builds:

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/241986
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/242912
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/242958
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/237686
https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/239161

The tests affected by the hang are random:

WebglConformance.conformance_attribs_gl_vertexattribpointer
WebglConformance.conformance_uniforms_uniform_samplers_test
WebglConformance.conformance_more_conformance_quickCheckAPI_G_I
WebglConformance.conformance_textures_video_tex_2d_rgba_rgba_unsigned_short_5_5_5_1
WebglConformance.conformance_uniforms_uniform_default_values

All flakes:

https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR3ZWJnbF9jb25mb3JtYW5jZV90ZXN0cyAod2l0aCBwYXRjaCkM

There have been recent reports of a spike in GPU process crashes and it's entirely plausible that they are in fact GPU process hangs and have the same root cause as these test failures.

Comment 7 by vmi...@chromium.org, Jun 16 2016

Status: Started (was: Assigned)
Working on getting the stack traces.
Project Member

Comment 8 by bugdroid1@chromium.org, Jun 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0cb2844464155e533e83b10facd8523884a406be

commit 0cb2844464155e533e83b10facd8523884a406be
Author: vmiura <vmiura@chromium.org>
Date: Thu Jun 16 22:35:44 2016

Output stacktraces on WebGL conformance failure.

If a minidump stack trace is available, output it along with the
page test failure.

R=kbr@chromium.org
BUG= 609252 
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2070233002
Cr-Commit-Position: refs/heads/master@{#400282}

[modify] https://crrev.com/0cb2844464155e533e83b10facd8523884a406be/content/test/gpu/gpu_tests/webgl_conformance.py

Comment 9 by kbr@chromium.org, Jun 17 2016

Blocking: 352807
This job flaked on Windows but we only got the call stack of the crasher thread: https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/240988/steps/webgl_conformance_tests%20on%20NVIDIA%20GPU%20on%20Windows%20%28with%20patch%29%20on%20Windows-2008ServerR2-SP1

On Mac I was able to get a call stack from all threads in the GPU process.
Some callstacks from recent Mac hangs.  It's unclear if the thread is hung in one of these stacks handlers, or there is a MessageLoop problem.

mac_chromium_rel_ng : build 246392 : WebglConformance.conformance_extensions_webgl_compressed_texture_s3tc

Thread 0
 0  Chromium Framework!(anonymous namespace)::ZombieDealloc(objc_object*, objc_selector*) + 0x22
    rax = 0x00007fff91b110b4   rdx = 0x00000000000dc870
    rcx = 0x00000000000fc080   rbx = 0x00007fd2d280a400
    rsi = 0x00007fff91b110b4   rdi = 0x00007fd2d280a400
    rbp = 0x00007fff558d95b0   rsp = 0x00007fff558d92e0
     r8 = 0x0000000000000001    r9 = 0x00007fd2d2800000
    r10 = 0x00007fd2ca60d580   r11 = 0x00007fff76738050
    r12 = 0x0000000000000000   r13 = 0x00007fd2d283c320
    r14 = 0x0000000000000001   r15 = 0x00007fd2d280a400
    rip = 0x000000010abc14b2
    Found by: given as instruction pointer in context
 1  CoreFoundation + 0x2d448
    rbx = 0x00007fd2d280a400   rbp = 0x00007fff558d9600
    rsp = 0x00007fff558d95c0   r12 = 0x0000000000000000
    r13 = 0x00007fd2d283c320   r14 = 0x0000000000000001
    r15 = 0x00007fff76739ed0   rip = 0x00007fff91f65448
    Found by: call frame info
 2  libobjc.A.dylib + 0x2189c
    rbp = 0x00007fff558d9670   rsp = 0x00007fff558d9610
    rip = 0x00007fff8854489c
    Found by: previous frame's frame pointer
 3  CoreFoundation + 0x18db0
    rbp = 0x00007fff558d96c0   rsp = 0x00007fff558d9680
    rip = 0x00007fff91f50db0
    Found by: previous frame's frame pointer
 4  CoreFoundation + 0x12a566
    rbp = 0x00007fff558d9710   rsp = 0x00007fff558d96d0
    rip = 0x00007fff92062566
    Found by: previous frame's frame pointer
 5  CoreFoundation + 0x72608
    rbp = 0x00007fff558da3f0   rsp = 0x00007fff558d9720
    rip = 0x00007fff91faa608
    Found by: previous frame's frame pointer
 6  CoreFoundation + 0x71bd8
    rbp = 0x00007fff558da450   rsp = 0x00007fff558da400
    rip = 0x00007fff91fa9bd8
    Found by: previous frame's frame pointer
 7  Chromium Framework!base::MessagePumpCFRunLoop::DoRun(base::MessagePump::Delegate*) + 0x4f
    rbp = 0x00007fff558da490   rsp = 0x00007fff558da460
    rip = 0x000000010a9f7e0f
    Found by: previous frame's frame pointer
 8  Chromium Framework!base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 0x7f
    rbx = 0x00007fd2ca536340   rbp = 0x00007fff558da4c0
    rsp = 0x00007fff558da4a0   r12 = 0x0000000000000000
    r13 = 0x000000011336be20   r14 = 0x00007fff558dac90
    r15 = 0x0000000000000000   rip = 0x000000010a9f781f
    Found by: call frame info
 9  Chromium Framework!base::MessageLoop::RunHandler() + 0xd7
    rbx = 0x00007fff558dac90   rbp = 0x00007fff558da620
    rsp = 0x00007fff558da4d0   r12 = 0x00007fd2ca537bc0
    r13 = 0x000000011336be20   r14 = 0x00007fff558da630
    r15 = 0x000000011336d847   rip = 0x000000010aa34fd7
    Found by: call frame info
10  Chromium Framework!base::RunLoop::Run() + 0x33
    rbx = 0x00007fff558da7b0   rbp = 0x00007fff558da670
    rsp = 0x00007fff558da630   r12 = 0x00007fd2ca537bc0
    r13 = 0x000000011336be20   r14 = 0x00007fff558da630
    r15 = 0x000000011336d847   rip = 0x000000010aa553d3
    Found by: call frame info
11  Chromium Framework!base::MessageLoop::Run() + 0x67
    rbx = 0x00007fff558da7b0   rbp = 0x00007fff558da7e0
    rsp = 0x00007fff558da680   r12 = 0x00007fd2ca537bc0
    r13 = 0x000000011336be20   r14 = 0x000000010a9a1c70
    r15 = 0x000000011336d847   rip = 0x000000010aa34187
    Found by: call frame info
12  Chromium Framework!content::GpuMain(content::MainFunctionParams const&) + 0xee7
    rbx = 0x00007fd2ca813800   rbp = 0x00007fff558db510
    rsp = 0x00007fff558da7f0   r12 = 0x00007fd2ca537bc0
    r13 = 0x000000011336be20   r14 = 0x000000010a9a1c70
    r15 = 0x000000011336d847   rip = 0x000000010a9a1817


========================================================================

mac_chromium_rel_ng : build 246795 : WebglConformance.conformance_uniforms_uniform_samplers_test

Thread 0
 0  libsystem_kernel.dylib + 0x11622
    rax = 0x0000000000000000   rdx = 0x0000000000000002
    rcx = 0x00007fff5f382598   rbx = 0x0701ddafe9383b41
    rsi = 0x0701ddafe9383b41   rdi = 0x000000000000360b
    rbp = 0x00007fff5f3825e0   rsp = 0x00007fff5f382598
     r8 = 0x00007fee1b61a820    r9 = 0x0000000000000000
    r10 = 0x0701ddafe9383c00   r11 = 0x0000000000000202
    r12 = 0x0701ddafe9383b81   r13 = 0x0701ddafe9383b41
    r14 = 0x00007fee1b61aa90   r15 = 0x0000000000000000
    rip = 0x00007fff88626622
    Found by: given as instruction pointer in context
 1  CoreFoundation + 0x79c4f
    rbp = 0x00007fff5f382620   rsp = 0x00007fff5f3825f0
    rip = 0x00007fff8e518c4f
    Found by: previous frame's frame pointer
 2  CoreFoundation + 0xb7217
    rbp = 0x00007fff5f3836c0   rsp = 0x00007fff5f382630
    rip = 0x00007fff8e556217
    Found by: previous frame's frame pointer
 3  CoreFoundation + 0x12a53d
    rbp = 0x00007fff5f383710   rsp = 0x00007fff5f3836d0
    rip = 0x00007fff8e5c953d
    Found by: previous frame's frame pointer
 4  CoreFoundation + 0x72608
    rbp = 0x00007fff5f3843f0   rsp = 0x00007fff5f383720
    rip = 0x00007fff8e511608
    Found by: previous frame's frame pointer
 5  CoreFoundation + 0x71bd8
    rbp = 0x00007fff5f384450   rsp = 0x00007fff5f384400
    rip = 0x00007fff8e510bd8
    Found by: previous frame's frame pointer
 6  Chromium Framework!base::MessagePumpCFRunLoop::DoRun(base::MessagePump::Delegate*) + 0x4f
    rbp = 0x00007fff5f384490   rsp = 0x00007fff5f384460
    rip = 0x0000000100f4d73f
    Found by: previous frame's frame pointer
 7  Chromium Framework!base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) + 0x7f
    rbx = 0x00007fee1b61a080   rbp = 0x00007fff5f3844c0
    rsp = 0x00007fff5f3844a0   r12 = 0x0000000000000000
    r13 = 0x00000001098c1480   r14 = 0x00007fff5f384c90
    r15 = 0x0000000000000000   rip = 0x0000000100f4d14f
    Found by: call frame info
 8  Chromium Framework!base::MessageLoop::RunHandler() + 0xd7
    rbx = 0x00007fff5f384c90   rbp = 0x00007fff5f384620
    rsp = 0x00007fff5f3844d0   r12 = 0x00007fee1b61ba70
    r13 = 0x00000001098c1480   r14 = 0x00007fff5f384630
    r15 = 0x00000001098c2ea7   rip = 0x0000000100f8a907
    Found by: call frame info
 9  Chromium Framework!base::RunLoop::Run() + 0x33
    rbx = 0x00007fff5f3847b0   rbp = 0x00007fff5f384670
    rsp = 0x00007fff5f384630   r12 = 0x00007fee1b61ba70
    r13 = 0x00000001098c1480   r14 = 0x00007fff5f384630
    r15 = 0x00000001098c2ea7   rip = 0x0000000100faad03
    Found by: call frame info
10  Chromium Framework!base::MessageLoop::Run() + 0x67
    rbx = 0x00007fff5f3847b0   rbp = 0x00007fff5f3847e0
    rsp = 0x00007fff5f384680   r12 = 0x00007fee1b61ba70
    r13 = 0x00000001098c1480   r14 = 0x0000000100ef75a0
    r15 = 0x00000001098c2ea7   rip = 0x0000000100f89ab7
    Found by: call frame info
11  Chromium Framework!content::GpuMain(content::MainFunctionParams const&) + 0xee7
    rbx = 0x00007fee1c002400   rbp = 0x00007fff5f385510
    rsp = 0x00007fff5f3847f0   r12 = 0x00007fee1b61ba70
    r13 = 0x00000001098c1480   r14 = 0x0000000100ef75a0
    r15 = 0x00000001098c2ea7   rip = 0x0000000100ef7147

Comment 12 by kbr@chromium.org, Jun 22 2016

Cc: stanisc@chromium.org jamescook@chromium.org amistry@chromium.org
Components: Internals>Core
Thanks for gathering these stacks Victor. Let's please continue to gather more.

A "git log" in src/base/message_loop/ turns up some recent changes; CC'ing a few people.

Could there be a bug where the GpuWatchdogTaskObserver isn't getting called for the do-nothing task posted by GpuWatchdogThread::OnCheck? Could a recent code change have introduced such a race condition? We should be looking for changes prior to May 4, but there are quite a few changes to the core message loop code recently, like:

05e1178680e47f4263f9c7087dad1d8e099a5027
aacdfd0e17cb1597dd22745f73b172a0e2797359
7fa6701bc5183bd5a73203d4fe1309f75ccfd5b4

I don't think my change https://chromium.googlesource.com/chromium/src/+/aacdfd0e17cb1597dd22745f73b172a0e2797359 affects anything other than the experimental win/chromeos "mus" build, which is tested by views_mus_unittests and similar suites. In normal builds/test suites, no MessageLoop::NestingObservers will be added, so I doubt I changed the behavior. Which doesn't really help you, I know. :-P

The first change mentioned in comment #12 applied to Windows only so that wouldn't explain GPU hangs on Mac. That change was actually intended to reduce the number of GPU hang crashes.

When did this issue started? There was a change to GPU watchdog in mid march that changed the hang detection logic that resulted in a surge of GPU hang crashes triggered by the watchdog. 


Comment 15 by kbr@chromium.org, Jun 22 2016

As reported at the top of this bug, it was first noticed on May 4. It's possible the flakiness was present earlier than that, for example starting in mid-March.

Comment 16 by kbr@chromium.org, Jun 22 2016

Cc: -dyen@chromium.org cblume@chromium.org ynovikov@chromium.org
For the record: the change in https://codereview.chromium.org/2070233002 seems to cause test failures on Android, possibly because it's attempting to fetch stack traces for expected failures and that doesn't work on Android yet. Conditionalizing this new code.

Comment 17 by kbr@chromium.org, Jun 22 2016

Example failure log excerpt from https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/332/steps/webgl_conformance_tests%20on%20Android/logs/stdio :


[ RUN      ] WebglConformance.conformance_textures_misc_tex_image_and_sub_image_2d_with_array_buffer_view
(INFO) 2016-06-22 15:07:18,668 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( rm -f /data/local/chrome-trace-config.json );echo %$?'
(INFO) 2016-06-22 15:07:18,739 cache_temperature.EnsurePageCacheTemperature:55  PageCacheTemperature: any
(INFO) 2016-06-22 15:07:32,421 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( uiautomator dump /data/local/tmp/temp_file-c624139d4bc4c );echo %$?'
(INFO) 2016-06-22 15:07:39,163 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 pull /data/local/tmp/temp_file-c624139d4bc4c /tmp/isolated_tmpJ_SBTo/tmpzT4_t2/tmp_ReadFileWithPull
(INFO) 2016-06-22 15:07:39,258 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell 'rm -f /data/local/tmp/temp_file-c624139d4bc4c'
(CRITICAL) 2016-06-22 15:07:39,271 timeout_retry.Run:172  (TimeoutThread-1-for-MainThread) Exception on _GetRootUiNode(<devil.android.app_ui.AppUi object at 0x7f897f3e7dd0>, retries=3, timeout=10), attempt 1 of 4: AdbCommandFailedError("(device: 00c74b7f7941caf7) adb pull /data/local/tmp/temp_file-c624139d4bc4c /tmp/isolated_tmpJ_SBTo/tmpzT4_t2/tmp_ReadFileWithPull: failed with exit status 1 and output:\n- remote object '/data/local/tmp/temp_file-c624139d4bc4c' does not exist\n",)
(INFO) 2016-06-22 15:07:39,273 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( uiautomator dump /data/local/tmp/temp_file-117a89efd55ca );echo %$?'
(INFO) 2016-06-22 15:07:46,037 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 pull /data/local/tmp/temp_file-117a89efd55ca /tmp/isolated_tmpJ_SBTo/tmp6P5uZ4/tmp_ReadFileWithPull
(INFO) 2016-06-22 15:07:46,129 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell 'rm -f /data/local/tmp/temp_file-117a89efd55ca'
(CRITICAL) 2016-06-22 15:07:46,156 timeout_retry.Run:172  (TimeoutThread-2-for-MainThread) Exception on _GetRootUiNode(<devil.android.app_ui.AppUi object at 0x7f897f3e7dd0>, retries=3, timeout=10), attempt 2 of 4: AdbCommandFailedError("(device: 00c74b7f7941caf7) adb pull /data/local/tmp/temp_file-117a89efd55ca /tmp/isolated_tmpJ_SBTo/tmp6P5uZ4/tmp_ReadFileWithPull: failed with exit status 1 and output:\n- remote object '/data/local/tmp/temp_file-117a89efd55ca' does not exist\n",)
(INFO) 2016-06-22 15:07:46,157 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( uiautomator dump /data/local/tmp/temp_file-b233f89f865b0 );echo %$?'
(INFO) 2016-06-22 15:07:52,906 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 pull /data/local/tmp/temp_file-b233f89f865b0 /tmp/isolated_tmpJ_SBTo/tmp1w7Zuy/tmp_ReadFileWithPull
(INFO) 2016-06-22 15:07:52,998 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell 'rm -f /data/local/tmp/temp_file-b233f89f865b0'
(CRITICAL) 2016-06-22 15:07:53,008 timeout_retry.Run:172  (TimeoutThread-3-for-MainThread) Exception on _GetRootUiNode(<devil.android.app_ui.AppUi object at 0x7f897f3e7dd0>, retries=3, timeout=10), attempt 3 of 4: AdbCommandFailedError("(device: 00c74b7f7941caf7) adb pull /data/local/tmp/temp_file-b233f89f865b0 /tmp/isolated_tmpJ_SBTo/tmp1w7Zuy/tmp_ReadFileWithPull: failed with exit status 1 and output:\n- remote object '/data/local/tmp/temp_file-b233f89f865b0' does not exist\n",)
(INFO) 2016-06-22 15:07:53,010 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( uiautomator dump /data/local/tmp/temp_file-b99cd7b4dced3 );echo %$?'
(INFO) 2016-06-22 15:07:59,748 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 pull /data/local/tmp/temp_file-b99cd7b4dced3 /tmp/isolated_tmpJ_SBTo/tmpdDCtbJ/tmp_ReadFileWithPull
(INFO) 2016-06-22 15:07:59,841 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell 'rm -f /data/local/tmp/temp_file-b99cd7b4dced3'

Expected exception while running WebglConformance.conformance_textures_misc_tex_image_and_sub_image_2d_with_array_buffer_view

Traceback (most recent call last):
  RunStoryWithRetries at content/test/gpu/gpu_tests/gpu_test_base.py:72
    super(cls, shared_page_state).RunStory(results)
  RunStory at third_party/catapult/telemetry/telemetry/page/shared_page_state.py:321
    self._current_page, self._current_tab, results)
  ValidateAndMeasurePage at content/test/gpu/gpu_tests/webgl_conformance.py:87
    is_valid_dump, trace_output = tab.browser.GetStackTrace()
  GetStackTrace at third_party/catapult/telemetry/telemetry/internal/browser/browser.py:279
    return self._browser_backend.GetStackTrace()
  GetStackTrace at third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py:230
    return self.platform_backend.GetStackTrace()
  GetStackTrace at third_party/catapult/telemetry/telemetry/internal/platform/android_platform_backend.py:667
    ret = Decorate('UI dump', '\n'.join(self.GetSystemUi().ScreenDump()))
  ScreenDump at third_party/catapult/devil/devil/android/app_ui.py:212
    return self._GetRootUiNode().Dump()
  timeout_retry_wrapper at third_party/catapult/devil/devil/android/decorators.py:57
    retry_if_func=retry_if_func)
  Run at third_party/catapult/devil/devil/utils/timeout_retry.py:157
    error_log_func=error_log_func)
  JoinAll at third_party/catapult/devil/devil/utils/reraiser_thread.py:186
    self._JoinAll(watcher, timeout)
  _JoinAll at third_party/catapult/devil/devil/utils/reraiser_thread.py:158
    thread.ReraiseIfException()
  run at third_party/catapult/devil/devil/utils/reraiser_thread.py:81
    self._ret = self._func(*self._args, **self._kwargs)
  <lambda> at third_party/catapult/devil/devil/utils/timeout_retry.py:150
    child_thread = reraiser_thread.ReraiserThread(lambda: func(*args, **kwargs),
  impl at third_party/catapult/devil/devil/android/decorators.py:47
    return f(*args, **kwargs)
  _GetRootUiNode at third_party/catapult/devil/devil/android/app_ui.py:203
    self._device.ReadFile(dtemp.name, force_pull=True))
  timeout_retry_wrapper at third_party/catapult/devil/devil/android/decorators.py:51
    return impl()
  impl at third_party/catapult/devil/devil/android/decorators.py:47
    return f(*args, **kwargs)
  ReadFile at third_party/catapult/devil/devil/android/device_utils.py:1541
    return self._ReadFileWithPull(device_path)
  _ReadFileWithPull at third_party/catapult/devil/devil/android/device_utils.py:1493
    self.adb.Pull(device_path, host_temp_path)
  Pull at third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:393
    self._RunDeviceAdbCmd(cmd, timeout, retries)
  _RunDeviceAdbCmd at third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:262
    check_error=check_error)
  timeout_retry_wrapper at third_party/catapult/devil/devil/android/decorators.py:51
    return impl()
  impl at third_party/catapult/devil/devil/android/decorators.py:47
    return f(*args, **kwargs)
  _RunAdbCmd at third_party/catapult/devil/devil/android/sdk/adb_wrapper.py:238
    args, output, status, device_serial)
AdbCommandFailedError: (device: 00c74b7f7941caf7) adb pull /data/local/tmp/temp_file-b99cd7b4dced3 /tmp/isolated_tmpJ_SBTo/tmpdDCtbJ/tmp_ReadFileWithPull: failed with exit status 1 and output:
- remote object '/data/local/tmp/temp_file-b99cd7b4dced3' does not exist

Locals:
  args          : ['pull', '/data/local/tmp/temp_file-b99cd7b4dced3', '/tmp/isolated_tmpJ_SBTo/tmpdDCtbJ/tmp_ReadFileWithPull']
  check_error   : True
  cls           : <class 'devil.android.sdk.adb_wrapper.AdbWrapper'>
  cpu_affinity  : None
  device_serial : '00c74b7f7941caf7'
  output        : "remote object '/data/local/tmp/temp_file-b99cd7b4dced3' does not exist\n"
  retries       : 2
  status        : 1
  timeout       : 300

[       OK ] WebglConformance.conformance_textures_misc_tex_image_and_sub_image_2d_with_array_buffer_view (71215 ms)
[ RUN      ] WebglConformance.conformance_textures_misc_tex_image_and_uniform_binding_bugs
(INFO) 2016-06-22 15:08:29,884 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( rm -f /data/local/chrome-trace-config.json );echo %$?'
(INFO) 2016-06-22 15:09:00,013 browser.DumpStateUponFailure:326  *************** BROWSER STANDARD OUTPUT ***************
(INFO) 2016-06-22 15:09:00,014 browser.DumpStateUponFailure:328  None
(INFO) 2016-06-22 15:09:00,014 browser.DumpStateUponFailure:331  *********** END OF BROWSER STANDARD OUTPUT ************
(INFO) 2016-06-22 15:09:00,014 browser.DumpStateUponFailure:333  ********************* BROWSER LOG *********************
(ERROR) 2016-06-22 15:09:00,014 browser.DumpStateUponFailure:337  Failed to get browser log:
Traceback (most recent call last):
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 335, in DumpStateUponFailure
    logging.info(self.GetLogFileContents())
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 276, in GetLogFileContents
    return self._browser_backend.GetLogFileContents()
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/browser_backend.py", line 43, in GetLogFileContents
    with file(self.log_file_path) as f:
TypeError: coercing to Unicode: need string or buffer, NoneType found
(INFO) 2016-06-22 15:09:00,015 browser.DumpStateUponFailure:338  ***************** END OF BROWSER LOG ******************
(WARNING) 2016-06-22 15:09:00,015 shared_page_state.DumpStateUponFailure:144  Taking screenshots upon failures disabled.
Traceback (most recent call last):
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 78, in _RunStoryAndProcessErrorIfNeeded
    state.WillRunStory(story)
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 245, in WillRunStory
    if len(self.browser.tabs) == 0:
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/browser/tab_list.py", line 15, in __len__
    return self._tab_list_backend.__len__()
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py", line 95, in __len__
    self._Update()
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend_list.py", line 99, in _Update
    backends_map = self._devtools_client.GetUpdatedInspectableContexts()
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py", line 296, in GetUpdatedInspectableContexts
    contexts = self._ListInspectableContexts()
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_client_backend.py", line 221, in _ListInspectableContexts
    return self._devtools_http.RequestJson('')
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_http.py", line 107, in RequestJson
    return json.loads(self.Request(path, timeout))
  File "/tmp/isolated_run1bYPqQ/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/devtools_http.py", line 88, in Request
    response = self._conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 415, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
DevToolsClientConnectionError: timed out

(INFO) 2016-06-22 15:09:00,032 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( ps | grep -F org.chromium.chrome; echo "PIPESTATUS: ${PIPESTATUS[@]}" );echo %$?'
(INFO) 2016-06-22 15:09:00,267 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( ls /root );echo %$?'
(INFO) 2016-06-22 15:09:00,370 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( cd /sys/devices/system/cpu && for CPU in cpu0 cpu1 cpu2 cpu3 cpu4 cpu5; do test -e "$CPU/cpufreq/scaling_governor" && echo ondemand > "$CPU/cpufreq/scaling_governor"; echo -n "%~%$?%~%"; done );echo %$?'
(INFO) 2016-06-22 15:09:00,437 perf_control._WriteEachCpuFile:114  Successfully set cpufreq/scaling_governor to 'ondemand' on: cpu0 cpu1 cpu2 cpu3 cpu4 cpu5
(INFO) 2016-06-22 15:09:00,438 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( cd /sys/devices/system/cpu && for CPU in cpu0 cpu1 cpu2 cpu3 cpu4 cpu5; do cat "$CPU/online"; echo -n "%~%$?%~%"; done );echo %$?'
(INFO) 2016-06-22 15:09:00,549 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 shell '( p=org.chromium.chrome;if [[ "$(ps)" = *$p* ]]; then am force-stop $p; fi );echo %$?'
(INFO) 2016-06-22 15:09:01,519 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 forward --list
(INFO) 2016-06-22 15:09:01,523 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 00c74b7f7941caf7 forward --remove tcp:54768
(INFO) 2016-06-22 15:09:01,529 forwarder._UnmapDevicePortLocked:232  Undo forwarding using command: [u'--adb=/tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb', '--serial-id=00c74b7f7941caf7', '--unmap', '60466']
(INFO) 2016-06-22 15:09:01,529 cmd_helper._ValidateAndLogCommand:153  [host]> /tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/forwarder_host --adb=/tmp/isolated_run1bYPqQ/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb --serial-id=00c74b7f7941caf7 --unmap 60466
[  FAILED  ] WebglConformance.conformance_textures_misc_tex_image_and_uniform_binding_bugs (31657 ms)

Project Member

Comment 18 by bugdroid1@chromium.org, Jun 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6ce7da1ad477f7e562961a155c69381219ac6835

commit 6ce7da1ad477f7e562961a155c69381219ac6835
Author: kbr <kbr@chromium.org>
Date: Thu Jun 23 01:43:25 2016

Conditionalize new code which fetches stack traces upon failure.

BUG= 609252 
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel
NOTRY=true

Review-Url: https://codereview.chromium.org/2085213003
Cr-Commit-Position: refs/heads/master@{#401509}

[modify] https://crrev.com/6ce7da1ad477f7e562961a155c69381219ac6835/content/test/gpu/gpu_tests/webgl_conformance.py

Comment 19 by kbr@chromium.org, Jun 23 2016

Another incidence (reported on  Issue 596622 ):

https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/249027

WebglConformance.conformance_attribs_gl_vertexattribpointer failed because the GPU process hung. The symbolized stack of the main thread is attached and excerpted here:

	Thread 0
	 0  libsystem_kernel.dylib + 0x11622
	 1  CoreFoundation + 0x79c4f
	 2  CoreFoundation + 0xb7217
	 3  CoreFoundation + 0x12a53d
	 4  CoreFoundation + 0x72608
	 5  CoreFoundation + 0x71bd8
	 6  Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f
	 7  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x7f
	 8  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0xd7
	 9  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33
	10  Chromium Framework!__ZN4base11MessageLoop3RunEv + 0x67
	11  Chromium Framework!__ZN7content7GpuMainERKNS_18MainFunctionParamsE + 0xee7
	12  Chromium Framework!__ZN7content21ContentMainRunnerImpl3RunEv + 0x322
	13  Chromium Framework!__ZN7content11ContentMainERKNS_17ContentMainParamsE + 0x36
	14  Chromium Framework!_ChromeMain + 0x3a
	15  Chromium Helper!_main + 0x212
	16  Chromium Helper!start + 0x34

This doesn't look like anything suspicious is happening on the main thread; it looks like it's just waiting for the next event. There's no reason the watchdog should consider the main thread as being hung.

stdout.txt
191 KB View Download
Project Member

Comment 20 by bugdroid1@chromium.org, Jun 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f9319d64b21f372bddd073b46c6bd0a089dc657f

commit f9319d64b21f372bddd073b46c6bd0a089dc657f
Author: kbr <kbr@chromium.org>
Date: Mon Jun 27 21:39:14 2016

Copy more helper DLLs for cdb.exe.

These are required to get access to the !uniqstack command, to be able
to dump all threads' stacks from a minidump.

BUG= 609252 

Review-Url: https://codereview.chromium.org/2092083004
Cr-Commit-Position: refs/heads/master@{#402295}

[modify] https://crrev.com/f9319d64b21f372bddd073b46c6bd0a089dc657f/build/win/copy_cdb_to_output.py
[modify] https://crrev.com/f9319d64b21f372bddd073b46c6bd0a089dc657f/chrome/chrome_tests.gypi
[modify] https://crrev.com/f9319d64b21f372bddd073b46c6bd0a089dc657f/tools/perf/chrome_telemetry_build/BUILD.gn

Project Member

Comment 21 by bugdroid1@chromium.org, Jun 28 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6df91ef69945c39e4a080762973503c4bc9ba5af

commit 6df91ef69945c39e4a080762973503c4bc9ba5af
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Tue Jun 28 21:39:28 2016

Roll src/third_party/catapult/ d44648720..f49c20888 (4 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/d44648720513..f49c20888bb8

$ git log d44648720..f49c20888 --date=short --no-merges --format='%ad %ae %s'

BUG= 609252 , 623963 , 609252 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2110613002
Cr-Commit-Position: refs/heads/master@{#402553}

[modify] https://crrev.com/6df91ef69945c39e4a080762973503c4bc9ba5af/DEPS

Project Member

Comment 22 by bugdroid1@chromium.org, Jun 29 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2f793dcc427d8345550acb237aa54c144e68ccce

commit 2f793dcc427d8345550acb237aa54c144e68ccce
Author: catapult-deps-roller <catapult-deps-roller@chromium.org>
Date: Wed Jun 29 18:12:37 2016

Roll src/third_party/catapult/ f49c20888..a45087135 (9 commits).

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/f49c20888bb8..a4508713566b

$ git log f49c20888..a45087135 --date=short --no-merges --format='%ad %ae %s'

BUG= 619045 , 609252 , 605842 

TBR=catapult-sheriff@chromium.org

Review-Url: https://codereview.chromium.org/2111593002
Cr-Commit-Position: refs/heads/master@{#402870}

[modify] https://crrev.com/2f793dcc427d8345550acb237aa54c144e68ccce/DEPS

Comment 23 by kbr@chromium.org, Jun 29 2016

Blockedon: 620904
In  Issue 596622  another flake was reported in https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/250753 . The stack trace of the main thread after the GPU watchdog fired looked like this:

Thread 0
 0  libobjc.A.dylib + 0x6c42
 1  Foundation + 0x20832
 2  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0xa7
 3  Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa
 4  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x44
 5  CoreFoundation + 0x80a01
 6  CoreFoundation + 0x72b8d
 7  CoreFoundation + 0x721bf
 8  CoreFoundation + 0x71bd8
 9  Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f
10  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x7f
11  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0xd7
12  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33
13  Chromium Framework!__ZN4base11MessageLoop3RunEv + 0x67
14  Chromium Framework!__ZN7content7GpuMainERKNS_18MainFunctionParamsE + 0xee7
15  Chromium Framework!__ZN7content21ContentMainRunnerImpl3RunEv + 0x322
16  Chromium Framework!__ZN7content11ContentMainERKNS_17ContentMainParamsE + 0x36
17  Chromium Framework!_ChromeMain + 0x3a
18  Chromium Helper!_main + 0x212
19  Chromium Helper!start + 0x34

Blocking this on  Issue 620904 . These hangs are happening on multiple platforms and it seems likely there is a cross-platform race in the MessagePump implementation.

Comment 24 by kbr@chromium.org, Jun 29 2016

By the way, a list of all recent flakes of webgl_conformance_tests:
https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyLwsSBUZsYWtlIiR3ZWJnbF9jb25mb3JtYW5jZV90ZXN0cyAod2l0aCBwYXRjaCkM&show_all=1

chromium-try-flakes is currently reporting new flakes on  Issue 596622 .

Comment 25 by kbr@chromium.org, Jun 29 2016

As of https://codereview.chromium.org/2105063003/ , if the GPU watchdog fires on Windows, Telemetry will dump the stacks for all threads, not just the watchdog thread. We should carefully watch for new flakes on Windows and see if the main thread seems to be hung in similar fashion to on the Mac.

Comment 26 by kbr@chromium.org, Jul 13 2016

Cc: kbr@chromium.org nek...@chromium.org
 Issue 627915  has been merged into this issue.

Comment 27 by kbr@chromium.org, Jul 20 2016

Another instance: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/2079

WebglConformance_deqp_functional_gles3_negativeshaderapi failed on the AMD MacBook Pro Retinas because of a GPU process hang.

For some reason, either no minidump was generated, or it didn't get symbolized. stdout excerpt attached.

stdout.txt
82.4 KB View Download

Comment 28 by kbr@chromium.org, Jul 20 2016

...actually, I don't think Victor's code which tried to symbolize minidumps upon test failure has been ported to the new WebGL test harness that's now run on the bots.

Comment 29 by kbr@chromium.org, Jul 21 2016

Cc: eyaich@chromium.org
Per https://bugs.chromium.org/p/chromium/issues/detail?id=596622#c113 , all of the current flakes of the WebGL 1.0 conformance suite on the CQ are these random GPU process hangs. Victor, any chance you could port your code which symbolizes minidumps upon test failure to the new test harness (webgl_conformance_integration_test.py, gpu_integration_test.py)?

Comment 30 by kbr@chromium.org, Aug 17 2016

Blocking: 619264

Comment 31 by kbr@chromium.org, Aug 31 2016

Another instance: WebglConformance_deqp_functional_gles3_negativetextureapi failed on the Mac Minis in this try job:
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/3071

of this CL:
https://codereview.chromium.org/2273683003

stdout of this single test attached.

stdout.txt
191 KB View Download
As an update, I'm making some progress by provoking these GPU hangs by repeatedly running WebGL 2 conformance test shards.

This is a list of shards that provoked a failure:

https://chromium-swarm.appspot.com/user/task/30fb1b7db5a33910
https://chromium-swarm.appspot.com/user/task/30f9324f2b743a10
https://chromium-swarm.appspot.com/user/task/30f901ac903feb10
https://chromium-swarm.appspot.com/user/task/30f778804f711110
https://chromium-swarm.appspot.com/user/task/30f76b5757a3c910
https://chromium-swarm.appspot.com/user/task/30f733ba19ca7b10

I'm adding increasing diagnostic information to narrow things down, including stack traces and side channel information about task execution times.

An interesting observation is the GPU Main thread stack trace on https://chromium-swarm.appspot.com/user/task/30f76b5757a3c910.  The thread was iterating TaskObservers either before or after running a pending task.  This indicates that the thread did not (permanently) hang in a task.

The final PC is in a LockImpl::Unlock()... which shouldn't really hang unless there's memory corruption.

Possible theories:
 - A) A task took a long time to run which triggered the watchdog timeout, and the main thread happened to have just finished the task.
 - B) The task didn't take a long time but the watchdog timed out too soon.
 - C) There's some other race condition / memory corruption issue around the watchdog's TaskObserver implementation.

 0  libsystem_pthread.dylib + 0x1c28
 1  libsystem_pthread.dylib + 0x19ea
 2  Chromium Framework!__ZN4base8internal8LockImpl6UnlockEv + 0x15
 3  Chromium Framework!__ZNK4base17ThreadCheckerImpl19CalledOnValidThreadEv + 0xc6
 4  Chromium Framework!__ZNK4base19SequenceCheckerImpl21CalledOnValidSequenceEv + 0x76
 5  Chromium Framework!__ZNK4base8internal13WeakReference4Flag7IsValidEv + 0x1a
 6  Chromium Framework!__ZNK4base8internal13WeakReference8is_validEv + 0x11
 7  Chromium Framework!__ZN4base16ObserverListBaseINS_11MessageLoop12TaskObserverEE8IteratorD2Ev + 0x13
 8  Chromium Framework!__ZN4base11MessageLoop7RunTaskERKNS_11PendingTaskE + 0x222
 9  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0x2c
10  Chromium Framework!__ZN4base11MessageLoop6DoWorkEv + 0x173

=====

In another run, it looks like we did hang and time out inside a task.

https://chromium-swarm.appspot.com/user/task/30fb1b7db5a33910

[33659:17923:0831/133912:ERROR:gpu_watchdog_thread.cc(444)]  TASK_START time 27.175ms
[33659:17923:0831/133912:ERROR:gpu_watchdog_thread.cc(444)]  TASK_END time 27.195ms
[33659:17923:0831/133912:ERROR:gpu_watchdog_thread.cc(444)]  TASK_START time 33.736ms   <---- main thread is running a task
[33659:17923:0831/133912:ERROR:gpu_watchdog_thread.cc(444)]  ARM time 6026.98ms         <---- watchdog arms 6 seconds later
[33659:17923:0831/133912:ERROR:gpu_watchdog_thread.cc(444)]  TIMEOUT time 17591.6ms     <---- watchdog times out 11 seconds later.

However the stack trace doesn't look like we're in a task...

 0  libsystem_kernel.dylib + 0x1148a
 1  CoreFoundation + 0x72445
 2  CoreFoundation + 0x71bd8
 3  Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f
 4  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x7f
 5  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0xd7
 6  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33

Finally, as I'm adding more diagnostic code it's getting harder to reproduce the hangs, which again points to raciness.

For https://chromium-swarm.appspot.com/user/task/30fb1b7db5a33910 could there be a C++ exception or NSException that CFRunLoop is swallowing? Though r338332 seems to say those should all be fatal.
^ Good point.  Don't we compile C++ with no exceptions, which would make passing exceptions through C++ stacks 'a bad thing'?
Answering my own questions yes and yes, we compile without C++ exceptions, and if ObjC swallows an exception through C++ stacks then that would be pretty bad.

I'm not sure if that's happening, but I worry about the comment in r338332: "Using try/catch is now unrestricted".

Comment 36 by kbr@chromium.org, Sep 1 2016

Cc: rsesek@chromium.org
+rsesek for any comments about the above conversation as it relates to r338332 / https://chromium.googlesource.com/chromium/src/+/73dc3d5a6345ddd43a1a54f06dd4058be8a9842b .

Comment 37 by kbr@chromium.org, Sep 1 2016

Cc: u...@chromium.org
It looks to me like if an exception were being thrown by a task invoked by the RunLoop that base::mac::CallWithEHFrame would always see it, and crash (from MessagePumpCFRunLoopBase::RunWorkSource, src/base/message_loop/message_pump_mac.mm), so I suspect that mechanism introduced in r338332 is working as expected.

The hang in base::LockImpl::Unlock is strange. It reminds me of  Issue 609249 , but that was Linux-only.

Yes I agree the mechanism in r338332 is working as expected.  I tested it out and a termination with full stack dump from the thrower.

However if any code inside a base::mac::CallWithEHFrame uses a "@try @catch" frame I think we must still take care about calling into C++ from that frame.
Blocking: 642735
I'm having a hard time getting the GPU hangs when adding any diagnostic code to the Main thread.

The most recent flaky crash happened in the Renderer process.  See: https://chromium-swarm.appspot.com/user/task/3101207f37443b10

Stack Trace:
********************************************************************************
	Operating system: Mac OS X
	                  10.10.5 14F1021
	CPU: amd64
	     family 6 model 70 stepping 1
	     8 CPUs
	
	GPU: UNKNOWN
	
	Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
	Crash address: 0x7fff50fdccec
	Process uptime: 27 seconds
	
	Thread 0 (crashed)
	 0  0x7fff50fdccec
	    rax = 0x0000000000000000   rdx = 0x00007fd45871c6e0
	    rcx = 0x0000000000000000   rbx = 0x00007fd45871c6c0
	    rsi = 0x009d3800009d3800   rdi = 0x009d3700009d3803
	    rbp = 0x00007fff50fdcd10   rsp = 0x00007fff50fdcce0
	     r8 = 0x00007fff50fdccf0    r9 = 0x00007fff790e3300
	    r10 = 0x00007fd45c819590   r11 = 0x00007fff79afcfb0
	    r12 = 0x00007fff50fdce20   r13 = 0x0000000000002503
	    r14 = 0x00007fd45871c6b0   r15 = 0x00007fd45871c5f0
	    rip = 0x00007fff50fdccec
	    Found by: given as instruction pointer in context
	 1  CoreFoundation + 0x72b03
	 2  CoreFoundation + 0x721bf
	 3  CoreFoundation + 0x71bd8
	 4  Foundation + 0x90b29
	 5  Chromium Framework!__ZN4base20MessagePumpNSRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x7e
	 6  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x7f
	 7  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0xd7
	 8  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33
	 9  Chromium Framework!__ZN7content12RendererMainERKNS_18MainFunctionParamsE + 0x2ca

What could make NSRunLoop jump to a stack address?  Corrupted stack frame?
ObjC and C++ exceptions are the same thing under the hood, so yes, an ObjC exception handler can swallow C++ exceptions. The goal of r338332 is to:

1) Let any code (either in third_party or in a binary library) use try/catch/@try/@catch as it wants.
2) If any exception makes it to CallWithEHFrame, crash at the original point of throw.

From the stack in #40 above, the three frames in CoreFoundation 1-3 are:

0x00072b03 [CoreFoundation +	 0x72b03] __CFRunLoopDoSources0
0x000721bf [CoreFoundation +	 0x721bf] __CFRunLoopRun
0x00071bd8 [CoreFoundation +	 0x71bd8] CFRunLoopRunSpecific

But frame 0 corresponds to an %rip that doesn't appear to be in any specific module. Looking at the "loaded module" section at the bottom of the crash report, no address range contains 0x00007fff50fdccec. That actually looks more like a stack pointer address.
Got the following failure running with ASAN on WebGL 2.0 Conformance tests.  Looks like use after free for a TaskObserverAdapter on the Renderer thread.  It occurs usually occurs within a minute, on a random test.

This failure reason looks very similar to what we're looking for on the GPU thread (i.e. something's up with Watchdog TaskObserver).  Either way, it's a cause of flaky failures on the tests.

To reproduce run with ASAN Mac build, with '--enable-unsafe-es3-apis' and run tests in: https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html?version=2.0.0

==30774==ERROR: AddressSanitizer: use-after-poison on address 0x7ee4598f5a00 at pc 0x000264c18abb bp 0x7fff57de6af0 sp 0x7fff57de6ae8
READ of size 8 at 0x7ee4598f5a00 thread T0
==30774==WARNING: invalid path to external symbolizer!
==30774==WARNING: Failed to use and restart external symbolizer!
    #0 0x264c18aba in blink::scheduler::WebThreadBase::TaskObserverAdapter::DidProcessTask(base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x11318aba)
    #1 0x264bf09ef in blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue(blink::scheduler::internal::WorkQueue*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x112f09ef)
    #2 0x264beb49e in blink::scheduler::TaskQueueManager::DoWork(base::TimeTicks, bool) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x112eb49e)
    #3 0x25a7972e5 in base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6e972e5)
    #4 0x25a82dbb4 in base::MessageLoop::RunTask(base::PendingTask const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2dbb4)
    #5 0x25a82e8a6 in base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2e8a6)
    #6 0x25a82fcee in base::MessageLoop::DoWork() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f2fcee)
    #7 0x25a83a649 in base::MessagePumpCFRunLoopBase::RunWork() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3a649)
    #8 0x25a804d19 in base::mac::CallWithEHFrame(void () block_pointer) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f04d19)
    #9 0x25a83831f in base::MessagePumpCFRunLoopBase::RunWorkSource(void*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3831f)
    #10 0x7fff85873880 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0xaa880)
    #11 0x7fff85852fbb in __CFRunLoopDoSources0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x89fbb)
    #12 0x7fff858524de in __CFRunLoopRun (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x894de)
    #13 0x7fff85851ed7 in CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation+0x88ed7)
    #14 0x7fff819c8ed8 in -[NSRunLoop(NSRunLoop) runMode:beforeDate:] (/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation+0x24ed8)
    #15 0x25a83c5ae in base::MessagePumpNSRunLoop::DoRun(base::MessagePump::Delegate*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f3c5ae)
    #16 0x25a839c8b in base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6f39c8b)
    #17 0x25a8b8bbb in base::RunLoop::Run() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x6fb8bbb)
    #18 0x26a9e8bc1 in content::RendererMain(content::MainFunctionParams const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x170e8bc1)
    #19 0x259582d4c in content::ContentMainRunnerImpl::Run() (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x5c82d4c)
    #20 0x25957f333 in content::ContentMain(content::ContentMainParams const&) (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x5c7f333)
    #21 0x2539054af in ChromeMain (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x54af)
    #22 0x107e16bd7 in main (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Helper.app/Contents/MacOS/Chromium Helper+0x100001bd7)
    #23 0x107e167c3 in start (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Helper.app/Contents/MacOS/Chromium Helper+0x1000017c3)

AddressSanitizer can not describe address in more detail (wild memory access suspected).
SUMMARY: AddressSanitizer: use-after-poison (/asan-mac-release-416628/Chromium.app/Contents/Versions/55.0.2852.0/Chromium Framework.framework/Chromium Framework+0x11318aba) in blink::scheduler::WebThreadBase::TaskObserverAdapter::DidProcessTask(base::PendingTask const&)
Shadow bytes around the buggy address:
  0x1fdc8b31eaf0: 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x1fdc8b31eb00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00 00 00 00
  0x1fdc8b31eb10: 00 00 00 f7 00 00 00 00 00 00 00 00 00 00 00 00
  0x1fdc8b31eb20: 00 00 00 f7 00 00 00 00 00 00 00 00 00 f7 00 00
  0x1fdc8b31eb30: 00 00 00 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7
=>0x1fdc8b31eb40:[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00
  0x1fdc8b31eb50: 00 00 00 00 00 00 f7 00 00 00 00 00 00 00 00 f7
  0x1fdc8b31eb60: 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7
  0x1fdc8b31eb70: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x1fdc8b31eb80: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x1fdc8b31eb90: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==30774==ABORTING
Received signal 6
 [0x00025a7957ea]
 [0x7fff87b2c52a]
 [0x000108b5c79b]
 [0x7fff949686df]
 [0x000107e8bac6]
 [0x000107e7217b]
 [0x000264c18abb]
 [0x000264bf09f0]
 [0x000264beb49f]
 [0x00025a7972e6]
 [0x00025a82dbb5]
 [0x00025a82e8a7]
 [0x00025a82fcef]
 [0x00025a83a64a]
 [0x00025a804d1a]
 [0x00025a838320]
 [0x7fff85873881]
 [0x7fff85852fbc]
 [0x7fff858524df]
 [0x7fff85851ed8]
 [0x7fff819c8ed9]
 [0x00025a83c5af]
 [0x00025a839c8c]
 [0x00025a8b8bbc]
 [0x00026a9e8bc2]
 [0x000259582d4d]
 [0x00025957f334]
 [0x0002539054b0]
 [0x000107e16bd8]
 [0x000107e167c4]
[end of stack trace]
Blockedon: 644575
@rsesek: Thanks for the explanations.  Do we always have accurate stack unwind data up to CallWithEHFrame for C++ code, which is built without exceptions enabled?  Or is it simply not a concern, since the uncaught exception is fatal?
On Mac we always emit unwind tables, even with C++ exceptions disabled, so it's not an issue.

Comment 46 by kbr@chromium.org, Sep 7 2016

Blockedon: 644920

Comment 47 by kbr@chromium.org, Sep 13 2016

Blockedon: 646538

Comment 48 by kbr@chromium.org, Sep 13 2016

Attached is stdout from https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/292381 in which WebglConformance_conformance_uniforms_uniform_samplers_test failed on Windows due to a GPU process hang.

Since this is on Windows it definitely isn't the exception issue described on macOS.

It looks to me like  Issue 644575  only affects the renderer process, correct? Is there a hypothesis that TaskObservers may be being mismanaged in the GPU process as well?

This is still a major source of flakiness on the tryservers. I'm getting daily pings from chromium-try-flakes on  Issue 596622 . How will we make forward progress on this?

stdout.txt
3.2 MB View Download
kbr@: Adding additional code for diagnosis has prevented the issue from repeating over thousands of runs.  It strongly points to races & data corruption.

 Issue 644575  is likely unrelated, but I'd like to see what that turns out to be.  Until that is fixed it's also blocking us from running extended ASAN tests to try to catch the GPU side race.  I plan to return to this.

Comment 50 by kbr@chromium.org, Sep 19 2016

Blocking: 648318
Cc: -jamescook@chromium.org

Comment 53 by kbr@chromium.org, Sep 22 2016

I think so but am not 100% sure. We need to fix the new gpu_integration_test.py harness to symbolize all unsymbolized stack traces upon test failure.

There are symbols later on in the log, though I'm not sure if from the same stack.
#52: It may be related, but I'm not seeing a GPU watchdog timeout in this log.
Labels: M-55
Blockedon: 612219
Issue 612219 is tracking a couple of changes to the GPU Watchdog that may affect the flakiness here.  Let's keep an eye on any changes.

 1) Changes the Ack-ing logic from the watched thread.
 2) Adds more watchdog ping signals to ContextGroup destruction code known to sometimes time out in the wild.

Comment 58 by kbr@chromium.org, Oct 10 2016

Have these fixes landed yet? Per https://bugs.chromium.org/p/chromium/issues/detail?id=619264#c78 , all of the most recent WebGL 2.0 conformance test flakes are due to the GPU process being reported as hanging.

Project Member

Comment 59 by bugdroid1@chromium.org, Oct 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/51e832338dfaeedf418778541a10003fd1feef8b

commit 51e832338dfaeedf418778541a10003fd1feef8b
Author: kbr <kbr@chromium.org>
Date: Tue Oct 25 02:34:54 2016

Try disabling GPU watchdog in WebGL conformance tests.

It's believed that the flaky GPU process hangs seen on the waterfall
are spurious, not real. Try disabling the GPU watchdog. If the hangs
are real, this will result in test timeouts rather than GPU process
crashes, context loss, and test failures.

BUG= 596622 ,  609252 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2446033002
Cr-Commit-Position: refs/heads/master@{#427247}

[modify] https://crrev.com/51e832338dfaeedf418778541a10003fd1feef8b/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py

Comment 60 by enne@chromium.org, Nov 9 2016

vmiura: ping.  Is this still an issue? It looks like conformance tests are still flaky, but hard to tell what's this and what's 596622 and if this should be closed or not.

Comment 61 by kbr@chromium.org, Nov 9 2016

Labels: -Pri-1 Pri-2
The majority of the remaining WebGL conformance flakes were worked around by disabling the GPU watchdog timer. It's still not clear to me whether there is a real bug in the watchdog that's being observed in the wild.

If we're going to close this then we should somehow resolve the two outstanding issues it's blocked on (Issue 611390 and Issue 646538).

Let me downgrade this to P2 on Victor's behalf since it's not affecting the WebGL conformance tests on the waterfall any more.

Comment 62 by kbr@chromium.org, Nov 22 2016

Symbolization of unsymbolized minidumps has been added to the gpu_integration_test.py harness, and it's working (modulo a recent problem,  Issue 667475 ), so removing the Catapult folks from the CC: list.

Comment 63 by kbr@chromium.org, Nov 22 2016

Cc: -eyaich@chromium.org

Comment 64 by kbr@chromium.org, Nov 29 2016

Blockedon: 662802

Comment 65 by kbr@chromium.org, Dec 19 2016

Blocking: -596622

Comment 66 by kbr@chromium.org, Dec 19 2016

Blockedon: 596622

Comment 67 by kbr@chromium.org, Dec 30 2016

Blockedon: 659081

Comment 68 by kbr@chromium.org, Jan 9 2017

Blocking: -352807

Comment 69 by kbr@chromium.org, Jan 9 2017

Blockedon: 352807

Comment 70 by kbr@chromium.org, Jan 12 2017

Blocking: -619264

Comment 71 by kbr@chromium.org, Jan 12 2017

Blockedon: 619264

Comment 72 by kbr@chromium.org, Mar 23 2017

Blocking: 622813

Comment 73 by kbr@chromium.org, May 3 2017

Blockedon: 689350
Project Member

Comment 74 by bugdroid1@chromium.org, May 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8f97866e8f0c1bae205fe0f3a4107d41ba8bae28

commit 8f97866e8f0c1bae205fe0f3a4107d41ba8bae28
Author: kbr <kbr@chromium.org>
Date: Wed May 17 08:03:24 2017

Re-enable GPU watchdog for WebGL conformance tests.

See whether this is working any more reliably now.

BUG= 609252 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2887983002
Cr-Commit-Position: refs/heads/master@{#472378}

[modify] https://crrev.com/8f97866e8f0c1bae205fe0f3a4107d41ba8bae28/content/test/gpu/gpu_tests/webgl_conformance_integration_test.py

Comment 75 by kbr@chromium.org, May 18 2017

Cc: ericrk@chromium.org
This issue is still occurring after re-enabling the GPU watchdog for these tests. Here are two failures:

https://luci-milo.appspot.com/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/64401

WebglConformance_conformance_textures_misc_texture_upload_size failed in this run.

https://luci-milo.appspot.com/buildbot/chromium.gpu/Mac%20Retina%20Release%20%28AMD%29/6737

WebglConformance_conformance_more_conformance_quickCheckAPI_B1 failed in this run.

In both cases the tests failed because the GPU watchdog was triggered. Here's the symbolized minidump from one of the runs.

  Symbolized minidump:
  Operating system: Mac OS X
                    10.12.4 16E195
  CPU: amd64
       family 6 model 69 stepping 1
       4 CPUs
  
  GPU: UNKNOWN
  
  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
  Crash address: 0x0
  Process uptime: 29 seconds
  
  Thread 3 (crashed)
   0  Chromium Framework!__ZN3gpu17GpuWatchdogThread38DeliberatelyTerminateToRecoverFromHangEv + 0x11f
      rax = 0x200c6932b5420050   rdx = 0x00000000000a9940
      rcx = 0x0000000000000000   rbx = 0x00007fe29fc17c88
      rsi = 0x00007fe29ff00000   rdi = 0x0000000000000000
      rbp = 0x00007000085c9250   rsp = 0x00007000085c90f0
       r8 = 0x0000000000000007    r9 = 0x00007fe29ff3bd20
      r10 = 0x0000000000000001   r11 = 0x0000000000000008
      r12 = 0x0000000000000000   r13 = 0x000000011da3de00
      r14 = 0x00007000085c90f8   r15 = 0x00007000085c9100
      rip = 0x000000011788a7bf
      Found by: given as instruction pointer in context
   1  Chromium Framework!__ZN3gpu17GpuWatchdogThread14OnCheckTimeoutEv + 0x99
      rbp = 0x00007000085c9410   rsp = 0x00007000085c9260
      rip = 0x000000011788a449
      Found by: previous frame's frame pointer
   2  Chromium Framework!__ZN4base8internal7InvokerINS0_9BindStateIMN3gpu17GpuWatchdogThreadEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE + 0xb4
      rbp = 0x00007000085c9570   rsp = 0x00007000085c9420
      rip = 0x000000011788ac84
      Found by: previous frame's frame pointer
   3  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x136
      rbp = 0x00007000085c9710   rsp = 0x00007000085c9580
      rip = 0x000000011648d4a6
      Found by: previous frame's frame pointer
   4  Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x1b9
      rbp = 0x00007000085c98e0   rsp = 0x00007000085c9720
      rip = 0x00000001164cb8c9
      Found by: previous frame's frame pointer
   5  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0x1f
      rbp = 0x00007000085c9900   rsp = 0x00007000085c98f0
      rip = 0x00000001164cbc7f
      Found by: previous frame's frame pointer
   6  Chromium Framework!__ZN4base11MessageLoop13DoDelayedWorkEPNS_9TimeTicksE + 0x137
      rbp = 0x00007000085c9a70   rsp = 0x00007000085c9910
      rip = 0x00000001164cc117
      Found by: previous frame's frame pointer
   7  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x4c
      rbp = 0x00007000085c9aa0   rsp = 0x00007000085c9a80
      rip = 0x00000001164d07cc
      Found by: previous frame's frame pointer
...


I'm going to turn the watchdog back off on the bots.

We really should get to the bottom of this. It may be happening in the field and may be the cause of increased incidences of the "Rats, WebGL hit a snag" infobar. Any ideas about what might be going wrong with the TaskObserver list or how to continue to track this down would be appreciated.

Comment 76 by kbr@chromium.org, May 19 2017

Note: another failure showing this is happening on Windows too:
https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_optional_gpu_tests_rel/10429

WebglConformance_deqp_functional_gles3_shaderoperator_binary_operator_06 failed because the GPU watchdog triggered.

  Symbolized minidump:
  Last event: f08.d3c: Access violation - code c0000005 (first/second chance not available)
    debugger time: Thu May 18 14:43:49.208 2017 (UTC - 7:00)
  ChildEBP RetAddr  Args to Child              
  0429efd0 688947c8 6889461c 00000000 0429f838 chrome_child!gpu::GpuWatchdogThread::DeliberatelyTerminateToRecoverFromHang+0x27f
  0429f0e4 69bf919f 0429f838 6a6f9dcc 0429f104 chrome_child!gpu::GpuWatchdogThread::OnCheckTimeout+0x1ac
  0429f0f4 69bfa69a 06c1eb60 06c1eb68 0429f1ec chrome_child!??$MakeItSo@ABQ8AsyncReadPixelsCompletedQuery@gles2@gpu@@AEXXZABV?$WeakPtr@VAsyncReadPixelsCompletedQuery@gles2@gpu@@@base@@$$V@?$InvokeHelper@$00X@internal@base@@SAXABQ8AsyncReadPixelsCompletedQuery@gles2@gpu@@AEXXZABV?$WeakPtr@VAsyncReadPixelsCompletedQuery@gles2@gpu@@@2@@Z+0x26
  0429f104 681815d9 06c1eb48 00000000 688945df chrome_child!base::internal::Invoker<base::internal::BindState<void (__thiscall gpu::gles2::AsyncReadPixelsCompletedQuery::*)(void),base::WeakPtr<gpu::gles2::AsyncReadPixelsCompletedQuery> >,void __cdecl(void)>::Run+0x13
  0429f1ec 68137471 6a4c51dc 0429f838 00facaf8 chrome_child!base::debug::TaskAnnotator::RunTask+0x199
  0429f81c 68136018 0429f838 00facaf8 a4627e48 chrome_child!base::MessageLoop::RunTask+0x4d1
  0429f830 681363d0 00fa0e20 0277ad77 a4627e48 chrome_child!base::MessageLoop::DeferOrRunPendingTask+0x48
  0429f8d0 681834c1 00f9feb8 00000000 00facafc chrome_child!base::MessageLoop::DoDelayedWork+0xc0
  0429f99c 68136f8b 00facaf8 00000000 0429fd30 chrome_child!base::MessagePumpDefault::Run+0x81
  0429fa68 68150f03 00000000 00faca1c 00000000 chrome_child!base::MessageLoop::Run+0x6b
  0429fbac 68133fed 00000000 0429fbe8 681220f4 chrome_child!base::RunLoop::Run+0x93
  0429fc68 681348ce 0429fd30 00f92d50 76291420 chrome_child!base::Thread::Run+0xad
  0429fd74 6811acc2 00000000 00000000 00f92d50 chrome_child!base::Thread::ThreadMain+0x26e
  *** WARNING: Unable to verify checksum for kernel32.dll

...

Comment 77 by kbr@chromium.org, May 19 2017

Victor pointed out something suspicious about the Mac watchdog timeouts so here's the entire symbolized minidump from the Windows failure.

  Symbolized minidump:
  Last event: f08.d3c: Access violation - code c0000005 (first/second chance not available)
    debugger time: Thu May 18 14:43:49.208 2017 (UTC - 7:00)
  ChildEBP RetAddr  Args to Child              
  0429efd0 688947c8 6889461c 00000000 0429f838 chrome_child!gpu::GpuWatchdogThread::DeliberatelyTerminateToRecoverFromHang+0x27f
  0429f0e4 69bf919f 0429f838 6a6f9dcc 0429f104 chrome_child!gpu::GpuWatchdogThread::OnCheckTimeout+0x1ac
  0429f0f4 69bfa69a 06c1eb60 06c1eb68 0429f1ec chrome_child!??$MakeItSo@ABQ8AsyncReadPixelsCompletedQuery@gles2@gpu@@AEXXZABV?$WeakPtr@VAsyncReadPixelsCompletedQuery@gles2@gpu@@@base@@$$V@?$InvokeHelper@$00X@internal@base@@SAXABQ8AsyncReadPixelsCompletedQuery@gles2@gpu@@AEXXZABV?$WeakPtr@VAsyncReadPixelsCompletedQuery@gles2@gpu@@@2@@Z+0x26
  0429f104 681815d9 06c1eb48 00000000 688945df chrome_child!base::internal::Invoker<base::internal::BindState<void (__thiscall gpu::gles2::AsyncReadPixelsCompletedQuery::*)(void),base::WeakPtr<gpu::gles2::AsyncReadPixelsCompletedQuery> >,void __cdecl(void)>::Run+0x13
  0429f1ec 68137471 6a4c51dc 0429f838 00facaf8 chrome_child!base::debug::TaskAnnotator::RunTask+0x199
  0429f81c 68136018 0429f838 00facaf8 a4627e48 chrome_child!base::MessageLoop::RunTask+0x4d1
  0429f830 681363d0 00fa0e20 0277ad77 a4627e48 chrome_child!base::MessageLoop::DeferOrRunPendingTask+0x48
  0429f8d0 681834c1 00f9feb8 00000000 00facafc chrome_child!base::MessageLoop::DoDelayedWork+0xc0
  0429f99c 68136f8b 00facaf8 00000000 0429fd30 chrome_child!base::MessagePumpDefault::Run+0x81
  0429fa68 68150f03 00000000 00faca1c 00000000 chrome_child!base::MessageLoop::Run+0x6b
  0429fbac 68133fed 00000000 0429fbe8 681220f4 chrome_child!base::RunLoop::Run+0x93
  0429fc68 681348ce 0429fd30 00f92d50 76291420 chrome_child!base::Thread::Run+0xad
  0429fd74 6811acc2 00000000 00000000 00f92d50 chrome_child!base::Thread::ThreadMain+0x26e
  *** WARNING: Unable to verify checksum for kernel32.dll
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for kernel32.dll - 
  0429fd90 7629336a 0000018c 0429fddc 772c92b2 chrome_child!base::PlatformThread::Sleep+0x122
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0429fd9c 772c92b2 00f92d50 7163c889 00000000 kernel32!BaseThreadInitThunk+0x12
  0429fddc 772c9285 6811ac30 00f92d50 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0429fdf4 00000000 6811ac30 00f92d50 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     0  Id: f08.ccc Suspend: 0 Teb: fffdd000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  001ef020 76291194 00000174 000074f8 00000000 ntdll!ZwWaitForSingleObject+0x15
  001ef038 76291148 00000174 000074f8 00000000 kernel32!WaitForSingleObjectEx+0x43
  001ef04c 68133aa7 00000174 000074f8 00f9fe24 kernel32!WaitForSingleObject+0x12
  001ef130 68133748 00000174 001ef15c 00f9fe28 chrome_child!base::WaitableEvent::WaitMany+0x2b7
  001ef164 68183507 00f9fe28 00000000 00fabf6c chrome_child!base::WaitableEvent::TimedWaitUntil+0x68
  001ef230 68136f8b 00fabf68 00000000 001ef454 chrome_child!base::MessagePumpDefault::Run+0xc7
  001ef2fc 68150f03 00000000 6aeef204 00000000 chrome_child!base::MessageLoop::Run+0x6b
  001ef440 691207b6 00f9a4b0 00000004 00f92748 chrome_child!base::RunLoop::Run+0x93
  001ef79c 680f2324 001ef934 02e88180 00000003 chrome_child!content::GpuMain+0x2ba
  001ef860 680f2243 001ef97c 001ef934 001efcb0 chrome_child!content::RunNamedProcessTypeMain+0xb0
  001ef998 680f59cd 00dca3e8 6e4f9576 00000000 chrome_child!content::ContentMainRunnerImpl::Run+0x19d
  001efbbc 680f190c 001efbe8 6e4f9576 6a4bef40 chrome_child!service_manager::Main+0x20c
  001efbec 6770bf5d 001efcd0 0000000b 00dc5e88 chrome_child!content::ContentMain+0x27
  *** WARNING: Unable to verify checksum for chrome.exe
  001efce4 00ff4f33 00ff0000 001efd08 9188af3b chrome_child!ChromeMain+0xdd
  001efd6c 00ff2cfd 00ff0000 9188af3b 00000009 chrome!MainDllLoader::Launch+0x1bc
  001efe90 01045cd8 00ff0000 00000000 01252964 chrome!wWinMain+0x17f
  (Inline) -------- -------- -------- -------- chrome!invoke_main+0x1a
  001efedc 7629336a fffde000 001eff28 772c92b2 chrome!__scrt_common_main_seh+0xf6
  001efee8 772c92b2 fffde000 7554ca7d 00000000 kernel32!BaseThreadInitThunk+0x12
  001eff28 772c9285 01045d50 fffde000 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  001eff40 00000000 01045d50 fffde000 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
  #  1  Id: f08.d3c Suspend: 0 Teb: fffda000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0429e5ac 768744a5 0000ea60 00000000 0429e684 ntdll!NtDelayExecution+0x15
  *** WARNING: Unable to verify checksum for chrome_elf.dll
  0429e5bc 6e4fb076 0000ea60 0429e848 0429e848 KERNELBASE!Sleep+0xf
  0429e684 6816da57 0429e848 7636030c 688940eb chrome_elf!crashpad::CrashpadClient::StartHandler+0xc0c
  0429e790 762d0313 0429e848 c490c92f 00000000 chrome_child!base::debug::StackTrace::Print+0x47
  0429e818 77301ca7 0429e848 77301b84 00000000 kernel32!GetProfileStringW+0x12d77
  0429fddc 772c9285 6811ac30 00f92d50 ffffffff ntdll!RtlKnownExceptionFilter+0xb7
  0429fdf4 00000000 6811ac30 00f92d50 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     2  Id: f08.bdc Suspend: 0 Teb: fffaf000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  03e1f33c 76291194 0000026c ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  03e1f354 76291148 0000026c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  *** WARNING: Unable to verify checksum for nvwgf2um.dll
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for nvwgf2um.dll - 
  03e1f368 65704c48 0000026c ffffffff 0470a32c kernel32!WaitForSingleObject+0x12
  03e1f8ac 65703f03 03e1f8d4 65703ef5 03e1f8c8 nvwgf2um!OpenAdapter12+0x10b2d8
  03e1f8d4 65c24e25 05d66050 03e1f918 65f45f02 nvwgf2um!OpenAdapter12+0x10a593
  03e1f8e0 65f45f02 0470a414 c5d1057e 00000000 nvwgf2um!NVAPI_Thunk+0x86285
  03e1f918 65f4602a 00000000 03e1f930 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  03e1f924 7629336a 05d53c98 03e1f970 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  03e1f930 772c92b2 05d53c98 76abcc25 00000000 kernel32!BaseThreadInitThunk+0x12
  03e1f970 772c9285 65f45fae 05d53c98 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  03e1f988 00000000 65f45fae 05d53c98 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     3  Id: f08.ed4 Suspend: 0 Teb: fffac000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0636f4bc 76291194 00000298 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  0636f4d4 76291148 00000298 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  0636f4e8 6813378a 00000298 ffffffff 00fa0198 kernel32!WaitForSingleObject+0x12
  0636f5c0 681834ff 00000000 00fb5fcc 00fb5fc8 chrome_child!base::WaitableEvent::Wait+0x2a
  0636f688 68136f8b 00fb5fc8 00000000 0636fa1c chrome_child!base::MessagePumpDefault::Run+0xbf
  0636f754 68150f03 00000000 00fb57e0 00000000 chrome_child!base::MessageLoop::Run+0x6b
  0636f898 68133fed 00000000 0636f8d4 6812215f chrome_child!base::RunLoop::Run+0x93
  0636f954 681348ce 0636fa1c 00fb13b8 76291420 chrome_child!base::Thread::Run+0xad
  0636fa60 6811acc2 00000000 00000000 00fb13b8 chrome_child!base::Thread::ThreadMain+0x26e
  0636fa7c 7629336a 00000294 0636fac8 772c92b2 chrome_child!base::PlatformThread::Sleep+0x122
  0636fa88 772c92b2 00fb13b8 737ccf9d 00000000 kernel32!BaseThreadInitThunk+0x12
  0636fac8 772c9285 6811ac30 00fb13b8 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0636fae0 00000000 6811ac30 00fb13b8 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     4  Id: f08.9f0 Suspend: 0 Teb: fffa9000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  064cf774 68182915 000002a4 064cf86c 064cf798 ntdll!NtRemoveIoCompletion+0x15
  064cf79c 68183243 ffffffff 00000000 06b32f98 chrome_child!base::MessagePumpForIO::GetIOItem+0x35
  064cf87c 681826b0 00000000 00fae8d8 00000000 chrome_child!base::MessagePumpForIO::WaitForWork+0xa3
  064cf95c 68182f7a 00fb643c 00fb6438 00fb6438 chrome_child!base::MessagePumpForIO::DoRunLoop+0xb0
  064cf984 68136f8b 00fb6438 00000000 064cfd18 chrome_child!base::MessagePumpWin::Run+0x4a
  064cfa50 68150f03 00000000 001ef70c 00000000 chrome_child!base::MessageLoop::Run+0x6b
  064cfb94 68133fed 00000000 064cfbd0 68122148 chrome_child!base::RunLoop::Run+0x93
  064cfc50 681348ce 064cfd18 00fb5ab0 76291420 chrome_child!base::Thread::Run+0xad
  064cfd5c 6811acc2 00000000 00000000 00fb5ab0 chrome_child!base::Thread::ThreadMain+0x26e
  064cfd78 7629336a 0000029c 064cfdc4 772c92b2 chrome_child!base::PlatformThread::Sleep+0x122
  064cfd84 772c92b2 00fb5ab0 7306c891 00000000 kernel32!BaseThreadInitThunk+0x12
  064cfdc4 772c9285 6811ac30 00fb5ab0 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  064cfddc 00000000 6811ac30 00fb5ab0 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     5  Id: f08.d50 Suspend: 0 Teb: fffa6000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0674fa58 7629336a 00000000 0674faa4 772c92b2 ntdll!ZwWaitForMultipleObjects+0x15
  0674fa64 772c92b2 05d9e9d8 733ecff1 00000000 kernel32!BaseThreadInitThunk+0x12
  0674faa4 772c9285 772d13b5 05d9e9d8 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0674fabc 00000000 772d13b5 05d9e9d8 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     6  Id: f08.76c Suspend: 0 Teb: fffa3000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0690f8c8 7629336a 05d9f358 0690f914 772c92b2 ntdll!NtWaitForWorkViaWorkerFactory+0x12
  0690f8d4 772c92b2 05d9f358 73dacc41 00000000 kernel32!BaseThreadInitThunk+0x12
  0690f914 772c9285 772e27e5 05d9f358 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0690f92c 00000000 772e27e5 05d9f358 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     7  Id: f08.97c Suspend: 0 Teb: fffd7000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0722fb30 7629336a 05eaac78 0722fb7c 772c92b2 ntdll!NtWaitForWorkViaWorkerFactory+0x12
  0722fb3c 772c92b2 05eaac78 7268ce29 00000000 kernel32!BaseThreadInitThunk+0x12
  0722fb7c 772c9285 772e27e5 05eaac78 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0722fb94 00000000 772e27e5 05eaac78 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     8  Id: f08.ef8 Suspend: 0 Teb: fffa0000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0741feb0 7629336a 05eaac78 0741fefc 772c92b2 ntdll!NtWaitForWorkViaWorkerFactory+0x12
  0741febc 772c92b2 05eaac78 720bcba9 00000000 kernel32!BaseThreadInitThunk+0x12
  0741fefc 772c9285 772e27e5 05eaac78 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0741ff14 00000000 772e27e5 05eaac78 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
     9  Id: f08.45c Suspend: 0 Teb: fff9d000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0759fd38 7629336a 05eaac78 0759fd84 772c92b2 ntdll!NtWaitForWorkViaWorkerFactory+0x12
  0759fd44 772c92b2 05eaac78 7213c8d1 00000000 kernel32!BaseThreadInitThunk+0x12
  0759fd84 772c9285 772e27e5 05eaac78 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0759fd9c 00000000 772e27e5 05eaac78 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    10  Id: f08.6a4 Suspend: 0 Teb: fff9a000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0680fe84 76291194 00000394 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  0680fe9c 76291148 00000394 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  0680feb0 65764d9a 00000394 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  0680fecc 65764cc0 00000000 0680ff10 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  0680fed8 65f45f02 04730238 c0b00376 00000000 nvwgf2um!OpenAdapter12+0x16b350
  0680ff10 65f4602a 00000000 0680ff28 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  0680ff1c 7629336a 05eb7570 0680ff68 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  0680ff28 772c92b2 05eb7570 73caca3d 00000000 kernel32!BaseThreadInitThunk+0x12
  0680ff68 772c9285 65f45fae 05eb7570 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0680ff80 00000000 65f45fae 05eb7570 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    11  Id: f08.d44 Suspend: 0 Teb: fff97000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  03f1fba0 76291194 00000394 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  03f1fbb8 76291148 00000394 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  03f1fbcc 65764d9a 00000394 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  03f1fbe8 65764cc0 00000001 03f1fc2c 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  03f1fbf4 65f45f02 04730240 c5c1004a 00000000 nvwgf2um!OpenAdapter12+0x16b350
  03f1fc2c 65f4602a 00000000 03f1fc44 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  03f1fc38 7629336a 05eb7950 03f1fc84 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  03f1fc44 772c92b2 05eb7950 76bbc9d1 00000000 kernel32!BaseThreadInitThunk+0x12
  03f1fc84 772c9285 65f45fae 05eb7950 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  03f1fc9c 00000000 65f45fae 05eb7950 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    12  Id: f08.f50 Suspend: 0 Teb: fff94000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0625fb50 76291194 00000394 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  0625fb68 76291148 00000394 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  0625fb7c 65764d9a 00000394 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  0625fb98 65764cc0 00000002 0625fbdc 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  0625fba4 65f45f02 04730248 c01507ba 00000000 nvwgf2um!OpenAdapter12+0x16b350
  0625fbdc 65f4602a 00000000 0625fbf4 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  0625fbe8 7629336a 05eb7d18 0625fc34 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  0625fbf4 772c92b2 05eb7d18 736fc961 00000000 kernel32!BaseThreadInitThunk+0x12
  0625fc34 772c9285 65f45fae 05eb7d18 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0625fc4c 00000000 65f45fae 05eb7d18 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    13  Id: f08.730 Suspend: 0 Teb: fff91000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  06fdf838 76291194 00000394 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  06fdf850 76291148 00000394 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  06fdf864 65764d9a 00000394 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  06fdf880 65764cc0 00000003 06fdf8c4 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  06fdf88c 65f45f02 04730250 c0cd04a2 00000000 nvwgf2um!OpenAdapter12+0x16b350
  06fdf8c4 65f4602a 00000000 06fdf8dc 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  06fdf8d0 7629336a 05eb8870 06fdf91c 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  06fdf8dc 772c92b2 05eb8870 73b7cc49 00000000 kernel32!BaseThreadInitThunk+0x12
  06fdf91c 772c9285 65f45fae 05eb8870 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  06fdf934 00000000 65f45fae 05eb8870 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    14  Id: f08.e78 Suspend: 0 Teb: fff8e000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  063bfde8 762919fc 063bfd9c 063bfe10 00000000 ntdll!ZwWaitForMultipleObjects+0x15
  063bfe30 762941d8 00000003 fffde000 00000000 kernel32!WaitForMultipleObjectsEx+0x8e
  063bfe4c 65c0a722 00000003 05eb65f8 00000000 kernel32!WaitForMultipleObjects+0x18
  063bfe90 65c24e25 05eb9b58 063bfed4 65f45f02 nvwgf2um!NVAPI_Thunk+0x6bb82
  063bfe9c 65f45f02 05eb65e8 c00b02b2 00000000 nvwgf2um!NVAPI_Thunk+0x86285
  063bfed4 65f4602a 00000000 063bfeec 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  063bfee0 7629336a 05eb8870 063bff2c 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  063bfeec 772c92b2 05eb8870 7371ca79 00000000 kernel32!BaseThreadInitThunk+0x12
  063bff2c 772c9285 65f45fae 05eb8870 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  063bff44 00000000 65f45fae 05eb8870 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    15  Id: f08.d64 Suspend: 0 Teb: fff8b000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  047af840 76291194 0000036c ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  047af858 76291148 0000036c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  047af86c 65764d9a 0000036c ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  047af888 65764cc0 00000000 047af8cc 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  047af894 65f45f02 047304c8 c24a04aa 00000000 nvwgf2um!OpenAdapter12+0x16b350
  047af8cc 65f4602a 00000000 047af8e4 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  047af8d8 7629336a 05eb8870 047af924 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  047af8e4 772c92b2 05eb8870 7130cc71 00000000 kernel32!BaseThreadInitThunk+0x12
  047af924 772c9285 65f45fae 05eb8870 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  047af93c 00000000 65f45fae 05eb8870 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    16  Id: f08.29c Suspend: 0 Teb: fff88000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  03f0f8f0 76291194 0000036c ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  03f0f908 76291148 0000036c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  03f0f91c 65764d9a 0000036c ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  03f0f938 65764cc0 00000001 03f0f97c 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  03f0f944 65f45f02 047304d0 c5c0051a 00000000 nvwgf2um!OpenAdapter12+0x16b350
  03f0f97c 65f4602a 00000000 03f0f994 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  03f0f988 7629336a 05eba2e8 03f0f9d4 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  03f0f994 772c92b2 05eba2e8 76bacc81 00000000 kernel32!BaseThreadInitThunk+0x12
  03f0f9d4 772c9285 65f45fae 05eba2e8 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  03f0f9ec 00000000 65f45fae 05eba2e8 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    17  Id: f08.9c0 Suspend: 0 Teb: fff85000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  03dff8f8 65781771 00000005 00000000 0d2c0558 nvwgf2um!OpenAdapter12+0x187f08
  03dff910 658a3f75 00000014 0d186be0 00000000 nvwgf2um!OpenAdapter12+0x187e01
  03dff928 65a689d1 0d185828 00000027 00000001 nvwgf2um!OpenAdapter12+0x2aa605
  03dff958 6590720d 0d185828 03dff9e0 659094f8 nvwgf2um!OpenAdapter12+0x46f061
  03dff964 659094f8 0d185828 0d186708 0d185828 nvwgf2um!OpenAdapter12+0x30d89d
  03dff9e0 657c4a7e 0d185828 03dffb64 00000003 nvwgf2um!OpenAdapter12+0x30fb88
  03dff9f4 657815dc 0d185828 0d17aeb8 00000001 nvwgf2um!OpenAdapter12+0x1cb10e
  03dffa6c 65781176 00000001 03dffb64 03dffb6c nvwgf2um!OpenAdapter12+0x187c6c
  03dffbbc 65763e24 6578141a 079d8268 65763e14 nvwgf2um!OpenAdapter12+0x187806
  03dffbdc 65763eaa 00000001 079d8258 0009001f nvwgf2um!OpenAdapter12+0x16a4b4
  03dffbf4 65759177 079d8268 0009001f 030b027f nvwgf2um!OpenAdapter12+0x16a53a
  03dffc20 65d0fa1d 09d4e098 01340140 09d4ddc8 nvwgf2um!OpenAdapter12+0x15f807
  03dffc40 65cef1cd 03dffc70 0470ab80 09f53e58 nvwgf2um!NVAPI_Thunk+0x170e7d
  03dffd10 656ed0e0 0cd2a9f8 00000000 09d4ddc8 nvwgf2um!NVAPI_Thunk+0x15062d
  03dffd24 65604359 09f53e58 0cd2a9f8 6b38cffc nvwgf2um!OpenAdapter12+0xf3770
  03dffd88 6575b7d3 09d28ef0 6b38cffc 09d27ca8 nvwgf2um!OpenAdapter12+0xa9e9
  03dffde8 6575b70b 09d27c50 03dffe10 65764d60 nvwgf2um!OpenAdapter12+0x161e63
  03dffdf4 65764d60 013416c0 09d27c50 00000000 nvwgf2um!OpenAdapter12+0x161d9b
  03dffe10 65764cc0 00000002 03dffe54 65f45f02 nvwgf2um!OpenAdapter12+0x16b3f0
  03dffe1c 65f45f02 047304d8 c5ef0232 00000000 nvwgf2um!OpenAdapter12+0x16b350
  03dffe54 65f4602a 00000000 03dffe6c 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  03dffe60 7629336a 05eba6b0 03dffeac 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  03dffe6c 772c92b2 05eba6b0 7695cbf9 00000000 kernel32!BaseThreadInitThunk+0x12
  03dffeac 772c9285 65f45fae 05eba6b0 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  03dffec4 00000000 65f45fae 05eba6b0 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    18  Id: f08.a24 Suspend: 0 Teb: fff82000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  03e9f840 6590c1ec 0d175898 08c0b850 00000000 nvwgf2um!OpenAdapter12+0x311e01
  03e9f85c 6590984f 00000620 0d16af20 0d175898 nvwgf2um!OpenAdapter12+0x31287c
  03e9f8dc 657bec90 0d175898 03e9fa64 00000001 nvwgf2um!OpenAdapter12+0x30fedf
  03e9f8f4 657815dc 0d175898 0d16af20 00000001 nvwgf2um!OpenAdapter12+0x1c5320
  03e9f96c 65781176 00000001 03e9fa64 03e9fa6c nvwgf2um!OpenAdapter12+0x187c6c
  03e9fabc 65763e24 6578141a 0a5e9cf0 65763e14 nvwgf2um!OpenAdapter12+0x187806
  03e9fadc 65763eaa 00000001 0a5e9ce0 0009001f nvwgf2um!OpenAdapter12+0x16a4b4
  03e9faf4 65759177 0a5e9cf0 0009001f 030b027f nvwgf2um!OpenAdapter12+0x16a53a
  03e9fb20 65d0fa1d 00000000 0ceeb780 0ceeb780 nvwgf2um!OpenAdapter12+0x15f807
  03e9fb40 65cf0acb 03e9fcb0 0470ab80 08b48918 nvwgf2um!NVAPI_Thunk+0x170e7d
  03e9fd50 65cf1876 088cff90 0ceeb780 0470ab80 nvwgf2um!NVAPI_Thunk+0x151f2b
  03e9fd68 656ed0e0 088cff90 00000000 0ceeb780 nvwgf2um!NVAPI_Thunk+0x152cd6
  03e9fd7c 65604359 08b48918 088cff90 6b38d03c nvwgf2um!OpenAdapter12+0xf3770
  03e9fde0 6575b7d3 09d2a1f8 6b38d03c 09d28fb0 nvwgf2um!OpenAdapter12+0xa9e9
  03e9fe44 6575b70b 09d28f58 03e9fe6c 65764d60 nvwgf2um!OpenAdapter12+0x161e63
  03e9fe50 65764d60 013416c0 09d28f58 00000000 nvwgf2um!OpenAdapter12+0x161d9b
  03e9fe6c 65764cc0 00000003 03e9feb0 65f45f02 nvwgf2um!OpenAdapter12+0x16b3f0
  03e9fe78 65f45f02 047304e0 c5d902d6 00000000 nvwgf2um!OpenAdapter12+0x16b350
  03e9feb0 65f4602a 00000000 03e9fec8 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  03e9febc 7629336a 05eb8870 03e9ff08 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  03e9fec8 772c92b2 05eb8870 76a3ca5d 00000000 kernel32!BaseThreadInitThunk+0x12
  03e9ff08 772c9285 65f45fae 05eb8870 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  03e9ff20 00000000 65f45fae 05eb8870 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    19  Id: f08.74c Suspend: 0 Teb: fff7f000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  030cfc80 76291194 0000033c ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  030cfc98 76291148 0000033c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  030cfcac 65764d9a 0000033c ffffffff 00000000 kernel32!WaitForSingleObject+0x12
  030cfcc8 65764cc0 00000000 030cfd0c 65f45f02 nvwgf2um!OpenAdapter12+0x16b42a
  030cfcd4 65f45f02 01347808 c53c016a 00000000 nvwgf2um!OpenAdapter12+0x16b350
  030cfd0c 65f4602a 00000000 030cfd24 7629336a nvwgf2um!NVAPI_Thunk+0x3a7362
  030cfd18 7629336a 05eb8870 030cfd64 772c92b2 nvwgf2um!NVAPI_Thunk+0x3a748a
  030cfd24 772c92b2 05eb8870 7646c831 00000000 kernel32!BaseThreadInitThunk+0x12
  030cfd64 772c9285 65f45fae 05eb8870 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  030cfd7c 00000000 65f45fae 05eb8870 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    20  Id: f08.ba0 Suspend: 0 Teb: fff7c000 Unfrozen
  *** WARNING: Unable to verify checksum for MFPlat.DLL
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for MFPlat.DLL - 
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0a98fac8 675e3978 00000474 0a98fb00 0a98faf4 ntdll!NtRemoveIoCompletion+0x15
  0a98faf8 675e1b11 0a98fbfc 00000000 00000000 MFPlat!MFLockPlatform+0x28d
  0a98fc00 675e8ce7 0a98fc40 76a21287 08b753c8 MFPlat+0x1b11
  *** WARNING: Unable to verify checksum for msvcrt.dll
  *** ERROR: Symbol file could not be found.  Defaulted to export symbols for msvcrt.dll - 
  0a98fc08 76a21287 08b753c8 ca21a4ba 00000000 MFPlat!FormatTagFromWfx+0xf3
  0a98fc40 76a21328 0a98fc54 7629336a 02de2ab0 msvcrt!itow_s+0x4c
  0a98fc48 7629336a 02de2ab0 0a98fc94 772c92b2 msvcrt!endthreadex+0x6c
  0a98fc54 772c92b2 02de2ab0 7fd2c9c1 00000000 kernel32!BaseThreadInitThunk+0x12
  0a98fc94 772c9285 76a212e5 02de2ab0 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0a98fcac 00000000 76a212e5 02de2ab0 00000000 ntdll!RtlInitializeExceptionChain+0x36
  
    21  Id: f08.5dc Suspend: 0 Teb: fff79000 Unfrozen
  ChildEBP RetAddr  Args to Child              
  WARNING: Stack unwind information not available. Following frames may be wrong.
  0923fd5c 76291194 00000490 ffffffff 00000000 ntdll!ZwWaitForSingleObject+0x15
  0923fd74 675e2094 00000490 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0x43
  0923fd98 675e1fe1 08fdd4e0 0923fdd8 00000000 MFPlat!MFHeapFree+0x1d4
  0923fde0 675e1b11 0923fee4 00000000 00000000 MFPlat!MFHeapFree+0x121
  0923fee8 675e8ce7 0923ff28 76a21287 08b75428 MFPlat+0x1b11
  0923fef0 76a21287 08b75428 c99aa7d2 00000000 MFPlat!FormatTagFromWfx+0xf3
  0923ff28 76a21328 0923ff3c 7629336a 02de2ab0 msvcrt!itow_s+0x4c
  0923ff30 7629336a 02de2ab0 0923ff7c 772c92b2 msvcrt!endthreadex+0x6c
  0923ff3c 772c92b2 02de2ab0 7c69ca29 00000000 kernel32!BaseThreadInitThunk+0x12
  0923ff7c 772c9285 76a212e5 02de2ab0 ffffffff ntdll!RtlInitializeExceptionChain+0x63
  0923ff94 00000000 76a212e5 02de2ab0 00000000 ntdll!RtlInitializeExceptionChain+0x36
 
Cc: a...@chromium.org
What looked suspicious was seeing the NativeStackSamplerMac was sampling at the time that the watchdog triggered.  However this doesn't look consistent across the hangs.

https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.gpu%2FMac_Release__Intel_%2F64401%2F%2B%2Frecipes%2Fsteps%2Fwebgl_conformance_tests_on_Intel_GPU_on_Mac_on_Mac-10.12%2F0%2Fstdout

Thread 3 (crashed)
 0  Chromium Framework!__ZN3gpu17GpuWatchdogThread38DeliberatelyTerminateToRecoverFromHangEv + 0x11f
 1  Chromium Framework!__ZN3gpu17GpuWatchdogThread14OnCheckTimeoutEv + 0x99
 2  Chromium Framework!__ZN4base8internal7InvokerINS0_9BindStateIMN3gpu17GpuWatchdogThreadEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE + 0xb4
 3  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x136

Thread 0
 0  libsystem_kernel.dylib + 0x1234a
 1  CoreFoundation + 0x884e4
 2  CoreFoundation + 0x87971
 3  CoreFoundation + 0x871c4
 4  Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f
 5  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x77
 6  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0x122
 7  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x97
 8  Chromium Framework!__ZN7content7GpuMainERKNS_18MainFunctionParamsE + 0x417
 9  Chromium Framework!__ZN7content21ContentMainRunnerImpl3RunEv + 0x415

Thread 2
 0  dyld + 0x168ea
 1  dyld + 0xd6e0
 2  libdyld.dylib + 0x2790
 3  Chromium Framework!__ZN4base12_GLOBAL__N_120WalkStackFromContextIZNS0_21NativeStackSamplerMac27SuspendThreadAndRecordStackEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleEE3$_0EEbP13unw_context_tPmRKT_ + 0xeb
 4  Chromium Framework!__ZN4base12_GLOBAL__N_121NativeStackSamplerMac17RecordStackSampleEPNS_18NativeStackSampler11StackBufferEPNS_21StackSamplingProfiler6SampleE + 0x77a
 5  Chromium Framework!__ZN4base21StackSamplingProfiler14SamplingThread12RecordSampleEPNS1_17CollectionContextE + 0x43f

Comment 79 by a...@chromium.org, May 19 2017

OK, that's very bad. Let me look into it; the thread suspension was written to be careful to not use anything that isn't already linked in but we may have failed.

Comment 80 by a...@chromium.org, May 19 2017

Cc: wittman@chromium.org

Comment 81 by a...@chromium.org, May 19 2017

What's super concerning is this:

 1  dyld + 0xd6e0
 2  libdyld.dylib + 0x2790
 3  Chromium Framework!base::WalkStackFromContext() + 0xeb

Which makes it look like we're making a call to a function that isn't linked in, and then deadlocking when hitting a suspended main thread. But what call?

The offset is 0xeb. This is the release bot so I did a release build of base/, Hoppered it, and looked at that offset:

0000000000073979 4C8B7808                        mov        r15, qword [ds:rax+8]
000000000007397d 488B5810                        mov        rbx, qword [ds:rax+0x10]
0000000000073981 48897D80                        mov        qword [ss:rbp+var_80], rdi
0000000000073985 4C89E6                          mov        rsi, r12            ; argument "info" for method imp___stubs__dladdr
0000000000073988 E84F860800                      call       imp___stubs__dladdr  // ***
000000000007398d 85C0                            test       eax, eax
000000000007398f 745F                            je         0x739f0

The offset isn't perfect, but it's likely dladdr. Which makes sense from the forensic point-of-view. The stack trace, again, was:

 1  dyld + 0xd6e0
 2  libdyld.dylib + 0x2790
 3  Chromium Framework!base::WalkStackFromContext() + 0xeb

So WalkStackFromContext() calls dladdr, which is libdyld.dylib, which isn't linked, so we call into dyld, the linker.

What doesn't make sense here is the flow. I'm pretty sure that we aren't calling that inside of the thread suspension. I have some thoughts here but nothing solid; I'm still looking at this.
I logged into the bot that produced the data from #78 to symbolize those dyld symbols:

Thread 2:
ImageLoaderMachO::findClosestSymbol(mach_header const*, void const*, void const**) (in dyld) + 386
dladdr (in dyld) + 133
dladdr (in libdyld.dylib) + 72

Thread 0:
mach_msg_trap (in libsystem_kernel.dylib) + 10
__CFRunLoopServiceMachPort (in CoreFoundation) + 212
__CFRunLoopRun (in CoreFoundation) + 1361
CFRunLoopRunSpecific (in CoreFoundation) + 420

All other threads are also waiting in their proper idle state.

Comment 83 by a...@chromium.org, May 19 2017

The main thread doesn't look like it's suspended holding a lock. It looks like it's waiting for the kernel.

What's weird here is that:
1. Thread 2, the stack sampler thread, is exploding deep in dyld, the linker.
2. Thread 0 doesn't look like it's suspended, but is still failing the watchdog thread.

Comment 84 by a...@chromium.org, May 19 2017

Two things before I finish for today.

First, in comment 78:
"What looked suspicious was seeing the NativeStackSamplerMac was sampling at the time that the watchdog triggered.  However this doesn't look consistent across the hangs."

Note that the sampler is running periodically all the time, so it's possible that it can end up in crash dumps simply by chance. I'm not claiming that it's not the stack sampler; I'm still going to investigate, but I would recommend that other people also keep investigating.

A second note: The stack sampler is going to be turned off (https://chromium-review.googlesource.com/c/508414/) because of suspicions that it is regressing the runtime of some test suites. This is an opportunity to watch these hangs. If the hangs disappear, then that's evidence for me to dig harder to figure out what's up with the stack sampler. If the hangs continue, though, then it's strong evidence the other way.

Comment 85 by kbr@chromium.org, May 24 2017

Note: the minidump from this failure:
https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win_optional_gpu_tests_rel/10429

is available here:
gs://chrome-telemetry-output/minidump-2017-05-18_14-43-45-122205.dmp

Comment 86 by a...@chromium.org, May 24 2017

Note that the stack sampler on the Mac is turned off. If you're still getting gpu hangs, it's not the stack sampler.
+maruel for question on Swarming.  Please see below.

I've been trying to debug these Gpu watchdog timeouts by mass re-running one of the tasks that kbr@ listed in comment #75, triggering up to 100 simultaneous tasks.

Here are 1396 runs of the job: https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=user&c=bot&c=completed_ts&et=1495584420000&f=reason%3Abisect&l=50&n=true&s=completed_ts%3Adesc&st=1495143540000

What's interesting is there were 1200+ successful job runs, and then 22 all failed at once within 2 minutes of each-other, on different bots.  This observation would point to something like a network effect, affecting multiple bots.

Talking with kbr@, logging output from the GPU process comes to mind, as the failing test spams a lot of logs.  Is it possible the Swarming log service could put back-pressure on the GPU process?


All failures were in test:

 third_party/webgl/src/sdk/tests/conformance/attribs/gl-vertexattribpointer.html


Here are some of the failing runs:

 https://chromium-swarm.appspot.com/task?id=365065a590df4e10&refresh=10&show_raw=1
 https://chromium-swarm.appspot.com/task?id=3650662580940510&refresh=10&show_raw=1
 https://chromium-swarm.appspot.com/task?id=365065f11f709c10&refresh=10&show_raw=1

Tail of system log:

  May 23 16:15:32 build614-m4 diagnosticd[213]: Unexpected xpc type [<double: 0x7ffacac02970>: 9223372036855179264.000000]
  May 23 16:15:32 build614-m4 diagnosticd[213]: Unexpected xpc type [<double: 0x7ffacae04d70>: 9223372036855179264.000000]
  May 23 16:15:32 build614-m4 diagnosticd[213]: Unexpected xpc type [<double: 0x7ffacac02990>: 9223372036855179264.000000]
  May 23 16:15:34 build614-m4 xcodebuild[40698]: BUG in libdispatch client: kevent[EVFILT_READ] monitored resource vanished before the source cancel handler was invoked
  May 23 16:15:34 build614-m4 com.apple.xpc.launchd[1] (com.apple.iphonesimulator.968[40700]): Service exited due to signal: Terminated: 15 sent by killall[40955]
  May 23 16:16:04 build614-m4 com.apple.usbmuxd[39644]: stopping.
  May 23 16:16:04 build614-m4 xpcproxy[40993]: libcoreservices: _dirhelper_userdir: 523: bootstrap_look_up returned 268435459
  May 23 16:16:04 build614-m4 com.apple.usbmuxd[40993]: log filter changed from 4 to 6
  May 23 16:16:04 build614-m4 com.apple.usbmuxd[40993]: usbmuxd-374.70 on Aug  1 2016 at 12:50:26, running 64 bit
  May 23 16:16:13 build614-m4 xpcproxy[41015]: libcoreservices: _dirhelper_userdir: 523: bootstrap_look_up returned 268435459
  May 23 16:16:34 build614-m4 com.apple.xpc.launchd[1] (com.apple.quicklook[41028]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.quicklook


Also avi@, all of these had the StackSamplingProfiler running at time of hang.  This task still had the stack sampler enabled.  However due to the coincidental timing on multiple bots, this may just be a symptom rather than a cause of the timeouts.
Cc: mar...@chromium.org
+maruel, please see comment #87.
Cc: friedman@chromium.org vadimsh@chromium.org
This is very interesting.

Swarming doesn't do anything in the background.

But puppet does. +Elliot and Vadim to know if they can correlate with their puppet schedule. Keep in mind this is likely totally red herring.
Puppet run schedule is not correlated between bots. It is deliberately randomized. I've confirmed that a bunch of bots that fail tests in #87 ran puppet at different times: http://shortn/_9wLjTv6hvb

Comment 91 by kbr@chromium.org, May 24 2017

Blockedon: 628464
The failing test (gl-vertexattribpointer) has about 100KB of GPU ERROR output to STDOUT.  My theory is that if STDOUT buffers fill up, the GPU threads can stall and cause watchdog timeouts.

I wrote a quick test script that adds long delays to reading the STDOUT of testing/scripts/run_gpu_integration_test_as_googletest.py and this causes the 'gl-vertexattribpointer' test to stall mid-way.

maruel@ could you give some pointers to how stdout logging is happening on the bots?  Swarming task pages show live log updates, so I'm wondering where are we reading the PIPE and if that could stall due to slow network?

Discussing last night with kbr@, he is thinking to change our logging so that we buffer browser log output in the test harness, and isolate browser execution from possible downstream PIPE back-pressure.



Possibly more red herrings, but in past experiments we noticed that adding or removing logging would make these timeout issues go away or become harder to reproduce.  Also enabling ASAN made the issue go away.  When running with ASAN, STDOUT gets piped through sanitizer's symbolizer which could affect buffering.
The Swarming bot reads the pipes as fast as it can and streams packets over HTTPS on a specific basis as decided at:
https://github.com/luci/luci-py/blob/master/appengine/swarming/swarming_bot/bot_code/task_runner.py#L323

So yes there's a magic 100kb number there, which may be related with what you are seeing.
https://github.com/luci/luci-py/blob/master/appengine/swarming/swarming_bot/bot_code/task_runner.py#L41

When this happens, stdout is effectively blocked for a small amount of time if it is not buffered. The kernel may buffer a bit, like 4Kb or so, I forget the details.

And yes, intermediary python scripts will likely affect buffering.

Comment 94 by piman@chromium.org, May 24 2017

@#92: that is interesting and a bit unexpected. GPU process logging is sent via IPCs, either via GpuService::RecordLogMessage for LOG, or via GpuCommandBufferStub::SendConsoleMessage for console messages - e.g. GL errors or perf/correctness warnings - either of which would hop to the IO thread. If the reader (browser or renderer) is stalled, I could see causing IPC queuing delays on the GPU IO thread, which is certainly bad in itself, however it shouldn't block the GPU main thread and trigger the watchdog - we shouldn't be blocking on the IO thread since we don't send sync messages from the GPU process.

I'm curious to see if we could get a trace of this behavior.
#94: That's interesting.

Setting up our webgl conformance tests to neglect reading the browser's pipe entirely with subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=env) and then never reading from the pipe, I get a stall exactly in test [10/859] gl-vertexattribpointer.  It just happens to be the first test that fills the output buffers.

The browser stalls somewhere, however I haven't seen that turn into a GPU watchdog timeout.

Comment 96 by piman@chromium.org, May 24 2017

So local debugging and investigation showed that #94 is incorrect, we do log from the GPU process for every message that is sent to the console (which is also logged from the browser process on the UI thread). So if the stdout/stderr pipe is full, we may block the UI thread as well as the GPU main thread (or any other thread that logs), possibly triggering the watchdog.

So the solution would be to either:
1- log less (e.g. it's probably not necessary to log GL errors twice, and maybe not at all if the test passes)
2- add buffering in the harness to isolate the tests from this.

or por qué no los dos.
We also log right before the GPU watchdog crashes, so that may explain the callstacks - it only crashes after the pipe is unblocked, so at that point the main thread could finish logging and return to the message loop before the crash actually happens.
Project Member

Comment 98 by bugdroid1@chromium.org, May 26 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d5e5040512747a4654b34c6ebaf0a301fb755f43

commit d5e5040512747a4654b34c6ebaf0a301fb755f43
Author: jbauman <jbauman@chromium.org>
Date: Fri May 26 02:06:48 2017

Don't log to stderr during GPU process watchdog timeout

It's possible stderr's buffer is full and that's causing the main thread
to hang, so just attempt to send the error message to the browser
process instead of possibly blocking as well.

BUG= 609252 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel

Review-Url: https://codereview.chromium.org/2902383003
Cr-Commit-Position: refs/heads/master@{#474878}

[modify] https://crrev.com/d5e5040512747a4654b34c6ebaf0a301fb755f43/gpu/ipc/service/gpu_watchdog_thread.cc

Comment 99 by kbr@chromium.org, Sep 20 2017

Owner: jbau...@chromium.org
Status: Fixed (was: Started)
Thanks to the excellent analysis above, this issue seems to be fixed. There are no flaky failures in the last 200 builds on these Windows and Mac machines, which are representative of the configurations where the problem used to be seen:

https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Win7%20Release%20%28NVIDIA%29/?limit=200
https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Release%20%28Intel%29/?limit=200

Since John landed the fix to the logging, giving this to him. Thanks John.

Sign in to add a comment