New issue
Advanced search Search tips

Issue 663601 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocked on:
issue 619264
issue 671049
issue 679058

Blocking:
issue 648318



Sign in to add a comment

Intermittent GPU process crashes in QuartzCore / CoreAnimation

Project Member Reported by kbr@chromium.org, Nov 9 2016

Issue description

Intermittent GPU process crashes are being observed inside QuartzCore / CoreAnimation that are impacting the stability of the WebGL conformance tests in  Issue 619264 . One is documented here:

https://bugs.chromium.org/p/chromium/issues/detail?id=619264#c115

Another was seen here:
https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/4638
https://chromium-swarm.appspot.com/user/task/324a3fcc3934be10

The failing test was:
WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_03

The log excerpt's attached but here's the stack trace of the crashing thread (the main thread in the GPU process, in this case):

Thread 0 (crashed)
 0  QuartzCore + 0x25172
 1  QuartzCore + 0x2501c
 2  QuartzCore + 0x1a1a8
 3  QuartzCore + 0x10177
 4  Chromium Framework!__ZN2ui22CALayerTreeCoordinator22CommitPendingTreesToCAERKN3gfx4RectEPb + 0x2c6
 5  Chromium Framework!__ZN3gpu31ImageTransportSurfaceOverlayMac19SwapBuffersInternalERKN3gfx4RectE + 0x2af
 6  Chromium Framework!__ZN3gpu31ImageTransportSurfaceOverlayMac13PostSubBufferEiiii + 0x65
 7  Chromium Framework!__ZN3gpu5gles216GLES2DecoderImpl27HandlePostSubBufferCHROMIUMEjPVKv + 0x3e6
 8  Chromium Framework!__ZN3gpu5gles216GLES2DecoderImpl14DoCommandsImplILb0EEENS_5error5ErrorEjPVKviPi + 0xf0
 9  Chromium Framework!__ZN3gpu13CommandParser15ProcessCommandsEi + 0x35
10  Chromium Framework!__ZN3gpu15CommandExecutor10PutChangedEv + 0x119
11  Chromium Framework!__ZN3gpu20CommandBufferService5FlushEi + 0x7f
12  Chromium Framework!__ZN3gpu20GpuCommandBufferStub12OnAsyncFlushEijRKNSt3__16vectorIN2ui11LatencyInfoENS1_9allocatorIS4_EEEE + 0x143
13  Chromium Framework!__ZN3IPC8MessageTI35GpuCommandBufferMsg_AsyncFlush_MetaNSt3__15tupleIJijNS2_6vectorIN2ui11LatencyInfoENS2_9allocatorIS6_EEEEEEEvE8DispatchIN3gpu20GpuCommandBufferStubESE_vMSE_FvijRKS9_EEEbPKNS_7MessageEPT_PT0_PT1_T2_ + 0x7b
14  Chromium Framework!__ZN3gpu20GpuCommandBufferStub17OnMessageReceivedERKN3IPC7MessageE + 0x550
15  Chromium Framework!__ZN3IPC13MessageRouter12RouteMessageERKNS_7MessageE + 0x92
16  Chromium Framework!__ZN3gpu10GpuChannel19HandleMessageHelperERKN3IPC7MessageE + 0xa1
17  Chromium Framework!__ZN3gpu10GpuChannel13HandleMessageERK13scoped_refptrINS_22GpuChannelMessageQueueEE + 0x2cb
18  Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0xd9
19  Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x22b
20  Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0x2c
21  Chromium Framework!__ZN4base11MessageLoop6DoWorkEv + 0x143
22  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x37
23  Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa
24  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x44
25  CoreFoundation + 0xaa881
26  CoreFoundation + 0x89fbc
27  CoreFoundation + 0x894df
28  CoreFoundation + 0x88ed8
29  Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f
30  Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x77
31  Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0x162
32  Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33
33  Chromium Framework!__ZN7content7GpuMainERKNS_18MainFunctionParamsE + 0x4df
34  Chromium Framework!__ZN7content21ContentMainRunnerImpl3RunEv + 0x25f
35  Chromium Framework!__ZN7content11ContentMainERKNS_17ContentMainParamsE + 0x36
36  Chromium Framework!_ChromeMain + 0x3c
37  Chromium Helper!_main + 0x20a
38  libdyld.dylib + 0x35ad

Is there any possibility Chromium's doing something thread-unsafe with the Core Animation library? This is pretty intermittent, but is there anything actionable we could file a Radar about with Apple? Unfortunately these machines aren't running the latest and greatest OS (only 10.11.6) but we could consider an upgrade.

The flakiness is really unfortunate because it randomly affects all of the tests. Marking them all flaky on macOS would allow worse flakiness to be introduced into the product.

 
stack.txt
83.9 KB View Download
Seeing this on Mac 10.10 Retina Release (AMD) GPU.FYI bot webgl2_conformance_tests.

https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11132 WebglConformance_deqp_functional_gles3_textureshadow_2d_array_nearest_mipmap_nearest_greater
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11130 WebglConformance_deqp_functional_gles3_textureshadow_cube_linear_never
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11105 WebglConformance_deqp_functional_gles3_texturefiltering_2d_formats_04
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11036 WebglConformance_conformance_ogles_GL_radians_radians_001_to_006
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11026 WebglConformance_deqp_functional_gles3_textureshadow_2d_linear_greater
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/11000 WebglConformance_deqp_functional_gles3_textureshadow_cube_nearest_less
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.10%20Retina%20Release%20%28AMD%29/builds/10956 WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_06

Seems to be focused in textureshadow / texturefiltering groups. Perhaps we should skip them on Mac AMD?
WebglConformance_conformance_ogles_GL_radians_radians_001_to_006 triggers a DCHECK(produceSyncToken.HasData()), which is probably a separate issue, but might give a hint why textureshadow / texturefiltering fail - perhaps due to some other crash, which is missing from the logs, but causes failure in QuartzCore in another process?

Logs attached.
10956.log
56.7 KB View Download
11000.log
82.1 KB View Download
11026.log
46.8 KB View Download
11036.log
52.9 KB View Download
11105.log
45.4 KB View Download
11130.log
87.0 KB View Download
11132.log
48.6 KB View Download
Also happens on Mac 10.11 Retina Release (AMD)
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.11%20Retina%20Release%20%28AMD%29/builds/998 WebglConformance_deqp_data_gles3_shaders_constants
https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.11%20Retina%20Release%20%28AMD%29/builds/1004 WebglConformance_deqp_functional_gles3_framebufferblit_conversion_26
998.log
82.5 KB View Download
1004.log
48.0 KB View Download

Comment 3 by kbr@chromium.org, Dec 2 2016

Thanks for triaging these Yuly.

It would be better to mark the textureshadow and texturefiltering tests as flaky. Skipping them on a particular configuration (and Mac AMD is a major one for WebGL 2.0) will allow major regressions to slip in.

The DCHECK(produceSyncToken.HasData()) is happening because the GPU process is crashing. That honestly shouldn't trigger a DCHECK but that needs to be filed separately with a good stack trace.

The relationship with the WebglConformance_conformance_ogles_GL_radians_radians_001_to_006 failure is interesting. Maybe that one should be marked flaky too. Basically marking the minimal set of flaky tests would still allow good test coverage while minimizing the possibility of new flakiness being introduced.

Project Member

Comment 5 by bugdroid1@chromium.org, Dec 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/08396cf149fecb64a53dc07bf6f9380e66886d93

commit 08396cf149fecb64a53dc07bf6f9380e66886d93
Author: kbr <kbr@chromium.org>
Date: Tue Dec 20 08:11:57 2016

Mark dEQP texturefiltering and textureshadow tests flaky on Mac AMD.

BUG=663601
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel
NOTRY=true
TBR=zmo@chromium.org

Review-Url: https://codereview.chromium.org/2594583002
Cr-Commit-Position: refs/heads/master@{#439746}

[modify] https://crrev.com/08396cf149fecb64a53dc07bf6f9380e66886d93/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py

Comment 6 by kbr@chromium.org, Jan 6 2017

Blockedon: 679058

Comment 7 by kbr@chromium.org, Jan 12 2017

Blocking: -619264

Comment 8 by kbr@chromium.org, Jan 12 2017

Blockedon: 619264
Status: Available (was: Untriaged)
Project Member

Comment 10 by sheriffbot@chromium.org, Feb 21 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 11 by kbr@chromium.org, Feb 21 2018

Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)
We should try un-marking these as flaky now that the machines are running much newer OSs.

Blocking: 648318

Sign in to add a comment