Intermittent GPU process crashes in QuartzCore / CoreAnimation |
|||||||||
Issue descriptionIntermittent GPU process crashes are being observed inside QuartzCore / CoreAnimation that are impacting the stability of the WebGL conformance tests in Issue 619264 . One is documented here: https://bugs.chromium.org/p/chromium/issues/detail?id=619264#c115 Another was seen here: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/4638 https://chromium-swarm.appspot.com/user/task/324a3fcc3934be10 The failing test was: WebglConformance_deqp_functional_gles3_texturefiltering_3d_combinations_03 The log excerpt's attached but here's the stack trace of the crashing thread (the main thread in the GPU process, in this case): Thread 0 (crashed) 0 QuartzCore + 0x25172 1 QuartzCore + 0x2501c 2 QuartzCore + 0x1a1a8 3 QuartzCore + 0x10177 4 Chromium Framework!__ZN2ui22CALayerTreeCoordinator22CommitPendingTreesToCAERKN3gfx4RectEPb + 0x2c6 5 Chromium Framework!__ZN3gpu31ImageTransportSurfaceOverlayMac19SwapBuffersInternalERKN3gfx4RectE + 0x2af 6 Chromium Framework!__ZN3gpu31ImageTransportSurfaceOverlayMac13PostSubBufferEiiii + 0x65 7 Chromium Framework!__ZN3gpu5gles216GLES2DecoderImpl27HandlePostSubBufferCHROMIUMEjPVKv + 0x3e6 8 Chromium Framework!__ZN3gpu5gles216GLES2DecoderImpl14DoCommandsImplILb0EEENS_5error5ErrorEjPVKviPi + 0xf0 9 Chromium Framework!__ZN3gpu13CommandParser15ProcessCommandsEi + 0x35 10 Chromium Framework!__ZN3gpu15CommandExecutor10PutChangedEv + 0x119 11 Chromium Framework!__ZN3gpu20CommandBufferService5FlushEi + 0x7f 12 Chromium Framework!__ZN3gpu20GpuCommandBufferStub12OnAsyncFlushEijRKNSt3__16vectorIN2ui11LatencyInfoENS1_9allocatorIS4_EEEE + 0x143 13 Chromium Framework!__ZN3IPC8MessageTI35GpuCommandBufferMsg_AsyncFlush_MetaNSt3__15tupleIJijNS2_6vectorIN2ui11LatencyInfoENS2_9allocatorIS6_EEEEEEEvE8DispatchIN3gpu20GpuCommandBufferStubESE_vMSE_FvijRKS9_EEEbPKNS_7MessageEPT_PT0_PT1_T2_ + 0x7b 14 Chromium Framework!__ZN3gpu20GpuCommandBufferStub17OnMessageReceivedERKN3IPC7MessageE + 0x550 15 Chromium Framework!__ZN3IPC13MessageRouter12RouteMessageERKNS_7MessageE + 0x92 16 Chromium Framework!__ZN3gpu10GpuChannel19HandleMessageHelperERKN3IPC7MessageE + 0xa1 17 Chromium Framework!__ZN3gpu10GpuChannel13HandleMessageERK13scoped_refptrINS_22GpuChannelMessageQueueEE + 0x2cb 18 Chromium Framework!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0xd9 19 Chromium Framework!__ZN4base11MessageLoop7RunTaskEPNS_11PendingTaskE + 0x22b 20 Chromium Framework!__ZN4base11MessageLoop21DeferOrRunPendingTaskENS_11PendingTaskE + 0x2c 21 Chromium Framework!__ZN4base11MessageLoop6DoWorkEv + 0x143 22 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase7RunWorkEv + 0x37 23 Chromium Framework!__ZN4base3mac15CallWithEHFrameEU13block_pointerFvvE + 0xa 24 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase13RunWorkSourceEPv + 0x44 25 CoreFoundation + 0xaa881 26 CoreFoundation + 0x89fbc 27 CoreFoundation + 0x894df 28 CoreFoundation + 0x88ed8 29 Chromium Framework!__ZN4base20MessagePumpCFRunLoop5DoRunEPNS_11MessagePump8DelegateE + 0x4f 30 Chromium Framework!__ZN4base24MessagePumpCFRunLoopBase3RunEPNS_11MessagePump8DelegateE + 0x77 31 Chromium Framework!__ZN4base11MessageLoop10RunHandlerEv + 0x162 32 Chromium Framework!__ZN4base7RunLoop3RunEv + 0x33 33 Chromium Framework!__ZN7content7GpuMainERKNS_18MainFunctionParamsE + 0x4df 34 Chromium Framework!__ZN7content21ContentMainRunnerImpl3RunEv + 0x25f 35 Chromium Framework!__ZN7content11ContentMainERKNS_17ContentMainParamsE + 0x36 36 Chromium Framework!_ChromeMain + 0x3c 37 Chromium Helper!_main + 0x20a 38 libdyld.dylib + 0x35ad Is there any possibility Chromium's doing something thread-unsafe with the Core Animation library? This is pretty intermittent, but is there anything actionable we could file a Radar about with Apple? Unfortunately these machines aren't running the latest and greatest OS (only 10.11.6) but we could consider an upgrade. The flakiness is really unfortunate because it randomly affects all of the tests. Marking them all flaky on macOS would allow worse flakiness to be introduced into the product.
,
Dec 2 2016
Also happens on Mac 10.11 Retina Release (AMD) https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.11%20Retina%20Release%20%28AMD%29/builds/998 WebglConformance_deqp_data_gles3_shaders_constants https://build.chromium.org/p/chromium.gpu.fyi/builders/Mac%2010.11%20Retina%20Release%20%28AMD%29/builds/1004 WebglConformance_deqp_functional_gles3_framebufferblit_conversion_26
,
Dec 2 2016
Thanks for triaging these Yuly. It would be better to mark the textureshadow and texturefiltering tests as flaky. Skipping them on a particular configuration (and Mac AMD is a major one for WebGL 2.0) will allow major regressions to slip in. The DCHECK(produceSyncToken.HasData()) is happening because the GPU process is crashing. That honestly shouldn't trigger a DCHECK but that needs to be filed separately with a good stack trace. The relationship with the WebglConformance_conformance_ogles_GL_radians_radians_001_to_006 failure is interesting. Maybe that one should be marked flaky too. Basically marking the minimal set of flaky tests would still allow good test coverage while minimizing the possibility of new flakiness being introduced.
,
Dec 20 2016
Another incidence of this: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel/builds/5859 https://chromium-swarm.appspot.com/task?id=3333f8b5b8f1cf10&refresh=10&show_raw=1 The stack trace clearly points to this bug. See https://bugs.chromium.org/p/chromium/issues/detail?id=671049#c7 .
,
Dec 20 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/08396cf149fecb64a53dc07bf6f9380e66886d93 commit 08396cf149fecb64a53dc07bf6f9380e66886d93 Author: kbr <kbr@chromium.org> Date: Tue Dec 20 08:11:57 2016 Mark dEQP texturefiltering and textureshadow tests flaky on Mac AMD. BUG=663601 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel NOTRY=true TBR=zmo@chromium.org Review-Url: https://codereview.chromium.org/2594583002 Cr-Commit-Position: refs/heads/master@{#439746} [modify] https://crrev.com/08396cf149fecb64a53dc07bf6f9380e66886d93/content/test/gpu/gpu_tests/webgl2_conformance_expectations.py
,
Jan 6 2017
,
Jan 12 2017
,
Jan 12 2017
,
Feb 4 2017
,
Feb 21 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Feb 21 2018
We should try un-marking these as flaky now that the machines are running much newer OSs.
,
Oct 1
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by ynovikov@chromium.org
, Dec 1 201656.7 KB
56.7 KB View Download
82.1 KB
82.1 KB View Download
46.8 KB
46.8 KB View Download
52.9 KB
52.9 KB View Download
45.4 KB
45.4 KB View Download
87.0 KB
87.0 KB View Download
48.6 KB
48.6 KB View Download