"gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash" is flaky |
||||||||||||||||||||||
Issue description"gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyigELEgVGbGFrZSJ_Z3B1X3Rlc3RzLmNvbnRleHRfbG9zdF9pbnRlZ3JhdGlvbl90ZXN0LkNvbnRleHRMb3N0SW50ZWdyYXRpb25UZXN0LkdwdUNyYXNoX0dQVVByb2Nlc3NDcmFzaGVzRXhhY3RseU9uY2VQZXJWaXNpdFRvQWJvdXRHcHVDcmFzaAw. Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs This flaky test/step was previously tracked in issue 861956 .
,
Aug 28
Looking at https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng?limit=200 this is rejecting about 1 in 5 tryjobs, which is awful. This needs a resolution. (I'm checking in at 9pm since a dry run of one of my own cls was affected) Skimming issue 861956 .. these seem to have a unique mechanism for disabling for flakiness. markusheintz - maybe you want to give it a go?
,
Aug 28
Issue 878504 has been merged into this issue.
,
Aug 28
vikassoni@ noticed this also in Issue 878504 . Suppressing the flake in https://chromium-review.googlesource.com/1194334 . From Vikas' update on the other bug: most recent log - https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/96596 log when test first failed https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/96517 snapshot of stack trace : FATAL:compositor.cc(605)] Check failed: false. 0 Chromium Framework 0x000000011de777dc base::debug::StackTrace::StackTrace(unsigned long) + 28 1 Chromium Framework 0x000000011dd7baef logging::LogMessage::~LogMessage() + 223 2 Chromium Framework 0x000000012046c32c ui::Compositor::DidFailToInitializeLayerTreeFrameSink() + 76 3 Chromium Framework 0x000000011f84325f cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink() + 159 4 Chromium Framework 0x000000011f8a9910 cc::SingleThreadProxy::SetLayerTreeFrameSink(cc::LayerTreeFrameSink*) + 576 5 Chromium Framework 0x000000011f842de3 cc::LayerTreeHost::SetLayerTreeFrameSink(std::__1::unique_ptr<cc::LayerTreeFrameSink, std::__1::default_delete<cc::LayerTreeFrameSink> >) + 195 6 Chromium Framework 0x000000012046a2d3 ui::Compositor::SetLayerTreeFrameSink(std::__1::unique_ptr<cc::LayerTreeFrameSink, std::__1::default_delete<cc::LayerTreeFrameSink> >) + 51 7 Chromium Framework 0x000000011bc9277b ui::HostContextFactoryPrivate::ConfigureCompositor(base::WeakPtr<ui::Compositor>, scoped_refptr<viz::ContextProvider>, scoped_refptr<viz::RasterContextProvider>) + 1515 It's also flaking on the Mac AMD bots: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Retina%20Release%20%28AMD%29/38755 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Retina%20Release%20%28AMD%29/38754 There has been some regression in the compositor causing context loss to not be handled gracefully.
,
Aug 28
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4f1b8df318b0b58d7f2dd6d29f4a4f019191cc35 commit 4f1b8df318b0b58d7f2dd6d29f4a4f019191cc35 Author: Vikas Soni <vikassoni@chromium.org> Date: Tue Aug 28 20:59:40 2018 Mark a context_lost test flaky on Mac. GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash occasionally crashes in browser process due to some checks failing in compositor.cc No-Try: True Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Ideb74153dbf7772489042e5cf8ed0ae3d5dc2641 Reviewed-on: https://chromium-review.googlesource.com/1194334 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#586847} [modify] https://crrev.com/4f1b8df318b0b58d7f2dd6d29f4a4f019191cc35/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Aug 28
https://chromium-review.googlesource.com/1194334 has been merged so these flakes should be suppressed. Downgrading to P1 now, but the regression must still be tracked down and the root cause fixed ASAP (and the flaky test un-suppressed). flackr@, could you please triage and dispatch this bug as appropriate?
,
Aug 28
To clarify: this code changed some time in the past couple of days to make this test flaky. These are the two first failing builds on these bots: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/96517 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Retina%20Release%20%28AMD%29/38670 This test was 100% reliable before that.
,
Aug 29
Detected 24 new flakes for test/step "gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyigELEgVGbGFrZSJ_Z3B1X3Rlc3RzLmNvbnRleHRfbG9zdF9pbnRlZ3JhdGlvbl90ZXN0LkNvbnRleHRMb3N0SW50ZWdyYXRpb25UZXN0LkdwdUNyYXNoX0dQVVByb2Nlc3NDcmFzaGVzRXhhY3RseU9uY2VQZXJWaXNpdFRvQWJvdXRHcHVDcmFzaAw. This message was posted automatically by the chromium-try-flakes app.
,
Aug 29
Sheriff ping. Can we disable this test? It has been flaky for a while now & guideline says to disable flaky test within 30 minutes
,
Aug 29
Reassigning to ccameron@ for Mac. Looks like this flake is caused by Mac OOP-D.
,
Aug 29
The test hasn't flaked since https://chromium-review.googlesource.com/1194334 landed at r586847. The last reported flake according to chromium-try-flakes was https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/128812 which was run at r586841.
,
Sep 3
Removing from the Sheriff queue.
,
Sep 14
Saw this flake here today: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20(AMD)/8050
,
Sep 14
This sounds a lot like an issue danakj@ fixed in https://chromium-review.googlesource.com/1219992 . Examine the stack trace from the crash in the failing shard: Operating system: Mac OS X 10.13.6 17G65 CPU: amd64 family 6 model 70 stepping 1 8 CPUs GPU: UNKNOWN Crash reason: EXC_BREAKPOINT / EXC_I386_BPT Crash address: 0x103c2e014 Process uptime: 9 seconds Thread 0 (crashed) 0 libbase.dylib!__ZN4base5debug13BreakDebuggerEv + 0x14 rax = 0x0000000103ccba1c rdx = 0x00007ffc8d92da38 rcx = 0x0000000000000015 rbx = 0x00000001038dbd30 rsi = 0x000000000000025d rdi = 0x0000000103ccba1c rbp = 0x00007ffeeeab5dc0 rsp = 0x00007ffeeeab5dc0 r8 = 0x00007ffc8d92da4d r9 = 0x00000000000034b5 r10 = 0x00007ffc8b500000 r11 = 0x0000000103c2e000 r12 = 0x0000000800002e88 r13 = 0x0000000000000001 r14 = 0x00007ffc8b6233a0 r15 = 0x0000000000000000 rip = 0x0000000103c2e014 Found by: given as instruction pointer in context 1 libchrome_dll.dylib!__ZN7logging12_GLOBAL__N_126SilentRuntimeAssertHandlerEPKciN4base16BasicStringPieceINSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEEESC_ + 0x24 rbp = 0x00007ffeeeab5e00 rsp = 0x00007ffeeeab5dd0 rip = 0x000000010e665c04 Found by: previous frame's frame pointer 2 libchrome_dll.dylib!__ZN4base8internal13FunctorTraitsIPFvPKciNS_16BasicStringPieceINSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEEESC_EvE6InvokeIRKSE_JS3_iSC_SC_EEEvOT_DpOT0_ + 0xa1 rbp = 0x00007ffeeeab5e90 rsp = 0x00007ffeeeab5e10 rip = 0x000000010e665f41 Found by: previous frame's frame pointer 3 libchrome_dll.dylib!__ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKPFvPKciNS_16BasicStringPieceINSt3__112basic_stringIcNS7_11char_traitsIcEENS7_9allocatorIcEEEEEESE_EJS5_iSE_SE_EEEvOT_DpOT0_ + 0x5d rbp = 0x00007ffeeeab5ef0 rsp = 0x00007ffeeeab5ea0 rip = 0x000000010e665e8d Found by: previous frame's frame pointer 4 libchrome_dll.dylib!__ZN4base8internal7InvokerINS0_9BindStateIPFvPKciNS_16BasicStringPieceINSt3__112basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEEESD_EJEEESE_E7RunImplIRKSF_RKNS6_5tupleIJEEEJEEEvOT_OT0_NS6_16integer_sequenceImJXspT1_EEEEOS4_OiOSD_SX_ + 0x61 rbp = 0x00007ffeeeab5f60 rsp = 0x00007ffeeeab5f00 rip = 0x000000010e665e21 Found by: previous frame's frame pointer 5 libchrome_dll.dylib!__ZN4base8internal7InvokerINS0_9BindStateIPFvPKciNS_16BasicStringPieceINSt3__112basic_stringIcNS6_11char_traitsIcEENS6_9allocatorIcEEEEEESD_EJEEESE_E3RunEPNS0_13BindStateBaseES4_iOSD_SK_ + 0x84 rbp = 0x00007ffeeeab5fe0 rsp = 0x00007ffeeeab5f70 rip = 0x000000010e665d14 Found by: previous frame's frame pointer 6 libbase.dylib!__ZNKR4base17RepeatingCallbackIFvPKciNS_16BasicStringPieceINSt3__112basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEEESB_EE3RunES2_iSB_SB_ + 0x9a rbp = 0x00007ffeeeab6060 rsp = 0x00007ffeeeab5ff0 rip = 0x000000010387c9ea Found by: previous frame's frame pointer 7 libbase.dylib!__ZN7logging10LogMessageD2Ev + 0x152b rbp = 0x00007ffeeeab6fa0 rsp = 0x00007ffeeeab6070 rip = 0x000000010387c44b Found by: previous frame's frame pointer 8 libbase.dylib!__ZN7logging10LogMessageD1Ev + 0x15 rbp = 0x00007ffeeeab6fc0 rsp = 0x00007ffeeeab6fb0 rip = 0x0000000103878ce5 Found by: previous frame's frame pointer 9 libcompositor.dylib!__ZN2ui10Compositor37DidFailToInitializeLayerTreeFrameSinkEv + 0x73 rbp = 0x00007ffeeeab7110 rsp = 0x00007ffeeeab6fd0 rip = 0x0000000143465e03 Found by: previous frame's frame pointer 10 libcc.dylib!__ZN2cc13LayerTreeHost37DidFailToInitializeLayerTreeFrameSinkEv + 0x305 rbp = 0x00007ffeeeab7380 rsp = 0x00007ffeeeab7120 rip = 0x0000000133f0e095 Found by: previous frame's frame pointer 11 libcc.dylib!__ZN2cc17SingleThreadProxy21SetLayerTreeFrameSinkEPNS_18LayerTreeFrameSinkE + 0x324 rbp = 0x00007ffeeeab76a0 rsp = 0x00007ffeeeab7390 rip = 0x0000000134092664 Found by: previous frame's frame pointer 12 libcc.dylib!__ZN2cc13LayerTreeHost21SetLayerTreeFrameSinkENSt3__110unique_ptrINS_18LayerTreeFrameSinkENS1_14default_deleteIS3_EEEE + 0x4ef rbp = 0x00007ffeeeab7ad0 rsp = 0x00007ffeeeab76b0 rip = 0x0000000133f0d4ef Found by: previous frame's frame pointer 13 libcompositor.dylib!__ZN2ui10Compositor21SetLayerTreeFrameSinkENSt3__110unique_ptrIN2cc18LayerTreeFrameSinkENS1_14default_deleteIS4_EEEE + 0x222 rbp = 0x00007ffeeeab7cc0 rsp = 0x00007ffeeeab7ae0 rip = 0x000000014345fbe2 Found by: previous frame's frame pointer 14 libcontent.dylib!__ZN2ui25HostContextFactoryPrivate19ConfigureCompositorEN4base7WeakPtrINS_10CompositorEEE13scoped_refptrIN3viz15ContextProviderEES5_INS6_21RasterContextProviderEE + 0x19ea rbp = 0x00007ffeeeab8a30 rsp = 0x00007ffeeeab7cd0 rip = 0x0000000126b6f37a Found by: previous frame's frame pointer 15 libcontent.dylib!__ZN7content26VizProcessTransportFactory23OnEstablishedGpuChannelEN4base7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEE + 0x2a8 rbp = 0x00007ffeeeab8b30 rsp = 0x00007ffeeeab8a40 rip = 0x00000001260ae398 Found by: previous frame's frame pointer 16 libcontent.dylib!__ZN4base8internal13FunctorTraitsIMN7content26VizProcessTransportFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEvE6InvokeISD_NS4_IS3_EEJS7_SB_EEEvT_OT0_DpOT1_ + 0xd8 rbp = 0x00007ffeeeab8bd0 rsp = 0x00007ffeeeab8b40 rip = 0x00000001260b25d8 Found by: previous frame's frame pointer 17 libcontent.dylib!__ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIMN7content26VizProcessTransportFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEENS6_IS5_EEJS9_SD_EEEvOT_OT0_DpOT1_ + 0x85 rbp = 0x00007ffeeeab8c40 rsp = 0x00007ffeeeab8be0 rip = 0x00000001260b24c5 Found by: previous frame's frame pointer 18 libcontent.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN7content26VizProcessTransportFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEJNS5_IS4_EES8_EEEFvSC_EE7RunImplISE_NSt3__15tupleIJSF_S8_EEEJLm0ELm1EEEEvOT_OT0_NSK_16integer_sequenceImJXspT1_EEEEOSC_ + 0x8d rbp = 0x00007ffeeeab8cc0 rsp = 0x00007ffeeeab8c50 rip = 0x00000001260b242d Found by: previous frame's frame pointer 19 libcontent.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN7content26VizProcessTransportFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEJNS5_IS4_EES8_EEEFvSC_EE7RunOnceEPNS0_13BindStateBaseEOSC_ + 0x49 rbp = 0x00007ffeeeab8d10 rsp = 0x00007ffeeeab8cd0 rip = 0x00000001260b2329 Found by: previous frame's frame pointer 20 libcontent.dylib!__ZNO4base12OnceCallbackIFv13scoped_refptrIN3gpu14GpuChannelHostEEEE3RunES4_ + 0x6f rbp = 0x00007ffeeeab8d60 rsp = 0x00007ffeeeab8d20 rip = 0x0000000124f44c4f Found by: previous frame's frame pointer 21 libcontent.dylib!__ZN7content28BrowserGpuChannelHostFactory19EstablishGpuChannelEN4base12OnceCallbackIFv13scoped_refptrIN3gpu14GpuChannelHostEEEEE + 0x2e3 rbp = 0x00007ffeeeab9170 rsp = 0x00007ffeeeab8d70 rip = 0x0000000124f46363 Found by: previous frame's frame pointer 22 libcontent.dylib!__ZN7content26VizProcessTransportFactory24CreateLayerTreeFrameSinkEN4base7WeakPtrIN2ui10CompositorEEE + 0xf1 rbp = 0x00007ffeeeab91f0 rsp = 0x00007ffeeeab9180 rip = 0x00000001260ae091 Found by: previous frame's frame pointer 23 libcompositor.dylib!__ZN2ui10Compositor28RequestNewLayerTreeFrameSinkEv + 0x10e rbp = 0x00007ffeeeab9370 rsp = 0x00007ffeeeab9200 rip = 0x0000000143465d4e Found by: previous frame's frame pointer 24 libcc.dylib!__ZN2cc13LayerTreeHost28RequestNewLayerTreeFrameSinkEv + 0x1a rbp = 0x00007ffeeeab9390 rsp = 0x00007ffeeeab9380 rip = 0x0000000133f0da8a Found by: previous frame's frame pointer 25 libcc.dylib!__ZN2cc17SingleThreadProxy28RequestNewLayerTreeFrameSinkEv + 0xf9 rbp = 0x00007ffeeeab94f0 rsp = 0x00007ffeeeab93a0 rip = 0x0000000134092179 Found by: previous frame's frame pointer 26 libcc.dylib!__ZN4base8internal13FunctorTraitsIMN2cc17SingleThreadProxyEFvvEvE6InvokeIS5_RKNS_7WeakPtrIS3_EEJEEEvT_OT0_DpOT1_ + 0x7f rbp = 0x00007ffeeeab9540 rsp = 0x00007ffeeeab9500 rip = 0x000000013409ad7f Found by: previous frame's frame pointer 27 libcc.dylib!__ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN2cc17SingleThreadProxyEFvvERKNS_7WeakPtrIS5_EEJEEEvOT_OT0_DpOT1_ + 0x5a rbp = 0x00007ffeeeab9580 rsp = 0x00007ffeeeab9550 rip = 0x000000013409ac9a Found by: previous frame's frame pointer 28 libcc.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN2cc17SingleThreadProxyEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEE + 0x50 rbp = 0x00007ffeeeab95d0 rsp = 0x00007ffeeeab9590 rip = 0x000000013409ac30 Found by: previous frame's frame pointer 29 libcc.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN2cc17SingleThreadProxyEFvvEJNS_7WeakPtrIS4_EEEEEFvvEE3RunEPNS0_13BindStateBaseE + 0x2c rbp = 0x00007ffeeeab9600 rsp = 0x00007ffeeeab95e0 rip = 0x000000013409ab6c Found by: previous frame's frame pointer 30 libcc.dylib!__ZNKR4base17RepeatingCallbackIFvvEE3RunEv + 0x3d rbp = 0x00007ffeeeab9630 rsp = 0x00007ffeeeab9610 rip = 0x0000000133c98d2d Found by: previous frame's frame pointer 31 libcc.dylib!__ZN4base8internal22CancelableCallbackImplINS_17RepeatingCallbackIFvvEEEE16ForwardRepeatingIJEEEvDpT_ + 0x15 rbp = 0x00007ffeeeab9650 rsp = 0x00007ffeeeab9640 rip = 0x0000000133c98bc5 Found by: previous frame's frame pointer 32 libcc.dylib!__ZN4base8internal13FunctorTraitsIMNS0_22CancelableCallbackImplINS_17RepeatingCallbackIFvvEEEEEFvvEvE6InvokeIS8_RKNS_7WeakPtrIS6_EEJEEEvT_OT0_DpOT1_ + 0x7f rbp = 0x00007ffeeeab96a0 rsp = 0x00007ffeeeab9660 rip = 0x0000000133c98f7f Found by: previous frame's frame pointer 33 libcc.dylib!__ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMNS0_22CancelableCallbackImplINS_17RepeatingCallbackIFvvEEEEEFvvERKNS_7WeakPtrIS8_EEJEEEvOT_OT0_DpOT1_ + 0x5a rbp = 0x00007ffeeeab96e0 rsp = 0x00007ffeeeab96b0 rip = 0x0000000133c98e9a Found by: previous frame's frame pointer 34 libcc.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMNS0_22CancelableCallbackImplINS_17RepeatingCallbackIFvvEEEEEFvvEJNS_7WeakPtrIS7_EEEEES5_E7RunImplIRKS9_RKNSt3__15tupleIJSB_EEEJLm0EEEEvOT_OT0_NSH_16integer_sequenceImJXspT1_EEEE + 0x50 rbp = 0x00007ffeeeab9730 rsp = 0x00007ffeeeab96f0 rip = 0x0000000133c98e30 Found by: previous frame's frame pointer 35 libcc.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMNS0_22CancelableCallbackImplINS_17RepeatingCallbackIFvvEEEEEFvvEJNS_7WeakPtrIS7_EEEEES5_E3RunEPNS0_13BindStateBaseE + 0x2c rbp = 0x00007ffeeeab9760 rsp = 0x00007ffeeeab9740 rip = 0x0000000133c98d6c Found by: previous frame's frame pointer 36 libaccelerated_widget_mac.dylib!__ZNO4base12OnceCallbackIFvvEE3RunEv + 0x5c rbp = 0x00007ffeeeab97a0 rsp = 0x00007ffeeeab9770 rip = 0x0000000147328dbc Found by: previous frame's frame pointer 37 libaccelerated_widget_mac.dylib!__ZN2ui12_GLOBAL__N_111WrappedTask3RunEv + 0x41 rbp = 0x00007ffeeeab97d0 rsp = 0x00007ffeeeab97b0 rip = 0x0000000147327601 Found by: previous frame's frame pointer 38 libaccelerated_widget_mac.dylib!__ZN4base8internal13FunctorTraitsIMN2ui12_GLOBAL__N_111WrappedTaskEFvvEvE6InvokeIS6_PS4_JEEEvT_OT0_DpOT1_ + 0x7d rbp = 0x00007ffeeeab9820 rsp = 0x00007ffeeeab97e0 rip = 0x000000014732869d Found by: previous frame's frame pointer 39 libaccelerated_widget_mac.dylib!__ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN2ui12_GLOBAL__N_111WrappedTaskEFvvEJPS6_EEEvOT_DpOT0_ + 0x44 rbp = 0x00007ffeeeab9860 rsp = 0x00007ffeeeab9830 rip = 0x00000001473285e4 Found by: previous frame's frame pointer 40 libaccelerated_widget_mac.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN2ui12_GLOBAL__N_111WrappedTaskEFvvEJNS0_12OwnedWrapperIS5_EEEEEFvvEE7RunImplIRKS7_RKNSt3__15tupleIJS9_EEEJLm0EEEEvOT_OT0_NSG_16integer_sequenceImJXspT1_EEEE + 0x63 rbp = 0x00007ffeeeab98c0 rsp = 0x00007ffeeeab9870 rip = 0x0000000147328573 Found by: previous frame's frame pointer 41 libaccelerated_widget_mac.dylib!__ZN4base8internal7InvokerINS0_9BindStateIMN2ui12_GLOBAL__N_111WrappedTaskEFvvEJNS0_12OwnedWrapperIS5_EEEEEFvvEE3RunEPNS0_13BindStateBaseE + 0x2c rbp = 0x00007ffeeeab98f0 rsp = 0x00007ffeeeab98d0 rip = 0x000000014732846c Found by: previous frame's frame pointer 42 libbase.dylib!__ZNO4base12OnceCallbackIFvvEE3RunEv + 0x5c rbp = 0x00007ffeeeab9930 rsp = 0x00007ffeeeab9900 rip = 0x00000001037b3f5c Found by: previous frame's frame pointer 43 libbase.dylib!__ZN4base5debug13TaskAnnotator7RunTaskEPKcPNS_11PendingTaskE + 0x409 rbp = 0x00007ffeeeab9b10 rsp = 0x00007ffeeeab9940 rip = 0x0000000103810339 Found by: previous frame's frame pointer
,
Oct 1
,
Oct 1
,
Oct 11
gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash is flaky. Findit has detected 3 new flake occurrences of this test. List of all flake occurrences can be found at: https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA. Since this test is still flaky, this issue has been moved back onto the Sheriff Bug Queue if it's not already there. If the result above is wrong, please file a bug using this link: https://bugs.chromium.org/p/chromium/issues/entry?status=Unconfirmed&labels=Pri-1,Test-Findit-Wrong&components=Tools%3ETest%3EFindit%3EFlakiness&summary=%5BFindit%5D%20Flake%20Detection%20-%20Wrong%20result%20for%20gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash&comment=Link%20to%20flake%20occurrences%3A%20https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
,
Oct 11
It's now failing on Win. Should it also be marked flaky there?
,
Oct 12
gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash is flaky. Findit has detected 5 new flake occurrences of this test. List of all flake occurrences can be found at: https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA. Since this test is still flaky, this issue has been moved back onto the Sheriff Bug Queue if it's not already there. If the result above is wrong, please file a bug using this link: https://bugs.chromium.org/p/chromium/issues/entry?status=Unconfirmed&labels=Pri-1,Test-Findit-Wrong&components=Tools%3ETest%3EFindit%3EFlakiness&summary=%5BFindit%5D%20Flake%20Detection%20-%20Wrong%20result%20for%20gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash&comment=Link%20to%20flake%20occurrences%3A%20https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
,
Oct 12
Sheriff here, should we mark the test as flaky on win?
,
Oct 12
The failures on Windows have a different root cause than the ones on Mac for which this bug was originally filed. https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/106046 https://chromium-swarm.appspot.com/task?id=4081918c5fdf9c10&refresh=10&show_raw=1 Last event: 588.18e8: Break instruction exception - code 80000003 (first/second chance not available) debugger time: Fri Oct 12 07:51:05.914 2018 (UTC - 7:00) ChildEBP RetAddr Args to Child 085fde14 6b053e0d 6dd7a789 00000230 05268a93 chrome_child!base::debug::BreakDebugger+0xc 085fde34 6ab42693 051764a0 6dd7a789 00000230 chrome_child!?Run@?$Invoker@U?$BindState@P6AXPBDHV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@base@@1@Z$$V@internal@base@@$$A6AXPBDHV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@3@1@Z@internal@base@@SAXPAVBindStateBase@23@PBDH$$QAV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@3@2@Z+0x1f 085fe358 6aef006e 051f4d50 051f4d50 00000003 chrome_child!logging::LogMessage::~LogMessage+0x483 085fe438 6969c4f8 00000000 08029348 080187ec chrome_child!gpu::gles2::GLES2Implementation::DeleteShader+0x10e 085fe450 6b3c5ce6 080187ec 00000000 051f4d50 chrome_child!gpu::gles2::GLES2Interface::`vcall'{1400}'+0xc2 085fe534 6b3bd0f5 00c101d0 00000016 00c10000 chrome_child!GrGLGpu::createClearColorProgram+0x736 085fe578 6b3bcc7a 0805db64 3f800000 3f800000 chrome_child!GrGLGpu::clearColorAsDraw+0x25 085fe5bc 6b3c73d9 0805db64 ffffffff 051eccd0 chrome_child!GrGLGpu::clear+0x11a 085fe5d4 6b3cd721 0805db64 ffffffff 0805db40 chrome_child!GrGLGpuRTCommandBuffer::onClear+0x19 085fe5ec 6b3cb3d6 085fe6f8 085fe648 085fe620 chrome_child!GrClearOp::onExecute+0x51 085fe62c 6b3cb278 085fe6f8 00000000 0805d8e8 chrome_child!GrOp::execute+0xa6 085fe698 6b3ad0c6 085fe6f8 07f1b028 07f1b008 chrome_child!GrRenderTargetOpList::onExecute+0x3b8 085fe6bc 6b3ac778 00000000 00000001 085fe6f8 chrome_child!GrDrawingManager::executeOpLists+0x3a6 085ff498 6b3ad2b9 0804a8e0 00000000 00000000 chrome_child!GrDrawingManager::flush+0x768 085ff4bc 6adf8368 0804a8e0 00000000 00000000 chrome_child!GrDrawingManager::prepareSurfaceForExternalIO+0x99 085ff4fc 6ae2981a 00000000 00000000 085ff5a8 chrome_child!GrRenderTargetContext::prepareForExternalIO+0xf8 085ff518 692aef3c 00000000 00000000 085ff538 chrome_child!SkGpuDevice::flushAndSignalSemaphores+0x2a 085ff528 6aec0b22 051e1744 085ff6e8 085ff67c chrome_child!SkSurface::flush+0xc 085ff538 6c71a9bf 6abad8af 051e1744 08062440 chrome_child!viz::ClientResourceProvider::ScopedSkSurface::~ScopedSkSurface+0x12 085ff67c 6c71a27e 085ff6e8 07fc62ed 00000de1 chrome_child!cc::GpuRasterBufferProvider::PlaybackOnWorkerThread+0x63f 085ff720 6c6fe349 07fcdcb0 07fe2454 07fe2464 chrome_child!cc::GpuRasterBufferProvider::RasterBufferImpl::Playback+0xde 085ff878 6c4a53d9 051d7000 051d7024 07fe2400 chrome_child!std::list<std::pair<unsigned __int64 const ,std::vector<cc::DrawImage,std::allocator<cc::DrawImage> > >,std::allocator<std::pair<unsigned __int64 const ,std::vector<cc::DrawImage,std::allocator<cc::DrawImage> > > > >::erase+0x269 Brian, could you please look into this regression? Ganesh has to handle context loss gracefully. Thanks. James, could you add a temporary flaky expectation for this test on Windows, to be removed when the fix for this in Skia is rolled forward into Chromium? Thanks.
,
Oct 12
,
Oct 12
I don't really understand how this is a Skia issue. Ganesh requires GrContext:::abandonContext be called if the context is lost in which case we stop making GL calls. It looks like that didn't happen here, unless I'm missing something.
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0a8d18e3a8a26a37ce6bc0bda923838b1ac3e74b commit 0a8d18e3a8a26a37ce6bc0bda923838b1ac3e74b Author: James Darpinian <jdarpinian@chromium.org> Date: Fri Oct 12 21:26:20 2018 Temporarily mark test flaky until Ganesh fix is made. GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash TBR: kbr@chromium.org Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I5ffe0fc31010f48838288a09b434235aedccb49b Reviewed-on: https://chromium-review.googlesource.com/c/1278812 Reviewed-by: James Darpinian <jdarpinian@chromium.org> Commit-Queue: James Darpinian <jdarpinian@chromium.org> Cr-Commit-Position: refs/heads/master@{#599359} [modify] https://crrev.com/0a8d18e3a8a26a37ce6bc0bda923838b1ac3e74b/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Oct 12
Skia is flushing because a ScopedSkSurface is going out of scope. For this not to have crashed before I think one of these conditions must have been true: 1) ~ScopedSkSurface was happening before the context was lost 2) ~ScopedSkSurface was happening after both context lost and GrContext::abandonContext was called. or 3) There was no work on GrContext queued when ~ScopedSkSurface happened because something else was flushing GrContext before the context was lost. For one of these conditions to have changed and made this flaky indicates a Chrome change.
,
Oct 12
Chrome's command buffer is supposed to guarantee that even if the context is lost, GL calls made in the renderer process won't crash, just become no-ops and perhaps generate a CONTEXT_LOST GL error. Unfortunately since context loss happens asynchronously, it can really happen between any two GL calls, though this isn't supposed to break anything. It's supposed to eventually be detected and things to recover later. I'm having a hard time finding where GLES2Implementation::DeleteShader and its callees are logging a message which causes the renderer process to crash. Can anyone see where that is happening? Does it actually look like this work is being done after the GLES2Implementation has been torn down? enne, may I assign this to you? It sounds like it's more related to GPU rasterization than a problem in Skia.
,
Oct 13
gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash is flaky. Findit has detected 3 new flake occurrences of this test. List of all flake occurrences can be found at: https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA. Since this test is still flaky, this issue has been moved back onto the Sheriff Bug Queue if it's not already there. If the result above is wrong, please file a bug using this link: https://bugs.chromium.org/p/chromium/issues/entry?status=Unconfirmed&labels=Pri-1,Test-Findit-Wrong&components=Tools%3ETest%3EFindit%3EFlakiness&summary=%5BFindit%5D%20Flake%20Detection%20-%20Wrong%20result%20for%20gpu_tests.context_lost_integration_test.ContextLostIntegrationTest.GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash&comment=Link%20to%20flake%20occurrences%3A%20https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyswELEgVGbGFrZSKnAWNocm9taXVtQHRlbGVtZXRyeV9ncHVfaW50ZWdyYXRpb25fdGVzdEBncHVfdGVzdHMuY29udGV4dF9sb3N0X2ludGVncmF0aW9uX3Rlc3QuQ29udGV4dExvc3RJbnRlZ3JhdGlvblRlc3QuR3B1Q3Jhc2hfR1BVUHJvY2Vzc0NyYXNoZXNFeGFjdGx5T25jZVBlclZpc2l0VG9BYm91dEdwdUNyYXNoDA Automatically posted by the findit-for-me app (https://goo.gl/Ot9f7N).
,
Oct 13
The newly found flakes predate James's suppression.
,
Oct 13
,
Oct 15
Sure, I'll try to take a look when I can.
,
Oct 16
Naively looking at the code, the only obvious crashes here (DCHECK is probably the log) would be if the shader id were 0 during DeleteShader. The command buffer never generates a zero id (because these are all client side ids), however https://cs.chromium.org/chromium/src/third_party/skia/src/gpu/gl/builders/GrGLShaderStringBuilder.cpp?type=cs&sq=package:chromium&g=0&l=161 suspiciously looks like if program compilation fails due to context lost, then GrGLCompileAndAttachShader will return a shader id of zero. The gl spec says that DeleteShader(0) just causes a gl error. It seems a bit to me like a DCHECK is overblown here, and the GLES2Implementation::DeleteShaderHelper function will already throw a gl error when it can't find a zero id. I wasn't able to repro this locally on a win nvidia machine, so this is just from reading the code.
,
Oct 16
Agreed that this DCHECK doesn't belong. The ES spec even says "DeleteShader will silently ignore the value zero.", so it shouldn't even raise a GL error.
,
Oct 17
This flaky test is making the ANGLE CQ unstable. We should suppress it temporarily. Also probably the long term fix is to remove more Chrome-specific tests from the ANGLE CQ.
,
Oct 17
My best guess is that that DCHECK was an attempt to catch potentially-incorrect, Chrome-internal code. Which path should we take? Update: https://cs.chromium.org/chromium/src/third_party/skia/src/gpu/gl/builders/GrGLShaderStringBuilder.cpp?type=cs&sq=package:chromium&g=0&l=160 to test the shader and not try to delete 0, or update: https://cs.chromium.org/chromium/src/gpu/command_buffer/client/gles2_implementation_impl_autogen.h?type=cs&q=GLES2Implementation::DeleteShader&sq=package:chromium&g=0&l=560 and remove the DCHECK? Given the intent of the DCHECK I have a slight preference to updating Ganesh.
,
Oct 17
We can update Ganesh sure, but we should follow the spec. https://chromium-review.googlesource.com/c/chromium/src/+/1285317 removes the DCHECK, and makes it consistent with other ids (e.g. textures, buffers). If there is interest, we could replace it by a DLOG, but I don't think I see value. Client code can use DCHECK before calling glDeleteShader if they want to catch this case.
,
Oct 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/382ffc533cb64b0946adfbc617858a6203207d03 commit 382ffc533cb64b0946adfbc617858a6203207d03 Author: Adrienne Walker <enne@chromium.org> Date: Wed Oct 17 17:54:55 2018 gpu: silently ignore deleting program, shader, sync 0 This is according to the gpu spec, and should fix a crash during context lost. Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: Icabb95055de11c7a743144288898428ecd36bc90 Reviewed-on: https://chromium-review.googlesource.com/c/1285317 Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#600468} [modify] https://crrev.com/382ffc533cb64b0946adfbc617858a6203207d03/gpu/command_buffer/build_cmd_buffer_lib.py [modify] https://crrev.com/382ffc533cb64b0946adfbc617858a6203207d03/gpu/command_buffer/client/gles2_implementation_impl_autogen.h [modify] https://crrev.com/382ffc533cb64b0946adfbc617858a6203207d03/gpu/command_buffer/client/gles2_implementation_unittest.cc
,
Oct 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f16c4808cd102369384466b1ee4c73193dbf86b7 commit f16c4808cd102369384466b1ee4c73193dbf86b7 Author: Jamie Madill <jmadill@chromium.org> Date: Wed Oct 17 18:35:19 2018 Upgrade context lost expectation to fail. GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash This test was so flaky it was failing on the ANGLE CQ. Also affects Intel and possibly AMD. Tbr: kbr@chromium.org Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I74685596e4162d1d8244ac48f60fcbd12319159d Reviewed-on: https://chromium-review.googlesource.com/c/1286892 Reviewed-by: Jamie Madill <jmadill@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#600489} [modify] https://crrev.com/f16c4808cd102369384466b1ee4c73193dbf86b7/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Oct 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bd7e58308cc50ef63e533e699c35fe11cc49582 commit 8bd7e58308cc50ef63e533e699c35fe11cc49582 Author: Adrienne Walker <enne@chromium.org> Date: Thu Oct 18 17:36:36 2018 Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash This was flaky, but should be fixed. Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Change-Id: I3e037e41339ddff9d46ec0a7ca7679064a3322e0 Reviewed-on: https://chromium-review.googlesource.com/c/1285526 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: enne <enne@chromium.org> Cr-Commit-Position: refs/heads/master@{#600813} [modify] https://crrev.com/8bd7e58308cc50ef63e533e699c35fe11cc49582/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Oct 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a337f08fa698a27fd20c104f61e1bfee81ba37fe commit a337f08fa698a27fd20c104f61e1bfee81ba37fe Author: Christian Dullweber <dullweber@chromium.org> Date: Fri Oct 19 10:05:50 2018 Revert "Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash" This reverts commit 8bd7e58308cc50ef63e533e699c35fe11cc49582. Reason for revert: Still flaky :( https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/166516 Original change's description: > Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash > > This was flaky, but should be fixed. > > Bug: 878258 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > Change-Id: I3e037e41339ddff9d46ec0a7ca7679064a3322e0 > Reviewed-on: https://chromium-review.googlesource.com/c/1285526 > Reviewed-by: Kenneth Russell <kbr@chromium.org> > Commit-Queue: enne <enne@chromium.org> > Cr-Commit-Position: refs/heads/master@{#600813} TBR=kbr@chromium.org,enne@chromium.org Change-Id: I0dd172bd7db86982f9b1bf49e2a57c059958d348 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Reviewed-on: https://chromium-review.googlesource.com/c/1290950 Reviewed-by: Christian Dullweber <dullweber@chromium.org> Commit-Queue: Christian Dullweber <dullweber@chromium.org> Cr-Commit-Position: refs/heads/master@{#601098} [modify] https://crrev.com/a337f08fa698a27fd20c104f61e1bfee81ba37fe/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Oct 19
The revert of that re-enable wouldn't have affected this try job. The try job failed on macOS, but the re-enable only affected the Windows platform. There was a previous flaky expectation for this test on macOS. From the failing shard: https://chromium-swarm.appspot.com/task?id=40a456d46556fe10&refresh=10&show_raw=1 It looks like this DCHECK is the reason the test failed: [16090:775:1019/015314.549645:FATAL:compositor.cc(618)] Check failed: false. 0 Chromium Framework 0x000000011001ab9f base::debug::StackTrace::StackTrace(unsigned long) + 31 1 Chromium Framework 0x000000010ff1b7af logging::LogMessage::~LogMessage() + 223 2 Chromium Framework 0x000000011267556c ui::Compositor::DidFailToInitializeLayerTreeFrameSink() + 76 3 Chromium Framework 0x0000000111a2bc9f cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink() + 159 4 Chromium Framework 0x0000000111a934b0 cc::SingleThreadProxy::SetLayerTreeFrameSink(cc::LayerTreeFrameSink*) + 576 5 Chromium Framework 0x0000000111a2b89b cc::LayerTreeHost::SetLayerTreeFrameSink(std::__1::unique_ptr<cc::LayerTreeFrameSink, std::__1::default_delete<cc::LayerTreeFrameSink> >) + 251 6 Chromium Framework 0x0000000112673503 ui::Compositor::SetLayerTreeFrameSink(std::__1::unique_ptr<cc::LayerTreeFrameSink, std::__1::default_delete<cc::LayerTreeFrameSink> >) + 51 7 Chromium Framework 0x000000010d930dd4 ui::HostContextFactoryPrivate::ConfigureCompositor(ui::Compositor*, scoped_refptr<viz::ContextProvider>, scoped_refptr<viz::RasterContextProvider>) + 1508 8 Chromium Framework 0x000000010d7bd6e4 content::VizProcessTransportFactory::OnEstablishedGpuChannel(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>) + 100 9 Chromium Framework 0x000000010d7bed8e void base::internal::FunctorTraits<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>), void>::Invoke<void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>), base::WeakPtr<content::VizProcessTransportFactory>, base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost> >(void (content::VizProcessTransportFactory::*)(base::WeakPtr<ui::Compositor>, scoped_refptr<gpu::GpuChannelHost>), base::WeakPtr<content::VizProcessTransportFactory>&&, base::WeakPtr<ui::Compositor>&&, scoped_refptr<gpu::GpuChannelHost>&&) + 206 10 Chromium Framework 0x000000010d2ddbed content::BrowserGpuChannelHostFactory::EstablishGpuChannel(base::OnceCallback<void (scoped_refptr<gpu::GpuChannelHost>)>) + 701 11 Chromium Framework 0x000000010d7bd666 content::VizProcessTransportFactory::CreateLayerTreeFrameSink(base::WeakPtr<ui::Compositor>) + 422 12 Chromium Framework 0x0000000112675500 ui::Compositor::RequestNewLayerTreeFrameSink() + 176 13 Chromium Framework 0x0000000111a931cd cc::SingleThreadProxy::RequestNewLayerTreeFrameSink() + 205 14 Chromium Framework 0x0000000111a975f7 base::internal::Invoker<base::internal::BindState<void (cc::SingleThreadProxy::*)(), base::WeakPtr<cc::SingleThreadProxy> >, void ()>::Run(base::internal::BindStateBase*) + 183 15 Chromium Framework 0x000000010c4a626f void base::internal::CancelableCallbackImpl<base::RepeatingCallback<void ()> >::ForwardRepeating<>() + 95 16 Chromium Framework 0x000000010c4a6337 base::internal::Invoker<base::internal::BindState<void (base::internal::CancelableCallbackImpl<base::RepeatingCallback<void ()> >::*)(), base::WeakPtr<base::internal::CancelableCallbackImpl<base::RepeatingCallback<void ()> > > >, void ()>::Run(base::internal::BindStateBase*) + 183 17 Chromium Framework 0x000000010d8d14ba base::OnceCallback<void ()>::Run() && + 106 Again, this sounds like the same issue that danakj@ fixed in https://chromium-review.googlesource.com/1219992 and Issue 882103 . Do we know why this is still crashing? It should be safe to re-land the re-enabling of this test on Windows. I'll try to do that.
,
Oct 19
,
Oct 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0d08508e73f47cde5cfb592e16b644a5bc7e9d44 commit 0d08508e73f47cde5cfb592e16b644a5bc7e9d44 Author: Kenneth Russell <kbr@chromium.org> Date: Fri Oct 19 20:00:26 2018 Reland "Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash" This reverts commit a337f08fa698a27fd20c104f61e1bfee81ba37fe. Reason for revert: this CL only re-enables the test on Windows; the failure seen was on macOS, which is a different platform and had a preexisting flaky expectation for this test. Investigation will continue on the bug into the flakiness on that platform. Original change's description: > Revert "Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash" > > This reverts commit 8bd7e58308cc50ef63e533e699c35fe11cc49582. > > Reason for revert: Still flaky :( https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/166516 > > Original change's description: > > Reenable GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash > > > > This was flaky, but should be fixed. > > > > Bug: 878258 > > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > > Change-Id: I3e037e41339ddff9d46ec0a7ca7679064a3322e0 > > Reviewed-on: https://chromium-review.googlesource.com/c/1285526 > > Reviewed-by: Kenneth Russell <kbr@chromium.org> > > Commit-Queue: enne <enne@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#600813} > > TBR=kbr@chromium.org,enne@chromium.org > > Change-Id: I0dd172bd7db86982f9b1bf49e2a57c059958d348 > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Bug: 878258 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > Reviewed-on: https://chromium-review.googlesource.com/c/1290950 > Reviewed-by: Christian Dullweber <dullweber@chromium.org> > Commit-Queue: Christian Dullweber <dullweber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#601098} TBR=kbr@chromium.org,enne@chromium.org,dullweber@chromium.org Change-Id: If1ca9d7ae9ef2122150edfa9d965e0a9cd5c442a No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 878258 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Reviewed-on: https://chromium-review.googlesource.com/c/1292431 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#601268} [modify] https://crrev.com/0d08508e73f47cde5cfb592e16b644a5bc7e9d44/content/test/gpu/gpu_tests/context_lost_expectations.py
,
Oct 19
This stack I think means that we gave a FrameSink to the compositor that could not be initialized. It's supposed to be initialized already since it's all one thread for the UI. Probably an OOPD issue?
,
Oct 19
OOP-D was enabled via fieldtrail_testing_config.json for Mac on August 27th, see https://crrev.com/c/1191122, so the timeline is right.
,
Oct 19
Thanks Dana and Kyle for figuring out the proximate root cause. Fady, could you please take this since that change seems to have caused this instability? Or reassign to a more appropriate engineer? Thanks.
,
Nov 15
I think kylechar@ or backer@ are more knowledgable here.
,
Dec 11
kylechar@ could you please look into this and make some progress? This test stresses context loss handling and it's crucial to make it reliable again. Thanks.
,
Dec 11
Another recent trybot failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/205513 https://chromium-swarm.appspot.com/task?id=41b7718ea7fcbb10&refresh=10&show_raw=1 Same stack trace as above. This is affecting other Chromium developers' productivity. The flaky suppression hasn't worked; it looks like due to timing issues, the test fails three times in a row.
,
Dec 12
Sorry, didn't see this was reassigned to me. I'll try and reproduce but I had a conversion with danakj@ a while ago. If I remember correctly the issue is that ui::Compositor doesn't handle LayerTreeFrameSink::BindToClient() failing correctly. Either it shouldn't fail in the browser or ui::Compositor needs to handle the failure gracefully.
,
Dec 12
Right, it shouldn't fail. The GL context is bound before being given to the compositor. Anything else that can fail should be done ahead too.
,
Dec 14
I wasn't able to reproduce it locally but I think I know what's happening after looking at the code. We check that the worker context hasn't been lost at [1], create the AsyncLayerTreeFrameSink and pass it to ui::Compositor. The ui::Compositor LayerTreeHost calls LayerTreeFrameSink::BindToClient() which checks if the worker context is lost again at [2]. It is lost the second time we check it which means BindToClient() fails. This is the actual problem, we can't guarantee that between those two checks the context hasn't been lost, but the reason it happens frequently is because the AsyncLayerTreeFrameSink has two different message pipes to the GPU process with OOP-D, mojom::CompositorFrameSink and the GPU channel. When the GPU process dies, the mojom::CompositorFrameSink sees the connection error to the GPU process first and triggers the AsyncLayerTreeFrameSink context loss code. GpuChannelHost hasn't seen it's connection error yet and new context providers get created using the existing GPU channel (for a dead GPU process). A new AsyncLayerTreeFrameSink is created and given those context providers, which is then given to ui::Compositor and LTFS::BindToClient() gets called. If GpuChannelHost sees the connection error at the right time then worker context will be lost at the second check. We can handle context loss / GPU process restart in a smarter way with OOP-D to avoid this situation. We do a bunch of wasted work creating new LTFSs with dead context providers. [1] https://cs.chromium.org/chromium/src/content/browser/compositor/viz_process_transport_factory.cc?l=411&rcl=163958795fc8e798f0959fd41628709f263898c7 [2] https://cs.chromium.org/chromium/src/cc/trees/layer_tree_frame_sink.cc?l=85&rcl=2975f65cb50278f165ad56bcf15afbaf77c15dcd
,
Dec 14
Looks legit. There's no reason for LTFS to check the worker context there as long as it will hear about the loss and respond to it from the observer in another call stack, the same way it would for the compositor context.
,
Jan 8
Issue 919987 has been merged into this issue.
,
Jan 8
Could this bug please be prioritized? Flakes in these tests continue to be noticed by pixel wranglers and sheriffs; see Issue 919987 for an example.
,
Jan 9
Note that in issue 919987 I've seen ui::Compositor::DidFailToInitializeLayerTreeFrameSink() in GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash in builds https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5113 and https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5116 However, in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5097 it happened either in the first test ContextLost_WebGLBlockedAfterJSNavigation or even before any test is run.
,
Jan 9
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5123 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5125 https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Retina%20Debug%20%28NVIDIA%29/5128 In 5123 this is in 8th test, ContextLost_WorkerRAFAfterGPUCrash
,
Jan 9
I've got https://crrev.com/c/1403323 which implements the solution suggested by danakj@. I've never actually been able to reproduce the failure locally so I'll try running optional mac GPU test bot to see if it is still flaky. I was unable to come up with a good way to solve the race around recreating browser AsyncLayerTreeFrameSinks. If https://crrev.com/c/1403323 works then it's probably not necessary.
,
Jan 17
(5 days ago)
,
Today
(13 hours ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f7f20da9aff457b4ca6e42f670e0e30adc2db38d commit f7f20da9aff457b4ca6e42f670e0e30adc2db38d Author: kylechar <kylechar@chromium.org> Date: Tue Jan 22 20:21:47 2019 Fix race binding LayerTreeFrameSink to client. ui::Compositor expects that calling LayerTreeFrameSink::BindToClient() will always be successful. However, BindToClient() can fail if the worker context provider has encountered a GL error. Even if we check the worker context provider hasn't encountered an error before passing it to ui::Compositor, it's possible the error happens after the check but before BindToClient() is called. GpuCrash_GPUProcessCrashesExactlyOncePerVisitToAboutGpuCrash is failing flakily on mac due to this. With OOP-D there are multiple message pipes between the browser and GPU process which all get notified of the GPU process crashing. This sets up the perfect conditions for the race to occur. Stop checking if the worker context provider has been lost in BindToCurrentThread(). Instead, ensure that observers will always get the OnContextLost() call even if AddObserver() was called after context is lost. We make OnContextLost() call happens in a new callstack to avoid re-entrancy. This should be safe because the posted task has a reference to context provider and we check that the observer is still observing in the posted task. Bug: 878258 Change-Id: If0db2fead55f86d86892db7a5dc257154590fe98 Reviewed-on: https://chromium-review.googlesource.com/c/1403323 Reviewed-by: Eric Karl <ericrk@chromium.org> Reviewed-by: Sunny Sachanandani <sunnyps@chromium.org> Reviewed-by: danakj <danakj@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: kylechar <kylechar@chromium.org> Cr-Commit-Position: refs/heads/master@{#624899} [modify] https://crrev.com/f7f20da9aff457b4ca6e42f670e0e30adc2db38d/cc/trees/layer_tree_frame_sink.cc [modify] https://crrev.com/f7f20da9aff457b4ca6e42f670e0e30adc2db38d/content/browser/renderer_host/compositor_impl_android.cc [modify] https://crrev.com/f7f20da9aff457b4ca6e42f670e0e30adc2db38d/content/test/gpu/gpu_tests/context_lost_expectations.py [modify] https://crrev.com/f7f20da9aff457b4ca6e42f670e0e30adc2db38d/ui/compositor/compositor.cc |
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by tapted@chromium.org
, Aug 28Components: Internals>Services>Viz
Labels: OS-Mac
Owner: kbr@chromium.org
Status: Assigned (was: Untriaged)