Compositor NOTREACHED Crashing Tests |
||||||||
Issue descriptionAbout a third of all mash_browser_test runs on the CQ are failing with this error: [0714/071528.181897:FATAL:compositor.cc(498)] Check failed: false. #0 0x0000037b321c base::debug::StackTrace::StackTrace() #1 0x0000037cb3e1 logging::LogMessage::~LogMessage() #2 0x0000055e6fac ui::Compositor::DidFailToInitializeLayerTreeFrameSink() #3 0x000004ee618f cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink() #4 0x000004ef16e8 cc::SingleThreadProxy::SetLayerTreeFrameSink() #5 0x000004ee5cf1 cc::LayerTreeHost::SetLayerTreeFrameSink() #6 0x0000055e6242 ui::Compositor::SetLayerTreeFrameSink() #7 0x0000055e370e aura::MusContextFactory::OnEstablishedGpuChannel() #8 0x0000055e3a9e _ZN4base8internal13FunctorTraitsIMN4aura17MusContextFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEvE6InvokeIRKNS4_IS3_EEJRKS7_SB_EEEvSD_OT_DpOT0_ #9 0x00000216a08f ui::Gpu::OnEstablishedGpuChannel() #10 0x00000216a7a9 _ZN4base8internal7InvokerINS0_9BindStateIMN2ui3GpuEFviN4mojo16ScopedHandleBaseINS5_17MessagePipeHandleEEERKN3gpu7GPUInfoEEJNS0_17UnretainedWrapperIS4_EEEEEFviS8_SC_EE3RunEPNS0_13BindStateBaseEOiOS8_SC_ #11 0x0000020ab002 ui::mojom::Gpu_EstablishGpuChannel_ForwardToCallback::Accept() #12 0x000004d64835 mojo::InterfaceEndpointClient::HandleValidatedMessage() #13 0x000004d64416 mojo::FilterChain::Accept() #14 0x000004d65915 mojo::InterfaceEndpointClient::HandleIncomingMessage() #15 0x000004d6c923 mojo::internal::MultiplexRouter::ProcessIncomingMessage() #16 0x000004d6c16f mojo::internal::MultiplexRouter::Accept() #17 0x000004d64416 mojo::FilterChain::Accept() #18 0x000004d63705 mojo::Connector::ReadSingleMessage() #19 0x000004d64001 mojo::Connector::ReadAllAvailableMessages() #20 0x000004d63eb9 mojo::Connector::OnHandleReadyInternal() #21 0x0000019f4185 chromeos::file_system_provider::Int64ToIntCompletionCallback() #22 0x000004d7a904 mojo::SimpleWatcher::OnHandleReady() #23 0x000004d7ad93 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE7RunImplIRKS9_RKSt5tupleIJSB_ijS5_EEJLm0ELm1ELm2ELm3EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE #24 0x00000385fc70 base::debug::TaskAnnotator::RunTask() #25 0x0000037d2709 base::MessageLoop::RunTask() #26 0x0000037d29bb base::MessageLoop::DeferOrRunPendingTask() #27 0x0000037d2dc4 base::MessageLoop::DoWork() #28 0x0000037d5289 base::MessagePumpLibevent::Run() #29 0x0000037d233b base::MessageLoop::Run() #30 0x0000037fc66a base::RunLoop::Run() #31 0x0000037a4102 (anonymous namespace)::StartEmbeddedService() #32 0x0000037a493d _ZN4base8internal7InvokerINS0_9BindStateIPFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJEEES9_E3RunEPNS0_13BindStateBaseEOS8_ #33 0x0000021b4a02 service_manager::RunStandaloneService() #34 0x0000037a3d2e RunMashBrowserTests() #35 0x0000037a3b87 main #36 0x7fcd0d28af45 __libc_start_main #37 0x000000627b3e <unknown> Received signal 6 #0 0x0000037b321c base::debug::StackTrace::StackTrace() #1 0x0000037b2d91 base::debug::(anonymous namespace)::StackDumpSignalHandler() #2 0x7fcd10973330 <unknown> #3 0x7fcd0d29fc37 gsignal #4 0x7fcd0d2a3028 abort #5 0x0000037b1a75 [0714/071528.556034:ERROR:layer_tree_host_impl.cc(2383)] Forcing zero-copy tile initialization as worker context is missing base::debug::BreakDebugger() #6 0x0000037cb7a7 logging::LogMessage::~LogMessage() #7 0x0000055e6fac [0714/071528.570651:ERROR:shell_delegate_mus.cc(53)] Not implemented reached in virtual bool ash::ShellDelegateMus::CanShowWindowForUser(aura::Window *) const ui::Compositor::DidFailToInitializeLayerTreeFrameSink() #8 0x000004ee618f [0714/071528.579527:ERROR:layer_tree_host_impl.cc(2383)] Forcing zero-copy tile initialization as worker context is missing cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink() #9 0x000004ef16e8 cc::SingleThreadProxy::SetLayerTreeFrameSink() #10 0x000004ee5cf1 cc::LayerTreeHost::SetLayerTreeFrameSink() #11 0x0000055e6242 ui::Compositor::SetLayerTreeFrameSink() #12 0x0000055e370e aura::MusContextFactory::OnEstablishedGpuChannel() #13 0x0000055e3a9e _ZN4base8internal13FunctorTraitsIMN4aura17MusContextFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEvE6InvokeIRKNS4_IS3_EEJRKS7_SB_EEEvSD_OT_DpOT0_ #14 0x00000216a08f ui::Gpu::OnEstablishedGpuChannel() #15 0x00000216a7a9 _ZN4base8internal7InvokerINS0_9BindStateIMN2ui3GpuEFviN4mojo16ScopedHandleBaseINS5_17MessagePipeHandleEEERKN3gpu7GPUInfoEEJNS0_17UnretainedWrapperIS4_EEEEEFviS8_SC_EE3RunEPNS0_13BindStateBaseEOiOS8_SC_ #16 0x0000020ab002 ui::mojom::Gpu_EstablishGpuChannel_ForwardToCallback::Accept() #17 0x000004d64835 mojo::InterfaceEndpointClient::HandleValidatedMessage() #18 0x000004d64416 mojo::FilterChain::Accept() #19 0x000004d65915 mojo::InterfaceEndpointClient::HandleIncomingMessage() #20 0x000004d6c923 mojo::internal::MultiplexRouter::ProcessIncomingMessage() #21 0x000004d6c16f mojo::internal::MultiplexRouter::Accept() #22 0x000004d64416 mojo::FilterChain::Accept() #23 0x000004d63705 mojo::Connector::ReadSingleMessage() #24 0x000004d64001 mojo::Connector::ReadAllAvailableMessages() #25 0x000004d63eb9 mojo::Connector::OnHandleReadyInternal() #26 0x0000019f4185 chromeos::file_system_provider::Int64ToIntCompletionCallback() #27 0x000004d7a904 mojo::SimpleWatcher::OnHandleReady() #28 0x000004d7ad93 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE7RunImplIRKS9_RKSt5tupleIJSB_ijS5_EEJLm0ELm1ELm2ELm3EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE #29 0x00000385fc70 base::debug::TaskAnnotator::RunTask() #30 0x0000037d2709 base::MessageLoop::RunTask() #31 0x0000037d29bb base::MessageLoop::DeferOrRunPendingTask() #32 0x0000037d2dc4 base::MessageLoop::DoWork() #33 0x0000037d5289 base::MessagePumpLibevent::Run() #34 0x0000037d233b base::MessageLoop::Run() #35 0x0000037fc66a base::RunLoop::Run() #36 0x0000037a4102 (anonymous namespace)::StartEmbeddedService() #37 0x0000037a493d _ZN4base8internal7InvokerINS0_9BindStateIPFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJEEES9_E3RunEPNS0_13BindStateBaseEOS8_ #38 0x0000021b4a02 service_manager::RunStandaloneService() #39 0x0000037a3d2e RunMashBrowserTests() #40 0x0000037a3b87 main #41 0x7fcd0d28af45 __libc_start_main #42 0x000000627b3e <unknown> r8: ffffbfb759ef43c8 r9: ffffbfb759ef43b8 r10: 0000000000000008 r11: 0000000000000202 r12: 00007fcd0db4a940 r13: 00007ffc921bf510 r14: 0000000000000043 r15: 00007ffc921bf508 di: 0000000000001fa8 si: 0000000000001fa8 bp: 00007ffc921bf060 bx: 00007ffc921bf070 dx: 0000000000000006 ax: 0000000000000000 cx: ffffffffffffffff sp: 00007ffc921bef28 ip: 00007fcd0d29fc37 efl: 0000000000000202 cgf: 0000000000000033 erf: 0000000000000000 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
,
Jul 14 2017
,
Jul 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b18491dc7383250a16e2256ba451fa70930cb501 commit b18491dc7383250a16e2256ba451fa70930cb501 Author: Jonathan <jonross@chromium.org> Date: Fri Jul 14 16:29:19 2017 Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED One third of CQ runs of mash_browser_tests are failing when hitting the NOTREACHED in Compositor:DidFailToInitializeLayerTreeFrameSink so I'm disabling it while the cause is researched. TBR=sadrul@chromium.org TEST: mash_browser_tests Bug: 742966 , 743028 Change-Id: I9363ebfbdeac594d26696b5426004cbafcec9c9b Reviewed-on: https://chromium-review.googlesource.com/571274 Reviewed-by: Jonathan Ross <jonross@chromium.org> Commit-Queue: Jonathan Ross <jonross@chromium.org> Cr-Commit-Position: refs/heads/master@{#486771} [modify] https://crrev.com/b18491dc7383250a16e2256ba451fa70930cb501/ui/compositor/compositor.cc
,
Jul 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1482be49c36f85a5de865899a5de759489112c7c commit 1482be49c36f85a5de865899a5de759489112c7c Author: Jonathan Ross <jonross@chromium.org> Date: Fri Jul 14 18:08:49 2017 Revert "Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED" This reverts commit b18491dc7383250a16e2256ba451fa70930cb501. Reason for revert: Viz currently does not support the mash configuration. Due to this the compositor cannot be used in mash_browser_tests. The tests have been disabled on the waterfall. See crbug.com/742966 Original change's description: > Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED > > One third of CQ runs of mash_browser_tests are failing when hitting the > NOTREACHED in Compositor:DidFailToInitializeLayerTreeFrameSink so I'm disabling > it while the cause is researched. > > TBR=sadrul@chromium.org > > TEST: mash_browser_tests > Bug: 742966 , 743028 > Change-Id: I9363ebfbdeac594d26696b5426004cbafcec9c9b > Reviewed-on: https://chromium-review.googlesource.com/571274 > Reviewed-by: Jonathan Ross <jonross@chromium.org> > Commit-Queue: Jonathan Ross <jonross@chromium.org> > Cr-Commit-Position: refs/heads/master@{#486771} TBR=sadrul@chromium.org,jonross@chromium.org Change-Id: Ic7eb38c23b6330dc27d5fddb0e546b83ec08b6d9 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 742966 , 743028 Reviewed-on: https://chromium-review.googlesource.com/572003 Reviewed-by: Jonathan Ross <jonross@chromium.org> Commit-Queue: Jonathan Ross <jonross@chromium.org> Cr-Commit-Position: refs/heads/master@{#486808} [modify] https://crrev.com/1482be49c36f85a5de865899a5de759489112c7c/ui/compositor/compositor.cc
,
Jul 14 2017
+danakj@ & piman@ FYI So the compositor init has derived from that used in --mash with fsamuel@ noting that aura::MusContextFactory::OnEstablishedGpuChannel is no longer supported. With mash_browser_tests disabled on the CQ this is no longer blocking the landing of unrelated work, so I'm reducing the priority. I'm taking this to do an investigation into the causes. It appears to be that InProcessContextProvider::BindToCurrentThread() is failing. Particularly: InProcessContextProvider::BindToCurrentThread() LayerTreeFrameSink::BindToClient LayerTreeHostImpl::InitializeRenderer SingleThreadProxy::SetLayerTreeFrameSink Where once this fails SingleThreadProxy calls layer_tree_host_->DidFailToInitializeLayerTreeFrameSink(); I'll update this with findings. inb4 another sync mojom somewhere causing a race.
,
Jul 17 2017
Trying to repro with logs in: https://chromium-review.googlesource.com/c/575387/ A repro in this run: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_chromeos_ozone_rel_ng%2F429475%2F%2B%2Frecipes%2Fsteps%2Fmash_browser_tests__with_patch_%2F0%2Fstdout Showing an error at initializing the GLES2CmdHelper: https://cs.chromium.org/chromium/src/services/ui/public/cpp/gpu/context_provider_command_buffer.cc?rcl=4b0a0252e2b47fa1a23a190184249af643a03a9f&l=270 Will debug more tomorrow
,
Jul 18 2017
Also cases of the GLES2Impl not initializing. https://cs.chromium.org/chromium/src/services/ui/public/cpp/gpu/context_provider_command_buffer.cc?rcl=cbb5082ca4707658874e42bb45e0327a0187c0ab&l=291 I'll look into finding where they each fail
,
Jul 18 2017
This is occurring because the compositors connection to the GPU process has closed. The closure is occurring while the compositor is still initializing. This is an expected potential outcome if the service manager is tearing down all connections, and the compositor has not completed its initialization yet. Once the compositor resumes code execution it will attempt to complete init before it has received any OnConnectionError In the particular case of the GLES2Impl the TransferBuffer cannot be initialized. This does signify that the compositor service is in a fatal state, and cannot continue. However it should be attempting to closed down cleanly. Rather than being a NOTREACHED, ui::Compositor::DidFailToInitializeLayerTreeFrameSink() should be attempting to close the ServiceContext::QuitNow, to cleanly close out. mash_browser_tests which as very short running have been known to expose these startup/shutdown conflicts.
,
Jul 18 2017
Initialization should happen when ContextProvider::BindToCurrentThread happens, which is done by GpuProcessTransportFactory before giving the context to compositor. That's why it is currently a NOTREACHED, cuz we already did init. Are you saying that more initialization is happening later? After init, if it fails for some reason, we should treat it as context lost instead. Is it being lost before cc gets it and it treats it as fail to init rather than loss?
,
Jul 18 2017
I have not looked into GpuProcessTransportFactory performing init.
The particular error I'm seeing is within ContextProvider::BindToCurrentThread, where the TransferBuffer is failing to init. I haven't dove deeper into that.
TransferBufferInterface::Initialize
GLES2Implementation::Initialize
ContextProviderCommandBuffer::BindToCurrentThread
LayerTreeFrameSink::BindToClient
LayerTreeHostImpl::InitializeRenderer
SingleThreadProxy::SetLayerTreeFrameSink
cc::SingleThreadProxy::SetLayerTreeFrameSink
cc::LayerTreeHost::SetLayerTreeFrameSink
ui::Compositor::SetLayerTreeFrameSink
aura::MusContextFactory::OnEstablishedGpuChannel
-Mojo interface stuff-
-MessageLoop stuff-
The BindToCurrentThread failure does lead to the ui::Compositor::DidFailToInitializeLayerTreeFrameSink being called.
As for my comments on initialization happening later. I don't mean that there is a second running of init. When looking at other services I have seen that initialization can be scheduled in such a way that service manager shutdown can occur. For example:
- Service::OnStart called
- Service begins to run its init code
- Service is interrupted from its run by the scheduler
- other process code runs
- ServiceManager is closed
- ServiceManager sends onConnectionError messages
- ServiceManager closes mojo pipes
- Service gets rescheduled resuming init
- but now the mojo peers are gone
- the onConnectionError is waiting to be processed later in the message loop
I can begin looking into what GpuProcessTransportFactory is doing when this occurs
,
Jul 18 2017
The context is supposed to already be bound before being given to ui::Compositor, since it is on the same thread. That happens here: https://cs.chromium.org/chromium/src/content/browser/compositor/gpu_process_transport_factory.cc?rcl=4696ce581cf84d109c1cea8e91e976ada38997cc&l=464 So cc's call to it should just be an early out with true.
,
Jul 18 2017
Maybe MusContextFactory or ClientLayerTreeFrameSink should BindToCurrentThread() on the context provider first?
,
Jul 18 2017
I'll look into updating MusContextFactory::OnEstablishedGpuChannel to perform ContextProvider::BindToCurrentThread earlier. As well as to port in the error handling which GpuProcessTransportFactory has for binding failures.
,
Jul 18 2017
yes, sounds like that's the issue. Thanks!
,
Jul 18 2017
RenderThreadImpl also creates ui::ContextProviderCommandBuffer [1]. It would be nice to have this code use ui::Gpu::CreateContextProvider() instead, have CreateContextProvider() do BindToCurrentThread() (and return nullptr on failure).
,
Jul 18 2017
That sounds like a good idea. What about the GpuProcessTransportFactory usage of CreateContextCommon? Similar pattern there, or do we think that is safe? https://cs.chromium.org/chromium/src/content/browser/compositor/gpu_process_transport_factory.cc?rcl=4696ce581cf84d109c1cea8e91e976ada38997cc&l=449
,
Jul 18 2017
RenderThreadImpl is different, as the context it gets is not bound on the same thread.
,
Jul 18 2017
GpuProcessTransportFactory does not use ui::Gpu, and we do not use GPTF when running with mus (--mus or --mash). So I am not sure we can easily share error checking code from there.
,
Jul 18 2017
In RenderThreadImpl, all three callsites for CreateContextCommon() seem to immediately BindToCurrentThread() after creation (and are called on main-thread). [1] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=1446 [2] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=1500 [3] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=2419
,
Jul 18 2017
The shared context is bound, the media context is bound, and the worker context is bound there. They are either used in blink or are shared between compositors and thats how we make it work. Just watch out for the compositor context. https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?rcl=eb728dc805a10d6a216b740bc9ece7b9443feded&l=2057
,
Jul 18 2017
So those contexts can be left. We'd want both ContextProviders created in RenderThreadImpl::RequestNewLayerTreeFrameSink to bound then? The one mentioned in #21 And the one passed to the RendererWindowTreeClient: https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?rcl=eb728dc805a10d6a216b740bc9ece7b9443feded&l=1963
,
Jul 19 2017
The one mentioned in #21 should /not/ be bound on the main thread. The others mentioned by #20 all are. The one passed to RendererWindowTreeClient is the compositor context (it's passed as it here https://cs.chromium.org/chromium/src/content/renderer/mus/renderer_window_tree_client.cc?rcl=352ac7aa0c0eebcce1ec493a6716f210a4f7d27b&l=88), so same rule as #21.
,
Jul 20 2017
Thanks for those clarifications. I've worked to address the MusContextFactory in: https://chromium-review.googlesource.com/c/577771 I'll follow up on this bug once I look into testing this more directly.
,
Jul 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1902f8987f618e2f32801ee0c87d5875c2d62978 commit 1902f8987f618e2f32801ee0c87d5875c2d62978 Author: Jonathan <jonross@chromium.org> Date: Thu Jul 20 15:05:08 2017 Update MusContextFactory ContextProvider Binding Update MusContextFactory to force the new ContextProvider to BindToCurrentThread immediately. This way we can detect errors in the mojo pipe and exit cleanly. Rather than attempting to bind later on in the initialization process of the LayerTreeFrameSink. TEST=mash_browser_tests Bug: 743028 Change-Id: I3806f779360d70df5b8e974e9f49c0c5f634b465 Reviewed-on: https://chromium-review.googlesource.com/577771 Commit-Queue: Jonathan Ross <jonross@chromium.org> Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org> Cr-Commit-Position: refs/heads/master@{#488248} [modify] https://crrev.com/1902f8987f618e2f32801ee0c87d5875c2d62978/ui/aura/mus/mus_context_factory.cc
,
Jul 31 2017
sadrul@ mentioned taking a look at how GpuProcessTransportFactory tests similar functionality. However there are no direct tests of the class. I'd be open to thoughts on creating proper unittests for MusContextFactory, or if we are satisfied with the mash_browser_tests exercising it
,
Aug 30 2017
,
Feb 26 2018
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by jonr...@chromium.org
, Jul 14 2017