New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 743028 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 742966



Sign in to add a comment

Compositor NOTREACHED Crashing Tests

Project Member Reported by jonr...@chromium.org, Jul 14 2017

Issue description

About a third of all mash_browser_test runs on the CQ are failing with this error:

[0714/071528.181897:FATAL:compositor.cc(498)] Check failed: false. 
#0 0x0000037b321c base::debug::StackTrace::StackTrace()
#1 0x0000037cb3e1 logging::LogMessage::~LogMessage()
#2 0x0000055e6fac ui::Compositor::DidFailToInitializeLayerTreeFrameSink()
#3 0x000004ee618f cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink()
#4 0x000004ef16e8 cc::SingleThreadProxy::SetLayerTreeFrameSink()
#5 0x000004ee5cf1 cc::LayerTreeHost::SetLayerTreeFrameSink()
#6 0x0000055e6242 ui::Compositor::SetLayerTreeFrameSink()
#7 0x0000055e370e aura::MusContextFactory::OnEstablishedGpuChannel()
#8 0x0000055e3a9e _ZN4base8internal13FunctorTraitsIMN4aura17MusContextFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEvE6InvokeIRKNS4_IS3_EEJRKS7_SB_EEEvSD_OT_DpOT0_
#9 0x00000216a08f ui::Gpu::OnEstablishedGpuChannel()
#10 0x00000216a7a9 _ZN4base8internal7InvokerINS0_9BindStateIMN2ui3GpuEFviN4mojo16ScopedHandleBaseINS5_17MessagePipeHandleEEERKN3gpu7GPUInfoEEJNS0_17UnretainedWrapperIS4_EEEEEFviS8_SC_EE3RunEPNS0_13BindStateBaseEOiOS8_SC_
#11 0x0000020ab002 ui::mojom::Gpu_EstablishGpuChannel_ForwardToCallback::Accept()
#12 0x000004d64835 mojo::InterfaceEndpointClient::HandleValidatedMessage()
#13 0x000004d64416 mojo::FilterChain::Accept()
#14 0x000004d65915 mojo::InterfaceEndpointClient::HandleIncomingMessage()
#15 0x000004d6c923 mojo::internal::MultiplexRouter::ProcessIncomingMessage()
#16 0x000004d6c16f mojo::internal::MultiplexRouter::Accept()
#17 0x000004d64416 mojo::FilterChain::Accept()
#18 0x000004d63705 mojo::Connector::ReadSingleMessage()
#19 0x000004d64001 mojo::Connector::ReadAllAvailableMessages()
#20 0x000004d63eb9 mojo::Connector::OnHandleReadyInternal()
#21 0x0000019f4185 chromeos::file_system_provider::Int64ToIntCompletionCallback()
#22 0x000004d7a904 mojo::SimpleWatcher::OnHandleReady()
#23 0x000004d7ad93 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE7RunImplIRKS9_RKSt5tupleIJSB_ijS5_EEJLm0ELm1ELm2ELm3EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#24 0x00000385fc70 base::debug::TaskAnnotator::RunTask()
#25 0x0000037d2709 base::MessageLoop::RunTask()
#26 0x0000037d29bb base::MessageLoop::DeferOrRunPendingTask()
#27 0x0000037d2dc4 base::MessageLoop::DoWork()
#28 0x0000037d5289 base::MessagePumpLibevent::Run()
#29 0x0000037d233b base::MessageLoop::Run()
#30 0x0000037fc66a base::RunLoop::Run()
#31 0x0000037a4102 (anonymous namespace)::StartEmbeddedService()
#32 0x0000037a493d _ZN4base8internal7InvokerINS0_9BindStateIPFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJEEES9_E3RunEPNS0_13BindStateBaseEOS8_
#33 0x0000021b4a02 service_manager::RunStandaloneService()
#34 0x0000037a3d2e RunMashBrowserTests()
#35 0x0000037a3b87 main
#36 0x7fcd0d28af45 __libc_start_main
#37 0x000000627b3e <unknown>

Received signal 6
#0 0x0000037b321c base::debug::StackTrace::StackTrace()
#1 0x0000037b2d91 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#2 0x7fcd10973330 <unknown>
#3 0x7fcd0d29fc37 gsignal
#4 0x7fcd0d2a3028 abort
#5 0x0000037b1a75 [0714/071528.556034:ERROR:layer_tree_host_impl.cc(2383)] Forcing zero-copy tile initialization as worker context is missing
base::debug::BreakDebugger()
#6 0x0000037cb7a7 logging::LogMessage::~LogMessage()
#7 0x0000055e6fac [0714/071528.570651:ERROR:shell_delegate_mus.cc(53)] Not implemented reached in virtual bool ash::ShellDelegateMus::CanShowWindowForUser(aura::Window *) const
ui::Compositor::DidFailToInitializeLayerTreeFrameSink()
#8 0x000004ee618f [0714/071528.579527:ERROR:layer_tree_host_impl.cc(2383)] Forcing zero-copy tile initialization as worker context is missing
cc::LayerTreeHost::DidFailToInitializeLayerTreeFrameSink()
#9 0x000004ef16e8 cc::SingleThreadProxy::SetLayerTreeFrameSink()
#10 0x000004ee5cf1 cc::LayerTreeHost::SetLayerTreeFrameSink()
#11 0x0000055e6242 ui::Compositor::SetLayerTreeFrameSink()
#12 0x0000055e370e aura::MusContextFactory::OnEstablishedGpuChannel()
#13 0x0000055e3a9e _ZN4base8internal13FunctorTraitsIMN4aura17MusContextFactoryEFvNS_7WeakPtrIN2ui10CompositorEEE13scoped_refptrIN3gpu14GpuChannelHostEEEvE6InvokeIRKNS4_IS3_EEJRKS7_SB_EEEvSD_OT_DpOT0_
#14 0x00000216a08f ui::Gpu::OnEstablishedGpuChannel()
#15 0x00000216a7a9 _ZN4base8internal7InvokerINS0_9BindStateIMN2ui3GpuEFviN4mojo16ScopedHandleBaseINS5_17MessagePipeHandleEEERKN3gpu7GPUInfoEEJNS0_17UnretainedWrapperIS4_EEEEEFviS8_SC_EE3RunEPNS0_13BindStateBaseEOiOS8_SC_
#16 0x0000020ab002 ui::mojom::Gpu_EstablishGpuChannel_ForwardToCallback::Accept()
#17 0x000004d64835 mojo::InterfaceEndpointClient::HandleValidatedMessage()
#18 0x000004d64416 mojo::FilterChain::Accept()
#19 0x000004d65915 mojo::InterfaceEndpointClient::HandleIncomingMessage()
#20 0x000004d6c923 mojo::internal::MultiplexRouter::ProcessIncomingMessage()
#21 0x000004d6c16f mojo::internal::MultiplexRouter::Accept()
#22 0x000004d64416 mojo::FilterChain::Accept()
#23 0x000004d63705 mojo::Connector::ReadSingleMessage()
#24 0x000004d64001 mojo::Connector::ReadAllAvailableMessages()
#25 0x000004d63eb9 mojo::Connector::OnHandleReadyInternal()
#26 0x0000019f4185 chromeos::file_system_provider::Int64ToIntCompletionCallback()
#27 0x000004d7a904 mojo::SimpleWatcher::OnHandleReady()
#28 0x000004d7ad93 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE7RunImplIRKS9_RKSt5tupleIJSB_ijS5_EEJLm0ELm1ELm2ELm3EEEEvOT_OT0_NS_13IndexSequenceIJXspT1_EEEE
#29 0x00000385fc70 base::debug::TaskAnnotator::RunTask()
#30 0x0000037d2709 base::MessageLoop::RunTask()
#31 0x0000037d29bb base::MessageLoop::DeferOrRunPendingTask()
#32 0x0000037d2dc4 base::MessageLoop::DoWork()
#33 0x0000037d5289 base::MessagePumpLibevent::Run()
#34 0x0000037d233b base::MessageLoop::Run()
#35 0x0000037fc66a base::RunLoop::Run()
#36 0x0000037a4102 (anonymous namespace)::StartEmbeddedService()
#37 0x0000037a493d _ZN4base8internal7InvokerINS0_9BindStateIPFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJEEES9_E3RunEPNS0_13BindStateBaseEOS8_
#38 0x0000021b4a02 service_manager::RunStandaloneService()
#39 0x0000037a3d2e RunMashBrowserTests()
#40 0x0000037a3b87 main
#41 0x7fcd0d28af45 __libc_start_main
#42 0x000000627b3e <unknown>
  r8: ffffbfb759ef43c8  r9: ffffbfb759ef43b8 r10: 0000000000000008 r11: 0000000000000202
 r12: 00007fcd0db4a940 r13: 00007ffc921bf510 r14: 0000000000000043 r15: 00007ffc921bf508
  di: 0000000000001fa8  si: 0000000000001fa8  bp: 00007ffc921bf060  bx: 00007ffc921bf070
  dx: 0000000000000006  ax: 0000000000000000  cx: ffffffffffffffff  sp: 00007ffc921bef28
  ip: 00007fcd0d29fc37 efl: 0000000000000202 cgf: 0000000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
 
Blocking: 742966
Cc: kylec...@chromium.org
Project Member

Comment 3 by bugdroid1@chromium.org, Jul 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b18491dc7383250a16e2256ba451fa70930cb501

commit b18491dc7383250a16e2256ba451fa70930cb501
Author: Jonathan <jonross@chromium.org>
Date: Fri Jul 14 16:29:19 2017

Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED

One third of CQ runs of mash_browser_tests are failing when hitting the
NOTREACHED in Compositor:DidFailToInitializeLayerTreeFrameSink so I'm disabling
it while the cause is researched.

TBR=sadrul@chromium.org

TEST: mash_browser_tests
Bug:  742966 ,  743028 
Change-Id: I9363ebfbdeac594d26696b5426004cbafcec9c9b
Reviewed-on: https://chromium-review.googlesource.com/571274
Reviewed-by: Jonathan Ross <jonross@chromium.org>
Commit-Queue: Jonathan Ross <jonross@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486771}
[modify] https://crrev.com/b18491dc7383250a16e2256ba451fa70930cb501/ui/compositor/compositor.cc

Project Member

Comment 4 by bugdroid1@chromium.org, Jul 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1482be49c36f85a5de865899a5de759489112c7c

commit 1482be49c36f85a5de865899a5de759489112c7c
Author: Jonathan Ross <jonross@chromium.org>
Date: Fri Jul 14 18:08:49 2017

Revert "Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED"

This reverts commit b18491dc7383250a16e2256ba451fa70930cb501.

Reason for revert:  Viz currently does not support the mash configuration. Due to this the compositor cannot be used in mash_browser_tests. The tests have been disabled on the waterfall. See  crbug.com/742966 

Original change's description:
> Disable Compositor:DidFailToInitializeLayerTreeFrameSink NOTREACHED
> 
> One third of CQ runs of mash_browser_tests are failing when hitting the
> NOTREACHED in Compositor:DidFailToInitializeLayerTreeFrameSink so I'm disabling
> it while the cause is researched.
> 
> TBR=sadrul@chromium.org
> 
> TEST: mash_browser_tests
> Bug:  742966 ,  743028 
> Change-Id: I9363ebfbdeac594d26696b5426004cbafcec9c9b
> Reviewed-on: https://chromium-review.googlesource.com/571274
> Reviewed-by: Jonathan Ross <jonross@chromium.org>
> Commit-Queue: Jonathan Ross <jonross@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#486771}

TBR=sadrul@chromium.org,jonross@chromium.org

Change-Id: Ic7eb38c23b6330dc27d5fddb0e546b83ec08b6d9
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  742966 ,  743028 
Reviewed-on: https://chromium-review.googlesource.com/572003
Reviewed-by: Jonathan Ross <jonross@chromium.org>
Commit-Queue: Jonathan Ross <jonross@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486808}
[modify] https://crrev.com/1482be49c36f85a5de865899a5de759489112c7c/ui/compositor/compositor.cc

Cc: piman@chromium.org danakj@chromium.org fsam...@chromium.org
Labels: -Pri-0 Pri-2
Owner: jonr...@chromium.org
+danakj@ & piman@ FYI

So the compositor init has derived from that used in --mash with fsamuel@ noting that aura::MusContextFactory::OnEstablishedGpuChannel is no longer supported.

With mash_browser_tests disabled on the CQ this is no longer blocking the landing of unrelated work, so I'm reducing the priority. I'm taking this to do an investigation into the causes.

It appears to be that InProcessContextProvider::BindToCurrentThread() is failing. Particularly:

InProcessContextProvider::BindToCurrentThread()
LayerTreeFrameSink::BindToClient
LayerTreeHostImpl::InitializeRenderer
SingleThreadProxy::SetLayerTreeFrameSink

Where once this fails SingleThreadProxy calls layer_tree_host_->DidFailToInitializeLayerTreeFrameSink();

I'll update this with findings.

inb4 another sync mojom somewhere causing a race.
Also cases of the GLES2Impl not initializing.

https://cs.chromium.org/chromium/src/services/ui/public/cpp/gpu/context_provider_command_buffer.cc?rcl=cbb5082ca4707658874e42bb45e0327a0187c0ab&l=291

I'll look into finding where they each fail
Status: Available (was: Assigned)
This is occurring because the compositors connection to the GPU process has closed. The closure is occurring while the compositor is still initializing.

This is an expected potential outcome if the service manager is tearing down all connections, and the compositor has not completed its initialization yet. Once the compositor resumes code execution it will attempt to complete init before it has received any OnConnectionError

In the particular case of the GLES2Impl the TransferBuffer cannot be initialized.

This does signify that the compositor service is in a fatal state, and cannot continue. However it should be attempting to closed down cleanly.

Rather than being a NOTREACHED, ui::Compositor::DidFailToInitializeLayerTreeFrameSink() should be attempting to close the ServiceContext::QuitNow, to cleanly close out.

mash_browser_tests which as very short running have been known to expose these startup/shutdown conflicts.

Comment 9 by danakj@chromium.org, Jul 18 2017

Initialization should happen when ContextProvider::BindToCurrentThread happens, which is done by GpuProcessTransportFactory before giving the context to compositor. That's why it is currently a NOTREACHED, cuz we already did init. Are you saying that more initialization is happening later?

After init, if it fails for some reason, we should treat it as context lost instead. Is it being lost before cc gets it and it treats it as fail to init rather than loss?
I have not looked into GpuProcessTransportFactory performing init.

The particular error I'm seeing is within ContextProvider::BindToCurrentThread, where the TransferBuffer is failing to init. I haven't dove deeper into that.

TransferBufferInterface::Initialize
GLES2Implementation::Initialize
ContextProviderCommandBuffer::BindToCurrentThread
LayerTreeFrameSink::BindToClient
LayerTreeHostImpl::InitializeRenderer
SingleThreadProxy::SetLayerTreeFrameSink
cc::SingleThreadProxy::SetLayerTreeFrameSink
cc::LayerTreeHost::SetLayerTreeFrameSink
ui::Compositor::SetLayerTreeFrameSink
aura::MusContextFactory::OnEstablishedGpuChannel
-Mojo interface stuff-
-MessageLoop stuff-

The BindToCurrentThread failure does lead to the ui::Compositor::DidFailToInitializeLayerTreeFrameSink being called.

As for my comments on initialization happening later. I don't mean that there is a second running of init. When looking at other services I have seen that initialization can be scheduled in such a way that service manager shutdown can occur. For example:
      - Service::OnStart called
      - Service begins to run its init code
      - Service is interrupted from its run by the scheduler
      - other process code runs
      - ServiceManager is closed
      - ServiceManager sends onConnectionError messages
      - ServiceManager closes mojo pipes
      - Service gets rescheduled resuming init
             - but now the mojo peers are gone
             - the onConnectionError is waiting to be processed later in the message loop

I can begin looking into what GpuProcessTransportFactory is doing when this occurs
The context is supposed to already be bound before being given to ui::Compositor, since it is on the same thread. That happens here: https://cs.chromium.org/chromium/src/content/browser/compositor/gpu_process_transport_factory.cc?rcl=4696ce581cf84d109c1cea8e91e976ada38997cc&l=464

So cc's call to it should just be an early out with true.
Maybe MusContextFactory or ClientLayerTreeFrameSink should BindToCurrentThread() on the context provider first?
Status: Assigned (was: Available)
I'll look into updating MusContextFactory::OnEstablishedGpuChannel to perform ContextProvider::BindToCurrentThread earlier.

As well as to port in the error handling which GpuProcessTransportFactory has for binding failures.
yes, sounds like that's the issue. Thanks!
RenderThreadImpl also creates ui::ContextProviderCommandBuffer [1]. It would be nice to have this code use ui::Gpu::CreateContextProvider() instead, have CreateContextProvider() do BindToCurrentThread() (and return nullptr on failure).
That sounds like a good idea. What about the GpuProcessTransportFactory usage of CreateContextCommon? Similar pattern there, or do we think that is safe?

https://cs.chromium.org/chromium/src/content/browser/compositor/gpu_process_transport_factory.cc?rcl=4696ce581cf84d109c1cea8e91e976ada38997cc&l=449


RenderThreadImpl is different, as the context it gets is not bound on the same thread.
GpuProcessTransportFactory does not use ui::Gpu, and we do not use GPTF when running with mus (--mus or --mash). So I am not sure we can easily share error checking code from there.
In RenderThreadImpl, all three callsites for CreateContextCommon() seem to immediately BindToCurrentThread() after creation (and are called on main-thread).

[1] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=1446
[2] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=1500
[3] https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?l=2419
The shared context is bound, the media context is bound, and the worker context is bound there. They are either used in blink or are shared between compositors and thats how we make it work.

Just watch out for the compositor context.
https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?rcl=eb728dc805a10d6a216b740bc9ece7b9443feded&l=2057
So those contexts can be left.

We'd want both ContextProviders created in RenderThreadImpl::RequestNewLayerTreeFrameSink to bound then?

The one mentioned in #21

And the one passed to the RendererWindowTreeClient: https://cs.chromium.org/chromium/src/content/renderer/render_thread_impl.cc?rcl=eb728dc805a10d6a216b740bc9ece7b9443feded&l=1963
The one mentioned in #21 should /not/ be bound on the main thread. The others mentioned by #20 all are.

The one passed to RendererWindowTreeClient is the compositor context (it's passed as it here https://cs.chromium.org/chromium/src/content/renderer/mus/renderer_window_tree_client.cc?rcl=352ac7aa0c0eebcce1ec493a6716f210a4f7d27b&l=88), so same rule as #21.
Status: Started (was: Assigned)
Thanks for those clarifications.

I've worked to address the MusContextFactory in: https://chromium-review.googlesource.com/c/577771

I'll follow up on this bug once I look into testing this more directly.
Project Member

Comment 25 by bugdroid1@chromium.org, Jul 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1902f8987f618e2f32801ee0c87d5875c2d62978

commit 1902f8987f618e2f32801ee0c87d5875c2d62978
Author: Jonathan <jonross@chromium.org>
Date: Thu Jul 20 15:05:08 2017

Update MusContextFactory ContextProvider Binding

Update MusContextFactory to force the new ContextProvider to BindToCurrentThread
immediately. This way we can detect errors in the mojo pipe and exit cleanly.

Rather than attempting to bind later on in the initialization process of the
LayerTreeFrameSink.

TEST=mash_browser_tests

Bug:  743028 
Change-Id: I3806f779360d70df5b8e974e9f49c0c5f634b465
Reviewed-on: https://chromium-review.googlesource.com/577771
Commit-Queue: Jonathan Ross <jonross@chromium.org>
Reviewed-by: Sadrul Chowdhury <sadrul@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488248}
[modify] https://crrev.com/1902f8987f618e2f32801ee0c87d5875c2d62978/ui/aura/mus/mus_context_factory.cc

sadrul@ mentioned taking a look at how GpuProcessTransportFactory tests similar functionality.

However there are no direct tests of the class. I'd be open to thoughts on creating proper unittests for MusContextFactory, or if we are satisfied with the mash_browser_tests exercising it
Status: Fixed (was: Started)
Components: -MUS Internals>Services>WindowService

Sign in to add a comment