Flaky startup crashes in MusThreadProxy::GpuRefreshNativeDisplays when running with --mash |
|||||||||
Issue descriptionChrome ToT r427689 OS ToT 8935 (developer build from today) On edgar, x86_64 braswell Run mash a few times, killing it with Ctrl-C on the terminal. It usually works the first time, but eventually you'll get into a state where it crashes on startup: localhost chrome # sudo -u chronos ./chrome --mash --user-data-dir=/home/chronos --homedir=/ --no-sandbox [17950:17953:1026/142448:700710153:INFO:child_process_host.cc(202)] Launched child process pid=17955, instance=tracing, name=service:tracing, user_id=d0c3bc2e-a776-4e6c-8b5f-1450c461c7dd [17950:17954:1026/142448:700716471:INFO:child_process_host.cc(202)] Launched child process pid=17956, instance=mash_session, name=service:chrome_mash, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17955:17955:1026/142448:700754733:ERROR:interface_registry.cc(107)] Failed to locate a binder for interface: tracing::mojom::Factory requested by: service:service_manager exposed by: service:tracing [17950:17954:1026/142448:700761313:INFO:child_process_host.cc(202)] Launched child process pid=17959, instance=app_driver, name=service:chrome_mash, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17950:17954:1026/142448:700766084:INFO:child_process_host.cc(202)] Launched child process pid=17960, instance=ash, name=service:chrome_mash, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17950:17953:1026/142448:700768623:INFO:child_process_host.cc(202)] Launched child process pid=17961, instance=quick_launch, name=service:chrome_mash, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17950:17954:1026/142448:700771215:INFO:child_process_host.cc(202)] Launched child process pid=17962, instance=browser, name=service:content_browser, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17955:17955:1026/142448:700858477:ERROR:interface_registry.cc(107)] Failed to locate a binder for interface: tracing::mojom::Factory requested by: service:quick_launch exposed by: service:tracing [17950:17953:1026/142448:700910376:INFO:child_process_host.cc(202)] Launched child process pid=17969, instance=font_service, name=service:chrome_mash, user_id=505C0EE9-3013-43C0-82B0-A84F50CF8D84 [17955:17955:1026/142448:700965514:ERROR:interface_registry.cc(107)] Failed to locate a binder for interface: tracing::mojom::Factory requested by: service:font_service exposed by: service:tracing [17955:17955:1026/142448:700985807:ERROR:interface_registry.cc(107)] Failed to locate a binder for interface: tracing::mojom::Factory requested by: service:ash exposed by: service:tracing [17950:17954:1026/142448:700995878:INFO:child_process_host.cc(202)] Launched child process pid=17978, instance=ui, name=service:chrome_mash, user_id=94e293f9-eff4-4b36-87a2-0fabb2dec838 [17960:17960:1026/142448:701000971:ERROR:wm_shell_mus.cc(330)] Not implemented reached in virtual void ash::mus::WmShellMus::RecordUserMetricsAction(ash::UserMetricsAction) [17960:17960:1026/142448:701003045:ERROR:shell_delegate_mus.cc(208)] Not implemented reached in virtual ash::MediaDelegate* ash::ShellDelegateMus::CreateMediaDelegate() [17960:17960:1026/142448:701003095:ERROR:shell_delegate_mus.cc(214)] Not implemented reached in virtual std::unique_ptr<ash::PaletteDelegate> ash::ShellDelegateMus::CreatePaletteDelegate() [17960:17960:1026/142448:701003158:ERROR:wm_shell_mus.cc(406)] Not implemented reached in virtual void ash::mus::WmShellMus::AddDisplayObserver(ash::WmDisplayObserver*) [17955:17955:1026/142449:701084519:ERROR:interface_registry.cc(107)] Failed to locate a binder for interface: tracing::mojom::Factory requested by: service:ui exposed by: service:tracing [17950:17953:1026/142449:701976853:INFO:child_process_host.cc(202)] Launched child process pid=18004, instance=test_ime_driver, name=service:chrome_mash, user_id=94e293f9-eff4-4b36-87a2-0fabb2dec838 Received signal 11 SEGV_MAPERR 000000000088 #0 0x7fd5c3ca3a76 base::debug::StackTrace::StackTrace() #1 0x7fd5c3ca3e70 [17978:18002:1026/142449:702046066:ERROR:vaapi_wrapper.cc(1217)] Could not get a valid VA display [17978:18002:1026/142449:702046464:ERROR:generic_v4l2_device.cc(109)] SetDevicePollInterrupt(): write() failed: Bad file descriptor base::debug::(anonymous namespace)::StackDumpSignalHandler() #2 0x7fd5c0582180 <unknown> #3 0x7fd5c3d2f8aa base::Thread::IsRunning() #4 0x7fd5c1a8b6e3 ui::MusThreadProxy::GpuRefreshNativeDisplays() #5 0x7fd5c1a7f029 ui::DrmDisplayHostManager::UpdateDisplays() #6 0x7fd5c49a72d2 ui::UpdateDisplayConfigurationTask::Run() #7 0x7fd5c499fc5c ui::DisplayConfigurator::ForceInitialConfigure() #8 0x7fd5c3769090 display::PlatformScreenOzone::Init() #9 0x7fd5c8e375c1 ui::ws::WindowManagerWindowTreeFactorySet::OnWindowManagerWindowTreeFactoryReady() #10 0x7fd5c8e5ff47 ui::ws::WindowManagerWindowTreeFactory::SetWindowTree() #11 0x7fd5c8e6003c ui::ws::WindowManagerWindowTreeFactory::CreateWindowTree() #12 0x7fd5c1b7aafc ui::mojom::WindowManagerWindowTreeFactoryStubDispatch::Accept() #13 0x7fd5c439d7fd mojo::InterfaceEndpointClient::HandleValidatedMessage() #14 0x7fd5c439b005 mojo::FilterChain::Accept() #15 0x7fd5c439d40c mojo::InterfaceEndpointClient::HandleIncomingMessage() #16 0x7fd5c43a47f3 mojo::internal::MultiplexRouter::ProcessIncomingMessage() #17 0x7fd5c43a6b85 mojo::internal::MultiplexRouter::Accept() #18 0x7fd5c439b005 mojo::FilterChain::Accept() #19 0x7fd5c43979c7 mojo::Connector::ReadSingleMessage() #20 0x7fd5c4397ae3 mojo::Connector::ReadAllAvailableMessages() #21 0x7fd5c4397d53 mojo::Connector::OnHandleReadyInternal() #22 0x7fd5c43af39b mojo::Watcher::OnHandleReady() #23 0x7fd5c3d654e6 base::debug::TaskAnnotator::RunTask() #24 0x7fd5c3cd0d82 base::MessageLoop::RunTask() #25 0x7fd5c3cd127e base::MessageLoop::DeferOrRunPendingTask() #26 0x7fd5c3cd5018 base::MessageLoop::DoWork() #27 0x7fd5c3cd5c49 base::MessagePumpLibevent::Run() #28 0x7fd5c3cd21ee base::MessageLoop::RunHandler() #29 0x7fd5c3d01f58 base::RunLoop::Run() #30 0x7fd5c17d4e7e MashRunner::StartChildApp() #31 0x7fd5c17d2725 _ZN4base8internal7InvokerINS0_9BindStateIM10MashRunnerFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJNS0_17UnretainedWrapperIS3_EEEEEFvS9_EE3RunEPNS0_13BindStateBaseEOS9_ #32 0x7fd5c2e11ecf service_manager::ChildProcessMainWithCallback() #33 0x7fd5c17d4b7a MashRunner::RunChild() #34 0x7fd5c17d4d27 MashMain() #35 0x7fd5c17cf2dd ChromeMain #36 0x7fd5bf1d1fb6 __libc_start_main #37 0x7fd5c17cf0d4 <unknown> r8: 0000000000000000 r9: 00000d61634155f0 r10: fffffffc0151f40c r11: fffff29d620762cc r12: 00007fff8cdba800 r13: 00000d61634c2b98 r14: 00000d6163558750 r15: 00007fd5c1a85360 di: 0000000000000000 si: 00007fff8cdba800 bp: 00007fff8cdba7c0 bx: 00000d61634e0a80 dx: 0000000000000000 ax: 0000000000000000 cx: 00000d616355b640 sp: 00007fff8cdba570 ip: 00007fd5c3d2f8aa efl: 0000000000010202 cgf: 0000000000000033 erf: 0000000000000004 trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000088 [end of stack trace] Rebooting seems to fix the issue. To rjkroege for triage.
,
Nov 15 2016
+a few other people I'm now blocked on this crash. It happens whenever I try to start chrome either via "start ui" or via "sudo -u chronos ./chrome --mash --user-data-dir=/home/chronos --homedir=/ --no-sandbox". I'm on edgar with a ToT r431874 build of chrome. Platform is a test image with M56 dev channel 8953.0.0 from a couple weeks ago (the most recent dev channel for this board). Linux localhost 3.18.0-13425-g95a3f1d #1 SMP PREEMPT Tue Nov 1 03:02:55 PDT 2016 x86_64 Intel(R) Celeron(R) CPU N3060 @ 1.60GHz GenuineIntel GNU/Linux #0 0x7f2e36e7adf6 base::debug::StackTrace::StackTrace() #1 0x7f2e36e7b1f0 base::debug::(anonymous namespace)::StackDumpSignalHandler() #2 0x7f2e33e68180 <unknown> #3 0x7f2e352482a5 ui::MusThreadProxy::GpuRefreshNativeDisplays() #4 0x7f2e3523ea39 [19194:19214:1114/161551:1078672322:ERROR:vaapi_wrapper.cc(1217)] Could not get a valid VA display ui::DrmDisplayHostManager::UpdateDisplays() #5 0x7f2e3788b249 ui::UpdateDisplayConfigurationTask::Run() #6 0x7f2e37885135 ui::DisplayConfigurator::ForceInitialConfigure() #7 0x7f2e36a5daec display::PlatformScreenOzone::Init() #8 0x7f2e36a50d8b ui::Service::OnStart() #9 0x7f2e37b79b1f service_manager::ServiceContext::OnStart() #10 0x7f2e35253f65 service_manager::mojom::ServiceStubDispatch::AcceptWithResponder() #11 0x7f2e370c0fef mojo::InterfaceEndpointClient::HandleValidatedMessage() #12 0x7f2e370c5331 mojo::internal::MultiplexRouter::ProcessIncomingMessage() #13 0x7f2e370c69d5 mojo::internal::MultiplexRouter::Accept() #14 0x7f2e370be017 mojo::Connector::ReadSingleMessage() #15 0x7f2e370be133 mojo::Connector::ReadAllAvailableMessages() #16 0x7f2e370cbb58 mojo::Watcher::OnHandleReady() #17 0x7f2e36f0e0c5 base::debug::TaskAnnotator::RunTask() #18 0x7f2e36e9bbef base::MessageLoop::RunTask() #19 0x7f2e36e9c07e base::MessageLoop::DeferOrRunPendingTask() #20 0x7f2e36e9e528 base::MessageLoop::DoWork() #21 0x7f2e36e9eaf9 base::MessagePumpLibevent::Run() #22 0x7f2e36e9b96a base::MessageLoop::RunHandler() #23 0x7f2e36ebe8fa base::RunLoop::Run() #24 0x7f2e34ff63bf MashRunner::StartChildApp() #25 0x7f2e34ff4cc4 _ZN4base8internal7InvokerINS0_9BindStateIM10MashRunnerFvN4mojo16InterfaceRequestIN15service_manager5mojom7ServiceEEEEJNS0_17UnretainedWrapperIS3_EEEEEFvS9_EE3RunEPNS0_13BindStateBaseEOS9_ #26 0x7f2e362c5dd4 service_manager::ChildProcessMainWithCallback() #27 0x7f2e34ff6097 MashRunner::RunChild() #28 0x7f2e34ff6227 MashMain() #29 0x7f2e34ff317d ChromeMain #30 0x7f2e32ad9fb6 __libc_start_main #31 0x7f2e34ff2f74 <unknown>
,
Nov 15 2016
When you press ctrl+c to terminate chrome, it does not always kill all the processes. Probably something like that is happening here? If an older mus/gpu process is up and running, it probably retains some lock on some system resource, and the new mus/gpu process is unable to talk to it, causing this failure? Some potentially interesting logs in the first backtrace: [17978:18002:1026/142449:702046066:ERROR:vaapi_wrapper.cc(1217)] Could not get a valid VA display [17978:18002:1026/142449:702046464:ERROR:generic_v4l2_device.cc(109)] SetDevicePollInterrupt(): write() failed: Bad file descriptor Could those be relevant?
,
Nov 15 2016
I think this could be my fault from https://codereview.chromium.org/2484223006/. Taking a look now.
,
Nov 15 2016
Reverting that CL does fix the crash. The CL doesn't revert cleanly anymore though. I'm still not getting any display output on device after reverting. It doesn't crash but DrmNativeDisplayDelegate is returning 0 displays now. I'm trying to figure out what is going on with that now.
,
Nov 15 2016
I *think* I understand the original problem and crash. We always had a race a between gpu service initialization and display initialization. Normally gpu service init would win and it wasn't a problem. Sometimes display initialization would win and MusThreadProxy::StartDrmThread() wouldn't have happened yet. Maybe the delay in gpu service initialization was caused by a lock on system resources that wasn't cleared properly, as sadrul suggested. I moved display initialization to happen earlier, causing the crash to happen every time. The right solution is to wait for GPU initialization to finish before starting display initialization, I think?
,
Nov 15 2016
kylechar@ I'm not sure that I agree. But that's probably because I'm not understanding correctly. Maybe we need some kind of design note that clarifies the start up ordering of the different threads and subsystems in musws and musgpu?
,
Nov 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7 commit 3e7dcd6974ae2895ba6cc2f38141c90338d20fc7 Author: kylechar <kylechar@chromium.org> Date: Wed Nov 16 06:24:18 2016 Fix GPU service and display initialization race. When running mustash on device with Ozone, display initialization was happening before GPU service initialization was finished. The DRM thread wasn't ready when DrmNativeDisplayDelegate went to use it. The problem was made much worse by crrev.com/2484223006 when display initialization was moved so it started earlier. With this patch, we start display initialization when it's known to be safe. BUG= 659804 Review-Url: https://codereview.chromium.org/2498073003 Cr-Commit-Position: refs/heads/master@{#432388} [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/service.cc [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/service.h [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/ws/test_utils.cc [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/ws/test_utils.h [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/ws/window_server.cc [modify] https://crrev.com/3e7dcd6974ae2895ba6cc2f38141c90338d20fc7/services/ui/ws/window_server_delegate.h
,
Nov 16 2016
This particular problem is fixed. Unfortunately mustash running with Ozone DRM still doesn't work, as DRM finds 0 displays. See crbug.com/665544 .
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
,
Feb 26 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by jamescook@chromium.org
, Oct 26 2016