Crash in sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags |
||||||||||||||
Issue descriptionWe are seeing crashes in 68, about 4% are attributed to this: sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_ChromeOS%27+AND+product.Version%3D%2768.0.3440.4%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27sandbox%3A%3Asyscall_broker%3A%3ABrokerSimpleMessage%3A%3ARecvMsgWithFlags%27 0x00007bb15a4d088d (libpthread-2.23.so + 0x0001088d ) __recvmsg_nocancel 0x00005e220863699d (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_simple_message.cc:123 ) sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags(int, int, base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>*) 0x00005e22086350b6 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_host.cc:359 ) sandbox::syscall_broker::BrokerHost::HandleRequest() const 0x00005e2208631a07 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_process.cc:93 ) sandbox::syscall_broker::BrokerProcess::Init(base::RepeatingCallback<bool ()> const&) 0x00005e220862ec4e (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/services/service_manager/sandbox/linux/sandbox_linux.cc:491 ) service_manager::SandboxLinux::StartBrokerProcess(std::__1::bitset<10ul> const&, std::__1::vector<sandbox::syscall_broker::BrokerFilePermission, std::__1::allocator<sandbox::syscall_broker::BrokerFilePermission> >, base::OnceCallback<bool (service_manager::SandboxLinux::Options)>, service_manager::SandboxLinux::Options const&) 0x00005e220a9245f7 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_sandbox_hook_linux.cc:287 ) content::GpuProcessPreSandboxHook(service_manager::SandboxLinux::Options) 0x00005e220862e335 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/base/callback.h:96 ) service_manager::SandboxLinux::InitializeSandbox(service_manager::SandboxType, base::OnceCallback<bool (service_manager::SandboxLinux::Options)>, service_manager::SandboxLinux::Options const&) 0x00005e220a922d06 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_main.cc:385 ) content::(anonymous namespace)::ContentSandboxHelper::EnsureSandboxInitialized(gpu::GpuWatchdogThread*, gpu::GPUInfo const*, gpu::GpuPreferences const&) 0x00005e220836a710 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/gpu/ipc/service/gpu_init.cc:297 ) gpu::GpuInit::InitializeAndStartSandbox(base::CommandLine*, gpu::GpuPreferences const&) 0x00005e220a921bbe (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_main.cc:310 ) content::GpuMain(content::MainFunctionParams const&) 0x00005e2206992de5 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/app/content_main_runner_impl.cc:648 ) content::ContentMainRunnerImpl::Run() 0x00005e220699d7e4 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/services/service_manager/embedder/main.cc:459 ) service_manager::Main(service_manager::MainParams const&) 0x00005e22042dfe47 (chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/app/content_main.cc:19 ) ChromeMain 0x00007bb1598ad735 (libc-2.23.so -libc-start.c:289 ) __libc_start_main 0x00005e22042cecc8 (chrome + 0x0038ecc8 ) _start 0x00007fff8e813357 It looks like this file recently got added in https://chromium-review.googlesource.com/553400 Would you be the right person to dig into this Greg?
,
Jun 6 2018
,
Jun 6 2018
+jww@, did you see any of these crashes with the original CL?
,
Jun 6 2018
Is it possible it's hanging in recvmsg and some watchdog process on Chrome OS is sending it SIGABRT?
,
Jun 7 2018
+vapier@, this could just be the session_manager sending the browser process SIGABRT on logout: https://www.google.com/url?q=https://bugs.chromium.org/p/chromium/issues/detail?id%3D284601%23c23&sa=D&source=hangouts&ust=1528415388397000&usg=AFQjCNFyQnZK8RGffIcfuFw5_hYjEeY-LA https://www.google.com/url?q=https://chromium.googlesource.com/chromiumos/platform2/%2B/master/login_manager/README.md%23logout&sa=D&source=hangouts&ust=1528415379490000&usg=AFQjCNF4_7bWZ0PKhytiMSrlI05oTQKyAg
,
Jun 7 2018
there's a bunch more details here to help with debugging: https://chromium.googlesource.com/chromiumos/platform2/+/master/login_manager/docs/chrome_hang_detection.md
,
Jun 7 2018
Does that mean it's possible just hung in recvmsg() and so this isn't session_manager cleaning up on logout?
,
Jun 7 2018
something needs to have sent the SIGABRT. generally speaking, a syscall, even with bad arguments, doesn't trigger a SIGABRT -- the kernel would simply return an errno value.
,
Jun 7 2018
The thing I don't understand, after reading the Chrome hang detection doc, is that, "Chrome handles D-Bus messages on its UI thread, so a failure to reply indicates that the UI thread is blocked." However, this is the GPU process receiving SIGABRT, not the browser process. Doesn't the hang detector send SIGABRT to the browser process?
,
Jun 7 2018
This is from the session manager log in a sample report: session_manager[905]: [INFO:session_manager_impl.cc(547)] Starting user session session_manager[905]: [INFO:browser_job.cc(165)] Terminating process: exiting cleanly session_manager[905]: [INFO:system_utils_impl.cc(94)] Sending 15 to 993 as 1000 session_manager[905]: [WARNING:browser_job.cc(173)] Aborting child process 993's process group 3 seconds after sending signal session_manager[905]: [INFO:browser_job.cc(157)] Terminating process group: Browser took more than 3 seconds to exit after signal. session_manager[905]: [INFO:system_utils_impl.cc(94)] Sending 6 to -993 as 1000 session_manager[905]: [INFO:session_manager_service.cc(480)] SessionManagerService quitting run loop session_manager[905]: [INFO:session_manager_service.cc(201)] SessionManagerService exiting
,
Jun 7 2018
What I'm not sure of is if the problem is that the browser is stuck on the syscall broker, or if these crash reports are just a red herring as a result of something else being hung.
,
Jun 8 2018
,
Jun 15 2018
The crash report is definitely session_manager sending SIGABRT to the process. The question is why is it hung there?
,
Aug 6
Any update on this? This is one of our bigger crashers in 68.
,
Aug 27
,
Aug 27
,
Aug 27
,
Aug 28
Matt, do you have any cycles to help us look into this?
,
Aug 28
Yeah I'll take a look.
,
Aug 28
Thanks rsesek@ from helping with this. What we know from the crash reports and logs is that the system is sending SIGABRT because it's ending the session, and the GPU process is trying to initialize itself when it receives SIGABRT. We don't have enough information yet to know why the GPU process is trying to initialize itself at this point. Was the GPU process launching logic changed at all recently? Does it use mojo now perhaps? derat@, can you take a look and see if you have an idea?
,
Aug 28
I'm (mostly?) on vacation this week. I also don't know anything about GPU process initialization, though.
,
Aug 28
I don't have anything helpful to add here, sorry.
,
Aug 28
Well, it's not the GPU process' backtrace we're looking at, it's the GPU process' broker process, correct? It's obviously supposed to sit there waiting for messages. I think Chrome OS sends a SIGABRT to the entire chrome process group, and for some reason this broker hasn't died. It _should_ die if the GPU process dies, but wouldn't we get a GPU process backtrace if it were still alive and it received a SIGABRT? So, it seems to me that the GPU process died but this broker didn't. Apparently if the broker receives an "error" that isn't ECONNRESET from the recvmsg flag, it just continues instead of dying. That's all I can deduce so far, and maybe I'm completely wrong.
,
Aug 28
Yeah, it seems to me that this is just a problem where the GPU process' broker fails to exit when everything else does, and the session manager sees that the browser's group ID hasn't completely disappeared from the system, and then just sends a SIGABRT to the group to finish it off, giving us a "crash". Perhaps in some cases this is due to an actual crash, and in other cases it's not--it's hard to tell.
,
Sep 18
Hi, is there any update on this and whether it should be considered a crash signature to look out for? Thanks!
,
Sep 19
Hi, this is not an important crash itself--but it would indicate that the GPU process is hanging and would exhibit a similar crash at the exact same time, which might give us more information. Do you have any advice on how to find some stack traces for the actual GPU process?
,
Oct 19
Issue 897213 has been merged into this issue.
,
Nov 9
Issue 903866 has been merged into this issue.
,
Nov 12
Issue 904243 has been merged into this issue.
,
Dec 22
Still seeing a lot of reports in version 70: https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27sandbox%3A%3Asyscall_broker%3A%3ABrokerSimpleMessage%3A%3ARecvMsgWithFlags%27+AND+product.Version%3D%2770.0.3538.110%27&stbtiq=&reportid=&index=0 And a bit less in 71 Have enterprise customer in support case 14813438 where they experience this crash after just watching youtube and when device freezes for a while: https://crash.corp.google.com/browse?stbtiq=24243d4eec4c1387
,
Jan 16
(6 days ago)
Issue 921692 has been merged into this issue. |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by kerrnel@chromium.org
, Jun 6 2018Yes, it appears to be crashing in here: char control_buffer[kControlBufferSize]; msg.msg_control = control_buffer; msg.msg_controllen = sizeof(control_buffer); const ssize_t r = HANDLE_EINTR(recvmsg(fd, &msg, flags)); if (r == -1) return -1;