New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 850264 link

Starred by 5 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Crash in sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags

Project Member Reported by bhthompson@google.com, Jun 6 2018

Issue description

We are seeing crashes in 68, about 4% are attributed to this:

sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags

https://crash.corp.google.com/browse?q=product_name%3D%27Chrome_ChromeOS%27+AND+product.Version%3D%2768.0.3440.4%27+AND+expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27sandbox%3A%3Asyscall_broker%3A%3ABrokerSimpleMessage%3A%3ARecvMsgWithFlags%27


0x00007bb15a4d088d	(libpthread-2.23.so + 0x0001088d )	__recvmsg_nocancel
0x00005e220863699d	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_simple_message.cc:123 )	sandbox::syscall_broker::BrokerSimpleMessage::RecvMsgWithFlags(int, int, base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>*)
0x00005e22086350b6	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_host.cc:359 )	sandbox::syscall_broker::BrokerHost::HandleRequest() const
0x00005e2208631a07	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/sandbox/linux/syscall_broker/broker_process.cc:93 )	sandbox::syscall_broker::BrokerProcess::Init(base::RepeatingCallback<bool ()> const&)
0x00005e220862ec4e	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/services/service_manager/sandbox/linux/sandbox_linux.cc:491 )	service_manager::SandboxLinux::StartBrokerProcess(std::__1::bitset<10ul> const&, std::__1::vector<sandbox::syscall_broker::BrokerFilePermission, std::__1::allocator<sandbox::syscall_broker::BrokerFilePermission> >, base::OnceCallback<bool (service_manager::SandboxLinux::Options)>, service_manager::SandboxLinux::Options const&)
0x00005e220a9245f7	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_sandbox_hook_linux.cc:287 )	content::GpuProcessPreSandboxHook(service_manager::SandboxLinux::Options)
0x00005e220862e335	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/base/callback.h:96 )	service_manager::SandboxLinux::InitializeSandbox(service_manager::SandboxType, base::OnceCallback<bool (service_manager::SandboxLinux::Options)>, service_manager::SandboxLinux::Options const&)
0x00005e220a922d06	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_main.cc:385 )	content::(anonymous namespace)::ContentSandboxHelper::EnsureSandboxInitialized(gpu::GpuWatchdogThread*, gpu::GPUInfo const*, gpu::GpuPreferences const&)
0x00005e220836a710	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/gpu/ipc/service/gpu_init.cc:297 )	gpu::GpuInit::InitializeAndStartSandbox(base::CommandLine*, gpu::GpuPreferences const&)
0x00005e220a921bbe	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/gpu/gpu_main.cc:310 )	content::GpuMain(content::MainFunctionParams const&)
0x00005e2206992de5	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/app/content_main_runner_impl.cc:648 )	content::ContentMainRunnerImpl::Run()
0x00005e220699d7e4	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/services/service_manager/embedder/main.cc:459 )	service_manager::Main(service_manager::MainParams const&)
0x00005e22042dfe47	(chrome -./../../../../../../../home/chrome-bot/chrome_root/src/content/app/content_main.cc:19 )	ChromeMain
0x00007bb1598ad735	(libc-2.23.so -libc-start.c:289 )	__libc_start_main
0x00005e22042cecc8	(chrome + 0x0038ecc8 )	_start
0x00007fff8e813357	

It looks like this file recently got added in https://chromium-review.googlesource.com/553400

Would you be the right person to dig into this Greg?
 
Yes, it appears to be crashing in here:

  char control_buffer[kControlBufferSize];
  msg.msg_control = control_buffer;
  msg.msg_controllen = sizeof(control_buffer);

  const ssize_t r = HANDLE_EINTR(recvmsg(fd, &msg, flags));
  if (r == -1)
    return -1;

Comment 2 by dgagnon@google.com, Jun 6 2018

Labels: XAct
Cc: jww@chromium.org
+jww@, did you see any of these crashes with the original CL?
Components: Internals>GPU
Is it possible it's hanging in recvmsg and some watchdog process on Chrome OS is sending it SIGABRT?
Does that mean it's possible just hung in recvmsg() and so this isn't session_manager cleaning up on logout?
something needs to have sent the SIGABRT.  generally speaking, a syscall, even with bad arguments, doesn't trigger a SIGABRT -- the kernel would simply return an errno value.
The thing I don't understand, after reading the Chrome hang detection doc, is that, "Chrome handles D-Bus messages on its UI thread, so a failure to reply indicates that the UI thread is blocked." However, this is the GPU process receiving SIGABRT, not the browser process. Doesn't the hang detector send SIGABRT to the browser process?
This is from the session manager log in a sample report:

session_manager[905]: [INFO:session_manager_impl.cc(547)] Starting user session
session_manager[905]: [INFO:browser_job.cc(165)] Terminating process: exiting cleanly
session_manager[905]: [INFO:system_utils_impl.cc(94)] Sending 15 to 993 as 1000
session_manager[905]: [WARNING:browser_job.cc(173)] Aborting child process 993's process group 3 seconds after sending signal
session_manager[905]: [INFO:browser_job.cc(157)] Terminating process group: Browser took more than 3 seconds to exit after signal.
session_manager[905]: [INFO:system_utils_impl.cc(94)] Sending 6 to -993 as 1000
session_manager[905]: [INFO:session_manager_service.cc(480)] SessionManagerService quitting run loop
session_manager[905]: [INFO:session_manager_service.cc(201)] SessionManagerService exiting
What I'm not sure of is if the problem is that the browser is stuck on the syscall broker, or if these crash reports are just a red herring as a result of something else being hung.
Status: Assigned (was: Untriaged)
The crash report is definitely session_manager sending SIGABRT to the process. The question is why is it hung there?
Any update on this? This is one of our bigger crashers in 68.
Labels: ReleaseBlock-Beta M-69
Labels: -ReleaseBlock-Beta
Labels: CrOSCodeYellow-Stability
Owner: mpdenton@chromium.org
Matt, do you have any cycles to help us look into this?
Yeah I'll take a look.
Cc: r...@chromium.org derat@chromium.org
Thanks rsesek@ from helping with this. What we know from the crash reports and logs is that the system is sending SIGABRT because it's ending the session, and the GPU process is trying to initialize itself when it receives SIGABRT.

We don't have enough information yet to know why the GPU process is trying to initialize itself at this point. Was the GPU process launching logic changed at all recently? Does it use mojo now perhaps?

derat@, can you take a look and see if you have an idea?
Cc: jamescook@chromium.org dcasta...@chromium.org spang@chromium.org
I'm (mostly?) on vacation this week. I also don't know anything about GPU process initialization, though.
I don't have anything helpful to add here, sorry.
Well, it's not the GPU process' backtrace we're looking at, it's the GPU process' broker process, correct? It's obviously supposed to sit there waiting for messages. I think Chrome OS sends a SIGABRT to the entire chrome process group, and for some reason this broker hasn't died. It _should_ die if the GPU process dies, but wouldn't we get a GPU process backtrace if it were still alive and it received a SIGABRT?

So, it seems to me that the GPU process died but this broker didn't. Apparently if the broker receives an "error" that isn't ECONNRESET from the recvmsg flag, it just continues instead of dying. That's all I can deduce so far, and maybe I'm completely wrong.
Yeah, it seems to me that this is just a problem where the GPU process' broker fails to exit when everything else does, and the session manager sees that the browser's group ID hasn't completely disappeared from the system, and then just sends a SIGABRT to the group to finish it off, giving us a "crash". Perhaps in some cases this is due to an actual crash, and in other cases it's not--it's hard to tell.
Cc: stevehuang@chromium.org krk@google.com
Hi, is there any update on this and whether it should be considered a crash signature to look out for? Thanks!
Hi, this is not an important crash itself--but it would indicate that the GPU process is hanging and would exhibit a similar crash at the exact same time, which might give us more information. Do you have any advice on how to find some stack traces for the actual GPU process?
Issue 897213 has been merged into this issue.
Issue 903866 has been merged into this issue.
Issue 904243 has been merged into this issue.
Cc: marchuk@google.com
Labels: Hotlist-Enterprise
Still seeing a lot of reports in version 70:

https://crash.corp.google.com/browse?q=expanded_custom_data.ChromeCrashProto.magic_signature_1.name%3D%27sandbox%3A%3Asyscall_broker%3A%3ABrokerSimpleMessage%3A%3ARecvMsgWithFlags%27+AND+product.Version%3D%2770.0.3538.110%27&stbtiq=&reportid=&index=0

And a bit less in 71

Have enterprise customer in support case 14813438 where they experience this crash after just watching youtube and when device freezes for a while:
https://crash.corp.google.com/browse?stbtiq=24243d4eec4c1387

Comment 31 by dtapu...@chromium.org, Jan 16 (6 days ago)

Issue 921692 has been merged into this issue.

Sign in to add a comment