session_manager race in EnableChromeTesting |
|||
Issue description
I've noticed what appears to be a race in session_manager's EnableChromeTesting D-Bus call. If it's called immediately after session_manager starts and launches Chrome, session_manager appears to sometimes try to send SIGKILL to Chrome, but then fail to restart it. Here's what it looks like in the logs:
----
2017-07-12T01:56:59.009552+00:00 INFO session_manager[467]: [INFO:session_manager_service.cc(142)] SessionManagerService starting
2017-07-12T01:56:59.014306+00:00 INFO session_manager[467]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-splash-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2
2017-07-12T01:56:59.018600+00:00 INFO session_manager[467]: [INFO:browser_job.cc(149)] Terminating process group: Restarting browser on-demand.
2017-07-12T01:56:59.018683+00:00 INFO session_manager[467]: [INFO:system_utils_impl.cc(110)] Sending 9 to -486 as 1000
2017-07-12T01:56:59.215108+00:00 INFO session_manager[467]: [INFO:child_exit_handler.cc(77)] Handling 484 exit.
...
2017-07-12T01:57:00.235367+00:00 INFO session_manager[467]: [INFO:server_backed_state_key_generator.cc(131)] Stable device secret missing!
[nothing more logged]
----
In contrast, here's what normally happens:
----
2017-07-12T01:56:49.087831+00:00 INFO session_manager[31528]: [INFO:session_manager_service.cc(142)] SessionManagerService starting
2017-07-12T01:56:49.092972+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl -
-gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-spla
sh-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=
/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterp
rise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_sta
te_controller=2,webui_screen_locker=2,screen_locker=2
2017-07-12T01:56:49.101119+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(149)] Terminating process group: Restarting browser on-demand.
2017-07-12T01:56:49.101327+00:00 INFO session_manager[31528]: [INFO:system_utils_impl.cc(110)] Sending 9 to -31547 as 1000
2017-07-12T01:56:49.101951+00:00 INFO session_manager[31528]: [INFO:child_exit_handler.cc(77)] Handling 31547 exit.
2017-07-12T01:56:49.102224+00:00 ERR session_manager[31528]: [ERROR:child_exit_handler.cc(85)] Exited with signal 9
2017-07-12T01:56:49.102374+00:00 INFO session_manager[31528]: [INFO:session_manager_service.cc(280)] Exiting process is chrome.
2017-07-12T01:56:49.102474+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(149)] Terminating process group: Ensuring browser processes are gone.
2017-07-12T01:56:49.102569+00:00 INFO session_manager[31528]: [INFO:system_utils_impl.cc(110)] Sending 9 to -31547 as 1000
2017-07-12T01:56:49.102981+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl -
-gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-spla
sh-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=
/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterp
rise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --remote-debugging-port=9222 --ash-disable-system-sounds --oobe-skip-postlogin --disable-gaia-services --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.USGNjm/.or
g.chromium.Chromium.Hjo9Tp --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2ยท
2017-07-12T01:56:49.300415+00:00 INFO session_manager[31528]: [INFO:child_exit_handler.cc(77)] Handling 31545 exit.
----
Note that in the non-broken case, the "Handling ___ exit." line is followed by "Exited with signal 9", after which session_manager restarts Chrome. Here's the relevant code:
void ChildExitHandler::Dispatch(const siginfo_t& info) {
// Find the manager whose child has exited.
std::vector<JobManagerInterface*>::iterator job_manager = managers_.begin();
while (job_manager != managers_.end()) {
if ((*job_manager)->IsManagedJob(info.si_pid))
break;
++job_manager;
}
if (job_manager == managers_.end()) {
DLOG(INFO) << info.si_pid << " is not a managed job.";
return;
}
LOG(INFO) << "Handling " << info.si_pid << " exit.";
if (info.si_code == CLD_EXITED) {
LOG_IF(ERROR, info.si_status != 0) << " Exited with exit code "
<< info.si_status;
CHECK(info.si_status != ChildJobInterface::kCantSetUid);
CHECK(info.si_status != ChildJobInterface::kCantSetEnv);
CHECK(info.si_status != ChildJobInterface::kCantExec);
} else {
LOG(ERROR) << " Exited with signal " << info.si_status;
}
(*job_manager)->HandleExit(info);
}
So when it fails, we're apparently getting CLD_EXITED with exit code 0, and when it works, we're getting SIGKILL. It seems like we're apparently also using some job manager other than SessionManagerService in the former case, since the "Exiting process is chrome" line would be getting logged otherwise:
void SessionManagerService::HandleExit(const siginfo_t& ignored) {
LOG(INFO) << "Exiting process is " << browser_->GetName() << ".";
,
Jul 12 2017
That old CL from Chris doesn't really say what the concern was with killing as root. So what you're saying is that killing the pgroup as the child's UID is OK, but killing the individual process EPERMs? I can't think of a reason why killing the process as root if killing the pgroup fails would be a problem, but I wonder why it was changed back then.
,
Jul 12 2017
Mattias saw the cause of the EPERMs in the review: the child calls setuid() before setsid(). When I was trying to kill the process as its desired UID, the setuid() hadn't happened yet, so it was still running as root, hence EPERM. Trying to kill the pgroup as the desired UID never results in EPERM, because if the setuid() hasn't happened yet, then the setsid() hasn't happened either, so the call fails with ESRCH instead.
,
Jul 12 2017
Ah I see, the missing piece is that the call *was failing* but with a different errno. Gotcha.
,
Jul 12 2017
Naive question: Then, how about waiting in parent process for setsid completion in child process? Will it take longer time, and block session_manager, which may not be acceptable?
,
Jul 12 2017
That was mentioned in the bug. I'm not sure if there's any way to do it without adding additional IPC or spinning in session_manager. Let me know if I'm mistaken, though.
,
Jul 12 2017
I meant "change" instead of "bug" in #6. It's at https://chromium-review.googlesource.com/c/567862/.
,
Jul 12 2017
#6, thank you for sharing it. What I imagined for the communication between the processes was something like pipe() + CLO_EXEC etc. Though, I found that's not an option in the CL.
,
Jul 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/aa62e2bf90a153132323b7a17b9c02e5c8e56b28 commit aa62e2bf90a153132323b7a17b9c02e5c8e56b28 Author: Daniel Erat <derat@chromium.org> Date: Thu Jul 13 18:44:44 2017 login: Avoid race when killing newly-forked processes. ChildJobInterface::Subprocess::KillEverything() can apparently send a signal to a forked process's process group before it's had a chance to call setsid() to create its session and process group. This is easy to trigger by calling EnableChromeTesting immediately after session_manager starts, and results in Chrome not exiting or getting restarted. Try to avoid the race by killing the child process directly if killing its process group fails. Also remove unnecessary BrowserJob null checks in SessionManagerService. BUG= chromium:741242 TEST=manual: unable to trigger the race Change-Id: I59b1d0e5e2060c17df1661847661b6e4004af9b7 Reviewed-on: https://chromium-review.googlesource.com/567862 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org> [modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/child_job.cc [modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/session_manager_service.cc [modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/session_manager_process_unittest.cc
,
Jul 24 2017
Closing this, but feel free to reopen if there's any interest in an IPC-based solution (as discussed in the review, IIRC).
,
Jan 22 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by derat@chromium.org
, Jul 12 2017Status: Started (was: Assigned)