New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 741242 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

session_manager race in EnableChromeTesting

Project Member Reported by derat@chromium.org, Jul 12 2017

Issue description

I've noticed what appears to be a race in session_manager's EnableChromeTesting D-Bus call. If it's called immediately after session_manager starts and launches Chrome, session_manager appears to sometimes try to send SIGKILL to Chrome, but then fail to restart it. Here's what it looks like in the logs:

----

2017-07-12T01:56:59.009552+00:00 INFO session_manager[467]: [INFO:session_manager_service.cc(142)] SessionManagerService starting
2017-07-12T01:56:59.014306+00:00 INFO session_manager[467]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl --gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-splash-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2
2017-07-12T01:56:59.018600+00:00 INFO session_manager[467]: [INFO:browser_job.cc(149)] Terminating process group: Restarting browser on-demand.
2017-07-12T01:56:59.018683+00:00 INFO session_manager[467]: [INFO:system_utils_impl.cc(110)] Sending 9 to -486 as 1000
2017-07-12T01:56:59.215108+00:00 INFO session_manager[467]: [INFO:child_exit_handler.cc(77)] Handling 484 exit.
...
2017-07-12T01:57:00.235367+00:00 INFO session_manager[467]: [INFO:server_backed_state_key_generator.cc(131)] Stable device secret missing!
[nothing more logged]

----

In contrast, here's what normally happens:

----

2017-07-12T01:56:49.087831+00:00 INFO session_manager[31528]: [INFO:session_manager_service.cc(142)] SessionManagerService starting
2017-07-12T01:56:49.092972+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl -
-gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-spla
sh-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=
/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterp
rise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_sta
te_controller=2,webui_screen_locker=2,screen_locker=2
2017-07-12T01:56:49.101119+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(149)] Terminating process group: Restarting browser on-demand.
2017-07-12T01:56:49.101327+00:00 INFO session_manager[31528]: [INFO:system_utils_impl.cc(110)] Sending 9 to -31547 as 1000 
2017-07-12T01:56:49.101951+00:00 INFO session_manager[31528]: [INFO:child_exit_handler.cc(77)] Handling 31547 exit.
2017-07-12T01:56:49.102224+00:00 ERR session_manager[31528]: [ERROR:child_exit_handler.cc(85)]   Exited with signal 9
2017-07-12T01:56:49.102374+00:00 INFO session_manager[31528]: [INFO:session_manager_service.cc(280)] Exiting process is chrome.
2017-07-12T01:56:49.102474+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(149)] Terminating process group: Ensuring browser processes are gone.
2017-07-12T01:56:49.102569+00:00 INFO session_manager[31528]: [INFO:system_utils_impl.cc(110)] Sending 9 to -31547 as 1000
2017-07-12T01:56:49.102981+00:00 INFO session_manager[31528]: [INFO:browser_job.cc(140)] Running child /opt/google/chrome/chrome --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=26.0.0.126 --ui-prioritize-in-gpu-process --use-gl=egl -
-gpu-sandbox-failures-fatal=yes --enable-logging --log-level=1 --use-cras --enable-wayland-server --user-data-dir=/home/chronos --max-unused-resource-memory-usage-percentage=5 --system-developer-mode --login-profile=user --has-chromeos-keyboard --ash-animate-from-boot-spla
sh-screen --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=
/usr/share/chromeos-assets/wallpaper/child_small.jpg --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --enable-prefixed-encrypted-media --enable-consumer-kiosk --enterp
rise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --login-manager --remote-debugging-port=9222 --ash-disable-system-sounds --oobe-skip-postlogin --disable-gaia-services --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.USGNjm/.or
g.chromium.Chromium.Hjo9Tp --vmodule=*chromeos/login/*=1,auto_enrollment_controller=1,*plugin*=2,*zygote*=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2ยท
2017-07-12T01:56:49.300415+00:00 INFO session_manager[31528]: [INFO:child_exit_handler.cc(77)] Handling 31545 exit.

----

Note that in the non-broken case, the "Handling ___ exit." line is followed by "Exited with signal 9", after which session_manager restarts Chrome. Here's the relevant code:

void ChildExitHandler::Dispatch(const siginfo_t& info) {
  // Find the manager whose child has exited.
  std::vector<JobManagerInterface*>::iterator job_manager = managers_.begin();
  while (job_manager != managers_.end()) {
    if ((*job_manager)->IsManagedJob(info.si_pid))
      break;
    ++job_manager;
  }
  if (job_manager == managers_.end()) {
    DLOG(INFO) << info.si_pid <<  " is not a managed job.";
    return;
  }

  LOG(INFO) << "Handling " << info.si_pid << " exit.";
  if (info.si_code == CLD_EXITED) {
    LOG_IF(ERROR, info.si_status != 0) << "  Exited with exit code "
                                       << info.si_status;
    CHECK(info.si_status != ChildJobInterface::kCantSetUid);
    CHECK(info.si_status != ChildJobInterface::kCantSetEnv);
    CHECK(info.si_status != ChildJobInterface::kCantExec);
  } else {
    LOG(ERROR) << "  Exited with signal " << info.si_status;
  }
  (*job_manager)->HandleExit(info);
}

So when it fails, we're apparently getting CLD_EXITED with exit code 0, and when it works, we're getting SIGKILL. It seems like we're apparently also using some job manager other than SessionManagerService in the former case, since the "Exiting process is chrome" line would be getting logged otherwise:

void SessionManagerService::HandleExit(const siginfo_t& ignored) {
  LOG(INFO) << "Exiting process is " << browser_->GetName() << ".";
 

Comment 1 by derat@chromium.org, Jul 12 2017

Cc: wad@chromium.org jorgelo@chromium.org
Status: Started (was: Assigned)
I think I found the problem. I believe that ChildJobInterface::Subprocess::KillEverything is trying to kill the forked Chrome process's process group before the process has had a chance to call setsid() to create a new session and pgroup. kill() errors are being swallowed, but when I log them, I can see that the kill to the pgroup is failing with "No such process".

I'm not sure what the right way to fix this is. I could make KillEverything loop a few times in the hopes that the pgroup will be created eventually.

If that still fails, maybe we should send the signal to the process itself. I apparently need to send the signal as root; when I try to do it as the child's UID (as we do usually), I get EPERM. Are there concerns about doing that?

The code that originally did this was https://codereview.chromium.org/3493012, but the underlying bug is gone because we hate the past... uh, because the chromium-os issue tracker got dumped. Maybe someone with more security privileges than me can find it in the chromium tracker by searching for the CL title.
That old CL from Chris doesn't really say what the concern was with killing as root.

So what you're saying is that killing the pgroup as the child's UID is OK, but killing the individual process EPERMs?

I can't think of a reason why killing the process as root if killing the pgroup fails would be a problem, but I wonder why it was changed back then.

Comment 3 by derat@chromium.org, Jul 12 2017

Mattias saw the cause of the EPERMs in the review: the child calls setuid() before setsid(). When I was trying to kill the process as its desired UID, the setuid() hadn't happened yet, so it was still running as root, hence EPERM. Trying to kill the pgroup as the desired UID never results in EPERM, because if the setuid() hasn't happened yet, then the setsid() hasn't happened either, so the call fails with ESRCH instead.
Ah I see, the missing piece is that the call *was failing* but with a different errno. Gotcha.
Naive question: Then, how about waiting in parent process for setsid completion in child process?
Will it take longer time, and block session_manager, which may not be acceptable?

Comment 6 by derat@chromium.org, Jul 12 2017

That was mentioned in the bug. I'm not sure if there's any way to do it without adding additional IPC or spinning in session_manager. Let me know if I'm mistaken, though.

Comment 7 by derat@chromium.org, Jul 12 2017

I meant "change" instead of "bug" in #6. It's at https://chromium-review.googlesource.com/c/567862/.
#6, thank you for sharing it.
What I imagined for the communication between the processes was something like pipe() + CLO_EXEC etc.
Though, I found that's not an option in the CL.
Project Member

Comment 9 by bugdroid1@chromium.org, Jul 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/aa62e2bf90a153132323b7a17b9c02e5c8e56b28

commit aa62e2bf90a153132323b7a17b9c02e5c8e56b28
Author: Daniel Erat <derat@chromium.org>
Date: Thu Jul 13 18:44:44 2017

login: Avoid race when killing newly-forked processes.

ChildJobInterface::Subprocess::KillEverything() can
apparently send a signal to a forked process's process group
before it's had a chance to call setsid() to create its
session and process group. This is easy to trigger by
calling EnableChromeTesting immediately after
session_manager starts, and results in Chrome not exiting or
getting restarted.

Try to avoid the race by killing the child process directly
if killing its process group fails. Also remove unnecessary
BrowserJob null checks in SessionManagerService.

BUG= chromium:741242 
TEST=manual: unable to trigger the race

Change-Id: I59b1d0e5e2060c17df1661847661b6e4004af9b7
Reviewed-on: https://chromium-review.googlesource.com/567862
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org>

[modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/child_job.cc
[modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/session_manager_service.cc
[modify] https://crrev.com/aa62e2bf90a153132323b7a17b9c02e5c8e56b28/login_manager/session_manager_process_unittest.cc

Comment 10 by derat@chromium.org, Jul 24 2017

Status: Fixed (was: Started)
Closing this, but feel free to reopen if there's any interest in an IPC-based solution (as discussed in the review, IIRC).

Comment 11 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment