New issue
Advanced search Search tips

Issue 787550 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

imageloader doesn't always reap its helper process

Project Member Reported by kerrnel@chromium.org, Nov 21 2017

Issue description

When imageloader launches, it forks a helper process that runs as root. When imageloader goes into daemon mode, it uses a brillo::ProcessReaper to kill the helper process when it exists. When it's invoked in single command mode (imageloader --unmount_all, etc.), it never reaps the helper process, and leaks the process. It needs to always reap the process.
 
One way to reproduce this is to call a dbus API (e.g. LoadComponent) and then there should be a crash after imageloader_wrapper exits (after Daemon.Run).
I think that's the other issue. This isn't about a crush, this about the helper process never terminating when imageloader is not run as a daemon.
my device is kefka. sadly I can't reproduce it any more... And I'm not seeing crashes any more in autotest. 

localhost ~ # ps aux | grep imageloader                                                                                                                                                                           
root      3974  0.0  0.3  30372  7376 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      3991  0.0  0.3  30300  7420 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      4067  0.0  0.3  30300  7336 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/CorruptTestFlashComponent1/23.0.0.207 --mount_helper_fd=2
root      4077  0.0  0.3  30300  7376 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/FlashLoadedAtPath/23.0.0.207 --mount_helper_fd=2
root      4086  0.0  0.3  30300  7344 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent/24.0.0.186 --mount_helper_fd=2
root      4096  0.0  0.3  30300  7532 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      7423  0.0  0.3  30372  7668 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      7438  0.0  0.3  30300  7392 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      8489  0.0  0.3  30372  7792 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root      8504  0.0  0.3  30300  7380 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root     10227  0.0  0.3  30372  7824 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root     10243  0.0  0.3  30300  7336 ?        Ss   Nov20   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root     13024  0.0  0.3  30372  7968 ?        Ss   11:47   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root     13040  0.0  0.3  30300  7364 ?        Ss   11:47   0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2
root     13563  0.0  0.0   4368   820 pts/0    S+   11:48   0:00 grep --colour=auto imageloader

This a result of process not exiting cleanly because I used subprocess.call().
Cc: vapier@chromium.org
vapier@, it appears that when the main process does not shutdown cleanly, the helper process never exits. Do you know why that would happen? See the following code from the helper process waiting on the socket. Doesn't recvmsg(2) return an error (< 0) if the other process crashes?

  ssize_t bytes = recvmsg(fd, &msg, 0);
  if (bytes < 0) PLOG(FATAL) << "recvmsg failed";
  // Per recvmsg(2), the return value will be 0 when the peer has performed an
  // orderly shutdown.
  LOG(ERROR) << "recvmsg got bytes: " << bytes;
  if (bytes == 0) _exit(0);

Comment 6 by vapier@chromium.org, Nov 28 2017

what kind of fd is this ?  is it a UNIX socket ?  i'm not super familiar with the semantics here, but i don't think it has the same semantics as say a pipe ... if all the writers disappear, the readers don't get killed.

also, isn't it a bit weird that you're actively introducing a blocking read call to a func explicitly named "OnFileCanReadWithoutBlocking" ?
It is a UNIX socket, yes. I think the behavior here has to do with whether or not the socket undergoes an orderly shutdown.

The read shouldn't block, though...? The whole point of this function is that it gets called when that won't happen.
Project Member

Comment 8 by bugdroid1@chromium.org, Nov 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/13011038b6be1f49bad6a03beb64fffde57531fb

commit 13011038b6be1f49bad6a03beb64fffde57531fb
Author: Greg Kerr <kerrnel@chromium.org>
Date: Tue Nov 28 21:34:27 2017

platform_ImageLoader: Use utils.system() to call binary.

This fixes two crashes caused by using subprocess.call().

BUG= chromium:787550 
TEST=test_that -b ${BOARD} ${DUT_IP} platform_ImageLoaderServer

Change-Id: Idf65d5a94afa913a855d3999cbf328132f4ba256
Reviewed-on: https://chromium-review.googlesource.com/792339
Commit-Ready: Greg Kerr <kerrnel@chromium.org>
Tested-by: Greg Kerr <kerrnel@chromium.org>
Reviewed-by: Xiaochu Liu <xiaochu@chromium.org>

[modify] https://crrev.com/13011038b6be1f49bad6a03beb64fffde57531fb/client/site_tests/platform_ImageLoader/platform_ImageLoader.py

Status: Fixed (was: Started)
are we going to worry about orphaned helpers still ?  i think that autotest CL is an OK fix by itself, but doesn't exactly address this case ?
Well what else can be done with this? The helper process already terminates if recvmsg() returns an error or 0.
Status: Archived (was: Fixed)

Sign in to add a comment