imageloader doesn't always reap its helper process |
||||
Issue descriptionWhen imageloader launches, it forks a helper process that runs as root. When imageloader goes into daemon mode, it uses a brillo::ProcessReaper to kill the helper process when it exists. When it's invoked in single command mode (imageloader --unmount_all, etc.), it never reaps the helper process, and leaks the process. It needs to always reap the process.
,
Nov 21 2017
I think that's the other issue. This isn't about a crush, this about the helper process never terminating when imageloader is not run as a daemon.
,
Nov 21 2017
my device is kefka. sadly I can't reproduce it any more... And I'm not seeing crashes any more in autotest. localhost ~ # ps aux | grep imageloader root 3974 0.0 0.3 30372 7376 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 3991 0.0 0.3 30300 7420 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 4067 0.0 0.3 30300 7336 ? Ss Nov20 0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/CorruptTestFlashComponent1/23.0.0.207 --mount_helper_fd=2 root 4077 0.0 0.3 30300 7376 ? Ss Nov20 0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/FlashLoadedAtPath/23.0.0.207 --mount_helper_fd=2 root 4086 0.0 0.3 30300 7344 ? Ss Nov20 0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent/24.0.0.186 --mount_helper_fd=2 root 4096 0.0 0.3 30300 7532 ? Ss Nov20 0:00 /usr/sbin/imageloader --unmount --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 7423 0.0 0.3 30372 7668 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 7438 0.0 0.3 30300 7392 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 8489 0.0 0.3 30372 7792 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 8504 0.0 0.3 30300 7380 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 10227 0.0 0.3 30372 7824 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 10243 0.0 0.3 30300 7336 ? Ss Nov20 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 13024 0.0 0.3 30372 7968 ? Ss 11:47 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 13040 0.0 0.3 30300 7364 ? Ss 11:47 0:00 /usr/sbin/imageloader --mount --mount_component=TestFlashComponent --mount_point=/run/imageloader/TestFlashComponent_testing --mount_helper_fd=2 root 13563 0.0 0.0 4368 820 pts/0 S+ 11:48 0:00 grep --colour=auto imageloader
,
Nov 28 2017
This a result of process not exiting cleanly because I used subprocess.call().
,
Nov 28 2017
vapier@, it appears that when the main process does not shutdown cleanly, the helper process never exits. Do you know why that would happen? See the following code from the helper process waiting on the socket. Doesn't recvmsg(2) return an error (< 0) if the other process crashes? ssize_t bytes = recvmsg(fd, &msg, 0); if (bytes < 0) PLOG(FATAL) << "recvmsg failed"; // Per recvmsg(2), the return value will be 0 when the peer has performed an // orderly shutdown. LOG(ERROR) << "recvmsg got bytes: " << bytes; if (bytes == 0) _exit(0);
,
Nov 28 2017
what kind of fd is this ? is it a UNIX socket ? i'm not super familiar with the semantics here, but i don't think it has the same semantics as say a pipe ... if all the writers disappear, the readers don't get killed. also, isn't it a bit weird that you're actively introducing a blocking read call to a func explicitly named "OnFileCanReadWithoutBlocking" ?
,
Nov 28 2017
It is a UNIX socket, yes. I think the behavior here has to do with whether or not the socket undergoes an orderly shutdown. The read shouldn't block, though...? The whole point of this function is that it gets called when that won't happen.
,
Nov 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/13011038b6be1f49bad6a03beb64fffde57531fb commit 13011038b6be1f49bad6a03beb64fffde57531fb Author: Greg Kerr <kerrnel@chromium.org> Date: Tue Nov 28 21:34:27 2017 platform_ImageLoader: Use utils.system() to call binary. This fixes two crashes caused by using subprocess.call(). BUG= chromium:787550 TEST=test_that -b ${BOARD} ${DUT_IP} platform_ImageLoaderServer Change-Id: Idf65d5a94afa913a855d3999cbf328132f4ba256 Reviewed-on: https://chromium-review.googlesource.com/792339 Commit-Ready: Greg Kerr <kerrnel@chromium.org> Tested-by: Greg Kerr <kerrnel@chromium.org> Reviewed-by: Xiaochu Liu <xiaochu@chromium.org> [modify] https://crrev.com/13011038b6be1f49bad6a03beb64fffde57531fb/client/site_tests/platform_ImageLoader/platform_ImageLoader.py
,
Nov 30 2017
,
Nov 30 2017
are we going to worry about orphaned helpers still ? i think that autotest CL is an OK fix by itself, but doesn't exactly address this case ?
,
Nov 30 2017
Well what else can be done with this? The helper process already terminates if recvmsg() returns an error or 0.
,
Jul 30
|
||||
►
Sign in to add a comment |
||||
Comment 1 by xiaochu@chromium.org
, Nov 21 2017