moblab-vm: test_that continues even after timeout |
||||
Issue descriptionSee example build: https://uberchromegw.corp.google.com/i/chromiumos/builders/moblab-generic-vm-paladin/builds/458 The test timed out here for some reason (irrelevant for this bug) So MoblabVMTest timed out and the RunCommand that was waiting for test_that was aborted. MoblabVMTest then went on to stop the KVM instance running the DUT. But the test_that invocation continued to run, and was stuck for ~3 minutes because it was trying to SSH into the non existent KVM DUT (brought down by MoblabVMTest already). When RunCommand aborts, test_that should have aborted.
,
Jan 3 2018
The bug is in the implementation of lib/namespaces.py:CreatePidNs() This forks twice to contain a process tree inside a pid namespace, but the forking processes fail to forward SIGTERM (and SIGINT) to their children.
,
Jan 3 2018
,
Jan 3 2018
Note that the behaviour wrt SIGINT is even weirder: If you SIGINT the launched process, it is simply ignored. CreatePidNs explicitly ignores SIGINT claiming that we trust the child process to handle SIGINT. This is fine if SIGINT is sent to the whole process group (say when you hit Ctrl-C from the controlling terminal), but if SIGINT is sent to the parent process, it'll be a no-op
pprabhu@pprabhu:trace$ pstree -a -p 38304
sudo,38304 SSH_AGENT_PID=10941 SSH_AUTH_SOCK=/tmp/ssh-faa0hbwJYzrK/agent.10847 TMUX=/tmp/tmux-193821/default,26303,0 CROS_CACHEDIR=/work/chromiumos/.cache DEPOT_TOOLS=/work/chromiumos/chromium/tools/depot_tools -- /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38307 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38318 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38323 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─bash,38327 -c sleep 30; echo done
└─sleep,38590 30
pprabhu@pprabhu:trace$ sudo kill -SIGINT 38304
pprabhu@pprabhu:trace$
pprabhu@pprabhu:trace$
pprabhu@pprabhu:trace$ jobs -p
38304
pprabhu@pprabhu:trace$ pstree -a -p 38304
sudo,38304 SSH_AGENT_PID=10941 SSH_AUTH_SOCK=/tmp/ssh-faa0hbwJYzrK/agent.10847 TMUX=/tmp/tmux-193821/default,26303,0 CROS_CACHEDIR=/work/chromiumos/.cache DEPOT_TOOLS=/work/chromiumos/chromium/tools/depot_tools -- /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38307 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38318 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─python2,38323 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done
└─bash,38327 -c sleep 30; echo done
└─sleep,38590 30
----------------
This doesn't affect things in practice since RunCommand doesn't use SIGINT (SIGTERM, followed by SIGKILL).
,
Jan 3 2018
Lots of history about why this is the way it is: issue 412484 My take is that ignoring SIGINT only went partway -- Ctrl-C now works because the terminal sends SIGINT to the whole process group, but SIGINT/SIGTERM to the parent process still does the wrong thing. This is especially a problem for cros_sdk as I noted above when run via RunCommand, which SIGTERM's the parent processes. I'll have CreatePidNs forward SIGINT/SIGTERM to the grandchild. The behaviour is still wrong for SIGKILL, where all bets are off.
,
Jan 3 2018
With https://chromium-review.googlesource.com/c/chromiumos/chromite/+/849535 child exits on SIGTERM as expected: pprabhu@pprabhu:trace$ cros_sdk bash -c "sleep 30; echo done" & [1] 74235 pprabhu@pprabhu:trace$ pstree -a -p 74235 sudo,74235 SSH_AGENT_PID=10941 SSH_AUTH_SOCK=/tmp/ssh-faa0hbwJYzrK/agent.10847 TMUX=/tmp/tmux-193821/default,26303,0 CROS_CACHEDIR=/work/chromiumos/.cache DEPOT_TOOLS=/work/chromiumos/chromium/tools/depot_tools -- /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,74239 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,74250 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,74255 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─bash,74259 -c sleep 30; echo done └─sleep,74524 30 pprabhu@pprabhu:trace$ sudo kill -SIGINT 74235 pprabhu@pprabhu:trace$ Traceback (most recent call last): File "/work/chromiumos/chromite/bin/cros_sdk", line 169, in <module> DoMain() File "/work/chromiumos/chromite/bin/cros_sdk", line 165, in DoMain commandline.ScriptWrapperMain(FindTarget) File "/work/chromiumos/chromite/lib/commandline.py", line 911, in ScriptWrapperMain ret = target(argv[1:]) File "/work/chromiumos/chromite/scripts/cros_sdk.py", line 1163, in main chroot_command) File "/work/chromiumos/chromite/scripts/cros_sdk.py", line 274, in EnterChroot mute_output=False) File "/work/chromiumos/chromite/lib/cros_build_lib.py", line 622, in RunCommand (cmd_result.output, cmd_result.error) = proc.communicate(input) File "/usr/lib/python2.7/subprocess.py", line 796, in communicate self.wait() File "/usr/lib/python2.7/subprocess.py", line 1376, in wait pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0) File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call return func(*args) File "/work/chromiumos/chromite/lib/cros_build_lib.py", line 369, in _KillChildProcess if not signals.RelaySignal(original_handler, signum, frame): File "/work/chromiumos/chromite/lib/signals.py", line 27, in RelaySignal handler(signum, frame) KeyboardInterrupt [1]+ Exit 1 cros_sdk bash -c "sleep 30; echo done" pprabhu@pprabhu:trace$
,
Jan 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/7928b3e6193e8d6b380fcffee055121e31f97289 commit 7928b3e6193e8d6b380fcffee055121e31f97289 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Tue Jan 09 06:42:11 2018 Forward signals to processes inside pid namespace. This improves somewhat over the existing behaviour. Now, SIGINT (not sent via Ctrl-C, but only to the launched process) and SIGTERM are forwarded to the actual process running inside the pid namespace. But, - SIGKILL will still only kill the waiting external-init process, and the actual processes inside the namespace will continue to live. This improves the situation particularly when cbuildbot runs any of the build commands via cros_sdk. That results in a RunCommand call to cros_sdk. If the outer RunCommand is aborted (due to timeout), it sends SIGTERM to the cros_sdk process. BUG= chromium:798784 TEST=cros_sdk bash -c "sleep 30; echo done" & # Then SIGTERM the process created, the spawned bash process # actually exits immediately. Change-Id: Idf41098d0225733edb1ef7a8518cf7c629192dd4 Reviewed-on: https://chromium-review.googlesource.com/849535 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> [modify] https://crrev.com/7928b3e6193e8d6b380fcffee055121e31f97289/lib/namespaces.py
,
Jan 9 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Jan 3 2018The problem here is that the implementation of cros_sdk involves one or more process forks that don't properly forward SIGTERM to their children. A simple repro setup is to run a command via cros_sdk and kill the parent process. The child process continues: pprabhu@pprabhu:trace$ cros_sdk bash -c "sleep 30; echo done" & [1] 29926 pprabhu@pprabhu:trace$ pstree -a -p 29926 sudo,29926 SSH_AGENT_PID=10941 SSH_AUTH_SOCK=/tmp/ssh-faa0hbwJYzrK/agent.10847 TMUX=/tmp/tmux-193821/default,26303,0 CROS_CACHEDIR=/work/chromiumos/.cache... └─python2,29934 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,29947 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,29954 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─bash,29958 -c sleep 30; echo done └─sleep,30227 30 pprabhu@pprabhu:trace$ sudo kill -SIGTERM 29926 cros_sdk: Signaled to shutdown: caught 15 signal. pprabhu@pprabhu:trace$ pstree -a -p 29934 [1]+ Exit 1 cros_sdk bash -c "sleep 30; echo done" pprabhu@pprabhu:trace$ pstree -a -p 29947 python2,29947 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─python2,29954 /work/chromiumos/chromite/bin/cros_sdk bash -c sleep 30; echo done └─bash,29958 -c sleep 30; echo done └─sleep,30227 30 pprabhu@pprabhu:trace$ done