With ext4crypt, autotest test job aborts at 'restart ui' |
|||||
Issue descriptionOS: Chrome OS 9233 + ext4crypt patches. What steps will reproduce the problem? (1) Log in by some regular account. (2) test_that -b $BOARD $REMOTE cheets_CTSHelper Expected: the UI restarts and the test starts up with a fake login. Instead: ... 13:34:02 ERROR| __init__ at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py:44 13:34:02 ERROR| self._cri.RestartUI(self.browser_options.clear_enterprise_policy) 13:34:02 ERROR| RestartUI at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:555 13:34:02 ERROR| self.RunCmdOnDevice(stop_cmd) 13:34:02 ERROR| RunCmdOnDevice at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:221 13:34:02 ERROR| quiet=quiet) 13:34:02 ERROR| GetAllCmdOutput at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:65 13:34:02 ERROR| stdout, stderr = p.communicate() 13:34:02 ERROR| communicate at /usr/local/lib64/python2.7/subprocess.py:799 13:34:02 ERROR| return self._communicate(input) 13:34:02 ERROR| _communicate at /usr/local/lib64/python2.7/subprocess.py:1409 13:34:02 ERROR| stdout, stderr = self._communicate_with_poll(input) 13:34:02 ERROR| _communicate_with_poll at /usr/local/lib64/python2.7/subprocess.py:1463 13:34:02 ERROR| ready = poller.poll() 13:34:02 ERROR| PrintStackAndExit at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/util/global_hooks.py:38 13:34:02 ERROR| exception_formatter.PrintFormattedFrame(stack_frame, exception_string) ... Here, stop_cmd = 'stop ui' and global_hooks.py:38 is a hook for SIGTERM signal. Hence, while running 'stop ui', for some reason the autotest script itself is being killed by SIGTERM and failing to continue running the rest. As far as I remember this was not the case for ecryptfs-based backend. (Though not sure what can make the difference...) This is probably not a blocker (since we can still run the test by manually signing out and running test_that), but it is still annoying.
,
Jan 31 2017
memo: My guess from home without any testing is that this may be because https://chromium.git.corp.google.com/chromiumos/platform2/+/b1046234bef29203d050a5de65d9f002c32eaeed/cryptohome/mount.cc#763 now we are always immediately invalidating the key, while previously we did only lazy unmount if any file is opened. It may be causing unexpected file IO error somewhere in autotest.
,
Feb 1 2017
No time to do further investigation yet, but I noticed that this is a P1 bug. Most of the tests that does Chrome sign-in are actually affected. For instance, `test_that video_ChromeRTCHWDecodeUsed` fails. The point is, nicely written tests like https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/video_ChromeRTCHWDecodeUsed/video_ChromeRTCHWDecodeUsed.py#70 cleanly log out from Chrome (by the with-statement) before finishing the test. It is `restart ui` that causes the same problem as `stop ui`. cheets_CTSHelper was only one of few tests that does not sign-out; most of the tests actually hit by this and marked as failing.
,
Feb 1 2017
,
Feb 2 2017
Hm, it may be simpler. /var/log/shutdown_force_kill_processes recorded that autotest related processes are killed by https://cs.corp.google.com/chromeos_public/src/platform2/init/killers?type=cs&q=kill_with_open_files_on&l=56 https://cs.corp.google.com/chromeos_public/src/platform2/login_manager/init/scripts/ui-post-stop?type=cs&q=kill_with_open_files_on&l=23 So they are killed exactly because they are opening files. What's not working well is, for some reason, $ lsof /home/chronos/u-* lists up processes not opening files matching /home/chronos/u-* too. autotest is killed because it is opening "/usr/local/lib64/libpython2.7.so.1.0", etc, etc.... Weird.
,
Feb 2 2017
TIL: lsof $mountOnPoint lists up all the open files on the same filesystem, so in our case all open files in /dev/sda1 are affected, including /usr/local. In the ecyptfs era, it wasn't the case because the ecryptfs is treated as a separate fs. I'll seek more to find if there's a nice way to handle the situation.
,
Feb 2 2017
Dumping out my work record today. (1) lsof has the "-D" option that does recursive directory traversal, so `lsof -D /home/chronos/u-hash` limits the output only under that path. However, this literally does slow recursive search on the whole directory tree that probably we don't want to do during shutdown. Besides, I'm not sure the files under Android /data/data can be detected by this or not. (2) lsof can also emit the file path (in addition to fd/pid), essentially by reading the symlinks /proc/$pid/fd/$fd. We may be able to filter it by whitelist/blacklist. Note that processes in side the Android container has their path inside the container. For logged-in users /home/chronos/u-* lives on /dev/sda1 (or corresponding device file on each chromebook), so, what it kills are: /dev/sda1 on /mnt/stateful_partition type ext4 (rw,nosuid,nodev,noexec,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /home type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /usr/local type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /home/chronos/user type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /home/user/45d4ae3aa864acc14aecc866544c12ba05e9e9f4 type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /home/chronos/u-45d4ae3aa864acc14aecc866544c12ba05e9e9f4 type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /home/root/45d4ae3aa864acc14aecc866544c12ba05e9e9f4 type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /opt/google/containers/android/rootfs/android-data type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /run/containers/android_eYAz38/root/cache type ext4 (rw,nodev,relatime,seclabel,commit=600,data=ordered) /dev/sda1 on /run/containers/android_eYAz38/root/data type ext4 (rw,nosuid,nodev,relatime,seclabel,commit=600,data=ordered) The paths that we don't want to kill are * /mnt/stateful_partition * /usr/local * /home and others are the things really under cryptohome.
,
Feb 10 2017
The current status. I created a local change to further filter pids by paths (/home(/root|/user/|chronos)|/data) using lsof -Fn. I'll clean it up and send for review soon.
,
Feb 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/421af1309ff12b8d8aa0d4fdc5222f4872f3e52d commit 421af1309ff12b8d8aa0d4fdc5222f4872f3e52d Author: Kazuhiro Inaba <kinaba@chromium.org> Date: Wed Feb 15 18:47:31 2017 ext4crypto: Extend kill_with_open_files_on with filtering by path. The change is necessary for let 'stop ui' not kill irrelevant processes, especially the autotest scripts on /usr/local, while trying to only kill processes accessing files on (to be unmounted) cryptohome. The previous implementation killing all the pids of `lsof -t /home/chronos/u-<hash>` worked on ecryptfs-based cryptohome, because each home directory mounted an encrypted vault directory. On ext4 crypto, /home/chronos/u-<hash> is merely a bind mount to an encyrpted directory in the ext4 filesystem on a disk device (e.g., /dev/xxx1.) The existing code killed more than expected, out of home directories. This CL adds the ability for the killer script to filter the pid by open file path, and utilize it by filtering the candidates killed by 'stop ui' with known whitelist of paths pointing inside the user data directories (/home/(chronos|user|root) and /data from inside ARC container.) BUG= chromium:686611 TEST=killers_unittest TEST=test_that suite:bvt-cq TEST=manually checked Chrome signout, with or without file-touching processes. Change-Id: I0b5de80df89d1df565bf06724a54206f2c2d7dfd Reviewed-on: https://chromium-review.googlesource.com/441429 Commit-Ready: Kazuhiro Inaba <kinaba@chromium.org> Tested-by: Kazuhiro Inaba <kinaba@chromium.org> Reviewed-by: Ryo Hashimoto <hashimoto@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/421af1309ff12b8d8aa0d4fdc5222f4872f3e52d/init/killers_unittest [modify] https://crrev.com/421af1309ff12b8d8aa0d4fdc5222f4872f3e52d/init/killers [modify] https://crrev.com/421af1309ff12b8d8aa0d4fdc5222f4872f3e52d/login_manager/init/scripts/ui-post-stop
,
Feb 15 2017
,
Apr 11 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by kinaba@chromium.org
, Jan 30 2017