eve and caroline informational Chrome PFQ failed cheets_ContainerMount bvt-arc HWTest occasionally |
||||||||
Issue descriptionIt started failing from https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8942668424417683312 Error message: cheets_ContainerMount FAIL: Mountpoint leaked after logout. Current mounts: ['/run/arc/oem', '/run/arc/shared_mounts', '/run/arc/oem', '/run/arc/sdcard', '/run/arc/debugfs/sync', '/run/arc/debugfs/tracing', '/run/arc/media', '/run/arc/adbd'] 06/26 06:04:49.670 WARNI|cheets_ContainerMo:0318| Failed to read mountinfo for pid 4989: [Errno 2] No such file or directory: '/proc/4989/mountinfo' 06/26 06:05:12.227 ERROR| utils:2631| Timed out waiting for unnamed condition
,
Jun 29 2018
see it again in the caroline-informational build, https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8942367709618001488 cheets_ContainerMount FAIL: Mountpoint leaked after logout. Current mounts: ['/run/arc/oem', '/run/arc/shared_mounts', '/run/arc/oem', '/run/arc/sdcard', '/run/arc/debugfs/tracing', '/run/arc/media', '/run/arc/adbd']
,
Jun 29 2018
,
Jun 29 2018
From the log of the latest caroline informational build in #2: https://stainless.corp.google.com/browse/chromeos-autotest-results/212961484-chromeos-test/, we have a large amount of android-run_oci.* files, which seems unusual to me. +Arc constables
,
Jun 30 2018
Assigning to nya@ since he's next week's non-PDT ARC constable.
,
Jul 2
(CCing cmtm@, ARC constable PST)
,
Jul 3
ARC constables, will you be able to investigate and fix this soon? It's continuing to fail on informational PFQ builders, e.g. eve-tot-chrome-pfq-informational at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8942085789674749888, and I suspect it will cause some failed nightly PFQ runs as well.
,
Jul 3
Sorry, I'm aware of this problem but I'm prioritizing issue 858902 since it's happening on CQ. I'll take a look at this problem today once I get something for the other issue.
,
Jul 3
These mount points are supposed to be unmounted by arc-setup --stop (the UnmountOnStop function). We do lazy unmounts, and we ignore the errors of those lazy unmounts. In the arc-lifetime.log file, I can see messages like: "Failed to lazy-umount /run/arc/shared_mounts: Invalid argument". Normally, when a lazy unmount is run on a non-existent directory, I'd expect to see : "Failed to stat /run/arc/shared_mounts/demo_apps: No such file or directory". Using lazy unmounts like this is something that seems ill-advised from the literature I've consulted. It's supposed to be used to not hold up system shutdown when unmounting nfs mounts. After fixing this bug, we might want to revisit our overall mount cleanup strategy.
,
Jul 3
The error message "Failed to lazy-umount /run/arc/shared_mounts: Invalid argument" also occurs in the passing test runs. It therefore likely a red herring.
,
Jul 4
The mount /run/arc/shared_mounts (and others I suspect) are actually supposed to be unmounted in "arc-setup --boot-continue", and it seems that this is done successfully. The test run doesn't report the mount in the host nor the container namespace while it's running. What seems to be the issue is that the container starts up again soon after is shuts down, but before the test finishes, so the test detects the mounts from the new containers startup. We can see this by looking at the timestamps in the android-run_oci.* files. Why the container is starting up again before the test is finished needs to be investigated.
,
Jul 4
,
Jul 6
It seems that "restart ui" is called in the "self._chrome.close()" call in the cleanup method of ArcTest in arc.py. This is called before _assert_no_leftover_mounts() in cheets_ContainerMount's cleanup function (it's called using the super() method). This causes a race between the container startup upon restart of the ui and the check for missing mounts. The mini-container didn't exist when this part of the test was developed. I'm guessing that a recent speed up in how fast a session can be restarted is exposing it now. I can't see of a trivial way of fixing the ordering. cheets_ContainerMount doesn't get to run code between the container shutting down and it starting up again. It might be worth considering getting rid of the test altogether since most of the mount points we were worried about leaking aren't a concern anymore since they only exist in the container's mount namespace, and are taken down when it is taken down.
,
Jul 9
,
Jul 9
Instead of my previous proposal, I'll add the mini-container mounts to the list of mounts that aren't considered leaked mounts.
,
Jul 10
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/autotest-cheets/+/e3a0b12812a41b3a6cea11da7542c6c913d460f5 commit e3a0b12812a41b3a6cea11da7542c6c913d460f5 Author: Chris Morin <cmtm@google.com> Date: Tue Jul 10 05:35:48 2018
,
Jul 10
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by x...@chromium.org
, Jun 27 2018