chromium-sdk: Builders showing failure due to failing clean build root |
|||||
Issue descriptionFollow-up from issue 860508: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=chromiumos-sdk&buildBranch=master is all red. Task fails to cleanup, see https://chrome-swarming.appspot.com/task?id=3edee6b484c12010&refresh=10&show_raw=1&wide_logs=true Failed to delete /b/swarming/c/nJ. The following files remain: ... - /b/swarming/c/nJ/repository/new-sdk-chroot/build/amd64-generic/var/tmp/portage - /b/swarming/c/nJ/repository/new-sdk-chroot/build/arm-generic/tmp/screen - /b/swarming/c/nJ/repository/new-sdk-chroot/build/arm-generic/var/tmp/portage - /b/swarming/c/nJ/repository/new-sdk-chroot/tmp - /b/swarming/c/nJ/repository/new-sdk-chroot/tmp/screen 3225 2018-07-24 00:13:11.875 E: internal failure: [Errno 1] Operation not permitted: '/b/swarming/c/nJ/repository/new-sdk-chroot/tmp/screen' A guess is that chroot has files owned by 'root' (?), and since Swarming bot runs as 'chrome-bot', it can't remove them. It's odd that it only failed to remove those paths. Normally, there are a lot more files in the chroot that are owned by root ... when we run `cros_sdk --delete`, we indeed run the clean up as root. but we also make sure to clean up any stray mounts. if we need a hammer outside of CrOS code, this prob should work fine: sudo rm -rf --one-file-system <chromiumos checkout>
,
Jul 25
Looking at the last failure from this morning: https://chrome-swarming.appspot.com/task?id=3ee6907aee9d4910&refresh=10&show_raw=1 I logged in to swarm-cros-548 and here's a directory listing: $ ls -l /b/swarming/c total 84 drwxr-xr-x 3 chrome-bot chrome-bot 4096 Jul 25 08:12 CD drwxr-xr-x 504 chrome-bot chrome-bot 65536 Jul 25 08:17 g6 drwxr-xr-x 3 chrome-bot chrome-bot 4096 Jul 19 17:27 Jp drwxr-xr-x 2 chrome-bot chrome-bot 4096 Jul 25 15:54 named drwxr-xr-x 12 chrome-bot chrome-bot 4096 Jul 22 05:55 qn -rw-r--r-- 1 chrome-bot chrome-bot 219 Jul 25 15:54 state.json So there's so build directories hanging around for a few days ago. But not the one in the error messages from (the linked above) build: Failed to delete /b/swarming/c/vE. The following files remain: - /b/swarming/c/vE - /b/swarming/c/vE/repository - /b/swarming/c/vE/repository/new-sdk-chroot - /b/swarming/c/vE/repository/new-sdk-chroot/build - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/tmp - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/tmp/screen - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var/tmp - /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var/tmp/portage - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/tmp - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/tmp/screen - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var/tmp - /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var/tmp/portage - /b/swarming/c/vE/repository/new-sdk-chroot/tmp - /b/swarming/c/vE/repository/new-sdk-chroot/tmp/screen What's deleting the vE directory if we aren't? And why doesn't whatever that is also delete the other older build roots?
,
Jul 25
Another random thought: maybe there's something funky about /tmp/screen or /var/tmp/portage? I checked perms on the stale roots, though, and I don't see anything strange on those. /tmp/screen sometimes contains unix domain sockets but that shouldn't be particularly interesting: even those in use would be rm'able. $ getfacl CD/repository/chroot/build/daisy_spring/var/tmp/portage/ # file: CD/repository/chroot/build/daisy_spring/var/tmp/portage/ # owner: 250 # group: root user::rwx group::r-x other::r-x jclinton@swarm-cros-548:/b/swarming/c$ getfacl CD/repository/chroot/build/daisy_spring/tmp/screen/ # file: CD/repository/chroot/build/daisy_spring/tmp/screen/ # owner: root # group: 406 user::rwx group::rwx
,
Jul 26
As the bug describes, it's because of the chrome-bot vs. root differentiation, which is WAI, so I'm taking out of trooper queue and cc'ing Marc-Antoine for further insights.
,
Jul 26
The bot runs a task, the bot expects to be able to delete the files created. Interestingly, the bot already takes care of the read write bits and already tries to take ownership via passwordless sudo: https://cs.chromium.org/chromium/infra/luci/client/utils/file_path.py?l=1108 Maybe this part doesn't have strong enough guarantees?
,
Jul 26
The question for the Troopers was not in the description, it was in #2: what deleting these directories if the Recipe isn't?
,
Jul 26
It's the bot itself, trying to clean up the task's residue.
,
Jul 26
Okay, so, the in the listing in #2, we see that there are directories hanging around from other builds that *didn't* fail in the Recipe's cleanup step. Does the bot itself not always perform the clean-up task? Trying to understand the lifecycle here and if there's more than one bug.
,
Jul 26
What you are seeing is generated by the bot, not the task. The problem is that the bot tries to cleanup the caches, not only the current cache but *all caches*, as part of the task content. That should be done outside of the task's scope. That's definitely a bug, which I'll address in issue 868083 . The bug here is that the bot is not able to delete the files, which is problematic. I think fs.make_tree_deleteable() needs fine tuning for the ChromeOS case.
,
Sep 14
The last set of builds on chromiumos-sdk builder has been gree so closing this as fixed. https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=chromiumos-sdk&buildBranch=master |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jclinton@chromium.org
, Jul 25