New issue
Advanced search Search tips

Issue 867622 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Sep 14
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 868083



Sign in to add a comment

chromium-sdk: Builders showing failure due to failing clean build root

Project Member Reported by jclinton@chromium.org, Jul 25

Issue description

Follow-up from issue 860508:

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=chromiumos-sdk&buildBranch=master is all red.

Task fails to cleanup, see https://chrome-swarming.appspot.com/task?id=3edee6b484c12010&refresh=10&show_raw=1&wide_logs=true

Failed to delete /b/swarming/c/nJ. The following files remain:
...
- /b/swarming/c/nJ/repository/new-sdk-chroot/build/amd64-generic/var/tmp/portage
- /b/swarming/c/nJ/repository/new-sdk-chroot/build/arm-generic/tmp/screen
- /b/swarming/c/nJ/repository/new-sdk-chroot/build/arm-generic/var/tmp/portage
- /b/swarming/c/nJ/repository/new-sdk-chroot/tmp
- /b/swarming/c/nJ/repository/new-sdk-chroot/tmp/screen
3225 2018-07-24 00:13:11.875 E: internal failure: [Errno 1] Operation not permitted: '/b/swarming/c/nJ/repository/new-sdk-chroot/tmp/screen'

A guess is that chroot has files owned by 'root' (?), and since Swarming bot runs as 'chrome-bot', it can't remove them.

It's odd that it only failed to remove those paths. Normally, there are a lot more files in the chroot that are owned by root ...

when we run `cros_sdk --delete`, we indeed run the clean up as root.  but we also make sure to clean up any stray mounts.  if we need a hammer outside of CrOS code, this prob should work fine:
  sudo rm -rf --one-file-system <chromiumos checkout>

 
I think this is making it hard to see real failures but it's not actually impacting SDK uprev?
Labels: Foundation-Troopers
Looking at the last failure from this morning: https://chrome-swarming.appspot.com/task?id=3ee6907aee9d4910&refresh=10&show_raw=1

I logged in to swarm-cros-548 and here's a directory listing:
$ ls -l /b/swarming/c
total 84
drwxr-xr-x   3 chrome-bot chrome-bot  4096 Jul 25 08:12 CD
drwxr-xr-x 504 chrome-bot chrome-bot 65536 Jul 25 08:17 g6
drwxr-xr-x   3 chrome-bot chrome-bot  4096 Jul 19 17:27 Jp
drwxr-xr-x   2 chrome-bot chrome-bot  4096 Jul 25 15:54 named
drwxr-xr-x  12 chrome-bot chrome-bot  4096 Jul 22 05:55 qn
-rw-r--r--   1 chrome-bot chrome-bot   219 Jul 25 15:54 state.json

So there's so build directories hanging around for a few days ago. But not the one in the error messages from (the linked above) build:
Failed to delete /b/swarming/c/vE. The following files remain:
- /b/swarming/c/vE
- /b/swarming/c/vE/repository
- /b/swarming/c/vE/repository/new-sdk-chroot
- /b/swarming/c/vE/repository/new-sdk-chroot/build
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/tmp
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/tmp/screen
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var/tmp
- /b/swarming/c/vE/repository/new-sdk-chroot/build/amd64-generic/var/tmp/portage
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/tmp
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/tmp/screen
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var/tmp
- /b/swarming/c/vE/repository/new-sdk-chroot/build/arm-generic/var/tmp/portage
- /b/swarming/c/vE/repository/new-sdk-chroot/tmp
- /b/swarming/c/vE/repository/new-sdk-chroot/tmp/screen

What's deleting the vE directory if we aren't? And why doesn't whatever that is also delete the other older build roots?

Another random thought: maybe there's something funky about /tmp/screen or /var/tmp/portage? I checked perms on the stale roots, though, and I don't see anything strange on those. /tmp/screen sometimes contains unix domain sockets but that shouldn't be particularly interesting: even those in use would be rm'able.

$ getfacl CD/repository/chroot/build/daisy_spring/var/tmp/portage/
# file: CD/repository/chroot/build/daisy_spring/var/tmp/portage/
# owner: 250
# group: root
user::rwx
group::r-x
other::r-x

jclinton@swarm-cros-548:/b/swarming/c$ getfacl CD/repository/chroot/build/daisy_spring/tmp/screen/
# file: CD/repository/chroot/build/daisy_spring/tmp/screen/
# owner: root
# group: 406
user::rwx
group::rwx

Cc: mar...@chromium.org
Labels: -Foundation-Troopers
As the bug describes, it's because of the chrome-bot vs. root differentiation, which is WAI, so I'm taking out of trooper queue and cc'ing Marc-Antoine for further insights.
The bot runs a task, the bot expects to be able to delete the files created.

Interestingly, the bot already takes care of the read write bits and already tries to take ownership via passwordless sudo:
https://cs.chromium.org/chromium/infra/luci/client/utils/file_path.py?l=1108

Maybe this part doesn't have strong enough guarantees?
The question for the Troopers was not in the description, it was in #2: what deleting these directories if the Recipe isn't?

It's the bot itself, trying to clean up the task's residue.
Okay, so, the in the listing in #2, we see that there are directories hanging around from other builds that *didn't* fail in the Recipe's cleanup step. Does the bot itself not always perform the clean-up task?

Trying to understand the lifecycle here and if there's more than one bug.

Blockedon: 868083
What you are seeing is generated by the bot, not the task.

The problem is that the bot tries to cleanup the caches, not only the current cache but *all caches*, as part of the task content. That should be done outside of the task's scope. That's definitely a bug, which I'll address in  issue 868083 .

The bug here is that the bot is not able to delete the files, which is problematic. I think fs.make_tree_deleteable() needs fine tuning for the ChromeOS case.
Status: Fixed (was: Available)
The last set of builds on chromiumos-sdk builder has been gree so closing this as fixed.

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=chromiumos-sdk&buildBranch=master

Sign in to add a comment