termina: sudo fails in container due to uid changes |
|||||||||||
Issue descriptionOn affected containers, the filesystem owners/groups no longer match the container. From the VM: (termina) localhost /mnt/stateful/lxd/containers/penguin # ls -lh total 8.0K -r-------- 1 root root 4.0K Oct 10 21:27 backup.yaml -rw-r--r-- 1 root root 532 Jul 12 05:29 metadata.yaml drwxr-xr-x 1 2000000 2000000 148 Oct 10 21:49 rootfs drwxr-xr-x 1 root root 42 Jul 12 05:29 templates From the container: root@penguin:~# ls -l / total 0 drwxr-xr-x 2 nobody nogroup 40 Oct 10 21:27 ChromeOS drwxr-xr-x 1 1000000 1000000 1858 Aug 22 22:28 bin drwxr-xr-x 1 1000000 1000000 0 Feb 23 2018 boot drwxr-xr-x 9 root root 560 Oct 10 21:27 dev drwxr-xr-x 1 1000000 1000000 2614 Oct 11 00:23 etc drwxr-xr-x 1 1000000 1000000 16 Jul 17 00:43 home drwxr-xr-x 1 1000000 1000000 164 Sep 22 01:02 lib drwxr-xr-x 1 1000000 1000000 40 Jul 12 05:25 lib64 drwxr-xr-x 1 1000000 1000000 0 Jul 12 05:25 media drwxr-xr-x 1 1000000 1000000 0 Jul 12 05:25 mnt drwxr-xr-x 1 1000000 1000000 12 Jul 17 00:43 opt dr-xr-xr-x 142 nobody nogroup 0 Oct 10 21:27 proc drwx------ 1 1000000 1000000 106 Sep 22 01:36 root ... The id map has the container root still as 1000000. But the filesystem is all 2000000. This breaks programs like sudo that need to be setuid root in their namespace. root@penguin:~# ls -l /usr/bin/sudo -rwsr-xr-x 1 1000000 1000000 140944 Jun 5 2017 /usr/bin/sudo
,
Oct 12
Nevermind, there was a misunderstanding on a private chat; this bug is irrelevant to comment#1.
,
Oct 12
This looks like the container's filesystem got remapped but the container didn't get stopped and started. What does cat /proc/self/uid_map cat /proc/self/gid_map show on such a container?
,
Oct 12
Thanks Christian, output below.
[smbarber@penguin:~]
% cat /proc/self/uid_map
0 1000000 1000
1000 1000 1
1001 1001001 999998999
[smbarber@penguin:~]
% cat /proc/self/gid_map
0 1000000 1000
1000 1000 1
1001 1001001 999998999
,
Nov 22
On version 71.0.3578.49 (Official Build) beta $ sudo apt-get install git-email sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set $ ls -al /usr/bin/sudo -rwsr-xr-x 1 1000000 1000000 140944 Jun 5 2017 /usr/bin/sudo You partly broke my Linux container, I'm sad ;-)
,
Nov 28
I'm also experiencing this. [nelhage@penguin:~]$ ls -l /var/ total 8.0K drwxr-xr-x 1 nobody nogroup 330 Nov 17 15:33 backups/ drwxr-xr-x 1 nobody nogroup 120 Oct 1 16:14 cache/ drwxr-xr-x 1 nobody nogroup 14 Sep 25 20:48 games/ drwxr-xr-x 1 nobody nogroup 330 Oct 1 16:14 lib/ drwxrwsr-x 1 nobody nogroup 0 Jun 26 12:03 local/ lrwxrwxrwx 1 nobody nogroup 9 Sep 25 02:17 lock -> /run/lock/ drwxr-xr-x 1 nobody nogroup 190 Sep 8 00:16 log/ drwxrwsr-x 1 nobody nogroup 0 Sep 7 05:25 mail/ drwxr-xr-x 1 nobody nogroup 0 Sep 7 05:25 opt/ lrwxrwxrwx 1 nobody nogroup 4 Sep 25 02:17 run -> /run/ drwxr-xr-x 1 nobody nogroup 8 Sep 7 05:25 spool/ drwxrwxrwt 1 nobody nogroup 3.4K Nov 28 03:23 tmp/ [nelhage@penguin:~]$ sudo chown root /var/lib/ chown: changing ownership of '/var/lib/': Operation not permitted [nelhage@penguin:~]$ stat /var/ File: /var/ Size: 100 Blocks: 0 IO Block: 4096 directory Device: 2ah/42d Inode: 40912 Links: 1 Access: (0755/drwxr-xr-x) Uid: (65534/ nobody) Gid: (65534/ nogroup) Access: 2018-11-27 16:06:08.458969135 +0000 Modify: 2018-09-25 20:48:48.295169614 +0000 Change: 2018-11-17 20:47:43.375911162 +0000 Birth: - I exported the VM using `vmc export` and mounted it on another machine, and can confirm that `lxd/storage_pools/default/containers/penguin/rootfs/var` is now owned by uid=0 instead of uid=1000000 like the other directories there: # ls -l /mnt/lxd/storage-pools/default/containers/penguin/rootfs/ total 0 drwxr-xr-x 1 1000000 1000000 1418 Oct 1 08:59 bin drwxr-xr-x 1 1000000 1000000 0 Jun 26 05:03 boot drwxr-xr-x 1 1000000 1000000 0 Nov 17 12:54 ChromeOS drwxr-xr-x 1 1000000 1000000 114 Sep 6 22:25 dev drwxr-xr-x 1 1000000 1000000 2472 Nov 27 18:38 etc drwxr-xr-x 1 1000000 1000000 14 Sep 24 19:19 home drwxr-xr-x 1 1000000 1000000 166 Oct 1 09:00 lib drwxr-xr-x 1 1000000 1000000 1304 Sep 24 19:28 lib32 drwxr-xr-x 1 1000000 1000000 40 Sep 6 22:25 lib64 drwxr-xr-x 1 1000000 1000000 0 Sep 6 22:25 media drwxr-xr-x 1 1000000 1000000 0 Sep 6 22:25 mnt drwxr-xr-x 1 1000000 1000000 12 Sep 24 19:19 opt drwxr-xr-x 1 1000000 1000000 0 Jun 26 05:03 proc drwx------ 1 1000000 1000000 84 Oct 1 09:07 root drwxr-xr-x 1 1000000 1000000 0 Sep 6 22:25 run drwxr-xr-x 1 1000000 1000000 1560 Oct 1 08:59 sbin drwxr-xr-x 1 1000000 1000000 0 Sep 6 22:25 srv drwxr-xr-x 1 1000000 1000000 0 Jun 26 05:03 sys drwxrwxrwt 1 1000000 1000000 462 Nov 27 18:41 tmp drwxr-xr-x 1 1000000 1000000 90 Sep 24 19:28 usr drwxr-xr-x 1 root root 100 Sep 25 13:48 var # ls -l /mnt/lxd/storage-pools/default/containers/penguin/rootfs/var/ total 8 drwxr-xr-x 1 root root 330 Nov 17 07:33 backups drwxr-xr-x 1 root root 120 Oct 1 09:14 cache drwxr-xr-x 1 root root 14 Sep 25 13:48 games drwxr-xr-x 1 root root 330 Oct 1 09:14 lib drwxrwsr-x 1 root staff 0 Jun 26 05:03 local lrwxrwxrwx 1 root root 9 Sep 24 19:17 lock -> /run/lock drwxr-xr-x 1 root root 190 Sep 7 17:16 log drwxrwsr-x 1 root mail 0 Sep 6 22:25 mail drwxr-xr-x 1 root root 0 Sep 6 22:25 opt lrwxrwxrwx 1 root root 4 Sep 24 19:17 run -> /run drwxr-xr-x 1 root root 8 Sep 6 22:25 spool drwxrwxrwt 1 root root 3120 Nov 27 08:34 tmp
,
Nov 28
Oh, hm, reading the OP's post more carefully my issue may be distinct. They're seeing a different ID mismatch issue.
,
Nov 29
re #5 we'd love to get our hands on your disk image if you still have it, Vincent :)
re #6 that's extremely helpful actually! Your post helps confirm the theory we have as to what might be happening.
We added a raw idmap ("both 1000 1000") so we can share files with the host over 9P. This went live in R71. Any change to the idmap will necessitate remapping the entire container filesystem [1]. And this can take a long time, which could cause the host to think the container startup timed out, so we'd end up killing it.
volatile.last_state.idmap is only supposed to be reset after the full unshift/shift finishes, but the filesystem in #6 definitely looks like it was interrupted as the shift was finishing in /var.
[1]: https://github.com/lxc/lxd/blob/lxd-3.0.2/lxd/container_lxc.go#L1952
,
Nov 29
Awesome, glad to hear it. If there's anything else I can inspect in the image to debug or further confirm the hypothesis, I'm happy to do so.
,
Dec 2
Just a quick note that we're working on a kernel solution for this moving forward, either by adding a bunch of new properties to the VFS or by introducing a tiny shifting filesystem (shiftfs). We have most of the shiftfs implementation ready to go and expect to roll it out in Ubuntu next release. With it in place, all files will remain unshifted on the filesystem and will only get shifted for the userns through shiftfs. This will seriously decrease initial startup time by no longer requiring shifting and will also avoid such issues where shifting is interrupted halfway through (which are notoriously hard/impossible to recover from).
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tremplin/+/046cf6881a33a4e12aa2bb106d1f68518218a6eb commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb Author: Stephen Barber <smbarber@chromium.org> Date: Tue Dec 04 08:11:35 2018 tremplin: leave raw.idmap alone on existing containers BUG= chromium:894299 TEST=manually flip raw.idmap and restart VM; check mapping is not reset Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4 Reviewed-on: https://chromium-review.googlesource.com/1359860 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/046cf6881a33a4e12aa2bb106d1f68518218a6eb/src/chromiumos/tremplin/main.go
,
Dec 4
re #10 that will be great :) we're happy to help out with the kernel solution as well. We'll track supporting this in issue 911372. In the meantime, we need to avoid these shifts until we have some UI for it. We'd also possibly time out during container startup for large containers, which could leave containers in an intermediate state. This will be tracked in issue 911333. We will repurpose this bug to track disabling uid remapping in 71 and 72. Requesting merge for #11 to 71 and 72. I've verified this on ToT, but needed to revert an unrelated LXD 3.0.3 uprev (issue 910806). That won't be needed on 71 and 72 which are still running LXD 3.0.2.
,
Dec 4
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 5
Hi, we're only approving emergency merges at this point. Does this qualify? Also, can you give me a bit more context as to the nature of the issue and impact if we don't merge? Thanks
,
Dec 5
For Crostini users currently on stable channel, there is a risk that without this change an update to 71+ would cause their container to be left in a broken state. The probability of a broken container is higher for users with more files.
,
Dec 5
@kbleicher this is a P0 for Crostini, users are ending up in a broken state and may lose data without this fix. We want to prevent it from reaching stable.
,
Dec 5
Your change meets the bar and is auto-approved for M72. Please go ahead and merge the CL to branch 3626 manually. Please contact milestone owner if you have questions. Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 5
Per #16, is this a M71 regression, has the fix been tested? Assume this is limited to the CL in #11? Any potential impact outside of crostini?
,
Dec 5
Yes, this is a regression in M71 and I've tested the fix in #11 on both updated containers (from 69/70->71) and fresh containers. Impact is limited to crostini only - the fix applies to the crostini VM and won't affect the rest of CrOS.
,
Dec 6
Thanks for #19. Approving merge to M71 Chrome OS.
,
Dec 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tremplin/+/1dcf36e886716c7278ca9464c6bf84fa56ae0597 commit 1dcf36e886716c7278ca9464c6bf84fa56ae0597 Author: Stephen Barber <smbarber@chromium.org> Date: Fri Dec 07 00:43:04 2018 tremplin: leave raw.idmap alone on existing containers BUG= chromium:894299 TEST=manually flip raw.idmap and restart VM; check mapping is not reset Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4 Reviewed-on: https://chromium-review.googlesource.com/1359860 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> (cherry picked from commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb) Reviewed-on: https://chromium-review.googlesource.com/c/1366498 Commit-Queue: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/1dcf36e886716c7278ca9464c6bf84fa56ae0597/src/chromiumos/tremplin/main.go
,
Dec 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tremplin/+/133cd299968a04a47c6e71287ea1a93d1e05fa1b commit 133cd299968a04a47c6e71287ea1a93d1e05fa1b Author: Stephen Barber <smbarber@chromium.org> Date: Fri Dec 07 00:43:06 2018 tremplin: leave raw.idmap alone on existing containers BUG= chromium:894299 TEST=manually flip raw.idmap and restart VM; check mapping is not reset Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4 Reviewed-on: https://chromium-review.googlesource.com/1359860 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Stephen Barber <smbarber@chromium.org> (cherry picked from commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb) Reviewed-on: https://chromium-review.googlesource.com/c/1366387 Commit-Queue: Stephen Barber <smbarber@chromium.org> [modify] https://crrev.com/133cd299968a04a47c6e71287ea1a93d1e05fa1b/src/chromiumos/tremplin/main.go
,
Dec 7
Merged and new VMs built and set up for testing. 71 - 11151.53.0 72 - 11316.7.0
,
Dec 7
New components are pushed and verified on 71 - 11151.54.0 72 - 11316.9.0 |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by mutexlox@chromium.org
, Oct 12