New issue
Advanced search Search tips

Issue 894299 link

Starred by 14 users

Issue metadata

Status: Verified
Owner:
Closed: Dec 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug


Show other hotlists

Hotlists containing this issue:
LXD


Sign in to add a comment

termina: sudo fails in container due to uid changes

Project Member Reported by smbar...@chromium.org, Oct 11

Issue description

On affected containers, the filesystem owners/groups no longer match the container.

From the VM:
(termina) localhost /mnt/stateful/lxd/containers/penguin # ls -lh
total 8.0K
-r-------- 1 root    root    4.0K Oct 10 21:27 backup.yaml
-rw-r--r-- 1 root    root     532 Jul 12 05:29 metadata.yaml
drwxr-xr-x 1 2000000 2000000  148 Oct 10 21:49 rootfs
drwxr-xr-x 1 root    root      42 Jul 12 05:29 templates

From the container:
root@penguin:~# ls -l /
total 0
drwxr-xr-x   2 nobody  nogroup   40 Oct 10 21:27 ChromeOS
drwxr-xr-x   1 1000000 1000000 1858 Aug 22 22:28 bin
drwxr-xr-x   1 1000000 1000000    0 Feb 23  2018 boot
drwxr-xr-x   9 root    root     560 Oct 10 21:27 dev
drwxr-xr-x   1 1000000 1000000 2614 Oct 11 00:23 etc
drwxr-xr-x   1 1000000 1000000   16 Jul 17 00:43 home
drwxr-xr-x   1 1000000 1000000  164 Sep 22 01:02 lib
drwxr-xr-x   1 1000000 1000000   40 Jul 12 05:25 lib64
drwxr-xr-x   1 1000000 1000000    0 Jul 12 05:25 media
drwxr-xr-x   1 1000000 1000000    0 Jul 12 05:25 mnt
drwxr-xr-x   1 1000000 1000000   12 Jul 17 00:43 opt
dr-xr-xr-x 142 nobody  nogroup    0 Oct 10 21:27 proc
drwx------   1 1000000 1000000  106 Sep 22 01:36 root
...

The id map has the container root still as 1000000. But the filesystem is all 2000000. This breaks programs like sudo that need to be setuid root in their namespace.

root@penguin:~# ls -l /usr/bin/sudo
-rwsr-xr-x 1 1000000 1000000 140944 Jun  5  2017 /usr/bin/sudo
 
Cc: mutexlox@chromium.org
This is blocking me from running unit tests, I believe. The error message I see is:

sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

My filesystem is not mounted with nosuid.

Any chance of bumping the priority on this? Or is there some workaround I can use?
Cc: -mutexlox@chromium.org
Nevermind, there was a misunderstanding on a private chat; this bug is irrelevant to comment#1.
This looks like the container's filesystem got remapped but the container didn't get stopped and started. What does
cat /proc/self/uid_map
cat /proc/self/gid_map
show on such a container?
Thanks Christian, output below.

[smbarber@penguin:~]
% cat /proc/self/uid_map
         0    1000000       1000
      1000       1000          1
      1001    1001001  999998999
[smbarber@penguin:~]
% cat /proc/self/gid_map
         0    1000000       1000
      1000       1000          1
      1001    1001001  999998999
On version 71.0.3578.49 (Official Build) beta

$ sudo apt-get install git-email
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

$ ls -al /usr/bin/sudo
-rwsr-xr-x 1 1000000 1000000 140944 Jun  5  2017 /usr/bin/sudo

You partly broke my Linux container, I'm sad ;-)
I'm also experiencing this.
[nelhage@penguin:~]$ ls -l /var/
total 8.0K
drwxr-xr-x 1 nobody nogroup  330 Nov 17 15:33 backups/
drwxr-xr-x 1 nobody nogroup  120 Oct  1 16:14 cache/
drwxr-xr-x 1 nobody nogroup   14 Sep 25 20:48 games/
drwxr-xr-x 1 nobody nogroup  330 Oct  1 16:14 lib/
drwxrwsr-x 1 nobody nogroup    0 Jun 26 12:03 local/
lrwxrwxrwx 1 nobody nogroup    9 Sep 25 02:17 lock -> /run/lock/
drwxr-xr-x 1 nobody nogroup  190 Sep  8 00:16 log/
drwxrwsr-x 1 nobody nogroup    0 Sep  7 05:25 mail/
drwxr-xr-x 1 nobody nogroup    0 Sep  7 05:25 opt/
lrwxrwxrwx 1 nobody nogroup    4 Sep 25 02:17 run -> /run/
drwxr-xr-x 1 nobody nogroup    8 Sep  7 05:25 spool/
drwxrwxrwt 1 nobody nogroup 3.4K Nov 28 03:23 tmp/
[nelhage@penguin:~]$ sudo chown root /var/lib/
chown: changing ownership of '/var/lib/': Operation not permitted
[nelhage@penguin:~]$ stat /var/
  File: /var/
  Size: 100             Blocks: 0          IO Block: 4096   directory
Device: 2ah/42d Inode: 40912       Links: 1
Access: (0755/drwxr-xr-x)  Uid: (65534/  nobody)   Gid: (65534/ nogroup)
Access: 2018-11-27 16:06:08.458969135 +0000
Modify: 2018-09-25 20:48:48.295169614 +0000
Change: 2018-11-17 20:47:43.375911162 +0000
 Birth: -

I exported the VM using `vmc export` and mounted it on another machine, and can confirm that `lxd/storage_pools/default/containers/penguin/rootfs/var` is now owned by uid=0 instead of uid=1000000 like the other directories there:

# ls -l /mnt/lxd/storage-pools/default/containers/penguin/rootfs/
total 0
drwxr-xr-x 1 1000000 1000000 1418 Oct  1 08:59 bin
drwxr-xr-x 1 1000000 1000000    0 Jun 26 05:03 boot
drwxr-xr-x 1 1000000 1000000    0 Nov 17 12:54 ChromeOS
drwxr-xr-x 1 1000000 1000000  114 Sep  6 22:25 dev
drwxr-xr-x 1 1000000 1000000 2472 Nov 27 18:38 etc
drwxr-xr-x 1 1000000 1000000   14 Sep 24 19:19 home
drwxr-xr-x 1 1000000 1000000  166 Oct  1 09:00 lib
drwxr-xr-x 1 1000000 1000000 1304 Sep 24 19:28 lib32
drwxr-xr-x 1 1000000 1000000   40 Sep  6 22:25 lib64
drwxr-xr-x 1 1000000 1000000    0 Sep  6 22:25 media
drwxr-xr-x 1 1000000 1000000    0 Sep  6 22:25 mnt
drwxr-xr-x 1 1000000 1000000   12 Sep 24 19:19 opt
drwxr-xr-x 1 1000000 1000000    0 Jun 26 05:03 proc
drwx------ 1 1000000 1000000   84 Oct  1 09:07 root
drwxr-xr-x 1 1000000 1000000    0 Sep  6 22:25 run
drwxr-xr-x 1 1000000 1000000 1560 Oct  1 08:59 sbin
drwxr-xr-x 1 1000000 1000000    0 Sep  6 22:25 srv
drwxr-xr-x 1 1000000 1000000    0 Jun 26 05:03 sys
drwxrwxrwt 1 1000000 1000000  462 Nov 27 18:41 tmp
drwxr-xr-x 1 1000000 1000000   90 Sep 24 19:28 usr
drwxr-xr-x 1 root    root     100 Sep 25 13:48 var
# ls -l /mnt/lxd/storage-pools/default/containers/penguin/rootfs/var/
total 8
drwxr-xr-x 1 root root   330 Nov 17 07:33 backups
drwxr-xr-x 1 root root   120 Oct  1 09:14 cache
drwxr-xr-x 1 root root    14 Sep 25 13:48 games
drwxr-xr-x 1 root root   330 Oct  1 09:14 lib
drwxrwsr-x 1 root staff    0 Jun 26 05:03 local
lrwxrwxrwx 1 root root     9 Sep 24 19:17 lock -> /run/lock
drwxr-xr-x 1 root root   190 Sep  7 17:16 log
drwxrwsr-x 1 root mail     0 Sep  6 22:25 mail
drwxr-xr-x 1 root root     0 Sep  6 22:25 opt
lrwxrwxrwx 1 root root     4 Sep 24 19:17 run -> /run
drwxr-xr-x 1 root root     8 Sep  6 22:25 spool
drwxrwxrwt 1 root root  3120 Nov 27 08:34 tmp

Oh, hm, reading the OP's post more carefully my issue may be distinct. They're seeing a different ID mismatch issue.
Cc: chirantan@chromium.org dgreid@chromium.org
re #5 we'd love to get our hands on your disk image if you still have it, Vincent :)

re #6 that's extremely helpful actually! Your post helps confirm the theory we have as to what might be happening.

We added a raw idmap ("both 1000 1000") so we can share files with the host over 9P. This went live in R71. Any change to the idmap will necessitate remapping the entire container filesystem [1]. And this can take a long time, which could cause the host to think the container startup timed out, so we'd end up killing it.

volatile.last_state.idmap is only supposed to be reset after the full unshift/shift finishes, but the filesystem in #6 definitely looks like it was interrupted as the shift was finishing in /var.

[1]: https://github.com/lxc/lxd/blob/lxd-3.0.2/lxd/container_lxc.go#L1952
Awesome, glad to hear it. If there's anything else I can inspect in the image to debug or further confirm the hypothesis, I'm happy to do so. 
Just a quick note that we're working on a kernel solution for this moving forward, either by adding a bunch of new properties to the VFS or by introducing a tiny shifting filesystem (shiftfs).

We have most of the shiftfs implementation ready to go and expect to roll it out in Ubuntu next release. With it in place, all files will remain unshifted on the filesystem and will only get shifted for the userns through shiftfs.

This will seriously decrease initial startup time by no longer requiring shifting and will also avoid such issues where shifting is interrupted halfway through (which are notoriously hard/impossible to recover from).
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tremplin/+/046cf6881a33a4e12aa2bb106d1f68518218a6eb

commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb
Author: Stephen Barber <smbarber@chromium.org>
Date: Tue Dec 04 08:11:35 2018

tremplin: leave raw.idmap alone on existing containers

BUG= chromium:894299 
TEST=manually flip raw.idmap and restart VM; check mapping is not reset

Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4
Reviewed-on: https://chromium-review.googlesource.com/1359860
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/046cf6881a33a4e12aa2bb106d1f68518218a6eb/src/chromiumos/tremplin/main.go

Cc: tbuck...@chromium.org
Labels: -Pri-2 M-71 Merge-Request-71 Merge-Request-72 M-72 Pri-1
Status: Started (was: Assigned)
re #10 that will be great :) we're happy to help out with the kernel solution as well.

We'll track supporting this in issue 911372.

In the meantime, we need to avoid these shifts until we have some UI for it. We'd also possibly time out during container startup for large containers, which could leave containers in an intermediate state. This will be tracked in issue 911333.

We will repurpose this bug to track disabling uid remapping in 71 and 72. Requesting merge for #11 to 71 and 72.

I've verified this on ToT, but needed to revert an unrelated LXD 3.0.3 uprev (issue 910806). That won't be needed on 71 and 72 which are still running LXD 3.0.2.
Project Member

Comment 13 by sheriffbot@chromium.org, Dec 4

Labels: -Merge-Request-71 Hotlist-Merge-Review Merge-Review-71
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Hi, we're only approving emergency merges at this point.  Does this qualify?   

Also, can you give me a bit more context as to the nature of the issue and impact if we don't merge?  Thanks
For Crostini users currently on stable channel, there is a risk that without this change an update to 71+ would cause their container to be left in a broken state. The probability of a broken container is higher for users with more files.
@kbleicher this is a P0 for Crostini, users are ending up in a broken state and may lose data without this fix. We want to prevent it from reaching stable.
Project Member

Comment 17 by sheriffbot@chromium.org, Dec 5

Labels: -Merge-Request-72 Hotlist-Merge-Approved Merge-Approved-72
Your change meets the bar and is auto-approved for M72. Please go ahead and merge the CL to branch 3626 manually. Please contact milestone owner if you have questions.
Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Per #16, is this a M71 regression, has the fix been tested?    Assume this is limited to the CL in #11?  Any potential impact outside of crostini?
Yes, this is a regression in M71 and I've tested the fix in #11 on both updated containers (from 69/70->71) and fresh containers.

Impact is limited to crostini only - the fix applies to the crostini VM and won't affect the rest of CrOS.
Labels: -Merge-Review-71 Merge-Approved-71
Thanks for #19.  Approving merge to M71 Chrome OS.

Project Member

Comment 21 by bugdroid1@chromium.org, Dec 7

Labels: merge-merged-release-R72-11316.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tremplin/+/1dcf36e886716c7278ca9464c6bf84fa56ae0597

commit 1dcf36e886716c7278ca9464c6bf84fa56ae0597
Author: Stephen Barber <smbarber@chromium.org>
Date: Fri Dec 07 00:43:04 2018

tremplin: leave raw.idmap alone on existing containers

BUG= chromium:894299 
TEST=manually flip raw.idmap and restart VM; check mapping is not reset

Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4
Reviewed-on: https://chromium-review.googlesource.com/1359860
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
(cherry picked from commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb)
Reviewed-on: https://chromium-review.googlesource.com/c/1366498
Commit-Queue: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/1dcf36e886716c7278ca9464c6bf84fa56ae0597/src/chromiumos/tremplin/main.go

Project Member

Comment 22 by bugdroid1@chromium.org, Dec 7

Labels: merge-merged-release-R71-11151.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tremplin/+/133cd299968a04a47c6e71287ea1a93d1e05fa1b

commit 133cd299968a04a47c6e71287ea1a93d1e05fa1b
Author: Stephen Barber <smbarber@chromium.org>
Date: Fri Dec 07 00:43:06 2018

tremplin: leave raw.idmap alone on existing containers

BUG= chromium:894299 
TEST=manually flip raw.idmap and restart VM; check mapping is not reset

Change-Id: I8503f2e56a636010be9a73350eaa6388ed0a35a4
Reviewed-on: https://chromium-review.googlesource.com/1359860
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
(cherry picked from commit 046cf6881a33a4e12aa2bb106d1f68518218a6eb)
Reviewed-on: https://chromium-review.googlesource.com/c/1366387
Commit-Queue: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/133cd299968a04a47c6e71287ea1a93d1e05fa1b/src/chromiumos/tremplin/main.go

Labels: -Merge-Approved-71 -Merge-Approved-72 Merge-Merged
Status: Fixed (was: Started)
Merged and new VMs built and set up for testing.

71 - 11151.53.0
72 - 11316.7.0
Status: Verified (was: Fixed)
New components are pushed and verified on
71 - 11151.54.0
72 - 11316.9.0

Sign in to add a comment