New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 810235 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Last visit 15 days ago
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Security

Blocking:
issue 707539



Sign in to add a comment

user namespaces allow for unprivileged noexec bypass

Project Member Reported by vapier@chromium.org, Feb 8 2018

Issue description

reproduction:
(1) start off as chronos (or anyone else unprivileged)
(2) Create a random dir anywhere writable.
    mkdir /tmp/foo
(3) Run a command in a new user namespace as root.  This allows mounting of pseudo filesystems.
    This will be a new tmpfs with exec/suid/dev enabled.
    /sbin/minijail0 -i -I -U -m'0 1000 1' -- /bin/bash -c 'mount -t tmpfs tmpfs /tmp/foo; sleep 999999' &
(4) Use that processes mount namespace visible via the proc's root path.
    cd /proc/$!/root/tmp/foo
(5) Create the script to run
    printf '#!/bin/sh\necho bad\n' > test.sh
    chmod a+rx test.sh
(6) Run it.
    ./test.sh
 
mmm drop the "-i" flag from (3) so it doesn't exit

minijail is just a convenience factor.  if we had `unshare` on the system, it would expose the same issue.
we can't disable user namespace usage by unprivileged users because Chrome uses it for its own sandbox (so as to avoid needing to be set*id).

i think we should be able to restrict mounts though.  if a user namespace is active, any attempts to mount w/out nodev/noexec/nosuid would be rejected.  not sure if we can pull this off in our CrOS LSM.

Comment 3 Deleted

Comment 4 Deleted

Comment 5 Deleted

Comment 6 Deleted

Comment 7 Deleted

Comment 8 Deleted

Cc: drinkcat@chromium.org keescook@chromium.org lhchavez@chromium.org dnschn...@chromium.org hidehiko@chromium.org yusukes@chromium.org
brainstormed a bit with Jorge

- we can't nuke /proc/<pid>/root/ because we have some tools (like crash reporting and containers) using it to get visibility in a process's mount namespace w/out having to exit its own

- what if we restricted mounting of filesystems w/out noexec/nodev/nosuid in general once the system is up and running ?  we could add a sysctl knob with your standard disable/enabled/enabled-lock settings which would require these bits at the lsm level.

i'm thinking of making it a knob because our system remounts /tmp as exec+suid for test images.

for non-dev mode, we can set it to enabled-lock, but for dev mode, we can set it to enabled.  this won't mess up our testing coverage, but will allow people (like crouton) to disable on the fly if they need to.  it would mean we'd break existing scripts if they don't toggle the sysctl knob :).

open questions:
- i don't think we can nerf all mounts because ARC++ will need to mount its containers w/exec (and prob suid) enabled
- i'm guessing ARC++ also wants devtmpfs/devpts/selinuxfs which means we can't nerf nodev everywhere
- are there any pseudo mounts that require exec/suid once the system is up and running ?  if not, we can start with blocking those bits on all pseudo mounts.
- what details does the lsm have at mount time that we can cheaply check/enforce ?  if we have access to the fstype (e.g. "tmpfs" or "ext4") and the mount flags (e.g. MS_NODEV), then that's perfect.
> - i don't think we can nerf all mounts because ARC++ will need to mount its containers w/exec (and prob suid) enabled

Yes, ARC++ mounts its squashfs images (system.raw.img and vendor.raw.img) with exec+suid. Its /dev (tmpfs) is also exec+dev. IIRC, /dev needs exec because /dev/ashmem doesn't fully work if /dev doesn't have exec.

> - i'm guessing ARC++ also wants devtmpfs/devpts/selinuxfs which means we can't nerf nodev everywhere

Yes ARC++ container has /dev/pts with dev.

> - are there any pseudo mounts that require exec/suid once the system is up and running ?  if not, we can start with blocking those bits on all pseudo mounts.

Roughly speaking, ARC++ container partially starts right after Chrome OS login screen is shown (without /data partition), and once the user signs in, the user's /data (in the user's home) is remounted with exec and then passed into the container.

Also, if the whole container crashes for some reason during user session, Chrome automatically restart it. So ARC++ container startup could happen in the middle of user session.

I think adding the restriction of filesystems unless the caller has the CAP_SYS_ADMIN in the init namespace (or is a bind-mount) _might_ work. There are a few of these mounts that are performed within the container namespace, but those can be moved to the intermediate namespace (prior to dropping caps), such as the ones mentioned in Comment #10, or fixed from within the container:

:/ # mount | grep -v 'nosuid,nodev,noexec' 
/dev/loop1 on / type squashfs (ro,seclabel,relatime)  # intermediate ns
/dev/loop2 on /vendor type squashfs (ro,seclabel,nodev,relatime)  # intermediate ns
tmpfs on /dev type tmpfs (rw,seclabel,nosuid,relatime,mode=755,uid=655360,gid=655360) # intermediate ns
devpts on /dev/pts type devpts (rw,seclabel,nosuid,noexec,relatime,mode=600,ptmxmode=666) # tricky
devpts on /dev/ptmx type devpts (rw,seclabel,nosuid,noexec,relatime,mode=600,ptmxmode=666) # tricky
none on /sys/fs/selinux type selinuxfs (rw,nosuid,noexec,relatime)  # bind-mount from outside
none on /acct type cgroup (rw,nodev,relatime,cpuacct) # this and all the other cgroup mounts can be nosuid,noexec
tmpfs on /mnt type tmpfs (rw,seclabel,nodev,relatime,mode=755,uid=655360,gid=656360) # not sure if Android wants this to be exec,suid for a reason
none on /dev/cpuctl type cgroup (rw,nodev,relatime,cpu)
tmpfs on /storage type tmpfs (rw,seclabel,nodev,relatime,mode=755,uid=655360,gid=656360) # bind-mount from /mnt/runtime/default
tmpfs on /var/run/inputbridge type tmpfs (rw,seclabel,nodev,relatime,mode=775,uid=655360,gid=656360) # Can make it nosuid,noexec
/dev/mmcblk0p1 on /data type ext4 (rw,seclabel,nosuid,nodev,noatime,commit=600,data=ordered)  # Mounted from outside
passthrough on /var/run/arc/media/removable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000,default_permissions,allow_other)  # Mounted from outside
tmpfs on /var/run/inputbridge type tmpfs (rw,seclabel,nodev,relatime,mode=775,uid=655360,gid=656360)  # Can make this nosuid,noexec

After some tweaking I *think* I can trim that list down to:

:/ # mount | grep -v 'nosuid,nodev,noexec'
/dev/loop98 on / type squashfs (ro,seclabel,relatime)
/dev/loop99 on /vendor type squashfs (ro,seclabel,nodev,relatime)
tmpfs on /dev type tmpfs (rw,seclabel,nosuid,relatime,mode=755,uid=655360,gid=655360)
devpts on /dev/pts type devpts (rw,seclabel,nosuid,noexec,relatime,mode=600,ptmxmode=666)
devpts on /dev/ptmx type devpts (rw,seclabel,nosuid,noexec,relatime,mode=600,ptmxmode=666)
none on /sys/fs/selinux type selinuxfs (rw,nosuid,noexec,relatime)
tmpfs on /mnt type tmpfs (rw,seclabel,nodev,relatime,mode=755,uid=655360,gid=656360)
tmpfs on /storage type tmpfs (rw,seclabel,nodev,relatime,mode=755,uid=655360,gid=656360)
/dev/mmcblk0p1 on /data type ext4 (rw,seclabel,nosuid,nodev,noatime,commit=600,data=ordered)
passthrough on /var/run/arc/media/removable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000,default_permissions,allow_other)

I was thinking about this and realized we don't *really* use unprivileged user namespaces that much, do we? The Android container is set up as root. Maybe we should just disallow user namespaces, possibly until we figure out a better way to fix this.
Half of it is setup as root (e.g. the mounts performed in the intermediate namespace are done as real root before entering userns), but the other half is setup as android-root, where it does expect to be able to do some mounts (but the code changes above should limit that impact).

Unfortunately due to the way the user namespaces interact with the other namespaces I was not able to split the container setup from the creation of the user namespace :(
Do we call the unshare() syscall (or clone() with the respective flag) as root or not? I think that's the only thing that matters here right?
oh i see your point. we invoke that as root, but Chrome does spawn some of its processes (notably the nacl_helper and nacl_helper_nonsfi) in a new userns in an unprivileged fashion:

localhost ~ # ls -l /proc/*/ns/user | sed -e 's%.*/proc/\([0-9]*\)/ns/user -> user:\[\([0-9]*\)\]%\2 \1%' | sort -n
4026531837 1  # initns
...
4026531837 99
4026532504 1881 # NaCl helper
4026532575 1884 # NaCl helper-nonsfi
4026532576 1879 # Chrome
4026532576 1887 # Chrome
4026532576 4061 # Chrome
4026532576 4394 # Chrome
4026532576 5541 # Chrome
4026532583 5681 # Android init
4026532583 5690
...
localhost ~ # grep Uid /proc/1791/status
Uid:    1000    1000    1000    1000
localhost ~ # pstree -p -s 1879
init(1)---session_manager(1751)---chrome(1791)---chrome(1879)-+-chrome(1887)-+-chrome(4061)-+-{Chrome_ChildIOT}(4069)
                                                              |              |              |-{CompositorTileW}(4073)
                                                              ...
                                                              |-nacl_helper(1881)
                                                              `-nacl_helper_non(1884)
After I applied a logging-only version of vapier@'s suggestion ( https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/917210 ) it seems like ARC is the sole offender:

2018-02-13T14:33:35.133058-08:00 NOTICE kernel: [   25.146747] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/opt/google/containers/android/rootfs/root/var/run/arc/shared_mounts" pid=2026 cmdline="/usr/bin/run_oci --log_dir=/var/log/android start android-master-run_oci"
2018-02-13T14:33:35.153074-08:00 NOTICE kernel: [   25.166128] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/" pid=2026 cmdline="/usr/bin/run_oci --log_dir=/var/log/android start android-master-run_oci"
2018-02-13T14:33:35.219031-08:00 NOTICE kernel: [   25.232360] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/mnt" pid=2026 cmdline="/init"
2018-02-13T14:33:35.219047-08:00 NOTICE kernel: [   25.232457] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/config" pid=2026 cmdline="/init"
2018-02-13T14:33:35.235023-08:00 NOTICE kernel: [   25.248013] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/" pid=2026 cmdline="/init"
2018-02-13T14:33:35.235039-08:00 NOTICE kernel: [   25.248065] Chromium OS LSM: sb_mount Mounting a filesystem without noexec outside of init ns prohibited obj="/storage" pid=2026 cmdline="/init"

I totally forgot about the shared_mounts one since it is short-lived (it goes away after logging in), but I'll send a fix for it soon. I'm pretty sure I can fix the complaints about /config with another AOSP patch, but /mnt and /storage might need some more talks with Android folks. I already sent some emails about it.

The complaints for / from init might be benign since it's trying to do something that would be forbidden anyways.
i'm fairly certain the Chrome sandbox is utilizing user namespaces so that it doesn't need to be set*id.  imo that's an improvement we don't want to go back on.

another possible solution raised would be alt syscall.  if we defined a new alt_syscall table that is used by everyone by default (e.g. by upstart itself as pid 1 switching tables before spawning anything), we would be able to hook the mount function:
- in the default table, we'd always block things
- in the ARC++ table, we'd allow specific combinations

i tried to look at your logging kernel patch, but i guess it's marked private ?  i think it would be useful to add kernel logging all the time here.  we could do this in the CrOS LSM sb_mount hook ?
Removed private. Yes, it's in the CrOS LSM sb_mount hook.
Ah yeah forgot about the Chrome sandbox. We definitely don't want to break that.

So, the LSM can only access superblock info, but that wouldn't allow us to hook subsequent bind mounts? Is that correct?
OK, we're down to just /mnt within Android. The owner of that has been OOO, but they're going to return tomorrow. IF we can mark /mnt as noexec,nosuid, we'll be good to go!
#comment20: iiuc bind mounts are still accessible, since the hook is called once for every call to mount(2). I had to add bind-mounts to the list of exceptions to avoid a whole bunch of errors, since bind-mounts and changes to mount  propagation flags don't let you change the exec,suid,dev flags.

remounting (except with MS_BIND, which only lets you change the ro/rw flag) is forbidden within user namespaces anyways.
So doesn't that mean that we can implement the LSM-based solution? Mike, Mattias, what's missing here?
i believe that is the case. with all the in-flight patches, the PoC no longer works and Android still boots.
the lsm superblock/mount flag limitations apply to lsm hooks that happen after the mount succeeds (e.g. inode_follow_link and file_open only get access to the sb), but if the lsm hooks the initial mount call (e.g. sb_mount), then we have access to all the args passed in to the kernel (source/type/dest/flags).  so we should have a clean point in the lsm to hook this event.

i've posted some updates to the existing lsm code to start adding more debugging info (incorporating some of Luis's changes).  once those settle, we can look at Luis's poc CL to see if we can pull this off in the lsm only.
Owner: lhchavez@chromium.org
Status: Started (was: Assigned)
Status update:

* All necessary changes in Android (AOSP) have been approved and need to propagate towards the nyc-mr1-arc branch https://googleplex-android-review.git.corp.google.com/q/topic:%22nyc-mr1-arc-mount-flags%22+(status:open%20OR%20status:merged). This might take some time because the Android CQ for that branch is busted: https://bugs.chromium.org/p/chromium/issues/detail?id=812467
* Once the PFQ rolls both the NYC and master containers, we can land https://chrome-internal-review.googlesource.com/c/chromeos/cheets-scripts/+/570514 to get rid of the very last unsafe mount from the Android side.
* With all those things landed, we can finally apply the kernel patch that uses the proposal outlined in #c9, #c11 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/917210 (which is now ready for review).

With that, we'll be ready to start cherry-picking to the release branches.
 Issue 809624  has been merged into this issue.
Status update:

* All changes in Android are in both master-arc-dev and nyc-mr1-arc branches.
* Both NYC and master Android PFQs are currently busted and have not updated in days [1][2] D:

I'll try to get those unblocked so we can proceed.

1: https://uberchromegw.corp.google.com/i/chromeos/builders/master-nyc-android-pfq/
2: https://uberchromegw.corp.google.com/i/chromeos/builders/master-mst-android-pfq/
Project Member

Comment 29 by bugdroid1@chromium.org, Feb 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/11f1a760ead9b1f0771b51767b47709ebcbc061c

commit 11f1a760ead9b1f0771b51767b47709ebcbc061c
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Thu Feb 22 06:26:52 2018

arc: Remove /dev/input from config.json

/dev/input was added to config.json as a glorified mkdir(2). It has
since been moved as an actual mkdir in init.cheets.rc in the Android
side, to avoid creating a mount altogether.

BUG= chromium:810235 
TEST=Android still boots
TBR=yusukes@google.com

Change-Id: I06ffbed091069513f35c8a319759d0d3e3022371
Reviewed-on: https://chromium-review.googlesource.com/929001
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Yury Khmel <khmel@chromium.org>

[modify] https://crrev.com/11f1a760ead9b1f0771b51767b47709ebcbc061c/arc/container-bundle/README.md
[modify] https://crrev.com/11f1a760ead9b1f0771b51767b47709ebcbc061c/arc/container-bundle/nyc/config.json
[modify] https://crrev.com/11f1a760ead9b1f0771b51767b47709ebcbc061c/arc/container-bundle/master/config.json

Status update:

The only missing thing to land is the LSM itself. I'm also preparing the set of cherry-picks for M65.

Turns out I can skip one of the Chrome OS-side changes (the one for /var/run/inputbridge) in favor of an Android-side version for M65, which should be safer since we can just land all the Android changes at once without having this weird synchronizations between repositories.

If all goes well, by EOW everything should have landed.
Project Member

Comment 31 by bugdroid1@chromium.org, Feb 22 2018

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/132548e180ca9dcbc01c4f38e862cada3f118521

commit 132548e180ca9dcbc01c4f38e862cada3f118521
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Thu Feb 22 09:53:31 2018

CHROMIUM: LSM: Deny mounting filesystems as exec in unprivileged userns

This change makes chromiumos_security_sb_mount() forbid mounting
filesystems without the MS_NOEXEC flag outside of the init namespace.

BUG= chromium:810235 
TEST=Android can still boot
Signed-off-by: Luis Hector Chavez <lhchavez@chromium.org>

Change-Id: I40d22bdd637b1113bb53db7856bdd06331083cbd
Reviewed-on: https://chromium-review.googlesource.com/917210
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/132548e180ca9dcbc01c4f38e862cada3f118521/security/chromiumos/lsm.c

Comment 32 Deleted

Out of curiosity, did someone test Mike's repro steps with the updated LSM?
I did: https://bugs.chromium.org/p/chromium/issues/detail?id=810235#c24

In case anybody else wants to try it out, grab latest canary on a 4.4 board.

Comment 35 Deleted

Comment 36 Deleted

As far as I can tell, the way this is being implemented shouldn't affect crouton, right?  crouton mounts/bind-mounts with exec as root from crosh, which I assume still has CAP_SYS_ADMIN.
Crouton should be fine. If you're running as root you should be fine.
Labels: Merge-Request-65
Status updates:

* Kernel cherry-picks are in the CQ for 3.14, 3.18. Kernel cherry-pick for 4.14 is ongoing due to merge conflicts.
* Android-side merges have been approved and have landed for M65: https://googleplex-android-review.git.corp.google.com/q/topic:%22nyc-mr1-arc-m65-mount-flags%22+(status:open%20OR%20status:merged)
* Android-side merges were not approved for M64.
* Requesting merge for https://chrome-internal-review.googlesource.com/c/chromeos/cheets-scripts/+/572751 and the kernel changes in M65.
Thanks for the update. It's OK to not merge this back to 64.
Labels: -M-64 M-65
Project Member

Comment 42 by sheriffbot@chromium.org, Feb 22 2018

Labels: -Merge-Request-65 Merge-Review-65 Hotlist-Merge-Review
This bug requires manual review: We are only 11 days from stable.
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), cmasso@(iOS), bhthompson@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
I think this will break the linux VMs on 4.14. 4.14 is what the VM guest runs. We could turn off the chromium lsm in the guest, or we could put the new mount namepsace feature behind a different option.
Cherry picked to 4.14 and confirmed that the container in the VM won't start anymore.
Labels: -Hotlist-Merge-Review -Merge-Review-65 Merge-Approved-65
Project Member

Comment 46 by bugdroid1@chromium.org, Feb 23 2018

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/9264f12338bddba8a1d1fcd425a0f425cd53297e

commit 9264f12338bddba8a1d1fcd425a0f425cd53297e
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Fri Feb 23 11:36:45 2018

CHROMIUM: LSM: Deny mounting filesystems as exec in unprivileged userns

This change makes chromiumos_security_sb_mount() forbid mounting
filesystems without the MS_NOEXEC flag outside of the init namespace.

BUG= chromium:810235 
TEST=Android can still boot
Signed-off-by: Luis Hector Chavez <lhchavez@chromium.org>

Change-Id: I40d22bdd637b1113bb53db7856bdd06331083cbd
Reviewed-on: https://chromium-review.googlesource.com/917210
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
(cherry picked from commit 132548e180ca9dcbc01c4f38e862cada3f118521)
Reviewed-on: https://chromium-review.googlesource.com/931781

[modify] https://crrev.com/9264f12338bddba8a1d1fcd425a0f425cd53297e/security/chromiumos/lsm.c

The kernel changes don't apply cleanly to M65 due to them depending on a previous logging improvement that vapier@ made and making them not depend on those is kind of hard (I would basically need to merge both changes together). Chatted with vapier@ offline and he is of the idea that it's probably better to skip merging to R65 since the LSM changes are fairly invasive and we don't have any way to report the failures.

Note that all the Android-side changes have already been merged and the latest R65 build already has them.

Options:

* Drop merging altogether and let it just be fixed in M66.
* Cherry-pick vapier@'s logging improvements to make the LSM changes apply cleanly.
* Make further improvements to make it WARN instead of just pr_notice.

Thoughts?
It's OK to skip merging this to 65.
Project Member

Comment 49 by bugdroid1@chromium.org, Feb 23 2018

Labels: merge-merged-chromeos-4.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7a840abdff7ccb7251686a6c29e10930e55fe38e

commit 7a840abdff7ccb7251686a6c29e10930e55fe38e
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Fri Feb 23 19:16:28 2018

CHROMIUM: LSM: Deny mounting filesystems as exec in unprivileged userns

This change makes chromiumos_security_sb_mount() forbid mounting
filesystems without the MS_NOEXEC flag outside of the init namespace.

BUG= chromium:810235 
TEST=Android can still boot
Signed-off-by: Luis Hector Chavez <lhchavez@chromium.org>

Change-Id: I40d22bdd637b1113bb53db7856bdd06331083cbd
Reviewed-on: https://chromium-review.googlesource.com/917210
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
(cherry picked from commit 132548e180ca9dcbc01c4f38e862cada3f118521)
Reviewed-on: https://chromium-review.googlesource.com/931782
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/7a840abdff7ccb7251686a6c29e10930e55fe38e/security/chromiumos/lsm.c
[modify] https://crrev.com/7a840abdff7ccb7251686a6c29e10930e55fe38e/security/chromiumos/Kconfig

Project Member

Comment 50 by bugdroid1@chromium.org, Feb 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a16f24e4bec2e90403246f5e22b6f6536051ee1c

commit a16f24e4bec2e90403246f5e22b6f6536051ee1c
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Fri Feb 23 19:16:30 2018

CHROMIUM: config: Disable SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS for VMs

This change disables SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS
for VMs, since they expect to be able to have executable tmpfs mounts.

BUG= chromium:810235 
TEST=Pre-CQ
Signed-off-by: Luis Hector Chavez <lhchavez@chromium.org>

Change-Id: I7308ac090847a02fe1d3644dc148e01b622fd9bc
Reviewed-on: https://chromium-review.googlesource.com/932823
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/a16f24e4bec2e90403246f5e22b6f6536051ee1c/arch/arm64/configs/chromiumos-container-vm-arm64_defconfig
[modify] https://crrev.com/a16f24e4bec2e90403246f5e22b6f6536051ee1c/arch/x86/configs/chromiumos-container-vm-x86_64_defconfig

Project Member

Comment 51 by bugdroid1@chromium.org, Feb 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/67fcdebc2f6f48623cae43579bcb847cc492fa6b

commit 67fcdebc2f6f48623cae43579bcb847cc492fa6b
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Fri Feb 23 19:16:33 2018

CHROMIUM: config: Normalize config.

Ran chromeos/scripts/kernelconfig oldconfig, accepted defaults.

This stems from the SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS
flag that was recently introduced

BUG= chromium:810235 
TEST=Pre-CQ
Signed-off-by: Luis Hector Chavez <lhchavez@chromium.org>

Change-Id: I53f55769c75706c52d0b1e786aac1689fed7a1f9
Reviewed-on: https://chromium-review.googlesource.com/932824
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/67fcdebc2f6f48623cae43579bcb847cc492fa6b/chromeos/config/base.config

Labels: -M-65 -Merge-Approved-65 M-66
Status: Fixed (was: Started)
All right, everything has landed to M-66. Given #comment48, there is nothing left to do. Removing the merge labels and marking this as fixed.
Project Member

Comment 53 by sheriffbot@chromium.org, Feb 24 2018

Labels: -Restrict-View-SecurityTeam Restrict-View-SecurityNotify
Cc: uekawa@chromium.org
Cc: adityakali@google.com
I think this is going to break containers for us (lakitu) so I am likely going to disable this.

But I wanted to understand what exactly this is trying to fix. When I run the command in the repro (/sbin/minijail0 -i -I -U -m'0 1000 1'), this not just creates a new USERNS for me, but also new MNT namespace. And so the tmpfs mount in that example is happening in a private mount namespace.

Why are we expecting that mount-flags (noexec, nosuid, etc.) from the parent's MNT-ns should have any meaning inside the private MNTns of the container? Doesn't that defeat the purpose of private mount namespaces?

re#56:

The issue is that the new mount namespace is visible in the init mount namespace through at least /proc/pid/self.

This creating the ability for an unprivileged user to get a tmpfs mount that was executable. This is going to be hard to address in a backwards compatible way, so we've got this chromium LSM solution to patch it up for now.
What Dylan said. Namespaces prevent things from leaking out, but usually don't prevent other things from reaching in. Whatever's executing that Minijail command lives in a namespace *outside* the newly created one, and can therefore access it, at least via /proc/pid. And that gives folks a writable+executable partition, which we try really hard to prevent.
Yeah. Thanks for explaining. The leak via /proc/<pid>/root (and also cwd) is troublesome.
I was confused because looking at the patch https://crrev.com/c/931782, it seemed to be denying mounts which should be fine if USERNS and MNTNS are both unshared.

I wonder if blocking access to the /proc/<pid>/{cwd, root} entries themselves would be OK for container users. A naive way could be to show them only if both 'current' and 'task' are in same mntns. Something like:

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 333138142a65b..82527ebf7b35f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -198,7 +198,7 @@ static int proc_root_link(struct dentry *dentry, struct path *path)
        struct task_struct *task = get_proc_task(d_inode(dentry));
        int result = -ENOENT;
 
-       if (task) {
+       if (task && (capable(CAP_SYS_ADMIN) || (task->nsproxy->mnt_ns == current->nsproxy->mnt_ns))) {
                result = get_task_root(task, path);
                put_task_struct(task);
        }


This solution could work for containers too because the expected way to access other mntns is by attaching to the target MNTns (nsenter/setns). Haven't thought about all scenarios thoroughly though.
If I could have removed {cwd, root}, I would have, but ARC++ had some not-unreasonable reasons to keep them in. If you can do it, I think you should.
Project Member

Comment 61 by bugdroid1@chromium.org, Mar 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/03ed3d5d09961fa924ddc7f605f20a4ea9600a71

commit 03ed3d5d09961fa924ddc7f605f20a4ea9600a71
Author: Aditya Kali <adityakali@google.com>
Date: Thu Mar 01 20:39:08 2018

lakitu: kernel: Disable SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS

The newly introduced kernel option (CL:931782) will prevent creation of
tmpfs and other executable mounts inside containers and potentially break
certain use-cases on lakitu.

BUG= chromium:810235 
TEST=trybots
RELEASE_NOTE=None

Change-Id: I6bb7f007809744c5cf4b478d7e2122cec76a169f
Reviewed-on: https://chromium-review.googlesource.com/942364
Commit-Ready: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org>
Reviewed-by: Daniel Wang <wonderfly@google.com>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/03ed3d5d09961fa924ddc7f605f20a4ea9600a71/overlay-lakitu/sys-kernel/lakitu-kernel-4_14/files/base.config
[modify] https://crrev.com/03ed3d5d09961fa924ddc7f605f20a4ea9600a71/overlay-lakitu/sys-kernel/lakitu-kernel-4_14/lakitu-kernel-4_14-9999.ebuild

Cc: dvyukov@google.com
Project Member

Comment 63 by sheriffbot@chromium.org, Jun 2 2018

Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment