New issue
Advanced search Search tips

Issue 866377 link

Starred by 8 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug

Blocked on:
issue 920092

Blocking:
issue 866129



Sign in to add a comment

Investigate and implement further hardening FUSE file systems mounted by cros-disks

Project Member Reported by mnissler@chromium.org, Jul 23

Issue description

Spin-off from  issue 866016 : The sshfs and ssh processes on the Chrome OS side are running with powerful capabilities. If that's indeed the case (I haven't verified), then that's risky since these processes directly handle untrusted input.

We should perform some due diligence to see if/how we can improve this:

* As a first step: Determine and document what processes are running, what privileges (capabilities in particular) they have and what IPC surface they open (network ports and such). Results of this investigation should ideally go into the security section of the design doc.

* As a second step: See what we can do to tighten the sandbox. We might need CAP_SYS_ADMIN for mount syscalls etc., but maybe we can drop more capabilities in process that don't need them and/or before we actually start processing untrusted data?
 
Cc: joelhockey@chromium.org
Cc: jorgelo@chromium.org
So cros-disks set up the sandbox for every FUSE helper tho inherit its caps, but not to be able to gain new through PR_SET_NO_NEW_PRIVS. As a result every one of them will have chown, setgid, setuid and sys_admin, out of which only sys_admin is actually used (rest needed only for cros-disks), and only by the fusermount which already has sys_admin fcap, so if wasn't for the particular setup it would work without any caps at all.

First thing first: we are using third party FUSE modules, and we will use them as there is absolutely no way we are going to reimplement every each one of them to do some kind of proxying of syscalls.

Now what I think we can actually do:

Ideally we would drop all caps from a sandboxed FUSE module but leave possibility to obtain sys_admin through exec fusermount. But from my reading of minijail code this particular combination is not supported - by dropping a capability it also blocks it from reobtaining it back through exec (?). In the current format we can only drop all but sys_admin.

Another option would be changing libminijail to be more flexible on what it does with capabilities, so it would support different caps and bcaps. I'd assume it's possible, but I guess jorgelo@ would know better.
Copying relevant bits from my reply on the doc comment thread (which is becoming unwieldy anyways):

In case of cros-disks / FUSE you need CAP_SYS_ADMIN to be able to perform mount and unmount system calls. Ideally, the helpers that actually implement the file system wouldn't run with that privilege though. The art is to find a way to split things up so the privileged parts a simple and safe, while the tricky and dangerous parts run in a completely deprivileged process.

I am not too familiar with FUSE, but conceptually it should be possible to have one process open /dev/fuse, perform the mount system call, then fork a deprivileged child process that runs the file system implementation (passing the /dev/fuse file descriptor). I doesn't look like the current fuse implementation supports this, but maybe it's worth a shot to attempt implementing this (and upstreaming it?)
Another idea: FUSE already supports controlled privilege escalation via fusermount, but as previously discussed that's designed based on setuid which we'd like to avoid.

Spelunking a bit in the code, it looks like fusermount does the following: (1) opening /dev/fuse and (2) mounting the file system. Now if we did these two things *before* spawning the file system helper, then make a stub fusermount implementation that just passes the /dev/fuse FD via the fusermount mechanism (it uses a pipe to pass the FD) back to the file system helper, we might be able to execute the helper fully deprivileged? Sounds a bit crazy, but probably worth a shot?
Problem arises that different helpers have different arguments, and validation of the arguments, interactive logins, etc. must happen before mount() is called and /dev/fuse opened. Also unmount() comes from untrusted code for some situations, e.g. shutdown/disconnect handling. So you definitely can't just run all unprivileged code strictly after privileged.

Maybe we can split fusermount into privileged daemon and unnpriviliged client and make them talk IPC passing /dev/fuse descriptor, but that's unlikely will ever get upstreamed, so we will become responsible for maintaining it in a compatible way. My gut feeling is that it might not go well in the long run.

How bad is having CAPBSET permitting sys_admin only, and drop all caps for the helper?
Cc: benchan@chromium.org
Are you saying that the helper needs to adjust mount options depending on user interaction? That sounds scary on its own - do you have specific example?

I might be wrong, but if I understand the code correctly, then the sequence of events is this:
1. FD=open(/dev/fuse)
2. call mount(), passing fd=FD as a mount option
3. The kernel will start sending requests via the FD to interact with the file system

What are the specific technical reasons you can think of that prevent steps 1 and 2 to happen before spawning the helper?

At cleanup time, there's an umount() call and the FD gets closed. I don't think the kernel can make any assumptions on the order of these and needs to handle both situations gracefully. So cros-disks could just handle the helper exit and perform the umount() itself.
Forgot about the cap_bset question: Even if we enable the capability in the bounding set, how would the fusermount child process acquire it? Are you thinking file capabilities? That has many of the same problems that setuid has, so I'd rather avoid it.
Before any mount there should be some clear understanding what is being mounted and whether it's legit to mount that thing in the present situation, with provided credentials, etc. You can't just mount and then check if the thing you are mounting even exists.

Yes, I mean fcaps. It has fewer problems than suid - it's only one capability after all, not all of them.
> Before any mount there should be some clear understanding what is being mounted and whether it's legit to mount that thing in the present situation, with provided credentials, etc. You can't just mount and then check if the thing you are mounting even exists.

Why not? If the FUSE helper fails, we just unmount and we're good. The decision what gets mount and where is made before opening /dev/fuse and mounting anyways. Note that this isn't any different from a FUSE helper failure or crash after the mount() takes place (e.g. the SSH connection might die at any time, so you can be sure the target will keep existing even if you checked before).

> Yes, I mean fcaps. It has fewer problems than suid - it's only one capability after all, not all of them.

CAP_SYS_ADMIN is the most powerful capability and has been shown to allow escalation to full privilege repeatedly in the past (for the unlikely event that CAP_SYS_ADMIN doesn't already give the attacker sufficient powers already).
Summary: Investigate and implement further hardening FUSE file systems mounted by cros-disks (was: Investigate and implement further hardening for SSHFS)
This really applies to all FUSE file systems mounted by cros-disks, adjusting title accordingly.
Cc: slangley@chromium.org weifangsun@chromium.org
Labels: M-70
Cc: tbuck...@chromium.org
Found some time today to make a proof-of-concept of the idea discussed above (i.e. opening /dev/fuse and mounting up front, then spawning the fuse helper).

To play with this:

1. Get fuse_no_privs.c and compile like so: clang -Wall -o /tmp/fuse_no_privs ~/fuse_no_privs.c -lminijail
2. Copy to Chrome OS device or VM: scp /tmp/fuse_no_privs $CROS:/tmp/
3. Build ntfs-3g with full libfuse (libfuse-lite doesn't support fusermount): USE=external-fuse emerge-$BOARD -v1 ntfs3g
4. cros deploy --board $BOARD $CROS ntfs3g

Then, on the device or VM (note in particular the Cap* and NoNewPrivs lines in the proc status output):

localhost ~ # dd if=/dev/zero of=/tmp/ntfs.bin count=100 bs=1M
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.138572 s, 757 MB/s
localhost ~ # mkfs.ntfs -F /tmp/ntfs.bin
/tmp/ntfs.bin is not a block device.
mkntfs forced anyway.
The sector size was not specified for /tmp/ntfs.bin and it could not be obtained automatically.  It has been set to 512 bytes.
The partition start sector was not specified for /tmp/ntfs.bin and it could not be obtained automatically.  It has been set to 0.
The number of sectors per track was not specified for /tmp/ntfs.bin and it could not be obtained automatically.  It has been set to 0.
The number of heads was not specified for /tmp/ntfs.bin and it could not be obtained automatically.  It has been set to 0.
Cluster size has been automatically set to 4096 bytes.
To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set.
Windows will not be able to boot from this device.
Initializing device with zeroes: 100% - Done.
Creating NTFS volume structures.
mkntfs completed successfully. Have a nice day.
localhost ~ # chown nobody:nogroup /tmp/ntfs.bin
localhost ~ # mkdir /run/fuse_mnt
localhost ~ # /tmp/fuse_no_privs $(which mount.ntfs-3g) /tmp/ntfs.bin /run/fuse_mnt
Minijailed fuse helper started: 0
Mounted successfully at /run/fuse_mnt.
localhost ~ # ls -ld /run/fuse_mnt
drwxrwxrwx. 1 nobody nogroup 4096 Aug 20 13:52 /run/fuse_mnt
localhost ~ # cat /proc/$(pidof mount.ntfs-3g)/status
Name:   mount.ntfs-3g
State:  S (sleeping)
Tgid:   9854
Ngid:   0
Pid:    9854
PPid:   1
TracerPid:      0
Uid:    65534   65534   65534   65534
Gid:    65533   65533   65533   65533
FDSize: 64
Groups:
NStgid: 9854
NSpid:  9854
NSpgid: 9854
NSsid:  9854
VmPeak:     9628 kB
VmSize:     9628 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:       528 kB
VmRSS:       528 kB
VmData:      664 kB
VmStk:       132 kB
VmExe:        48 kB
VmLib:      2452 kB
VmPTE:        36 kB
VmSwap:        0 kB
Threads:        1
SigQ:   0/15568
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180004003
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000
NoNewPrivs:     1
Seccomp:        0
Cpus_allowed:   f
Cpus_allowed_list:      0-3
Mems_allowed:   1
Mems_allowed_list:      0
voluntary_ctxt_switches:        2
nonvoluntary_ctxt_switches:     0
localhost ~ # echo "Look mom, no privs!" > /run/fuse_mnt/file
localhost ~ # ls -l /run/fuse_mnt
total 1
-rwxrwxrwx. 1 nobody nogroup 20 Aug 20 13:54 file
localhost ~ # cat /run/fuse_mnt/file
Look mom, no privs!
localhost ~ # umount /run/fuse_mnt
localhost ~ # pidof mount.ntfs-3g

There is probably a prettier way to do this by patching libfuse to accept a pre-opened and pre-mounted /dev/fuse file descriptor so we don't have to emulate fusermount behavior. Anyhow, I think it's very much worthwhile to implement a version of this in cros-disks so we can put the fuse helpers in a very strict sandbox.
fuse_no_privs.c
3.4 KB View Download
Blocking: 866129
OK, so here's another version that relies on a small libfuse patch (attached) that makes libfuse honor a new fuse_fd parameter that directs it to use an inherited /dev/fuse fd instead of trying to open /dev/fuse and mounting the file system itself (or indirectly via fusermount). Opening /dev/fuse and mounting happens in advance.

localhost ~ # exec 17<>/dev/fuse
localhost ~ # mount -i -o fd=17,rootmode=40000,user_id=0,group_id=0 -t fuse none /run/fuse_mnt &
[1] 13331                                                                                                                           
localhost ~ # minijail0 -u nobody -g nogroup -c 0 -e -l -n -N -p -r -v -P /var/empty -b / -b /dev -b /proc -k tmpfs,/run,tmpfs,0xe -b /tmp/ntfs.bin,/run/ntfs.bin,1 /usr/bin/ntfs-3g -o fuse_fd=17 /run/ntfs.bin '' &                                                    
[2] 13333
localhost ~ # cat /proc/$(pidof ntfs-3g)/status
Name:   ntfs-3g
State:  S (sleeping)
Tgid:   13337
Ngid:   0
Pid:    13337
PPid:   13334
TracerPid:      0
Uid:    65534   65534   65534   65534
Gid:    65533   65533   65533   65533
FDSize: 64
Groups:
NStgid: 13337   3
NSpid:  13337   3
NSpgid: 13337   3
NSsid:  13337   3
VmPeak:     9632 kB
VmSize:     9632 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:       548 kB
VmRSS:       548 kB
VmData:      668 kB
VmStk:       132 kB
VmExe:        48 kB
VmLib:      2452 kB
VmPTE:        36 kB
VmSwap:        0 kB
Threads:        1
SigQ:   0/15568
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180004003
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000
NoNewPrivs:     1
Seccomp:        0
Cpus_allowed:   f
Cpus_allowed_list:      0-3
Mems_allowed:   1
Mems_allowed_list:      0
voluntary_ctxt_switches:        1
nonvoluntary_ctxt_switches:     3
[1]-  Done                    mount -i -o fd=17,rootmode=40000,user_id=0,group_id=0 -t fuse none /run/fuse_mnt                      
localhost ~ # cat /proc/$(pidof ntfs-3g)/mounts                                                                                     
rootfs / rootfs rw,seclabel 0 0                                                                                                     
/dev/root / ext2 ro,seclabel,relatime 0 0
devtmpfs /dev devtmpfs ro,seclabel,relatime,size=1992812k,nr_inodes=498203,mode=755 0 0
proc /proc proc ro,relatime 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
tmp /run/ntfs.bin tmpfs rw,seclabel,nodev,relatime 0 0
localhost ~ # ls -l /run/fuse_mnt                                                                                                   
total 0                                                                                                                             
-rwxrwxrwx. 1 nobody nogroup 0 Aug 27 05:58 file                                                                                    
localhost ~ # umount /run/fuse_mnt                                                                                                  

Some notes:

1. The mount command hangs because it tries to access the mounted target while the FUSE helper isn't running yet (this is easy to avoid in cros-disks).

2. The FUSE helper is started in a somewhat unconventional way (not even specifying the mount point to prevent it from stat()-ing it and hanging, no actual mount options). This might trigger unusual (and thus buggy) code paths. We can always give the FUSE helper harmless dummy parameters though. Probably makes sense to experiment with a couple more FUSE file systems to see what works best.

3. I'm throwing all the isolation options minijail has at the FUSE helper. The only thing lacking is seccomp policies, but that shouldn't be hard to add (and in case of avfsd, the policy already exists).

4. The patch is against fuse 2.9.7 as present in our tree - it'll probably make sense to adjust it for upstream libfuse and ask whether they're willing to accept it (or something similar that helps our use case).
0001-Add-option-to-parse-pre-created-FUSE-FD.patch
2.3 KB Download
Labels: -M-70 M-71
I've polished the FUSE change proposed in comment 16 a bit. Upstream pull request: https://github.com/libfuse/libfuse/pull/291
Labels: -M-71 M-72
Good news: The change to allow fully deprivileged fuse file systems has been merged in upstream FUSE. With https://github.com/libfuse/libfuse/commit/da7c9b228aaf31f37684e106b75262055ca440de you can now do this (note the new drop_privileges option):

root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# mount -t fuse3 -o drop_privileges hello /mnt/tmp              
root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# cat /mnt/tmp/hello                                            
Hello World!                                                                                                                        
root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# cat /proc/$(pgrep hello)/status                               
Name:   hello                                                                                                                       
Umask:  0022
State:  S (sleeping)
Tgid:   84194
Ngid:   0
Pid:    84194
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 64
Groups: 0 126 
NStgid: 84194
NSpid:  84194
NSpgid: 84194
NSsid:  84194
VmPeak:   225764 kB
VmSize:   160228 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:       324 kB
VmRSS:       324 kB
RssAnon:             276 kB
RssFile:              48 kB
RssShmem:              0 kB
VmData:    17136 kB
VmStk:       132 kB
VmExe:         8 kB
VmLib:      2060 kB
VmPTE:        56 kB
VmPMD:        12 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
Threads:        3
SigQ:   3/257200
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000080
SigCgt: 0000000180005003
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000000000000000
CapAmb: 0000000000000000
Seccomp:        0
Cpus_allowed:   ffff,ffffffff
Cpus_allowed_list:      0-47
Mems_allowed:   00000000,00000003
Mems_allowed_list:      0-1
voluntary_ctxt_switches:        1
nonvoluntary_ctxt_switches:     0


So I think we have a clear path forward now. Up next:

1. Figure out how to make this available in our libfuse copy. Can we just uprev? Or merely apply the patches for now?
2. Tweak cros-disks to use this, either by (1) launching the FUSE file system via mount.fuse3 or (2) by opening /dev/fuse itself and passing the file descriptor to the FUSE file system.
3. Re-assess the sandbox and see whether we can now tighten more.

dats@, benchan@ - any chance one of you can pick these up?
If looked a tiny bit into upgrading fuse to a fuse3 version. Turns out that our packages that depend on fuse mostly don't support fuse3 yet :-/

So backporting https://github.com/libfuse/libfuse/commit/64e11073b9347fcf9c6d1eea143763ba9e946f70 might be the best choice for now?
Here's a CL for the backport in case we decide to go for that: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1273218

That doesn't include the mount.fuse changes though, so it'd be on cros-disks to open /dev/fuse, mount, and pass the file descriptor to the FUSE file system.
Project Member

Comment 23 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/6df1b1b474622635f2ff5508aa6453a48e928817

commit 6df1b1b474622635f2ff5508aa6453a48e928817
Author: Mattias Nissler <mnissler@chromium.org>
Date: Fri Oct 12 22:16:52 2018

cros-disks: Tighten capabilities

It turns out that we are running quite a few subprocesses without
restricting capabilities, so these inherit cros-disks' comparatively
powerful capabilities. This change restricts capabilities as much as
possible in the current status quo.

BUG=chromium:866377,chromium:867509
TEST=Passes platform_CrosDisks{Filesystem,Format,Rename,Archive}

Change-Id: Iee40b4bedb76f90179eb73694b5713ceceb4bc2d
Reviewed-on: https://chromium-review.googlesource.com/1276166
Commit-Ready: Mattias Nissler <mnissler@chromium.org>
Tested-by: Mattias Nissler <mnissler@chromium.org>
Reviewed-by: Ben Chan <benchan@chromium.org>

[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/rename_manager.cc
[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/format_manager.cc
[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/archive_manager.cc
[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/fuse_mounter.cc
[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/device_ejector.cc
[modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/cros-disks.conf

Cc: zentaro@chromium.org
I poked a bit at cros-disks to see how hard it would be to make it open /dev/fuse, mount, then pass the file descriptor to the fuse binary. I've got it as far as mounting exfat successfully and passing platform_CrosDisksFilesystem.exfat. Needs more work though, in particular regarding figuring out how to determine the proper mount options/parameters to pass to the kernel. Need to take care of other stuff now though, so I dumped the experimental code I have here: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1283329
Cc: och...@chromium.org
Labels: -M-72
+ochang@ - I chatted briefly with him about this, in case he's interested in taking things further.
Cc: amistry@chromium.org
+amistry as he mentioned he may have some cycle

Comment 28 by amistry@chromium.org, Jan 20 (2 days ago)

Cc: dats@chromium.org
Owner: amistry@chromium.org

Comment 29 by amistry@chromium.org, Yesterday (32 hours ago)

Blockedon: 920092

Sign in to add a comment