Investigate and implement further hardening FUSE file systems mounted by cros-disks |
|||||||||||||||
Issue descriptionSpin-off from issue 866016 : The sshfs and ssh processes on the Chrome OS side are running with powerful capabilities. If that's indeed the case (I haven't verified), then that's risky since these processes directly handle untrusted input. We should perform some due diligence to see if/how we can improve this: * As a first step: Determine and document what processes are running, what privileges (capabilities in particular) they have and what IPC surface they open (network ports and such). Results of this investigation should ideally go into the security section of the design doc. * As a second step: See what we can do to tighten the sandbox. We might need CAP_SYS_ADMIN for mount syscalls etc., but maybe we can drop more capabilities in process that don't need them and/or before we actually start processing untrusted data?
,
Jul 24
So cros-disks set up the sandbox for every FUSE helper tho inherit its caps, but not to be able to gain new through PR_SET_NO_NEW_PRIVS. As a result every one of them will have chown, setgid, setuid and sys_admin, out of which only sys_admin is actually used (rest needed only for cros-disks), and only by the fusermount which already has sys_admin fcap, so if wasn't for the particular setup it would work without any caps at all. First thing first: we are using third party FUSE modules, and we will use them as there is absolutely no way we are going to reimplement every each one of them to do some kind of proxying of syscalls. Now what I think we can actually do: Ideally we would drop all caps from a sandboxed FUSE module but leave possibility to obtain sys_admin through exec fusermount. But from my reading of minijail code this particular combination is not supported - by dropping a capability it also blocks it from reobtaining it back through exec (?). In the current format we can only drop all but sys_admin. Another option would be changing libminijail to be more flexible on what it does with capabilities, so it would support different caps and bcaps. I'd assume it's possible, but I guess jorgelo@ would know better.
,
Jul 24
Copying relevant bits from my reply on the doc comment thread (which is becoming unwieldy anyways): In case of cros-disks / FUSE you need CAP_SYS_ADMIN to be able to perform mount and unmount system calls. Ideally, the helpers that actually implement the file system wouldn't run with that privilege though. The art is to find a way to split things up so the privileged parts a simple and safe, while the tricky and dangerous parts run in a completely deprivileged process. I am not too familiar with FUSE, but conceptually it should be possible to have one process open /dev/fuse, perform the mount system call, then fork a deprivileged child process that runs the file system implementation (passing the /dev/fuse file descriptor). I doesn't look like the current fuse implementation supports this, but maybe it's worth a shot to attempt implementing this (and upstreaming it?)
,
Jul 24
Another idea: FUSE already supports controlled privilege escalation via fusermount, but as previously discussed that's designed based on setuid which we'd like to avoid. Spelunking a bit in the code, it looks like fusermount does the following: (1) opening /dev/fuse and (2) mounting the file system. Now if we did these two things *before* spawning the file system helper, then make a stub fusermount implementation that just passes the /dev/fuse FD via the fusermount mechanism (it uses a pipe to pass the FD) back to the file system helper, we might be able to execute the helper fully deprivileged? Sounds a bit crazy, but probably worth a shot?
,
Jul 25
Problem arises that different helpers have different arguments, and validation of the arguments, interactive logins, etc. must happen before mount() is called and /dev/fuse opened. Also unmount() comes from untrusted code for some situations, e.g. shutdown/disconnect handling. So you definitely can't just run all unprivileged code strictly after privileged. Maybe we can split fusermount into privileged daemon and unnpriviliged client and make them talk IPC passing /dev/fuse descriptor, but that's unlikely will ever get upstreamed, so we will become responsible for maintaining it in a compatible way. My gut feeling is that it might not go well in the long run. How bad is having CAPBSET permitting sys_admin only, and drop all caps for the helper?
,
Jul 25
Are you saying that the helper needs to adjust mount options depending on user interaction? That sounds scary on its own - do you have specific example? I might be wrong, but if I understand the code correctly, then the sequence of events is this: 1. FD=open(/dev/fuse) 2. call mount(), passing fd=FD as a mount option 3. The kernel will start sending requests via the FD to interact with the file system What are the specific technical reasons you can think of that prevent steps 1 and 2 to happen before spawning the helper? At cleanup time, there's an umount() call and the FD gets closed. I don't think the kernel can make any assumptions on the order of these and needs to handle both situations gracefully. So cros-disks could just handle the helper exit and perform the umount() itself.
,
Jul 25
Forgot about the cap_bset question: Even if we enable the capability in the bounding set, how would the fusermount child process acquire it? Are you thinking file capabilities? That has many of the same problems that setuid has, so I'd rather avoid it.
,
Jul 26
Before any mount there should be some clear understanding what is being mounted and whether it's legit to mount that thing in the present situation, with provided credentials, etc. You can't just mount and then check if the thing you are mounting even exists. Yes, I mean fcaps. It has fewer problems than suid - it's only one capability after all, not all of them.
,
Jul 26
> Before any mount there should be some clear understanding what is being mounted and whether it's legit to mount that thing in the present situation, with provided credentials, etc. You can't just mount and then check if the thing you are mounting even exists. Why not? If the FUSE helper fails, we just unmount and we're good. The decision what gets mount and where is made before opening /dev/fuse and mounting anyways. Note that this isn't any different from a FUSE helper failure or crash after the mount() takes place (e.g. the SSH connection might die at any time, so you can be sure the target will keep existing even if you checked before). > Yes, I mean fcaps. It has fewer problems than suid - it's only one capability after all, not all of them. CAP_SYS_ADMIN is the most powerful capability and has been shown to allow escalation to full privilege repeatedly in the past (for the unlikely event that CAP_SYS_ADMIN doesn't already give the attacker sufficient powers already).
,
Jul 27
This really applies to all FUSE file systems mounted by cros-disks, adjusting title accordingly.
,
Aug 8
,
Aug 9
,
Aug 9
,
Aug 20
Found some time today to make a proof-of-concept of the idea discussed above (i.e. opening /dev/fuse and mounting up front, then spawning the fuse helper). To play with this: 1. Get fuse_no_privs.c and compile like so: clang -Wall -o /tmp/fuse_no_privs ~/fuse_no_privs.c -lminijail 2. Copy to Chrome OS device or VM: scp /tmp/fuse_no_privs $CROS:/tmp/ 3. Build ntfs-3g with full libfuse (libfuse-lite doesn't support fusermount): USE=external-fuse emerge-$BOARD -v1 ntfs3g 4. cros deploy --board $BOARD $CROS ntfs3g Then, on the device or VM (note in particular the Cap* and NoNewPrivs lines in the proc status output): localhost ~ # dd if=/dev/zero of=/tmp/ntfs.bin count=100 bs=1M 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.138572 s, 757 MB/s localhost ~ # mkfs.ntfs -F /tmp/ntfs.bin /tmp/ntfs.bin is not a block device. mkntfs forced anyway. The sector size was not specified for /tmp/ntfs.bin and it could not be obtained automatically. It has been set to 512 bytes. The partition start sector was not specified for /tmp/ntfs.bin and it could not be obtained automatically. It has been set to 0. The number of sectors per track was not specified for /tmp/ntfs.bin and it could not be obtained automatically. It has been set to 0. The number of heads was not specified for /tmp/ntfs.bin and it could not be obtained automatically. It has been set to 0. Cluster size has been automatically set to 4096 bytes. To boot from a device, Windows needs the 'partition start sector', the 'sectors per track' and the 'number of heads' to be set. Windows will not be able to boot from this device. Initializing device with zeroes: 100% - Done. Creating NTFS volume structures. mkntfs completed successfully. Have a nice day. localhost ~ # chown nobody:nogroup /tmp/ntfs.bin localhost ~ # mkdir /run/fuse_mnt localhost ~ # /tmp/fuse_no_privs $(which mount.ntfs-3g) /tmp/ntfs.bin /run/fuse_mnt Minijailed fuse helper started: 0 Mounted successfully at /run/fuse_mnt. localhost ~ # ls -ld /run/fuse_mnt drwxrwxrwx. 1 nobody nogroup 4096 Aug 20 13:52 /run/fuse_mnt localhost ~ # cat /proc/$(pidof mount.ntfs-3g)/status Name: mount.ntfs-3g State: S (sleeping) Tgid: 9854 Ngid: 0 Pid: 9854 PPid: 1 TracerPid: 0 Uid: 65534 65534 65534 65534 Gid: 65533 65533 65533 65533 FDSize: 64 Groups: NStgid: 9854 NSpid: 9854 NSpgid: 9854 NSsid: 9854 VmPeak: 9628 kB VmSize: 9628 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 528 kB VmRSS: 528 kB VmData: 664 kB VmStk: 132 kB VmExe: 48 kB VmLib: 2452 kB VmPTE: 36 kB VmSwap: 0 kB Threads: 1 SigQ: 0/15568 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180004003 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000000000000000 CapAmb: 0000000000000000 NoNewPrivs: 1 Seccomp: 0 Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 2 nonvoluntary_ctxt_switches: 0 localhost ~ # echo "Look mom, no privs!" > /run/fuse_mnt/file localhost ~ # ls -l /run/fuse_mnt total 1 -rwxrwxrwx. 1 nobody nogroup 20 Aug 20 13:54 file localhost ~ # cat /run/fuse_mnt/file Look mom, no privs! localhost ~ # umount /run/fuse_mnt localhost ~ # pidof mount.ntfs-3g There is probably a prettier way to do this by patching libfuse to accept a pre-opened and pre-mounted /dev/fuse file descriptor so we don't have to emulate fusermount behavior. Anyhow, I think it's very much worthwhile to implement a version of this in cros-disks so we can put the fuse helpers in a very strict sandbox.
,
Aug 24
,
Aug 28
OK, so here's another version that relies on a small libfuse patch (attached) that makes libfuse honor a new fuse_fd parameter that directs it to use an inherited /dev/fuse fd instead of trying to open /dev/fuse and mounting the file system itself (or indirectly via fusermount). Opening /dev/fuse and mounting happens in advance. localhost ~ # exec 17<>/dev/fuse localhost ~ # mount -i -o fd=17,rootmode=40000,user_id=0,group_id=0 -t fuse none /run/fuse_mnt & [1] 13331 localhost ~ # minijail0 -u nobody -g nogroup -c 0 -e -l -n -N -p -r -v -P /var/empty -b / -b /dev -b /proc -k tmpfs,/run,tmpfs,0xe -b /tmp/ntfs.bin,/run/ntfs.bin,1 /usr/bin/ntfs-3g -o fuse_fd=17 /run/ntfs.bin '' & [2] 13333 localhost ~ # cat /proc/$(pidof ntfs-3g)/status Name: ntfs-3g State: S (sleeping) Tgid: 13337 Ngid: 0 Pid: 13337 PPid: 13334 TracerPid: 0 Uid: 65534 65534 65534 65534 Gid: 65533 65533 65533 65533 FDSize: 64 Groups: NStgid: 13337 3 NSpid: 13337 3 NSpgid: 13337 3 NSsid: 13337 3 VmPeak: 9632 kB VmSize: 9632 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 548 kB VmRSS: 548 kB VmData: 668 kB VmStk: 132 kB VmExe: 48 kB VmLib: 2452 kB VmPTE: 36 kB VmSwap: 0 kB Threads: 1 SigQ: 0/15568 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000180004003 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000000000000000 CapAmb: 0000000000000000 NoNewPrivs: 1 Seccomp: 0 Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 1 nonvoluntary_ctxt_switches: 3 [1]- Done mount -i -o fd=17,rootmode=40000,user_id=0,group_id=0 -t fuse none /run/fuse_mnt localhost ~ # cat /proc/$(pidof ntfs-3g)/mounts rootfs / rootfs rw,seclabel 0 0 /dev/root / ext2 ro,seclabel,relatime 0 0 devtmpfs /dev devtmpfs ro,seclabel,relatime,size=1992812k,nr_inodes=498203,mode=755 0 0 proc /proc proc ro,relatime 0 0 tmpfs /run tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0 tmp /run/ntfs.bin tmpfs rw,seclabel,nodev,relatime 0 0 localhost ~ # ls -l /run/fuse_mnt total 0 -rwxrwxrwx. 1 nobody nogroup 0 Aug 27 05:58 file localhost ~ # umount /run/fuse_mnt Some notes: 1. The mount command hangs because it tries to access the mounted target while the FUSE helper isn't running yet (this is easy to avoid in cros-disks). 2. The FUSE helper is started in a somewhat unconventional way (not even specifying the mount point to prevent it from stat()-ing it and hanging, no actual mount options). This might trigger unusual (and thus buggy) code paths. We can always give the FUSE helper harmless dummy parameters though. Probably makes sense to experiment with a couple more FUSE file systems to see what works best. 3. I'm throwing all the isolation options minijail has at the FUSE helper. The only thing lacking is seccomp policies, but that shouldn't be hard to add (and in case of avfsd, the policy already exists). 4. The patch is against fuse 2.9.7 as present in our tree - it'll probably make sense to adjust it for upstream libfuse and ask whether they're willing to accept it (or something similar that helps our use case).
,
Aug 28
,
Aug 29
I've polished the FUSE change proposed in comment 16 a bit. Upstream pull request: https://github.com/libfuse/libfuse/pull/291
,
Oct 4
,
Oct 10
Good news: The change to allow fully deprivileged fuse file systems has been merged in upstream FUSE. With https://github.com/libfuse/libfuse/commit/da7c9b228aaf31f37684e106b75262055ca440de you can now do this (note the new drop_privileges option): root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# mount -t fuse3 -o drop_privileges hello /mnt/tmp root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# cat /mnt/tmp/hello Hello World! root@toroa:/usr/local/google/home/mnissler/src/libfuse/build/example# cat /proc/$(pgrep hello)/status Name: hello Umask: 0022 State: S (sleeping) Tgid: 84194 Ngid: 0 Pid: 84194 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: 0 126 NStgid: 84194 NSpid: 84194 NSpgid: 84194 NSsid: 84194 VmPeak: 225764 kB VmSize: 160228 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 324 kB VmRSS: 324 kB RssAnon: 276 kB RssFile: 48 kB RssShmem: 0 kB VmData: 17136 kB VmStk: 132 kB VmExe: 8 kB VmLib: 2060 kB VmPTE: 56 kB VmPMD: 12 kB VmSwap: 0 kB HugetlbPages: 0 kB Threads: 3 SigQ: 3/257200 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000080 SigCgt: 0000000180005003 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000000000000000 CapAmb: 0000000000000000 Seccomp: 0 Cpus_allowed: ffff,ffffffff Cpus_allowed_list: 0-47 Mems_allowed: 00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 1 nonvoluntary_ctxt_switches: 0 So I think we have a clear path forward now. Up next: 1. Figure out how to make this available in our libfuse copy. Can we just uprev? Or merely apply the patches for now? 2. Tweak cros-disks to use this, either by (1) launching the FUSE file system via mount.fuse3 or (2) by opening /dev/fuse itself and passing the file descriptor to the FUSE file system. 3. Re-assess the sandbox and see whether we can now tighten more. dats@, benchan@ - any chance one of you can pick these up?
,
Oct 10
If looked a tiny bit into upgrading fuse to a fuse3 version. Turns out that our packages that depend on fuse mostly don't support fuse3 yet :-/ So backporting https://github.com/libfuse/libfuse/commit/64e11073b9347fcf9c6d1eea143763ba9e946f70 might be the best choice for now?
,
Oct 10
Here's a CL for the backport in case we decide to go for that: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1273218 That doesn't include the mount.fuse changes though, so it'd be on cros-disks to open /dev/fuse, mount, and pass the file descriptor to the FUSE file system.
,
Oct 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/6df1b1b474622635f2ff5508aa6453a48e928817 commit 6df1b1b474622635f2ff5508aa6453a48e928817 Author: Mattias Nissler <mnissler@chromium.org> Date: Fri Oct 12 22:16:52 2018 cros-disks: Tighten capabilities It turns out that we are running quite a few subprocesses without restricting capabilities, so these inherit cros-disks' comparatively powerful capabilities. This change restricts capabilities as much as possible in the current status quo. BUG=chromium:866377,chromium:867509 TEST=Passes platform_CrosDisks{Filesystem,Format,Rename,Archive} Change-Id: Iee40b4bedb76f90179eb73694b5713ceceb4bc2d Reviewed-on: https://chromium-review.googlesource.com/1276166 Commit-Ready: Mattias Nissler <mnissler@chromium.org> Tested-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Ben Chan <benchan@chromium.org> [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/rename_manager.cc [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/format_manager.cc [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/archive_manager.cc [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/fuse_mounter.cc [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/device_ejector.cc [modify] https://crrev.com/6df1b1b474622635f2ff5508aa6453a48e928817/cros-disks/cros-disks.conf
,
Oct 16
,
Oct 16
I poked a bit at cros-disks to see how hard it would be to make it open /dev/fuse, mount, then pass the file descriptor to the fuse binary. I've got it as far as mounting exfat successfully and passing platform_CrosDisksFilesystem.exfat. Needs more work though, in particular regarding figuring out how to determine the proper mount options/parameters to pass to the kernel. Need to take care of other stuff now though, so I dumped the experimental code I have here: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1283329
,
Oct 18
+ochang@ - I chatted briefly with him about this, in case he's interested in taking things further.
,
Oct 18
+amistry as he mentioned he may have some cycle
,
Jan 20
(2 days ago)
,
Yesterday
(32 hours ago)
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by dgreid@chromium.org
, Jul 23