PID/TID caching removal in glibc breaks the user namespaces sandbox |
|||||
Issue descriptionSome time ago, the glibc 2.24 package was updated in Fedora 25 and incorporated https://sourceware.org/git/?p=glibc.git;a=commit;h=c579f48edba88380635ab98cb612030e3ed8691e ("Remove cached PID/TID in clone"). Since then, I'm unable to run content_shell and chrome in debug mode, as I get crashes similar to this one at launch time (sometimes I just get a blank content_shell instead): [1:3:0620/164907.817244:109618146953:FATAL:lock_impl_posix.cc(65)] Check failed: rv == 0 (35 vs. 0). Resource deadlock avoided #0 0x7f59fc5ba17b <unknown> #1 0x7f59fc5b8e7c <unknown> #2 0x7f59fc62c9e3 <unknown> #3 0x7f59fc745cbd <unknown> #4 0x7f59ec2add03 base::Lock::Acquire() #5 0x7f59ec2acaa3 base::AutoLock::AutoLock() #6 0x7f59ec2e4414 mojo::edk::NodeController::OnAcceptBrokerClient() #7 0x7f59ec2db580 mojo::edk::NodeChannel::OnChannelMessage() #8 0x7f59ec2b0bb4 mojo::edk::Channel::OnReadComplete() #9 0x7f59ec2b328a mojo::edk::(anonymous namespace)::ChannelPosix::OnFileCanReadWithoutBlocking() #10 0x7f59fc66b191 <unknown> #11 0x7f59fc66c439 base::MessagePumpLibevent::OnLibeventNotification() #12 0x7f59fc8d3c3e event_process_active #13 0x7f59fc8d3287 event_base_loop #14 0x7f59fc66c633 base::MessagePumpLibevent::Run() #15 0x7f59fc656488 base::MessageLoop::Run() #16 0x7f59fc700b0d base::RunLoop::Run() #17 0x7f59fc7b84d4 base::Thread::Run() #18 0x7f59fc7b8d3a base::Thread::ThreadMain() #19 0x7f59fc79a0da base::(anonymous namespace)::ThreadFunc() #20 0x7f5a01e9b4a4 start_thread #21 0x7f59ecff87ef __GI___clone Received signal 6 #0 0x7f59fc5ba17b base::debug::StackTrace::StackTrace() #1 0x7f59fc5b8e7c base::debug::StackTrace::StackTrace() #2 0x7f59fc5b9c8f base::debug::(anonymous namespace)::StackDumpSignalHandler() #3 0x7f5a01ea5060 <unknown> #4 0x7f59ecf42fff __GI_raise #5 0x7f59ecf4442a __GI_abort #6 0x7f59fc5b6e76 base::debug::(anonymous namespace)::DebugBreak() #7 0x7f59fc5b6e58 base::debug::BreakDebugger() #8 0x7f59fc62ce64 logging::LogMessage::~LogMessage() #9 0x7f59fc745cbd base::internal::LockImpl::Lock() #10 0x7f59ec2add03 base::Lock::Acquire() #11 0x7f59ec2acaa3 base::AutoLock::AutoLock() #12 0x7f59ec2e4414 mojo::edk::NodeController::OnAcceptBrokerClient() #13 0x7f59ec2db580 mojo::edk::NodeChannel::OnChannelMessage() #14 0x7f59ec2b0bb4 mojo::edk::Channel::OnReadComplete() #15 0x7f59ec2b328a mojo::edk::(anonymous namespace)::ChannelPosix::OnFileCanReadWithoutBlocking() #16 0x7f59fc66b191 base::MessagePumpLibevent::FileDescriptorWatcher::OnFileCanReadWithoutBlocking() #17 0x7f59fc66c439 base::MessagePumpLibevent::OnLibeventNotification() #18 0x7f59fc8d3c3e event_process_active #19 0x7f59fc8d3287 event_base_loop #19 0x7f59fc8d3287 event_base_loop #20 0x7f59fc66c633 base::MessagePumpLibevent::Run() #21 0x7f59fc656488 base::MessageLoop::Run() #22 0x7f59fc700b0d base::RunLoop::Run() #23 0x7f59fc7b84d4 base::Thread::Run() #24 0x7f59fc7b8d3a base::Thread::ThreadMain() #25 0x7f59fc79a0da base::(anonymous namespace)::ThreadFunc() #26 0x7f5a01e9b4a4 start_thread #27 0x7f59ecff87ef __GI___clone errno 35 is EDEADLK, which posix_mutex_lock() returns when the current thread already owns the given mutex. I haven't been able to figure out what exactly is going on, but that glibc commit + the user namespaces sandboxing code reliably fails here. Rolling back glibc, building an older commit from its releases/2.24/master branch, just passing --no-sandbox as well as passing --single-process also work fine. I've checked that it's not seccomp that's at fault here, as passing --disable-seccomp-filter-sandbox still leads to the same crash. I've tried both Linux 4.10 and 4.11 (I'm currently on 4.11.5) with the same results.
,
Jun 20 2017
Looks like glibc stopped populating the PID/TID cache in clone, which we rely on in our ForkWithFlags function: https://cs.chromium.org/chromium/src/base/process/launch.h?type=cs&q=ForkWithFlags&sq=package:chromium&l=313 Annoyingly, they stopped populating the TID cache in clone, but they didn't actually remove all uses of the cache: https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;h=dc9ca4c4764be2654141493330bd8a91a989f601;hb=HEAD#l148 At a glance, I unfortunately didn't see a nice way to trigger population of the tid cache short of doing a pointless additional fork or I guess doing horrible, potentially libc version dependent guessing to figure out where the tid cache lives in TLS and fix it up. glibc has unfortunately done a very thorough job here of making it an utter pain to call clone() with the various namespacing flags. I haven't looked at this code in a while, but as ugly as it is, I wonder if we could get away with using: unshare(CLONE_NEWPID); fork(); in NamespaceSandbox::ForkInNewPidNamespace. I believe our other main use of clone, starting up the zygote, is fine because the cloned child with the invalid cached tid forks off the real zygote process (fork populates the tid cache), and the original process becomes a reaper process (CreateInitProcessReaper) which should not do any pthread stuff that depends on a cached PID/TID.
,
Jun 22 2017
That's a very interesting explanation, thanks for the write-up. Is there anyone who I could assign this bug to, or is there anything we need to do in glibc?
,
Jun 23 2017
,
Jun 23 2017
Just to add another data point, I recently started running into this (trying to run a component debug chromium build) on Debian Testing (now buster) with user namespaces enabled (Debian defaults to disabled). It appears that the recent Debian Stable (stretch) also has this glibc change. Note that (on Debian) if I disable user namespaces sh -c 'echo 0 > /proc/sys/kernel/unprivileged_userns_clone' and then build the 'old' chrome_sandbox and set it up then that works fine (as expected). Since Debian ships with user namespaces off by default, this means most Debian users probably aren't seeing this issue (since user namespaces are disabled). Ubuntu, however, turns user namespaces on by default, so the next Ubuntu release will surface this sort of issue far more often.
,
Jun 23 2017
I'll take it for now to see if I can hand it off to someone else (maybe Robert?). We should at least figure out an owner before M-61 branches.
,
Jun 23 2017
Also, for those following at home, note that Ricky's suggestion works by putting the child created by fork() in a new namespace. unshare(CLONE_NEWPID) never changes the pid namespace of the current process, because a process should never see its own pid change during its lifetime.
,
Jul 14 2017
Friendly ping here. /proc/sys/kernel/unprivileged_userns_clone is specific to Debian/Ubuntu AFAIK; with Fedora 26 being based on glibc 2.25 I think it'll be harder to keep a local checkout with the pid caching change reverted, and like bungeman@ said this will probably cause a lot of pain for Ubuntu users in the future.
,
Jul 14 2017
I'll take another look next week. We definitely need to fix this somehow.
,
Jul 19 2017
Upstream glibc bug created: https://sourceware.org/bugzilla/show_bug.cgi?id=21793 Also, we have a workaround CL: https://chromium-review.googlesource.com/c/575769/ Maybe it would also be possible to statically link libc until the upstream fix lands and is backported?
,
Jul 20 2017
Thanks for creating the upstream glibc, bug. I want to respond directly here to reiterate a point I made there too. Once you call clone() you own the created thread, and you cannot use any POSIX Thread functions provided by the libc you left behind when you cloned a new thread manually. If you are using clone() as a convenience function for unshare() then you should use unshare(). Take care that calling clone() yourself has implications on all the thread APIs, from thread local storage, to cancellation, to joining, to exiting threads, etc. You really really have to be ready to take over all aspects of the new thread using your own threading runtime. The point of clone() is to implement threading libraries, like those provided by glibc. I'm all ears when it comes to hearing about what problems you use clone() to solve and finding a better solution for you.
,
Jul 20 2017
Lastly, regarding comment 10, I strongly suggest against bundling/statically linking to glibc, such a solution will cause you all sorts of of potential pain, from security (need to recompile for latest CVE fixes), to deployment (all IdM [read NSS] functions go through the normal plugin infrastructure [read dlopen()] and require matching build/host runtimes).
,
Jul 20 2017
thomasanderson@: Thanks for making the workaround - did you by any chance try the unshare(CLONE_NEWPID); fork() idea mentioned in comment 2 and run into issues? It would be nice not to pay the extra fork on Linux, and while it's not very likely to result in harmful side effects, hardcoding offsets into internal glibc structures is... unfortunate. codonell@: Thanks for posting here. It's been a while since I've looked at Chrome's Linux sandbox, but here's what I recall about the functionality we wanted clone for: Say you process that wants to repeatedly fork children, some which are placed in a separate user and PID namespace and some which are not. We cannot repeatedly call unshare(CLONE_NEWUSER | CLONE_NEWPID); fork(); because this places the parent process into the user namespace (and the parent process cannot rejoin its original user namespace/reset the PID namespace for future children without having CAP_SYS_ADMIN in the appropriate original namespaces). What we really want is a version of fork which supports these clone flags, but also updates THREAD_SELF->tid. Right now, if someone wants to do this, the best way I can think of is to pay an extra fork: fork(); unshare(...); fork(). All that said, while I think the above is a situation that I think is difficult to do with what glibc provides today, and I'd love for it to be conveniently doable, it does not perfectly match the situation that we're running into in this bug. Like I mentioned in comment #2, I think we might be able to get away with unshare(CLONE_NEWPID); fork() each time the zygote forks, since in our situation: - The the main zygote process should have a correct cached TID. The zygote is started using our problematic ForkWithFlags API, so immediately after calling ForkWithFlags, the resulting child process does have an incorrect cached TID. Luckily for us, we use this process as init for the zygote's PID namespace, so we never call any thread APIs in this thread. Instead, we fork off a process for the real zygote process, and doing so writes a correct cached TID. - The zygote never needs to fork a process that isn't in a new PID namespace. - The children that the zygote spawns remain in the same user namespace as the zygote (we just drop all capabilities in the child). The zygote process already has CAP_SYS_ADMIN in its user namespace, so unshare(CLONE_NEWPID) should be allowed. I haven't tested this approach at all though, so folks can correct me if I'm mistaken.
,
Jul 20 2017
Er correction: Everything I said about the zygote process becoming the init process is irrelevant because we start the zygote via LaunchProcess, which is a ForkWithFlags followed by an execve. After execve, the processes TID cache is setup properly, obviously :-) So whether we go with the double-forking approach or the unshare approach (assuming it works), we only need to worry about fixing NamespaceSandbox::ForkInNewPidNamespace.
,
Jul 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/24df419517bb8ca8ac2b75eb752bd9a133267108 commit 24df419517bb8ca8ac2b75eb752bd9a133267108 Author: Tom Anderson <thomasanderson@chromium.org> Date: Tue Jul 25 02:41:01 2017 Update glibc's thread descriptor after clone()'ing Bug: 735048 Change-Id: Ic60bc676c4d743e24168cf9f7cf3e4b6414ba3b4 R: jorgelo@chromium.org Reviewed-on: https://chromium-review.googlesource.com/575769 Commit-Queue: Thomas Anderson <thomasanderson@chromium.org> Reviewed-by: Ricky Zhou <rickyz@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org> Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org> Cr-Commit-Position: refs/heads/master@{#489195} [modify] https://crrev.com/24df419517bb8ca8ac2b75eb752bd9a133267108/base/process/launch.h [modify] https://crrev.com/24df419517bb8ca8ac2b75eb752bd9a133267108/sandbox/linux/services/namespace_sandbox.cc
,
Jul 25 2017
Should be fixed now. Thanks Ricky for all the help!
,
Jul 25 2017
,
Jul 27 2017
This is a warning that the fix you have committed in 24df419517bb8ca8ac2b75eb752bd9a133267108 is completely unsupported and will break when we change the TCB layout. You have no guarantees given by writing into implementation dependent structures. Your only safe implementation is to avoid using clone(), and use the double-fork() which was suggested in comment 13.
,
Jul 27 2017
Thanks, we're aware that we're modifying internal glibc data structures. However, the layout doesn't appear to have changed in almost a decade. And if it does, uses will get a loud CHECK() failure and we'll see it in our crash reports and be able to respond quickly |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jorgelo@chromium.org
, Jun 20 2017Labels: OS-Chrome