Locking problems in binder when running ARC++ with v4.12 kernels |
||||||
Issue descriptionIf ChromeOS specific Android binder patches from chromeos-4.4 are applied to chromeos-4.12, the following locking problems have been observed. [ 60.537913] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/mm/vmalloc.c:1491 [ 60.550905] in_atomic(): 1, irqs_disabled(): 0, pid: 97, name: kworker/2:2 [ 60.558634] CPU: 2 PID: 97 Comm: kworker/2:2 Not tainted 4.12.7 #10 [ 60.565651] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017 [ 60.573837] Workqueue: events binder_deferred_func [ 60.579200] Call Trace: [ 60.581958] dump_stack+0x4d/0x63 [ 60.585674] ___might_sleep+0x192/0x1a9 [ 60.589963] __might_sleep+0xe1/0xed [ 60.593959] ? binder_deferred_func+0x71a/0x7d3 [ 60.599053] remove_vm_area+0x26/0xe8 [ 60.603153] __vunmap+0x46/0xd1 [ 60.606672] vfree+0x78/0x7b [ 60.609901] binder_deferred_func+0x723/0x7d3 [ 60.614776] process_one_work+0x2ca/0x4da [ 60.619268] worker_thread+0x2e2/0x449 [ 60.623466] ? create_worker+0x2f9/0x2f9 [ 60.627856] kthread+0x221/0x231 [ 60.631467] ? kthread_flush_work+0x147/0x147 [ 60.636343] ret_from_fork+0x22/0x30 [ 130.894317] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/security/selinux/hooks.c:252 [ 130.908348] in_atomic(): 1, irqs_disabled(): 0, pid: 3601, name: Binder:27_2 [ 130.916263] CPU: 2 PID: 3601 Comm: Binder:27_2 Tainted: G W 4.12.7 #12 [ 130.924835] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017 [ 130.933010] Call Trace: [ 130.935751] dump_stack+0x4d/0x63 [ 130.939462] ___might_sleep+0x192/0x1a9 [ 130.943752] __might_sleep+0xe1/0xed [ 130.947750] ? __inode_security_revalidate+0x22/0x83 [ 130.953303] __inode_security_revalidate+0x3e/0x83 [ 130.958665] backing_inode_security+0x2d/0x3f [ 130.963545] selinux_binder_transfer_file+0xad/0x147 [ 130.969101] security_binder_transfer_file+0x3e/0x5a [ 130.974656] binder_translate_fd+0x191/0x370 [ 130.979432] ? _copy_from_user+0x5f/0x8e [ 130.983818] binder_transaction+0x1e23/0x2b02 [ 130.988692] binder_thread_write+0xd62/0x17dc [ 130.993565] ? avc_has_perm_noaudit+0xdd/0x103 [ 130.998535] binder_ioctl_write_read+0x16f/0x3dd [ 131.003708] ? __might_sleep+0xe1/0xed [ 131.007906] ? __might_fault+0x48/0x65 [ 131.012101] binder_ioctl+0x18a/0x59d [ 131.016198] vfs_ioctl+0x4b/0x5d [ 131.019808] do_vfs_ioctl+0x643/0x666 [ 131.023895] SyS_ioctl+0x57/0x79 [ 131.027506] entry_SYSCALL_64_fastpath+0x27/0xa8 This is seen with the following patches applied to chromeos-4.12 (cherry-picked from chromeos-4.4). CHROMIUM: android: binder: another use of binder_lock_interruptible android: binder: Turn off warning for NO_WAIT allocation CHROMIUM: android: binder: do not create thread when trying to kill it CHROMIUM: android: binder: introduce binder_lock_interruptible CHROMIUM: android: binder: do not take binder lock when thread forcibly woken up CHROMIUM: android: binder: push binder_lock down into ioctl ops CHROMIUM: android: binder: reset BINDER_LOOPER_STATE_NEED_RETURN earlier FROMLIST: android: binder: check result of binder_get_thread() in binder_poll() CHROMIUM: android: binder: Fix potential scheduling-while-atomic android: binder: Disable preemption while holding the global binder lock. android: binder: Use wake up hint for synchronous transactions. CHROMIUM: android: binder: get rid of BINDER_LOOPER_STATE_WAITING It is not seen if the above patches are not applied.
,
Aug 15 2017
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8caddac16a9d..25603bc4d658 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1778,7 +1778,9 @@ static int binder_translate_fd(int fd,
ret = -EBADF;
goto err_fget;
}
+ preempt_enable_no_resched();
ret = security_binder_transfer_file(proc->tsk, target_proc->tsk, file);
+ preempt_disable();
if (ret < 0) {
ret = -EPERM;
goto err_security;
@@ -3813,7 +3815,9 @@ static void binder_deferred_release(struct binder_proc *proc)
page_count++;
}
kfree(proc->pages);
+ preempt_enable_no_resched();
vfree(proc->buffer);
+ preempt_disable();
}
put_task_struct(proc->tsk);
fixes the above problems but is likely incomplete, and I have no idea if the fix is correct. It follows 'CHROMIUM: android: binder: Fix potential scheduling-while-atomic'.
,
Aug 16 2017
I would be tempted to drop the CHROMIUM binder patches from 4.12. Specifically note that upstream is working on removing the one global binder lock. It's currently staged in linuxnext, so perhaps it will go into 4.14? With the upstream patches to remove the global binder lock nearly all of our patches aren't needed. Specifically they focus on dealing with performance issues around the global binder lock and thus they make no sense when there is no global binder lock. So we should somehow note that we should pick the upstream binder patches when they land upstream. In the short term (even if we don't want to just pick the staged patches) one could also argue that our local patches aren't needed. They are mostly for performance, not correctness. Thus, if we're waiting for things to settle upstream we can just drop them. --- NOTE: Dmitry expressed some dissatisfaction with the binder patches that are going upstream and thinks that they will need some cleanup. Some of the local CHROMIUM patches could be used as inspiration for some of that cleanup. In the past he has volunteered to lead this effort.
,
Aug 16 2017
#3: sgtm. Should I apply the post-4.12 binder patches instead ? it might be worth a try, and it would ensure some additional test coverage.
,
Aug 16 2017
I do not think this is the binder locking patches that are to blame. We have this whole preempt_enable_no_resched() sprinkled across binder that is coming form Android and not in upstream. Dylan will have toadjust the code brought from upstream to account for this.
,
Aug 16 2017
Ah, sorry, it is 4.12, not 4.4.
,
Aug 16 2017
@4: It seems line a sane thing to do.
,
Aug 16 2017
Ok, I'll do that and refer to this bug.
,
Aug 16 2017
Created bug #756083 to track cleanups.
,
Sep 1 2017
Consider this one fixed. The original problem is no longer seen after switching to upstream binder code. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by groeck@chromium.org
, Aug 15 2017