New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 755697 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Locking problems in binder when running ARC++ with v4.12 kernels

Project Member Reported by groeck@chromium.org, Aug 15 2017

Issue description

If ChromeOS specific Android binder patches from chromeos-4.4 are applied to chromeos-4.12, the following locking problems have been observed.

[   60.537913] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/mm/vmalloc.c:1491
[   60.550905] in_atomic(): 1, irqs_disabled(): 0, pid: 97, name: kworker/2:2
[   60.558634] CPU: 2 PID: 97 Comm: kworker/2:2 Not tainted 4.12.7 #10
[   60.565651] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017
[   60.573837] Workqueue: events binder_deferred_func
[   60.579200] Call Trace:
[   60.581958]  dump_stack+0x4d/0x63
[   60.585674]  ___might_sleep+0x192/0x1a9
[   60.589963]  __might_sleep+0xe1/0xed
[   60.593959]  ? binder_deferred_func+0x71a/0x7d3
[   60.599053]  remove_vm_area+0x26/0xe8
[   60.603153]  __vunmap+0x46/0xd1
[   60.606672]  vfree+0x78/0x7b
[   60.609901]  binder_deferred_func+0x723/0x7d3
[   60.614776]  process_one_work+0x2ca/0x4da
[   60.619268]  worker_thread+0x2e2/0x449
[   60.623466]  ? create_worker+0x2f9/0x2f9
[   60.627856]  kthread+0x221/0x231
[   60.631467]  ? kthread_flush_work+0x147/0x147
[   60.636343]  ret_from_fork+0x22/0x30


[  130.894317] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/security/selinux/hooks.c:252
[  130.908348] in_atomic(): 1, irqs_disabled(): 0, pid: 3601, name: Binder:27_2
[  130.916263] CPU: 2 PID: 3601 Comm: Binder:27_2 Tainted: G        W       4.12.7 #12
[  130.924835] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017
[  130.933010] Call Trace:
[  130.935751]  dump_stack+0x4d/0x63
[  130.939462]  ___might_sleep+0x192/0x1a9
[  130.943752]  __might_sleep+0xe1/0xed
[  130.947750]  ? __inode_security_revalidate+0x22/0x83
[  130.953303]  __inode_security_revalidate+0x3e/0x83
[  130.958665]  backing_inode_security+0x2d/0x3f
[  130.963545]  selinux_binder_transfer_file+0xad/0x147
[  130.969101]  security_binder_transfer_file+0x3e/0x5a
[  130.974656]  binder_translate_fd+0x191/0x370
[  130.979432]  ? _copy_from_user+0x5f/0x8e
[  130.983818]  binder_transaction+0x1e23/0x2b02
[  130.988692]  binder_thread_write+0xd62/0x17dc
[  130.993565]  ? avc_has_perm_noaudit+0xdd/0x103
[  130.998535]  binder_ioctl_write_read+0x16f/0x3dd
[  131.003708]  ? __might_sleep+0xe1/0xed
[  131.007906]  ? __might_fault+0x48/0x65
[  131.012101]  binder_ioctl+0x18a/0x59d
[  131.016198]  vfs_ioctl+0x4b/0x5d
[  131.019808]  do_vfs_ioctl+0x643/0x666
[  131.023895]  SyS_ioctl+0x57/0x79
[  131.027506]  entry_SYSCALL_64_fastpath+0x27/0xa8


This is seen with the following patches applied to chromeos-4.12 (cherry-picked from chromeos-4.4).

CHROMIUM: android: binder: another use of binder_lock_interruptible
android: binder: Turn off warning for NO_WAIT allocation
CHROMIUM: android: binder: do not create thread when trying to kill it
CHROMIUM: android: binder: introduce binder_lock_interruptible
CHROMIUM: android: binder: do not take binder lock when thread forcibly woken up
CHROMIUM: android: binder: push binder_lock down into ioctl ops
CHROMIUM: android: binder: reset BINDER_LOOPER_STATE_NEED_RETURN earlier
FROMLIST: android: binder: check result of binder_get_thread() in binder_poll()
CHROMIUM: android: binder: Fix potential scheduling-while-atomic
android: binder: Disable preemption while holding the global binder lock.
android: binder: Use wake up hint for synchronous transactions.
CHROMIUM: android: binder: get rid of BINDER_LOOPER_STATE_WAITING

It is not seen if the above patches are not applied.

 

Comment 1 by groeck@chromium.org, Aug 15 2017

Cc: dtor@chromium.org diand...@chromium.org

Comment 2 by groeck@chromium.org, Aug 15 2017

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8caddac16a9d..25603bc4d658 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1778,7 +1778,9 @@ static int binder_translate_fd(int fd,
                ret = -EBADF;
                goto err_fget;
        }
+       preempt_enable_no_resched();
        ret = security_binder_transfer_file(proc->tsk, target_proc->tsk, file);
+       preempt_disable();
        if (ret < 0) {
                ret = -EPERM;
                goto err_security;
@@ -3813,7 +3815,9 @@ static void binder_deferred_release(struct binder_proc *proc)
                        page_count++;
                }
                kfree(proc->pages);
+               preempt_enable_no_resched();
                vfree(proc->buffer);
+               preempt_disable();
        }
 
        put_task_struct(proc->tsk);

fixes the above problems but is likely incomplete, and I have no idea if the fix is correct. It follows 'CHROMIUM: android: binder: Fix potential scheduling-while-atomic'.

Owner: dtor@chromium.org
I would be tempted to drop the CHROMIUM binder patches from 4.12.  Specifically note that upstream is working on removing the one global binder lock.  It's currently staged in linuxnext, so perhaps it will go into 4.14?

With the upstream patches to remove the global binder lock nearly all of our patches aren't needed.  Specifically they focus on dealing with performance issues around the global binder lock and thus they make no sense when there is no global binder lock.  So we should somehow note that we should pick the upstream binder patches when they land upstream.

In the short term (even if we don't want to just pick the staged patches) one could also argue that our local patches aren't needed.  They are mostly for performance, not correctness.  Thus, if we're waiting for things to settle upstream we can just drop them.

---

NOTE: Dmitry expressed some dissatisfaction with the binder patches that are going upstream and thinks that they will need some cleanup.  Some of the local CHROMIUM patches could be used as inspiration for some of that cleanup.  In the past he has volunteered to lead this effort.


Comment 4 by groeck@chromium.org, Aug 16 2017

#3: sgtm. Should I apply the post-4.12 binder patches instead ? it might be worth a try, and it would ensure some additional test coverage.

Comment 5 by dtor@chromium.org, Aug 16 2017

Owner: dgreid@chromium.org
I do not think this is the binder locking patches that are to blame. We have this whole preempt_enable_no_resched() sprinkled across binder that is coming form Android and not in upstream. Dylan will have toadjust the code brought from upstream to account for this.

Comment 6 by dtor@chromium.org, Aug 16 2017

Owner: dtor@chromium.org
Ah, sorry, it is 4.12, not 4.4.
@4: It seems line a sane thing to do.

Comment 8 by groeck@chromium.org, Aug 16 2017

Owner: groeck@chromium.org
Status: Started (was: Untriaged)
Ok, I'll do that and refer to this bug. 

Created  bug #756083  to track cleanups.
Status: Fixed (was: Started)
Consider this one fixed. The original problem is no longer seen after switching to upstream binder code.


Sign in to add a comment