New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 767096 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

binder: "BUG: sleeping function called from invalid context"

Project Member Reported by groeck@chromium.org, Sep 20 2017

Issue description

Observed once.

[58638.944378] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.12/mm/memory.c:1320
[58638.957699] in_atomic(): 1, irqs_disabled(): 0, pid: 51, name: kswapd0
[58638.965016] CPU: 1 PID: 51 Comm: kswapd0 Not tainted 4.12.13 #8
[58638.971646] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017
[58638.979819] Call Trace:
[58638.982562]  dump_stack+0x4d/0x63
[58638.986272]  ___might_sleep+0x192/0x1a9
[58638.990565]  unmap_page_range+0x6db/0x733
[58638.995065]  unmap_single_vma+0xad/0xb9
[58638.999364]  zap_page_range+0x162/0x185
[58639.003662]  ? do_raw_spin_unlock+0xc7/0xd1
[58639.008343]  binder_alloc_free_page+0x19c/0x3c3
[58639.013414]  __list_lru_walk_one.isra.11+0xb3/0x198
[58639.018872]  ? binder_shrink_count+0x19/0x19
[58639.023650]  list_lru_walk_node+0xe/0x10
[58639.028038]  binder_shrink_scan+0x4c/0x65
[58639.032523]  shrink_slab.part.58+0x2b8/0x42b
[58639.037300]  shrink_node+0xdd/0x2cf
[58639.041203]  balance_pgdat+0x19e/0x2a5
[58639.045398]  kswapd+0x450/0x5b7
[58639.048915]  ? wake_up_atomic_t+0x2c/0x2c
[58639.053402]  ? balance_pgdat+0x2a5/0x2a5
[58639.057790]  kthread+0x221/0x231
[58639.061401]  ? kthread_flush_work+0x147/0x147
[58639.066276]  ret_from_fork+0x22/0x30

 

Comment 1 by groeck@chromium.org, Sep 20 2017

Cc: m...@android.com dtor@chromium.org tkjos@google.com
Call sequence:
  zap_page_range
  -> unmap_page_range
    -> zap_p4d_range
       -> zap_pud_range
          -> cond_resched
             -> ___might_sleep

And:
    binder_shrink_scan
    -> list_lru_walk_node
       -> __list_lru_walk_one
          -> spin_lock()
          -> binder_alloc_free_page() [ with spinlock active ]

The context as well as the spinlock passed to binder_alloc_free_page() suggests that the spinlock should be released prior to calling zap_page_range(), but I don't know the code well enough to be sure.

Comment 2 by groeck@chromium.org, Sep 20 2017

Also:

[ 3546.342638] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.12/kernel/fork.c:927
[ 3546.355705] in_atomic(): 1, irqs_disabled(): 0, pid: 50, name: kswapd0
[ 3546.363048] CPU: 2 PID: 50 Comm: kswapd0 Not tainted 4.12.13 #9
[ 3546.369670] Hardware name: Google Eve/Eve, BIOS Google_Eve.9584.41.0 07/17/2017
[ 3546.377842] Call Trace:
[ 3546.380576]  dump_stack+0x4d/0x63
[ 3546.384296]  ___might_sleep+0x192/0x1a9

[ 3546.388591]  __might_sleep+0xe1/0xed
[ 3546.392801]  ? up_write+0x16/0x35
[ 3546.396508]  mmput+0x20/0x33
[ 3546.399721]  binder_alloc_free_page+0x256/0x3de
[ 3546.404801]  __list_lru_walk_one.isra.11+0xb3/0x198
[ 3546.410254]  ? binder_shrink_count+0x19/0x19
[ 3546.415029]  list_lru_walk_node+0xe/0x10
[ 3546.419414]  binder_shrink_scan+0x4c/0x65
[ 3546.423896]  shrink_slab.part.58+0x2b8/0x42b
[ 3546.428670]  shrink_node+0xdd/0x2cf
[ 3546.432571]  balance_pgdat+0x19e/0x2a5
[ 3546.436753]  kswapd+0x450/0x5b7
[ 3546.440257]  ? wake_up_atomic_t+0x2c/0x2c
[ 3546.444729]  ? balance_pgdat+0x2a5/0x2a5
[ 3546.449114]  kthread+0x221/0x231
[ 3546.452714]  ? kthread_flush_work+0x147/0x147
[ 3546.457587]  ret_from_fork+0x22/0x30

mmput must not be called in atomic context either.

Comment 3 by tkjos@google.com, Sep 20 2017

There is a fix for this in progress:
https://android-review.googlesource.com/#/c/kernel/common/+/478862/

Comment 4 by groeck@chromium.org, Oct 16 2017

Status: Started (was: Assigned)
Project Member

Comment 5 by bugdroid1@chromium.org, Oct 16 2017

Labels: merge-merged-chromeos-4.12
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/53936ea322c4104fef59c9fad0ba4ba386a93700

commit 53936ea322c4104fef59c9fad0ba4ba386a93700
Author: Sherry Yang <sherryy@android.com>
Date: Mon Oct 16 23:11:41 2017

BACKPORT: android: binder: drop lru lock in isolate callback

Drop the global lru lock in isolate callback before calling zap_page_range
which calls cond_resched, and re-acquire the global lru lock before
returning.  Also change return code to LRU_REMOVED_RETRY.

Use mmput_async when fail to acquire mmap sem in an atomic context.

Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

Also restore mmput_async, which was initially introduced in ec8d7c14e
("mm, oom_reaper: do not mmput synchronously from the oom reaper
context"), and was removed in 212925802 ("mm: oom: let oom_reap_task and
exit_mmap run concurrently").

BUG= chromium:767096 
TEST=Build and run

Change-Id: I3648ba112b32c348dbbc3aebe99d24bf70386395
Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com
Fixes: f2517eb76f1f2 ("android: binder: Add global lru shrinker to binder")
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Kyle Yan <kyan@codeaurora.org>
Acked-by: Arve Hjnnevg <arve@android.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Martijn Coenen <maco@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Riley Andrews <riandrews@android.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hoeun Ryu <hoeun.ryu@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[backport: The restored functions were never removed in v4.12]
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit a1b2289cef92)
Reviewed-on: https://chromium-review.googlesource.com/675570
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/53936ea322c4104fef59c9fad0ba4ba386a93700/drivers/android/binder_alloc.c

Project Member

Comment 6 by bugdroid1@chromium.org, Oct 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/237e41d476f57c6ba8600357c7d5f8727b49687a

commit 237e41d476f57c6ba8600357c7d5f8727b49687a
Author: Sherry Yang <sherryy@android.com>
Date: Mon Oct 16 23:11:42 2017

FROMLIST: android: binder: Remove unused vma argument

(from https://patchwork.kernel.org/patch/9954123/)

The vma argument in update_page_range is no longer
used after 74310e06 ("android: binder: Move buffer
out of area shared with user space"), since mmap_handler
no longer calls update_page_range with a vma.

Test: ran binderLibTest, throughputtest, interfacetest and
mempressure w/lockdep
Bug: b:36007193,  chromium:767096 
Change-Id: Ibd6f24c11750f8f7e6ed56e40dd18c08e02ace25
Acked-by: Arve Hjnnevg <arve@android.com>
Signed-off-by: Sherry Yang <sherryy@android.com>
(cherry picked from commit edd2131714af4ece5cb61afd27e2ce7fc9e0906a)
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/718484
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/237e41d476f57c6ba8600357c7d5f8727b49687a/drivers/android/binder_alloc.c

Comment 7 by groeck@chromium.org, Nov 17 2017

Status: Fixed (was: Started)
Fixed in chromeos-4.14. WontFix in chromeos-4.12.

Sign in to add a comment