New issue
Advanced search Search tips

Issue 822360 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

zswap parameter z3fold causes kernel panic

Project Member Reported by asavery@chromium.org, Mar 15 2018

Issue description

What steps will reproduce the problem?
(1) Enable zswap on device, create a swap file, disable zram
(2) set /sys/module/zswap/parameters/zpool to z3fold
(3) run platform_MemoryPressure autotest

What is the expected result?
The test will run, using the swap file and zswap.

What happens instead?
The device reboots part way through the test.

I am attaching the dmesg output.

 
z3fold_dmesg
147 KB View Download
BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.14/mm/page_alloc.c:4145 
could be fixed by crrev/969068, as zswap_get_swap_cache_page is calling __read_swap_cache_async with GFP_KERNEL only.
Attaching dmesg output after applying change from comment 1.
z3fold_dmesg_2
34.9 KB View Download
Cc: sonnyrao@chromium.org semenzato@chromium.org
The problem is in  __swap_writepage, we call get_swap_bio  with GFP_NOIO as gfp_mask, allowing bio_alloc_bioset to allow mempool_alloc to sleep. 

I think there is somewhere we are holding a lock, preventing the task to sleep, leading to the BUG(), and then the deadlock.

Comment 4 by sonny...@google.com, Mar 20 2018

I haven't looked into the specifics but I assume you had DEBUG_ATOMIC_SLEEP turned on to get that message in #1?

you could also try tuning on the other lock debugging flags -- I think Guenter added a use flag called "lockdebug" that you can enable and get some more output

4.14 is close enough to upstream that you might just report the issue upstream and see what they say
I enabled the "lockdebug" flag and ran the test again. The first error that occurs is BUG: MAX_LOCK_DEPTH too low! I increased MAX_LOCK_DEPTH from 48 to 400, and the same thing happens. I am attaching both dmesg outputs.
z3fold_lock_depth_dmesg
12.8 KB View Download
z3fold_lock_depth_increased_dmesg
52.6 KB View Download
Cc: groeck@chromium.org
re #5 -- this looks pretty broken, maybe z3fold is missing a lockdep annotation.
I looked at v4.16-rc6 and it's pretty much identical -- so might be worth reporting this upstream

Also, what happens if you run the test with zbud?
Running the test with zbud works, I only see this problem with z3fold.

Comment 8 by groeck@chromium.org, Mar 28 2018

I assume / hope this build isn't using clang ? If it does, please recompile with gcc and try again.


re #8 -- why would that matter?
Ok -- could you try building the kernel with USE=-clang and repeating the test, just to rule out a bad compiler?
The (complete) kernel log should show if the image was built with clang.

Ok, I built the kernel with USE=-clang, and I am attaching the new dmesg output. Now, instead of the MAX_LOCK_DEPTH error, the first error message is 
[  234.392580] ============================================
[  234.393030] WARNING: possible recursive locking detected
[  234.393030] 4.14.27 #21 Not tainted
[  234.393030] --------------------------------------------
[  234.393030] chrome/4690 is trying to acquire lock:
[  234.393030]  (&(&zhdr->page_lock)->rlock){+.+.}, at: [<ffffffffbabfdb2b>] z3fold_zpool_map+0x4b/0xc4
[  234.393030]
[  234.393030] but task is already holding lock:
[  234.393030]  (&(&zhdr->page_lock)->rlock){+.+.}, at: [<ffffffffbabfd2b0>] z3fold_zpool_shrink+0x118/0x303
[  234.393030]
[  234.393030] other info that might help us debug this:
[  234.393030]  Possible unsafe locking scenario:
[  234.393030]
[  234.393030]        CPU0
[  234.393030]        ----
[  234.393030]   lock(&(&zhdr->page_lock)->rlock);
[  234.393030]   lock(&(&zhdr->page_lock)->rlock);
[  234.393030]
[  234.393030]  *** DEADLOCK ***
[  234.393030]
[  234.393030]  May be due to missing lock nesting notation
[  234.393030]

z3fold_no_clang_dmesg
63.9 KB View Download
Since you are running this in a VM - does it depend on a ChromeOS kernel/branch, or could you run the same test with the latest upstream kernel ?
I suspect that the same problem exists in the upstream kernel. We might be able to get support from the file author if we can show that the problem exists upstream.

Can someone provide details on how to "Enable zswap on device, create a swap file, disable zram" ?

Yes, here are two CLs that will allow you to create a swap file and enable zswap: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1005513
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1005512
and here is a cl that will create a swap file along side zram: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1006064

You don't have to use swap.sh to create the swap file, as long as you configure the kernel to allow disk based swap you can just create a file, then use mkswap and swapon, but when you reboot the device it will default back to zram. To disable zram, use 'swapoff /dev/zram0'. Enable zswap with 'echo 1 > /sys/module/zswap/parameters/enabled'
Owner: groeck@chromium.org
Status: Started (was: Untriaged)
Submitted patch series ending with CL:1017014 for testing.

asavery@: Please test the patch series with your setup if you find the time.

Comment 20 Deleted

I tested the patch series with my setup, running the platform_MemoryPressure autotest (both with the realistic and simple parameters) and it does get much farther and utilizes zswap more before rebooting, but it does eventually reboot with BUG: unable to handle kernel NULL pointer dereference at           (null)
z3fold_with_patches_dmesg
5.4 KB View Download
Can you provide additional information about your VM ? Command line would be most helpful. Complete dmesg logs would also be appreciated.

The command line:  
BOOT_IMAGE=vmlinuz.A init=/sbin/init boot=local rootwait ro noresume noswap loglevel=7 noinitrd console=ttyS0  root=PARTUUID=16F35D3F-5BA7-804D-A85D-60B9CE6656BA i915.modeset=1 cros_legacy cros_debug

and I am attaching the complete dmesg logs.
z3fold_with_patches_full_dmesg
33.6 KB View Download
Sorry, I meant the kvm/qemu command line, not the kernel command line.

The crash isn't surprising - z3fold_zpool_map() returns a NULL pointer. zswap_writeback_entry() calls zpool_map_handle(), which calls z3fold_zpool_map(), and doesn't expect to get a NULL pointer as return value. The real problem is therefore indicated by "z3fold: unknown buddy id 0".

Sorry, the kvm/qemu command line:
Launching: /usr/local/google/home/asavery/chromiumos/chroot/usr/bin/qemu-system-x86_64 -enable-kvm -m 8G -smp 8 -cpu SandyBridge,-invpcid,-tsc-deadline,check -vga virtio -pidfile /tmp/kvm.41291.pid -chardev pipe,id=control_pipe,path=/tmp/kvm.41291.monitor -serial file:/tmp/kvm.41291.serial -mon chardev=control_pipe -daemonize -device virtio-net,netdev=eth0 -netdev user,id=eth0,net=10.0.2.0/27,hostfwd=tcp:127.0.0.1:9222-:22 -drive file=../build/images/amd64-generic/latest/chromiumos_qemu_image.bin,index=0,media=disk,cache=unsafe
Project Member

Comment 27 by bugdroid1@chromium.org, May 1 2018

Labels: merge-merged-chromeos-4.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e6f9d17f5d16eefd9851e93423a98b6b1f54cf46

commit e6f9d17f5d16eefd9851e93423a98b6b1f54cf46
Author: Vitaly Wool <vitalywool@gmail.com>
Date: Tue May 01 15:31:23 2018

UPSTREAM: z3fold: limit use of stale list for allocation

Currently if z3fold couldn't find an unbuddied page it would first try
to pull a page off the stale list.  The problem with this approach is
that we can't 100% guarantee that the page is not processed by the
workqueue thread at the same time unless we run cancel_work_sync() on
it, which we can't do if we're in an atomic context.  So let's just
limit stale list usage to non-atomic contexts only.

Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.com
Signed-off-by: Vitaly Vul <vitaly.vul@sony.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <Oleksiy.Avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

BUG= chromium:822360 
TEST=Build and boot

Change-Id: Ib2260339af06cf88d19c26751fd32de3fc75cb7e
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 5c9bab592f53328b3e00d7293378b714abf47a2a)
Reviewed-on: https://chromium-review.googlesource.com/1017011
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/e6f9d17f5d16eefd9851e93423a98b6b1f54cf46/mm/z3fold.c

Project Member

Comment 28 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/c35058acde78ac1a485e87651a45154c1419bb1f

commit c35058acde78ac1a485e87651a45154c1419bb1f
Author: Xidong Wang <wangxidong_97@163.com>
Date: Tue May 01 15:31:25 2018

UPSTREAM: z3fold: fix memory leak

In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL.  This will
result in a memory leak bug.

[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
Signed-off-by: Xidong Wang <wangxidong_97@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

BUG= chromium:822360 
TEST=Build and boot

Change-Id: I4373a18a70df50494d18c5e9a688dc05714771e3
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 1ec6995d1290bfb87cc3a51f0836c889e857cef9)
Reviewed-on: https://chromium-review.googlesource.com/1017012
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/c35058acde78ac1a485e87651a45154c1419bb1f/mm/z3fold.c

Project Member

Comment 29 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/712c2dc9b2bb70365b306dc87d9e1554f3bd747a

commit 712c2dc9b2bb70365b306dc87d9e1554f3bd747a
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue May 01 15:31:26 2018

UPSTREAM: mm/z3fold.c: use gfpflags_allow_blocking

We have a perfectly good macro to determine whether the gfp flags allow
you to sleep or not; use it instead of trying to infer it.

Link: http://lkml.kernel.org/r/20180408062206.GC16007@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

BUG= chromium:822360 
TEST=Build and boot

Change-Id: I68e0df63718a3cbe1c76de17bec488ad413e1bac
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 8a97ea546bb6532f77a0efe165012ee0d0c4b903)
Reviewed-on: https://chromium-review.googlesource.com/1017013
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/712c2dc9b2bb70365b306dc87d9e1554f3bd747a/mm/z3fold.c

Project Member

Comment 30 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4ff158e81c898bfa4c09762c4c55058197a25d3e

commit 4ff158e81c898bfa4c09762c4c55058197a25d3e
Author: Vitaly Wool <vitalywool@gmail.com>
Date: Tue May 01 15:31:28 2018

FROMLIST: z3fold: fix reclaim lock-ups

Do not try to optimize in-page object layout while the page is
under reclaim. This fixes lock-ups on reclaim and improves reclaim
performance at the same time.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>

BUG= chromium:822360 
TEST=Build and boot

Change-Id: I19ad95065c27f376bd06bba14a4210133d394ce2
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(am from https://patchwork.kernel.org/patch/10371431/)
Reviewed-on: https://chromium-review.googlesource.com/1017014
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/4ff158e81c898bfa4c09762c4c55058197a25d3e/mm/z3fold.c

Another problem, observed with ToT and various different cache size settings:
In zswap_writeback_entry(), the tree variable (assigned around line 869) is sometimes set to NULL after OOM killed a process. This results in a subsequent crash in "spin_lock(&tree->lock);". In this situation, zpool_map_handle() did not return NULL. Analysis is still ongoing.
After instrumenting the code, I found the following:

In zswap.c:zswap_writeback_entry():

    tree = zswap_trees[swp_type(swpentry)];

In the problem case, swp_type(swpentry) has a value of 127. zswap_trees[] has MAX_SWAPFILES entries. MAX_SWAPFILES is set to 30, and zswap_trees[127] points well beyond the array. This is confirmed after enabling KASAN, which reports just that:

BUG: KASAN: global-out-of-bounds in zswap_writeback_entry+0xee/0x5d0
Read of size 8 at addr ffffffff97155af8 by task memory-eater/619

CPU: 4 PID: 619 Comm: memory-eater Not tainted 4.17.0-rc3-yocto-standard+ #26
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x7c/0xbb
 print_address_description+0x195/0x270
 kasan_report+0x260/0x380
 ? zswap_writeback_entry+0xee/0x5d0
 zswap_writeback_entry+0xee/0x5d0

and:

(gdb) l *zswap_writeback_entry+0xee      
0xffffffff812d3b4e is in zswap_writeback_entry (mm/zswap.c:870).
865		/* extract swpentry from data */
866		zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
867		BUG_ON(IS_ERR_OR_NULL(zhdr));
868		swpentry = zhdr->swpentry; /* here */
869		zpool_unmap_handle(pool, handle);
870		tree = zswap_trees[swp_type(swpentry)];
871		offset = swp_offset(swpentry);
872	
873		if (tree == NULL) {
874			WARN(1, "tree is NULL: zhdr=%p, type=%d (max=%d), offset=%ld, handle=%0lx\n",

Comment 33 by groeck@google.com, May 2 2018

The error case in #32 is seen because 'buddy' is 3 (LAST). At the same time,  the value of last_chunks in the respective z3fold_header is 0. In z3fold_map(), the following code:
    addr += PAGE_SIZE - (zhdr->last_chunks << CHUNK_SHIFT);
returns addr + PAGE_SIZE, ie a pointer to the next page. It should never do this.

The problem described in #21/#23 is a bit different but similar: Ii that case, the value of "buddy" is 0. This causes z3fold_map() to return NULL, which as I mentioned in #25 is not handled by the calling code.

Both situations indicate that zswap_writeback_entry() is called with an invalid handle. In both cases the call path is through z3fold_zpool_shrink().

The code itself actually suggests that there may be a problem. In z3fold_reclaim_page():

"We need encode the handles before unlocking, since we can race with free that will set (first|last)_chunks to 0".

If this comment is correct, the free can happen before the eviction, which in turn accesses (through the mapping code) last_chunks which has already been cleared.

More instrumentation shows that this is in fact the problem.

z3fold: zhdr ffff8801f0150000 released (buddy=1, first_chunks=0, middle_chunks=0, last_chunks=0, addr=ffff8801f0150000)
z3fold: zhdr ffff8801703bd000 released (buddy=1, first_chunks=0, middle_chunks=0, last_chunks=0, addr=ffff8801703bd000)
z3fold: zhdr ffff8801efdd5000 released (buddy=1, first_chunks=0, middle_chunks=0, last_chunks=0, addr=ffff8801efdd5000)
z3fold: zhdr ffff8801efdd5000 released (buddy=3, first_chunks=0, middle_chunks=0, last_chunks=0, addr=ffff8801efdd5000)

This is from z3fold_map(), seen just before the crash. The complete instrumentation code (diffs against mainline) is attached.

instrumentation
3.7 KB View Download
asavery@: Can you try again with CL:1040870 ? It fixes the problem reported in #33. It would be great to know if the problem you reported in #21 has the same root cause or if there is another problem lurking in the code.

Attaching workaround for reference.

0001-WIP-HACK-mm-z3fold-Work-around-race-condition.patch
2.2 KB Download
I tried again with CL:1040870 but am still getting the problem from #21, BUG: unable to handle kernel NULL pointer dereference at           (null)
Complete dmesg log is attached.
z3fold_with_workaround_dmesg
33.2 KB View Download
asavery@: I got another patch to try; it is supposed to fix #21: CL:1042872. Can you test again, with both CLs (if you check out CL:1040870 you'll get both) ?

With both CL:1042872 and CL:1040870 applied on top of v4.17-rc3, I managed to get:

WARNING: CPU: 6 PID: 608 at mm/z3fold.c:257 __release_z3fold_page.constprop.4+0x105/0x120

followed by:

BUG: unable to handle kernel paging request at 0000002400000158
PGD 80000002187f2067 P4D 80000002187f2067 PUD 0
Oops: 0002 [#1] PREEMPT SMP PTI
Modules linked in:
CPU: 0 PID: 1204 Comm: memory-eater Tainted: G        W         4.17.0-rc3-yocto-standard+ #62
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:__lock_acquire+0xf6/0x18f0
...
Call Trace:
 ? __lock_acquire+0x2af/0x18f0
 ? __lock_acquire+0x2af/0x18f0
 ? lock_acquire+0x93/0x230
 lock_acquire+0x93/0x230
 ? z3fold_zpool_malloc+0x118/0x7e0
 _raw_spin_trylock+0x65/0x80
 ? z3fold_zpool_malloc+0x118/0x7e0
 ? do_raw_spin_lock+0xad/0xb0
 z3fold_zpool_malloc+0x118/0x7e0
 zswap_frontswap_store+0x497/0x7c0
 __frontswap_store+0x6e/0xf0
 swap_writepage+0x49/0x70

Seen once so far. Log file attached.

z3fold-crash
5.9 KB View Download
With both CL:1042872 and CL:1040870 applied, I had to run the test twice before the device crashed. The first time I ran the test, the test crashed with FAIL: Unhandled DevtoolsTargetCrashException: Devtools target crashed with this behavior in the dmesg log:
[  436.237109] TaskSchedulerFo: page allocation stalls for 33954ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
[  436.242068] TaskSchedulerFo cpuset=chrome mems_allowed=0
[  436.242981] CPU: 3 PID: 6922 Comm: TaskSchedulerFo Not tainted 4.14.37 #27
[  436.243986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[  436.245660] Call Trace:
[  436.246034]  dump_stack+0x67/0x92
[  436.246533]  warn_alloc+0x107/0x1ab
[  436.247049]  __alloc_pages_slowpath+0x452/0xca7
[  436.247712]  __alloc_pages_nodemask+0x260/0x3a6
[  436.248362]  __read_swap_cache_async+0xa7/0x188
[  436.249022]  read_swap_cache_async+0x2b/0x5d
[  436.249639]  swapin_readahead+0x1aa/0x1e2
[  436.250215]  ? find_get_entry+0x8e/0x9c
[  436.250769]  ? pagecache_get_page+0x2b/0x134
[  436.251391]  ? do_swap_page+0x131/0x432
[  436.251946]  do_swap_page+0x131/0x432
[  436.252476]  ? do_futex+0xf1/0x6ce
[  436.252969]  __handle_mm_fault+0x87e/0x9a5
[  436.253560]  handle_mm_fault+0xed/0x101
[  436.254114]  __do_page_fault+0x28f/0x429
[  436.254682]  ? page_fault+0x2f/0x50
[  436.255191]  page_fault+0x45/0x50

Running the test again causes a crash. I am attaching the full dmesg log.
z3fold_dmesg
426 KB View Download
It appears that the "page allocation stalls" message can result in OOM lockups. The fix is available in CL:1044302 or as part of the series ending with CL:1040870.

With CL:1044302 applied, the test still causes a crash.
Complete dmesg logs attached.
z3fold_2_dmesg
106 KB View Download
Components: OS>Kernel
Project Member

Comment 43 by bugdroid1@chromium.org, May 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fef19f88438030c917258696b1bb14a5ff1fac67

commit fef19f88438030c917258696b1bb14a5ff1fac67
Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed May 09 21:40:16 2018

UPSTREAM: mm: don't warn about allocations which stall for too long

Commit 63f53dea0c98 ("mm: warn about allocations which stall for too
long") was a great step for reducing possibility of silent hang up
problem caused by memory allocation stalls.  But this commit reverts it,
for it is possible to trigger OOM lockup and/or soft lockups when many
threads concurrently called warn_alloc() (in order to warn about memory
allocation stalls) due to current implementation of printk(), and it is
difficult to obtain useful information due to limitation of synchronous
warning approach.

Current printk() implementation flushes all pending logs using the
context of a thread which called console_unlock().  printk() should be
able to flush all pending logs eventually unless somebody continues
appending to printk() buffer.

Since warn_alloc() started appending to printk() buffer while waiting
for oom_kill_process() to make forward progress when oom_kill_process()
is processing pending logs, it became possible for warn_alloc() to force
oom_kill_process() loop inside printk().  As a result, warn_alloc()
significantly increased possibility of preventing oom_kill_process()
from making forward progress.

---------- Pseudo code start ----------
Before warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    }
    goto retry;

After warn_alloc() was introduced:

  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else if (waited_for_10seconds()) {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

Although waited_for_10seconds() becomes true once per 10 seconds,
unbounded number of threads can call waited_for_10seconds() at the same
time.  Also, since threads doing waited_for_10seconds() keep doing
almost busy loop, the thread doing print_one_log() can use little CPU
resource.  Therefore, this situation can be simplified like

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_lock)
    } else {
      atomic_inc(&printk_pending_logs);
    }
    goto retry;
---------- Pseudo code end ----------

when printk() is called faster than print_one_log() can process a log.

One of possible mitigation would be to introduce a new lock in order to
make sure that no other series of printk() (either oom_kill_process() or
warn_alloc()) can append to printk() buffer when one series of printk()
(either oom_kill_process() or warn_alloc()) is already in progress.

Such serialization will also help obtaining kernel messages in readable
form.

---------- Pseudo code start ----------
  retry:
    if (mutex_trylock(&oom_lock)) {
      mutex_lock(&oom_printk_lock);
      while (atomic_read(&printk_pending_logs) > 0) {
        atomic_dec(&printk_pending_logs);
        print_one_log();
      }
      // Send SIGKILL here.
      mutex_unlock(&oom_printk_lock);
      mutex_unlock(&oom_lock)
    } else {
      if (mutex_trylock(&oom_printk_lock)) {
        atomic_inc(&printk_pending_logs);
        mutex_unlock(&oom_printk_lock);
      }
    }
    goto retry;
---------- Pseudo code end ----------

But this commit does not go that direction, for we don't want to
introduce a new lock dependency, and we unlikely be able to obtain
useful information even if we serialized oom_kill_process() and
warn_alloc().

Synchronous approach is prone to unexpected results (e.g.  too late [1],
too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
helped with providing information other than "something is going wrong".
I want to consider asynchronous approach which can obtain information
during stalls with possibly relevant threads (e.g.  the owner of
oom_lock and kswapd-like threads) and serve as a trigger for actions
(e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
of stalling KVM guest for diagnostic purpose).

This commit temporarily loses ability to report e.g.  OOM lockup due to
unable to invoke the OOM killer due to !__GFP_FS allocation request.
But asynchronous approach will be able to detect such situation and emit
warning.  Thus, let's remove warn_alloc().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
[2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
[3] commit db73ee0d46379922 ("mm, vmscan: do not loop on too_many_isolated for ever"))

Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: yuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

BUG= chromium:822360 
TEST=Run under memory pressure with z3fold enabled

Change-Id: I79dfae890121b671dce2163b9a6546b268373bdc
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 400e22499dd92613821374c8c6c88c7225359980)
Reviewed-on: https://chromium-review.googlesource.com/1044302
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/fef19f88438030c917258696b1bb14a5ff1fac67/mm/page_alloc.c

What's the status here -- It looks like there are still crashes?  I've been testing z3fold on 4.14 Samus (with ram reduced to 2G) and can pretty consistently crash the system running platform_MemoryPressure.

It looks like memory corruption of some sort because the crash signatures aren't consistent and I see things like corrupted spinlocks.
z3fold is pretty much unusable. I had tried to get it fixed, but there are lots of race conditions in the code, so I finally gave up.

that's too bad, I don't see how we could ever switch to zswap without something better than zbud.  Is upstream aware of the issues?
Yes, I had a number of exchanges with the submitter (Vitaly Wool) before I gave up on it.
Project Member

Comment 48 by bugdroid1@chromium.org, Jul 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/117fe398696326c1f685c1849431b4a4cd260846

commit 117fe398696326c1f685c1849431b4a4cd260846
Author: Jongseok Kim <ks77sj@gmail.com>
Date: Thu Jul 19 09:52:16 2018

FROMLIST: z3fold: fix wrong handling of headless pages

During the processing of headless pages in z3fold_reclaim_page(),
there was a problem that the zhdr pointed to another page
or a page was already released in z3fold_free(). So, the wrong page
is encoded in headless, or test_bit does not work properly
in z3fold_reclaim_page(). This patch fixed these problems.

Signed-off-by: Jongseok Kim <ks77sj@gmail.com>
(am from https://patchwork.kernel.org/patch/10510583/)

BUG= chromium:822360 
TEST=Boot with z3fold enabled

Change-Id: I2d7f60f415892272c41792e0271c8727e22129a3
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1142465
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/117fe398696326c1f685c1849431b4a4cd260846/mm/z3fold.c

Status: Fixed (was: Started)
Looks like the latest patch fixes all known problems. Marking as fixed.

Sign in to add a comment