New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 703196 link

Starred by 1 user

Issue metadata

Status: Verified
Owner: ----
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Kernel Crash on Caroline - BUG: unable to handle kernel NULL pointer dereference

Project Member Reported by rookrishna@chromium.org, Mar 20 2017

Issue description

58.0.3029.19/9334.13.0 Caroline

Please specify Cr-* of the system to which this bug/feature applies (add
the label below).

Device in use .

Crashes to black screen and reboots

https://crash.corp.google.com/browse?stbtiq=c30a631660000000


 

Comment 1 by ka...@chromium.org, Mar 20 2017

Labels: zram
It it crash in a loop, or a single observation?

<1>[13898.613529] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
<1>[13898.613550] IP: [<ffffffff9d54e8cf>] obj_free+0x6b/0xa1
<4>[13898.613567] PGD 0 
<4>[13898.613575] Oops: 0000 [#1] PREEMPT SMP 
<0>[13898.615849] gsmi: Log Shutdown Reason 0x03
<4>[13898.615857] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat xt_TCPMSS ip6table_mangle veth uinput snd_soc_dmic snd_soc_skl_nau88l25_ssm4567 snd_soc_hdac_hdmi rfcomm asix snd_soc_skl snd_soc_skl_ipc iwlmvm snd_soc_sst_acpi ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat snd_soc_sst_ipc snd_soc_sst_dsp xt_mark btusb iwlwifi snd_hda_ext_core snd_hda_core memconsole_x86_legacy iwl7000_mac80211 bridge btrtl btbcm btintel bluetooth usbnet mii memconsole snd_soc_nau8825 snd_soc_ssm4567 fuse stp llc zram cfg80211 ip6table_filter iio_trig_sysfs cros_ec_sensors_ring cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq kfifo_buf industrialio snd_seq_device uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core joydev
<4>[13898.616044] CPU: 3 PID: 20636 Comm: Chrome_IOThread Not tainted 3.18.0-13942-g317806e #1
<4>[13898.616056] Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.263.0 01/26/2017
<4>[13898.616069] task: ffff88015bcc1b60 ti: ffff880178e08000 task.ti: ffff880178e08000
<4>[13898.616080] RIP: 0010:[<ffffffff9d54e8cf>]  [<ffffffff9d54e8cf>] obj_free+0x6b/0xa1
<4>[13898.616095] RSP: 0018:ffff880178e0ba88  EFLAGS: 00010286
<4>[13898.616104] RAX: ffff880109d60000 RBX: 0000000000000000 RCX: 8000000000000000
<4>[13898.616115] RDX: 0000000000000000 RSI: 0004275800000000 RDI: 0000000109d60000
<4>[13898.616126] RBP: ffff880178e0baa8 R08: 0000000000000000 R09: 000000000021e63a
<4>[13898.616136] R10: ffff880178e0bb78 R11: ffff880006597908 R12: 0000000000000000
<4>[13898.616147] R13: 0004275800000000 R14: ffff880071c5bae0 R15: 0004275800000000
<4>[13898.616159] FS:  00007d6f65170700(0000) GS:ffff88017ed80000(0000) knlGS:0000000000000000
<4>[13898.616171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[13898.616180] CR2: 0000000000000010 CR3: 000000013aeb4000 CR4: 00000000003407e0
<4>[13898.616190] Stack:
<4>[13898.616195]  ffffea00041db9c0 ffff880071c5bae0 ffff8801241ec050 ffff880178f05f00
<4>[13898.616212]  ffff880178e0bae8 ffffffff9d54eabf 00000000001603b0 00000000008798e0
<4>[13898.616228]  ffff880176b87400 ffff88016cc04fc0 00000000008798e0 00000000000005b9
<4>[13898.616244] Call Trace:
<4>[13898.616255]  [<ffffffff9d54eabf>] zs_free+0x78/0xda
<4>[13898.616271]  [<ffffffffc00a0a87>] zram_release+0x154c/0x2056 [zram]
<4>[13898.616285]  [<ffffffffc00a0b10>] zram_release+0x15d5/0x2056 [zram]
<4>[13898.616297]  [<ffffffff9d540b9b>] swap_entry_free+0x260/0x26c
<4>[13898.616309]  [<ffffffff9d542b2e>] free_swap_and_cache+0x48/0xf8
<4>[13898.616322]  [<ffffffff9d521fb2>] shmem_undo_range+0x19e/0x548
<4>[13898.616336]  [<ffffffff9d522371>] shmem_truncate_range+0x15/0x31
<4>[13898.616348]  [<ffffffff9d52285d>] shmem_fallocate+0x172/0x3c1
<4>[13898.616361]  [<ffffffff9d52f9a0>] ? tlb_flush_mmu+0x38/0x53
<4>[13898.616373]  [<ffffffff9d554f9a>] ? __sb_start_write+0xb0/0xf9
<4>[13898.616385]  [<ffffffff9d550efa>] do_fallocate+0x13a/0x167
<4>[13898.616398]  [<ffffffff9d53f37e>] SyS_madvise+0x258/0x64c
<4>[13898.616410]  [<ffffffff9d5369d6>] ? vm_munmap+0x50/0x5e
<4>[13898.616423]  [<ffffffff9daa649c>] system_call_fastpath+0x1c/0x21
<4>[13898.616433] Code: 07 49 89 fc f6 c4 08 75 04 4c 8b 67 30 48 8b 0f 49 63 46 28 31 d2 80 e5 08 75 04 48 8b 57 10 48 0f af d8 48 01 d3 e8 19 f9 ff ff <49> 8b 54 24 10 48 89 14 18 41 80 7e 44 00 74 09 49 c7 44 24 30 
<1>[13898.616552] RIP  [<ffffffff9d54e8cf>] obj_free+0x6b/0xa1
<4>[13898.616563]  RSP <ffff880178e0ba88>
<4>[13898.616570] CR2: 0000000000000010
<4>[13898.616577] ---[ end trace 177604adbdb4fd52 ]---
This is a single observation. 

Comment 3 by ka...@chromium.org, Mar 20 2017

Cc: groeck@chromium.org sonnyrao@chromium.org pyeh@chromium.org
Labels: -Pri-1 Pri-2
Summary: Kernel Crash on Caroline - BUG: unable to handle kernel NULL pointer dereference (was: Kernel Crash on Caroline)
Cc: cylee@chromium.org bccheng@chromium.org
It looks like there is maybe a patch for a similar crash that hasn't been merged: 
https://patchwork.kernel.org/patch/8051251/

I'm not sure if this is the same issue or not though

Comment 6 by groeck@chromium.org, Mar 20 2017

Upstream c102f07ca0b0 ("zsmalloc: fix migrate_zspage-zs_free race condition")

Thanks Guenter, it looks like 4.4 has this but 3.18 doesn't -- CL here: 

https://chromium-review.googlesource.com/457150
Project Member

Comment 8 by bugdroid1@chromium.org, Mar 22 2017

Labels: merge-merged-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a8c81f7aca71d637e67c38b13f95ab7660a00ae7

commit a8c81f7aca71d637e67c38b13f95ab7660a00ae7
Author: Junil Lee <junil0814.lee@lge.com>
Date: Wed Mar 22 03:59:54 2017

UPSTREAM: zsmalloc: fix migrate_zspage-zs_free race condition

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.  This additional explicit
unpin_tag() introduces a race condition with zs_free(), which can pin
that handle by this time, so the handle becomes un-pinned.

Schematically, it goes like this:

  CPU0                                        CPU1
  migrate_zspage
    find_alloced_obj
      trypin_tag
        set HANDLE_PIN_BIT                    zs_free()
                                                pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
  PC is at get_zspage_mapping+0x0/0x24
  LR is at obj_free.isra.22+0x64/0x128
  Call trace:
     get_zspage_mapping+0x0/0x24
     zs_free+0x88/0x114
     zram_free_page+0x64/0xcc
     zram_slot_free_notify+0x90/0x108
     swap_entry_free+0x278/0x294
     free_swap_and_cache+0x38/0x11c
     unmap_single_vma+0x480/0x5c8
     unmap_vmas+0x44/0x60
     exit_mmap+0x50/0x110
     mmput+0x58/0xe0
     do_exit+0x320/0x8dc
     do_group_exit+0x44/0xa8
     get_signal+0x538/0x580
     do_signal+0x98/0x4b8
     do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update value
atomically.

BUG= chromium:703196 
TEST=build/boot on caroline

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c102f07ca0b04f2cb49cfc161c83f6239d17f491)
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>

Change-Id: Ibf8c8d03e1f994c42ff341912f3d69aac21d2345
Reviewed-on: https://chromium-review.googlesource.com/457150
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/a8c81f7aca71d637e67c38b13f95ab7660a00ae7/mm/zsmalloc.c

Labels: Merge-Request-58
Labels: -Merge-Request-58 Merge-Approved-58
https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=55530801315 appears to be another of these, merge approved.
Project Member

Comment 11 by bugdroid1@chromium.org, Mar 22 2017

Labels: merge-merged-release-R58-9334.B-chromeos-3.18
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8c75b50bda0d7c5cd900836bc05a648e564b7f53

commit 8c75b50bda0d7c5cd900836bc05a648e564b7f53
Author: Junil Lee <junil0814.lee@lge.com>
Date: Wed Mar 22 22:27:29 2017

UPSTREAM: zsmalloc: fix migrate_zspage-zs_free race condition

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.  This additional explicit
unpin_tag() introduces a race condition with zs_free(), which can pin
that handle by this time, so the handle becomes un-pinned.

Schematically, it goes like this:

  CPU0                                        CPU1
  migrate_zspage
    find_alloced_obj
      trypin_tag
        set HANDLE_PIN_BIT                    zs_free()
                                                pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
  PC is at get_zspage_mapping+0x0/0x24
  LR is at obj_free.isra.22+0x64/0x128
  Call trace:
     get_zspage_mapping+0x0/0x24
     zs_free+0x88/0x114
     zram_free_page+0x64/0xcc
     zram_slot_free_notify+0x90/0x108
     swap_entry_free+0x278/0x294
     free_swap_and_cache+0x38/0x11c
     unmap_single_vma+0x480/0x5c8
     unmap_vmas+0x44/0x60
     exit_mmap+0x50/0x110
     mmput+0x58/0xe0
     do_exit+0x320/0x8dc
     do_group_exit+0x44/0xa8
     get_signal+0x538/0x580
     do_signal+0x98/0x4b8
     do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update value
atomically.

BUG= chromium:703196 
TEST=build/boot on caroline

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c102f07ca0b04f2cb49cfc161c83f6239d17f491)
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>

Change-Id: Ibf8c8d03e1f994c42ff341912f3d69aac21d2345
Reviewed-on: https://chromium-review.googlesource.com/457150
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit a8c81f7aca71d637e67c38b13f95ab7660a00ae7)
Reviewed-on: https://chromium-review.googlesource.com/457785
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Commit-Queue: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/8c75b50bda0d7c5cd900836bc05a648e564b7f53/mm/zsmalloc.c

Project Member

Comment 12 by bugdroid1@chromium.org, Mar 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/8c75b50bda0d7c5cd900836bc05a648e564b7f53

commit 8c75b50bda0d7c5cd900836bc05a648e564b7f53
Author: Junil Lee <junil0814.lee@lge.com>
Date: Wed Mar 22 22:27:29 2017

UPSTREAM: zsmalloc: fix migrate_zspage-zs_free race condition

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.  This additional explicit
unpin_tag() introduces a race condition with zs_free(), which can pin
that handle by this time, so the handle becomes un-pinned.

Schematically, it goes like this:

  CPU0                                        CPU1
  migrate_zspage
    find_alloced_obj
      trypin_tag
        set HANDLE_PIN_BIT                    zs_free()
                                                pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
  PC is at get_zspage_mapping+0x0/0x24
  LR is at obj_free.isra.22+0x64/0x128
  Call trace:
     get_zspage_mapping+0x0/0x24
     zs_free+0x88/0x114
     zram_free_page+0x64/0xcc
     zram_slot_free_notify+0x90/0x108
     swap_entry_free+0x278/0x294
     free_swap_and_cache+0x38/0x11c
     unmap_single_vma+0x480/0x5c8
     unmap_vmas+0x44/0x60
     exit_mmap+0x50/0x110
     mmput+0x58/0xe0
     do_exit+0x320/0x8dc
     do_group_exit+0x44/0xa8
     get_signal+0x538/0x580
     do_signal+0x98/0x4b8
     do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update value
atomically.

BUG= chromium:703196 
TEST=build/boot on caroline

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c102f07ca0b04f2cb49cfc161c83f6239d17f491)
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>

Change-Id: Ibf8c8d03e1f994c42ff341912f3d69aac21d2345
Reviewed-on: https://chromium-review.googlesource.com/457150
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit a8c81f7aca71d637e67c38b13f95ab7660a00ae7)
Reviewed-on: https://chromium-review.googlesource.com/457785
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
Commit-Queue: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/8c75b50bda0d7c5cd900836bc05a648e564b7f53/mm/zsmalloc.c

Status: Fixed (was: Untriaged)
I think this is probably fixed now -- reopen if seen again

Comment 14 by son...@google.com, Mar 24 2017

Status: Verified (was: Fixed)
Not able to reproduce this issue on build 9334.20.0
Project Member

Comment 15 by sheriffbot@chromium.org, Mar 27 2017

Cc: bhthompson@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-58
Merge is complete.

Sign in to add a comment