New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 862467 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 11
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Samus running 4.14 kernel crash in i915 during platform_Memorypressure

Project Member Reported by sonnyrao@chromium.org, Jul 11

Issue description

Got this crash on Samus with 4.14 running platform_Memorypressure:

<4>[ 1466.846211] general protection fault: 0000 [#1] PREEMPT SMP PTI
<0>[ 1466.848522] gsmi: Log Shutdown Reason 0x03
<4>[ 1466.848525] Modules linked in: rfcomm cmac uinput btusb btrtl btbcm btintel bluetooth ecdh_generic snd_hda_codec_hdmi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_intel videobuf2_core snd_hda_codec snd_hwdep snd_hda_core lzo lzo_compress acpi_als kfifo_buf industrialio snd_soc_sst_acpi snd_soc_acpi snd_soc_acpi_intel_match zram xt_nat bridge stp llc snd_seq_dummy ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_mark fuse snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ip6table_filter iwlmvm iwl7000_mac80211 asix usbnet mii iwlwifi cfg80211 joydev
<4>[ 1466.848568] CPU: 1 PID: 3732 Comm: chrome Tainted: G     U          4.14.51 #2
<4>[ 1466.848570] Hardware name: GOOGLE Samus, BIOS Google_Samus.6300.174.0 04/02/2015
<4>[ 1466.848573] task: ffff8fa1b3ceab80 task.stack: ffffb4e5412f0000
<4>[ 1466.848580] RIP: 0010:gen8_ppgtt_alloc_pdp+0x17d/0x266
<4>[ 1466.848583] RSP: 0000:ffffb4e5412f38e0 EFLAGS: 00010206
<4>[ 1466.848586] RAX: 00056aaf40000000 RBX: ffff8fa11d4ec000 RCX: 00000001d1e50003
<4>[ 1466.848589] RDX: 0000000000001000 RSI: 00000001dd067083 RDI: 000000000000008d
<4>[ 1466.848591] RBP: ffffb4e5412f3950 R08: 0000000000000000 R09: 0000000000000000
<4>[ 1466.848593] R10: ffffffffb4728cc0 R11: ffffffffb4728cc0 R12: ffff8fa0bfd5e000
<4>[ 1466.848596] R13: 0000000011a53000 R14: ffff8fa0bcd278e0 R15: ffff8fa0bcd27560
<4>[ 1466.848599] FS:  000078172e24d740(0000) GS:ffff8fa1bec80000(0000) knlGS:0000000000000000
<4>[ 1466.848602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 1466.848604] CR2: 0000259de817a168 CR3: 0000000219d4a005 CR4: 00000000003606e0
<4>[ 1466.848606] Call Trace:
<4>[ 1466.848613]  gen8_ppgtt_alloc_4lvl+0xcb/0x16a
<4>[ 1466.848617]  ppgtt_bind_vma+0x74/0x7d
<4>[ 1466.848623]  i915_vma_bind+0xc4/0xd9
<4>[ 1466.848627]  __i915_vma_do_pin+0x2e8/0x336
<4>[ 1466.848631]  ? __radix_tree_insert+0xaa/0xdc
<4>[ 1466.848634]  eb_lookup_vmas+0x3f6/0x89b
<4>[ 1466.848638]  i915_gem_do_execbuffer+0x4f1/0xdc1
<4>[ 1466.848644]  ? preempt_schedule_irq+0x3c/0x4e
<4>[ 1466.848647]  ? retint_kernel+0x1b/0x1d
<4>[ 1466.848651]  i915_gem_execbuffer2+0x18f/0x347
<4>[ 1466.848655]  ? i915_gem_execbuffer+0x289/0x289
<4>[ 1466.848660]  drm_ioctl_kernel+0x6c/0xa8
<4>[ 1466.848664]  drm_ioctl+0x267/0x353
<4>[ 1466.848668]  ? i915_gem_execbuffer+0x289/0x289
<4>[ 1466.848672]  ? __inode_security_revalidate+0x34/0x67
<4>[ 1466.848677]  vfs_ioctl+0x21/0x2f
<4>[ 1466.848681]  do_vfs_ioctl+0x4c4/0x4e7
<4>[ 1466.848685]  ? security_file_ioctl+0x3b/0x4f
<4>[ 1466.848689]  SyS_ioctl+0x57/0x79
<4>[ 1466.848693]  do_syscall_64+0x64/0x72
<4>[ 1466.848698]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
<4>[ 1466.848702] RIP: 0033:0x78172e841967
<4>[ 1466.848705] RSP: 002b:00007ffce8ebc798 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 1466.848708] RAX: ffffffffffffffda RBX: 0000000000000038 RCX: 000078172e841967
<4>[ 1466.848711] RDX: 00007ffce8ebc7e0 RSI: 0000000040406469 RDI: 0000000000000032
<4>[ 1466.848714] RBP: 00007ffce8ebc7c0 R08: 0000000000000098 R09: 00000b1797e0a000
<4>[ 1466.848717] R10: 000059accaaa1401 R11: 0000000000000246 R12: 0000000000000032
<4>[ 1466.848720] R13: 00000b179360cbe0 R14: 00007ffce8ebc7e0 R15: 0000000040406469
<4>[ 1466.848723] Code: 00 00 49 8b 3f 40 80 ce 83 e8 e6 e1 ff ff 48 8b 45 a8 4c 89 78 10 49 8b 3c 24 e8 c7 d1 ff ff 49 8b 4f 08 48 8b 7d 98 48 83 c9 03 <48> 89 0c f8 e8 31 dd ff ff 41 ff 84 24 10 10 00 00 8b 45 a4 41 
<1>[ 1466.848768] RIP: gen8_ppgtt_alloc_pdp+0x17d/0x266 RSP: ffffb4e5412f38e0
<4>[ 1466.848784] ---[ end trace bdafc554eb4c74a0 ]---
<0>[ 1466.849347] Kernel panic - not syncing: Fatal exception

This is fixed upstream with b715a2f0c7714a399e7f8e951cc8dea9cd4eeb4b
drm/i915/ppgtt: Pin page directories before allocation


I have no reason to believe this crash is samus specific so I think it's worth landing this upstream fix for any i915 user on 4.14



 
Project Member

Comment 1 by bugdroid1@chromium.org, Jul 11

Labels: merge-merged-chromeos-4.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/402cee31d5c4ddaf8729ae9e1ca5a5a46ddc6a11

commit 402cee31d5c4ddaf8729ae9e1ca5a5a46ddc6a11
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jul 11 19:13:19 2018

UPSTREAM: drm/i915/ppgtt: Pin page directories before allocation

Commit e2b763caa6eb ("drm/i915: Remove bitmap tracking for used-pdpes")
believed that because it did not insert its freshly allocated page
directory into the pd tree, it was safe from the shrinker. I failed to
heed the lesson learnt from commit dd19674bacba ("drm/i915: Remove bitmap
tracking for used-ptes") that we need to pin all the levels in the tree
before hitting the shrinker or else the shrinker may free an upper layer
as we proceed to allocate the tree. Thus leaving dangling pointers
everywhere and a GPF should we hit direct reclaim at just the wrong
moment.

CPU: 0 PID: 7374 Comm: chromium Tainted: P           O    4.14.13-1-ARCH #1
Hardware name: Apple Inc. MacBookPro12,1/Mac-E43C1C25D4880AD6, BIOS MBP121.88Z.0167.B33.1706181928 06/18/2017
task: ffff994f696c2c40 task.stack: ffffb1a789d4c000
RIP: 0010:gen8_ppgtt_set_pde.isra.40+0x48/0x70 [i915]
RSP: 0018:ffffb1a789d4f940 EFLAGS: 00010206
RAX: 81c1788cc4f68138 RBX: ffff994f54db8000 RCX: ffff994f696c2c40
RDX: 000000023bc73003 RSI: ffff994d598b6b80 RDI: ffff994f54db8000
RBP: ffff994d598b6b80 R08: 0000000000000000 R09: 0000000000000000
R10: ffffb1a789d4f550 R11: ffff994eaf3c3208 R12: 0000000000000027
R13: 0000000000005000 R14: 0000000004e8f000 R15: ffff994f54dba000
FS:  00007f585886aa00(0000) GS:ffff994faec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004ac8e8 CR3: 00000002552c8004 CR4: 00000000003606f0
Call Trace:
 gen8_ppgtt_alloc_pdp+0x178/0x320 [i915]
 gen8_ppgtt_alloc_4lvl+0x5f/0x150 [i915]
 ppgtt_bind_vma+0x30/0x70 [i915]
 i915_vma_bind+0x68/0xd0 [i915]
 __i915_vma_do_pin+0x2d6/0x3a0 [i915]
 eb_lookup_vmas+0x7a2/0xb50 [i915]
 i915_gem_do_execbuffer+0x4d7/0x10e0 [i915]
 ? sock_wfree+0x34/0x60
 ? unix_stream_read_generic+0x1f9/0x7e0
 ? import_iovec+0x37/0xd0
 ? i915_gem_execbuffer2+0x5d/0x390 [i915]
 i915_gem_execbuffer2+0x1b7/0x390 [i915]
 ? i915_gem_execbuffer+0x2d0/0x2d0 [i915]
 drm_ioctl_kernel+0x59/0xb0 [drm]
 drm_ioctl+0x2d5/0x370 [drm]
 ? i915_gem_execbuffer+0x2d0/0x2d0 [i915]
 ? __seccomp_filter+0x3b/0x260
 do_vfs_ioctl+0xa1/0x610
 ? syscall_trace_enter+0xdb/0x2b0
 SyS_ioctl+0x74/0x80
 do_syscall_64+0x55/0x110
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f584fa82d27
RSP: 002b:00007ffee14a7828 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000003b0126a1030 RCX: 00007f584fa82d27
RDX: 00007ffee14a7870 RSI: 0000000040406469 RDI: 0000000000000080
RBP: 00007ffee14a7870 R08: 0000000000000002 R09: 0000000000000077
R10: 00007f5839f2b780 R11: 0000000000000246 R12: 0000000040406469
R13: 0000000000000080 R14: 00007f5842b00040 R15: 0000000000000000
Code: 01 00 83 81 58 0a 00 00 01 48 2b 05 13 9d fd c9 48 c1 f8 06 48 c1 e0 0c 48 8d 04 d0 48 8b 56 08 48 03 05 0c 9d fd c9 48 83 ca 03 <48> 89 10 83 a9 58 0a 00 00 01 65 ff 0d 37 03 fb 3e 74 02 f3 c3
RIP: gen8_ppgtt_set_pde.isra.40+0x48/0x70 [i915] RSP: ffffb1a789d4f940

BUG= chromium:862467 
TEST=run platform_MemoryPressure on Samus running 4.14 -- no kernel crashes

Reported-by: Eric Blau <eblau@eblau.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104773
Fixes: e2b763caa6eb ("drm/i915: Remove bitmap tracking for used-pdpes")
References: dd19674bacba ("drm/i915: Remove bitmap tracking for used-ptes")
Testcase: igt/drv_selftest/live_gtt (igt_ppgtt_shrink_boom)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180131214440.7141-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
(cherry picked from commit b715a2f0c7714a399e7f8e951cc8dea9cd4eeb4b)
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>

Change-Id: I8efaa2119377f2eecd5fa37bafba23391e04fc63
Reviewed-on: https://chromium-review.googlesource.com/1132617
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>

[modify] https://crrev.com/402cee31d5c4ddaf8729ae9e1ca5a5a46ddc6a11/drivers/gpu/drm/i915/i915_gem_gtt.c

Status: Fixed (was: Started)

Sign in to add a comment