USB3 NULL pointer dereference at logical disconnect |
||||||||||||||
Issue description
The following crash is seen with teemo on a regular basis.
[ 449.783006] audit: type=1400 audit(1542985832.796:692): avc:
granted { execute } for pid=4657 comm="init" name="dash" dev="sda3"
ino=16471 scontext=u:r:cros_init:s0 tcontext=u:object_r:sh_exec:s0
tclass=file
[ 449.783066] audit: type=1400 audit(1542985832.796:693): avc:
granted { execute } for pid=4657 comm="cros-machine-id"
path="/bin/dash" dev="sda3" ino=16471 scontext=u:r:cros_init:s0
tcontext=u:object_r:sh_exec:s0 tclass=file
[ 449.820209] BUG: unable to handle kernel NULL pointer dereference
at 000000000000001c
[ 449.820213] IP: [<ffffffffbb18593b>] xhci_find_slot_id_by_port+0x20/0x56
[ 449.820214] PGD 0
[ 449.820215] Oops: 0000 [#1] PREEMPT SMP
[ 449.822084] gsmi: Log Shutdown Reason 0x03
[ 449.822098] Modules linked in: cmac rfcomm evdi uinput
snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_seq_dummy snd_seq
snd_soc_skl_ssp_clk snd_soc_dmic snd_soc_skl snd_soc_skl_ipc
snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_sst_match snd_hda_ext_core
snd_hda_core cros_ec_cec r8169 bridge snd_soc_rt5663 acpi_als stp
kfifo_buf llc snd_soc_rl6231 industrialio ipt_MASQUERADE zram
nf_nat_masquerade_ipv4 xt_mark fuse btusb btrtl btbcm btintel joydev
bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
cdc_ether usbnet videobuf2_core snd_usb_audio snd_usbmidi_lib
snd_hwdep snd_rawmidi snd_seq_device ip6table_filter iwlmvm r8152 mii
iwlwifi iwl7000_mac80211 cfg80211
[ 449.822099] CPU: 2 PID: 1341 Comm: mtpd Not tainted 4.4.162 #1
[ 449.822100] Hardware name: Google Fizz/Fizz, BIOS
Google_Fizz.10139.141.0 06/13/2018
[ 449.822101] task: ffff880172cc8e40 task.stack: ffff88006aaf8000
[ 449.822102] RIP: 0010:[<ffffffffbb18593b>] [<ffffffffbb18593b>]
xhci_find_slot_id_by_port+0x20/0x56
[ 449.822103] RSP: 0018:ffff88006aafb728 EFLAGS: 00010086
[ 449.822104] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 449.822104] RDX: 0000000000000004 RSI: ffff8801798602a8 RDI: ffff880179860000
[ 449.822105] RBP: ffff88006aafb728 R08: 0000000000000003 R09: ffff880075628000
[ 449.822105] R10: 0000000000000001 R11: ffffffffbad61d78 R12: 00000000ffffffe0
[ 449.822105] R13: 0000000000000004 R14: ffff8801798602a8 R15: ffff8801798602a8
[ 449.822106] FS: 00007e5d0b31c740(0000) GS:ffff88017ec80000(0000)
knlGS:0000000000000000
[ 449.822107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 449.822107] CR2: 000000000000001c CR3: 000000006aacc000 CR4: 0000000000360670
[ 449.822108] Stack:
[ 449.822109] ffff88006aafb800 ffffffffbb186c2e 0e99c6a447658e8c
ffffffffbb7ebccb
[ 449.822110] ffff88006aafb770 ffffffffbad611e1 ffff8801575434d0
00000000024080c0
[ 449.822111] ffff88017a003d00 ffff8801575434d0 000000000000000f
000000007a003d00
[ 449.822112] Call Trace:
[ 449.822114] [<ffffffffbb186c2e>] xhci_hub_control+0x11cd/0x12aa
[ 449.822116] [<ffffffffbad611e1>] ? __might_sleep+0x41/0x7d
[ 449.822118] [<ffffffffbb297ebd>] usb_hcd_submit_urb+0x5bb/0x7b0
[ 449.822119] [<ffffffffbad60a52>] ? scheduler_tick+0xd6/0xd6
[ 449.822120] [<ffffffffbb156ae7>] ? usb_free_urb+0x48/0x63
[ 449.822122] [<ffffffffbadcd32b>] ? kfree+0x96/0x42c
[ 449.822123] [<ffffffffbb156ff2>] usb_submit_urb+0x383/0x51a
[ 449.822124] [<ffffffffbb1579c6>] usb_start_wait_urb+0x52/0xd7
[ 449.822125] [<ffffffffbb1577a8>] usb_control_msg+0xc5/0x122
[ 449.822127] [<ffffffffbb14de5b>] usb_clear_port_feature+0x31/0x37
[ 449.822128] [<ffffffffbb14f244>] usb_port_resume+0x114/0x432
[ 449.822129] [<ffffffffbb15afa9>] ? autosuspend_check+0xd1/0xd1
[ 449.822130] [<ffffffffbb1630a3>] generic_resume+0x15/0x1e
[ 449.822131] [<ffffffffbb15abb7>] usb_resume_both+0x9b/0x125
[ 449.822132] [<ffffffffbb15afc3>] usb_runtime_resume+0x1a/0x1c
[ 449.822133] [<ffffffffbb0e4977>] __rpm_callback+0xd5/0x184
[ 449.822135] [<ffffffffbae909f7>] ? do_sys_poll+0x38b/0x51f
[ 449.822136] [<ffffffffbb0e47d5>] rpm_callback+0x27/0x7e
[ 449.822137] [<ffffffffbb15afa9>] ? autosuspend_check+0xd1/0xd1
[ 449.822138] [<ffffffffbb295e89>] rpm_resume+0x4e3/0x598
[ 449.822140] [<ffffffffbb29598e>] __pm_runtime_resume+0x6b/0x83
[ 449.822141] [<ffffffffbb15acfa>] pm_runtime_get_sync+0xc/0xe
[ 449.822142] [<ffffffffbb15acd1>] usb_autoresume_device+0x1e/0x3b
[ 449.822143] [<ffffffffbb15ef3b>] usbdev_open+0xae/0x21c
[ 449.822144] [<ffffffffbadd5953>] chrdev_open+0x164/0x19b
[ 449.822145] [<ffffffffbae88a70>] vfs_open+0x1b0/0x2f3
[ 449.822147] [<ffffffffbae8c5d2>] path_openat+0x3e5/0x1110
[ 449.822148] [<ffffffffbae8c17e>] do_filp_open+0x79/0xe8
[ 449.822149] [<ffffffffbad5eef7>] ? task_work_run+0x7f/0xac
[ 449.822150] [<ffffffffbae8b340>] ? getname_flags+0x3b/0x1a5
[ 449.822152] [<ffffffffbb3fd1e7>] ? _raw_spin_unlock+0x1f/0x52
[ 449.822153] [<ffffffffbae88ceb>] SyS_open+0x106/0x2dc
[ 449.822154] [<ffffffffbacb1de1>] ? syscall_trace_enter_phase1+0xa7/0x121
[ 449.822156] [<ffffffffbb3fd923>] entry_SYSCALL_64_fastpath+0x31/0xab
[ 449.822169] Code: e5 81 e7 e9 ff 00 4e 89 f8 5d c3 0f 1f 44 00 00
55 48 89 e5 31 c0 45 31 c0 eb 36 4e 8b 8c c6 38 02 00 00 4d 85 c9 74
26 49 8b 09 <83> 79 1c 05 41 0f 92 c2 83 bf c0 00 00 00 3f 0f 9f c1 41
38 ca
[ 449.822170] RIP [<ffffffffbb18593b>] xhci_find_slot_id_by_port+0x20/0x56
[ 449.822171] RSP <ffff88006aafb728>
[ 449.822171] CR2: 000000000000001c
[ 449.822172] ---[ end trace f3cac307afb127e4 ]---
[ 449.828611] Kernel panic - not syncing: Fatal exception
[ 449.836078] Kernel Offset: 0x39c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 449.836231] gsmi: Log Shutdown Reason 0x02
[ 457.753608] ACPI MEMORY or I/O RESET_REG.
,
Nov 26
Any chance this could be backported to R70 as well?
,
Nov 26
,
Nov 26
I was told that there will be no more ChromeOS builds for R70.
,
Nov 26
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4be1b892ee2087b474654f9af605bc8572df41b2 commit 4be1b892ee2087b474654f9af605bc8572df41b2 Author: Mathias Nyman <mathias.nyman@linux.intel.com> Date: Tue Nov 27 04:31:19 2018 UPSTREAM: xhci: Fix USB3 NULL pointer dereference at logical disconnect. Hub driver will try to disable a USB3 device twice at logical disconnect, racing with xhci_free_dev() callback from the first port disable. This can be triggered with "udisksctl power-off --block-device <disk>" or by writing "1" to the "remove" sysfs file for a USB3 device in 4.17-rc4. USB3 devices don't have a similar disabled link state as USB2 devices, and use a U3 suspended link state instead. In this state the port is still enabled and connected. hub_port_connect() first disconnects the device, then later it notices that device is still enabled (due to U3 states) it will try to disable the port again (set to U3). The xhci_free_dev() called during device disable is async, so checking for existing xhci->devs[i] when setting link state to U3 the second time was successful, even if device was being freed. The regression was caused by, and whole thing revealed by, Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned. and causes a NULL pointer dereference the second time we try to set U3. Fix this by checking xhci->devs[i]->udev exists before setting link state. The original patch went to stable so this fix needs to be applied there as well. Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") Cc: <stable@vger.kernel.org> Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e) BUG= chromium:908455 TEST=udisksctl power-off --block-device <disk> Change-Id: Ief142b666aadf014d00a8bb3e92a4d71030bc924 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1351289 Reviewed-by: Dariusz Marcinkiewicz <darekm@google.com> Reviewed-by: Felix Ekblom <felixe@chromium.org> Reviewed-by: Dmitry Torokhov <dtor@chromium.org> [modify] https://crrev.com/4be1b892ee2087b474654f9af605bc8572df41b2/drivers/usb/host/xhci-hub.c
,
Nov 27
,
Nov 27
This bug requires manual review: We don't branch M72 until 2018-11-29. Please contact the milestone owner if you have questions. Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 27
This bug requires manual review: We are only 6 days from stable. Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 27
Hi, I don't see comments regarding testing / verification. Given pending stable I'd like to capture those details
,
Nov 27
#10: What kind of testing information do you require ? For my education, and since we had this several times now, is there some common document describing those testing requirements that I can rely on in the future ? Comments from Felix Ekblom (felixe@) about the patch as applied here: "As pointed out by +Stefan Adolfsson, https://patchwork.kernel.org/patch/10394653/ might be interesting for the USB XHCI issue #3 above. Since applying that patch locally I have not seen the USB crash (though not enough time has passed to say anything conclusive)." Based on code inspection, my conclusion is 1) that the patch does indeed fix the observed problem, and that 2) the patch, even if it does not completely fix the problem, adds zero additional risk. If that is insufficient to accept the patch into M-71, I would suggest to let it rest in mainline for an extended period of time and verify with the following query on crash.corp.google.com if it is still seen and where. "product_name='ChromeOS' AND EXISTS (SELECT 1 FROM UNNEST(productdata) WHERE Key='exec_name' AND STRPOS(Value, 'kernel') > 0) AND stable_signature like '%xhci_find_slot_id_by_port%'" Unfortunately, Teemo devices do not currently show up with this query, so the result may be less than perfect, but maybe that will change in the future as the affected devices become more common in the field.
,
Nov 27
,
Nov 28
I'd like to see that the changes have been tested on M72, etc. and that the change had the desired effect without creating other problems. Ideally on more than one board. Submitting a CL with no testing detail makes it impossible to approve. Thanks for the update Approving merge to M71 Chrome OS.
,
Nov 28
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b3236cae9e59881db1d87a5ce658f1a28b1cb278 commit b3236cae9e59881db1d87a5ce658f1a28b1cb278 Author: Mathias Nyman <mathias.nyman@linux.intel.com> Date: Wed Nov 28 23:33:38 2018 UPSTREAM: xhci: Fix USB3 NULL pointer dereference at logical disconnect. Hub driver will try to disable a USB3 device twice at logical disconnect, racing with xhci_free_dev() callback from the first port disable. This can be triggered with "udisksctl power-off --block-device <disk>" or by writing "1" to the "remove" sysfs file for a USB3 device in 4.17-rc4. USB3 devices don't have a similar disabled link state as USB2 devices, and use a U3 suspended link state instead. In this state the port is still enabled and connected. hub_port_connect() first disconnects the device, then later it notices that device is still enabled (due to U3 states) it will try to disable the port again (set to U3). The xhci_free_dev() called during device disable is async, so checking for existing xhci->devs[i] when setting link state to U3 the second time was successful, even if device was being freed. The regression was caused by, and whole thing revealed by, Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned. and causes a NULL pointer dereference the second time we try to set U3. Fix this by checking xhci->devs[i]->udev exists before setting link state. The original patch went to stable so this fix needs to be applied there as well. Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device") Cc: <stable@vger.kernel.org> Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch> Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e) BUG= chromium:908455 TEST=udisksctl power-off --block-device <disk> Change-Id: Ief142b666aadf014d00a8bb3e92a4d71030bc924 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1351289 Reviewed-by: Dariusz Marcinkiewicz <darekm@google.com> Reviewed-by: Felix Ekblom <felixe@chromium.org> Reviewed-by: Dmitry Torokhov <dtor@chromium.org> (cherry picked from commit 4be1b892ee2087b474654f9af605bc8572df41b2) Reviewed-on: https://chromium-review.googlesource.com/c/1352555 [modify] https://crrev.com/b3236cae9e59881db1d87a5ce658f1a28b1cb278/drivers/usb/host/xhci-hub.c
,
Dec 3
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 3
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by lndmrk@chromium.org
, Nov 26