New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 908455 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

USB3 NULL pointer dereference at logical disconnect

Project Member Reported by groeck@chromium.org, Nov 26

Issue description

The following crash is seen with teemo on a regular basis.

[  449.783006] audit: type=1400 audit(1542985832.796:692): avc:
granted  { execute } for  pid=4657 comm="init" name="dash" dev="sda3"
ino=16471 scontext=u:r:cros_init:s0 tcontext=u:object_r:sh_exec:s0
tclass=file
[  449.783066] audit: type=1400 audit(1542985832.796:693): avc:
granted  { execute } for  pid=4657 comm="cros-machine-id"
path="/bin/dash" dev="sda3" ino=16471 scontext=u:r:cros_init:s0
tcontext=u:object_r:sh_exec:s0 tclass=file
[  449.820209] BUG: unable to handle kernel NULL pointer dereference
at 000000000000001c
[  449.820213] IP: [<ffffffffbb18593b>] xhci_find_slot_id_by_port+0x20/0x56
[  449.820214] PGD 0
[  449.820215] Oops: 0000 [#1] PREEMPT SMP
[  449.822084] gsmi: Log Shutdown Reason 0x03
[  449.822098] Modules linked in: cmac rfcomm evdi uinput
snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_seq_dummy snd_seq
snd_soc_skl_ssp_clk snd_soc_dmic snd_soc_skl snd_soc_skl_ipc
snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_sst_match snd_hda_ext_core
snd_hda_core cros_ec_cec r8169 bridge snd_soc_rt5663 acpi_als stp
kfifo_buf llc snd_soc_rl6231 industrialio ipt_MASQUERADE zram
nf_nat_masquerade_ipv4 xt_mark fuse btusb btrtl btbcm btintel joydev
bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
cdc_ether usbnet videobuf2_core snd_usb_audio snd_usbmidi_lib
snd_hwdep snd_rawmidi snd_seq_device ip6table_filter iwlmvm r8152 mii
iwlwifi iwl7000_mac80211 cfg80211
[  449.822099] CPU: 2 PID: 1341 Comm: mtpd Not tainted 4.4.162 #1
[  449.822100] Hardware name: Google Fizz/Fizz, BIOS
Google_Fizz.10139.141.0 06/13/2018
[  449.822101] task: ffff880172cc8e40 task.stack: ffff88006aaf8000
[  449.822102] RIP: 0010:[<ffffffffbb18593b>]  [<ffffffffbb18593b>]
xhci_find_slot_id_by_port+0x20/0x56
[  449.822103] RSP: 0018:ffff88006aafb728  EFLAGS: 00010086
[  449.822104] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[  449.822104] RDX: 0000000000000004 RSI: ffff8801798602a8 RDI: ffff880179860000
[  449.822105] RBP: ffff88006aafb728 R08: 0000000000000003 R09: ffff880075628000
[  449.822105] R10: 0000000000000001 R11: ffffffffbad61d78 R12: 00000000ffffffe0
[  449.822105] R13: 0000000000000004 R14: ffff8801798602a8 R15: ffff8801798602a8
[  449.822106] FS:  00007e5d0b31c740(0000) GS:ffff88017ec80000(0000)
knlGS:0000000000000000
[  449.822107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  449.822107] CR2: 000000000000001c CR3: 000000006aacc000 CR4: 0000000000360670
[  449.822108] Stack:
[  449.822109]  ffff88006aafb800 ffffffffbb186c2e 0e99c6a447658e8c
ffffffffbb7ebccb
[  449.822110]  ffff88006aafb770 ffffffffbad611e1 ffff8801575434d0
00000000024080c0
[  449.822111]  ffff88017a003d00 ffff8801575434d0 000000000000000f
000000007a003d00
[  449.822112] Call Trace:
[  449.822114]  [<ffffffffbb186c2e>] xhci_hub_control+0x11cd/0x12aa
[  449.822116]  [<ffffffffbad611e1>] ? __might_sleep+0x41/0x7d
[  449.822118]  [<ffffffffbb297ebd>] usb_hcd_submit_urb+0x5bb/0x7b0
[  449.822119]  [<ffffffffbad60a52>] ? scheduler_tick+0xd6/0xd6
[  449.822120]  [<ffffffffbb156ae7>] ? usb_free_urb+0x48/0x63
[  449.822122]  [<ffffffffbadcd32b>] ? kfree+0x96/0x42c
[  449.822123]  [<ffffffffbb156ff2>] usb_submit_urb+0x383/0x51a
[  449.822124]  [<ffffffffbb1579c6>] usb_start_wait_urb+0x52/0xd7
[  449.822125]  [<ffffffffbb1577a8>] usb_control_msg+0xc5/0x122
[  449.822127]  [<ffffffffbb14de5b>] usb_clear_port_feature+0x31/0x37
[  449.822128]  [<ffffffffbb14f244>] usb_port_resume+0x114/0x432
[  449.822129]  [<ffffffffbb15afa9>] ? autosuspend_check+0xd1/0xd1
[  449.822130]  [<ffffffffbb1630a3>] generic_resume+0x15/0x1e
[  449.822131]  [<ffffffffbb15abb7>] usb_resume_both+0x9b/0x125
[  449.822132]  [<ffffffffbb15afc3>] usb_runtime_resume+0x1a/0x1c
[  449.822133]  [<ffffffffbb0e4977>] __rpm_callback+0xd5/0x184
[  449.822135]  [<ffffffffbae909f7>] ? do_sys_poll+0x38b/0x51f
[  449.822136]  [<ffffffffbb0e47d5>] rpm_callback+0x27/0x7e
[  449.822137]  [<ffffffffbb15afa9>] ? autosuspend_check+0xd1/0xd1
[  449.822138]  [<ffffffffbb295e89>] rpm_resume+0x4e3/0x598
[  449.822140]  [<ffffffffbb29598e>] __pm_runtime_resume+0x6b/0x83
[  449.822141]  [<ffffffffbb15acfa>] pm_runtime_get_sync+0xc/0xe
[  449.822142]  [<ffffffffbb15acd1>] usb_autoresume_device+0x1e/0x3b
[  449.822143]  [<ffffffffbb15ef3b>] usbdev_open+0xae/0x21c
[  449.822144]  [<ffffffffbadd5953>] chrdev_open+0x164/0x19b
[  449.822145]  [<ffffffffbae88a70>] vfs_open+0x1b0/0x2f3
[  449.822147]  [<ffffffffbae8c5d2>] path_openat+0x3e5/0x1110
[  449.822148]  [<ffffffffbae8c17e>] do_filp_open+0x79/0xe8
[  449.822149]  [<ffffffffbad5eef7>] ? task_work_run+0x7f/0xac
[  449.822150]  [<ffffffffbae8b340>] ? getname_flags+0x3b/0x1a5
[  449.822152]  [<ffffffffbb3fd1e7>] ? _raw_spin_unlock+0x1f/0x52
[  449.822153]  [<ffffffffbae88ceb>] SyS_open+0x106/0x2dc
[  449.822154]  [<ffffffffbacb1de1>] ? syscall_trace_enter_phase1+0xa7/0x121
[  449.822156]  [<ffffffffbb3fd923>] entry_SYSCALL_64_fastpath+0x31/0xab
[  449.822169] Code: e5 81 e7 e9 ff 00 4e 89 f8 5d c3 0f 1f 44 00 00
55 48 89 e5 31 c0 45 31 c0 eb 36 4e 8b 8c c6 38 02 00 00 4d 85 c9 74
26 49 8b 09 <83> 79 1c 05 41 0f 92 c2 83 bf c0 00 00 00 3f 0f 9f c1 41
38 ca
[  449.822170] RIP  [<ffffffffbb18593b>] xhci_find_slot_id_by_port+0x20/0x56
[  449.822171]  RSP <ffff88006aafb728>
[  449.822171] CR2: 000000000000001c
[  449.822172] ---[ end trace f3cac307afb127e4 ]---
[  449.828611] Kernel panic - not syncing: Fatal exception
[  449.836078] Kernel Offset: 0x39c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  449.836231] gsmi: Log Shutdown Reason 0x02
[  457.753608] ACPI MEMORY or I/O RESET_REG.


 
Cc: lndmrk@chromium.org
Any chance this could be backported to R70 as well?
Cc: jtho@chromium.org
Labels: Kernel-4.4
I was told that there will be no more ChromeOS builds for R70.

Status: Started (was: Assigned)
Project Member

Comment 6 by bugdroid1@chromium.org, Nov 27

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4be1b892ee2087b474654f9af605bc8572df41b2

commit 4be1b892ee2087b474654f9af605bc8572df41b2
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Tue Nov 27 04:31:19 2018

UPSTREAM: xhci: Fix USB3 NULL pointer dereference at logical disconnect.

Hub driver will try to disable a USB3 device twice at logical disconnect,
racing with xhci_free_dev() callback from the first port disable.

This can be triggered with "udisksctl power-off --block-device <disk>"
or by writing "1" to the "remove" sysfs file for a USB3 device
in 4.17-rc4.

USB3 devices don't have a similar disabled link state as USB2 devices,
and use a U3 suspended link state instead. In this state the port
is still enabled and connected.

hub_port_connect() first disconnects the device, then later it notices
that device is still enabled (due to U3 states) it will try to disable
the port again (set to U3).

The xhci_free_dev() called during device disable is async, so checking
for existing xhci->devs[i] when setting link state to U3 the second time
was successful, even if device was being freed.

The regression was caused by, and whole thing revealed by,
Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
and causes a NULL pointer dereference the second time we try to set U3.

Fix this by checking xhci->devs[i]->udev exists before setting link state.

The original patch went to stable so this fix needs to be applied there as
well.

Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
Cc: <stable@vger.kernel.org>
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e)

BUG= chromium:908455 
TEST=udisksctl power-off --block-device <disk>

Change-Id: Ief142b666aadf014d00a8bb3e92a4d71030bc924
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1351289
Reviewed-by: Dariusz Marcinkiewicz <darekm@google.com>
Reviewed-by: Felix Ekblom <felixe@chromium.org>
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>

[modify] https://crrev.com/4be1b892ee2087b474654f9af605bc8572df41b2/drivers/usb/host/xhci-hub.c

Labels: Merge-Request-71 Merge-Request-72
Project Member

Comment 8 by sheriffbot@chromium.org, Nov 27

Labels: -Merge-Request-72 Merge-Review-72 Hotlist-Merge-Review
This bug requires manual review: We don't branch M72 until 2018-11-29.
Please contact the milestone owner if you have questions.
Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 9 by sheriffbot@chromium.org, Nov 27

Labels: -Merge-Request-71 Merge-Review-71
This bug requires manual review: We are only 6 days from stable.
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Hi, I don't see comments regarding testing / verification. Given pending stable I'd like to capture those details
Labels: -Merge-Review-72
#10: What kind of testing information do you require ? For my education, and since we had this several times now, is there some common document describing those testing requirements that I can rely on in the future ?

Comments from Felix Ekblom (felixe@) about the patch as applied here:

"As pointed out by +Stefan Adolfsson, https://patchwork.kernel.org/patch/10394653/ might be interesting for the USB XHCI  issue #3  above. Since applying that patch locally I have not seen the USB crash (though not enough time has passed to say anything conclusive)."

Based on code inspection, my conclusion is 1) that the patch does indeed fix the observed problem, and that 2) the patch, even if it does not completely fix the problem, adds zero additional risk. If that is insufficient to accept the patch into M-71, I would suggest to let it rest in mainline for an extended period of time and verify with the following query on crash.corp.google.com if it is still seen and where.

"product_name='ChromeOS' AND EXISTS (SELECT 1 FROM UNNEST(productdata) WHERE Key='exec_name' AND STRPOS(Value, 'kernel') > 0) AND stable_signature like '%xhci_find_slot_id_by_port%'"

Unfortunately, Teemo devices do not currently show up with this query, so the result may be less than perfect, but maybe that will change in the future as the affected devices become more common in the field.

Status: Fixed (was: Started)
Labels: -Merge-Review-71 Merge-Approved-71
I'd like to see that the changes have been tested on M72, etc. and that the change had the desired effect without creating other problems.  Ideally on more than one board.

Submitting a CL with no testing detail makes it impossible to approve.

Thanks for the update

Approving merge to M71 Chrome OS.

Project Member

Comment 14 by bugdroid1@chromium.org, Nov 28

Labels: merge-merged-release-R71-11151.B-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b3236cae9e59881db1d87a5ce658f1a28b1cb278

commit b3236cae9e59881db1d87a5ce658f1a28b1cb278
Author: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Wed Nov 28 23:33:38 2018

UPSTREAM: xhci: Fix USB3 NULL pointer dereference at logical disconnect.

Hub driver will try to disable a USB3 device twice at logical disconnect,
racing with xhci_free_dev() callback from the first port disable.

This can be triggered with "udisksctl power-off --block-device <disk>"
or by writing "1" to the "remove" sysfs file for a USB3 device
in 4.17-rc4.

USB3 devices don't have a similar disabled link state as USB2 devices,
and use a U3 suspended link state instead. In this state the port
is still enabled and connected.

hub_port_connect() first disconnects the device, then later it notices
that device is still enabled (due to U3 states) it will try to disable
the port again (set to U3).

The xhci_free_dev() called during device disable is async, so checking
for existing xhci->devs[i] when setting link state to U3 the second time
was successful, even if device was being freed.

The regression was caused by, and whole thing revealed by,
Commit 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
which sets xhci->devs[i]->udev to NULL before xhci_virt_dev() returned.
and causes a NULL pointer dereference the second time we try to set U3.

Fix this by checking xhci->devs[i]->udev exists before setting link state.

The original patch went to stable so this fix needs to be applied there as
well.

Fixes: 44a182b9d177 ("xhci: Fix use-after-free in xhci_free_virt_device")
Cc: <stable@vger.kernel.org>
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2278446e2b7cd33ad894b32e7eb63afc7db6c86e)

BUG= chromium:908455 
TEST=udisksctl power-off --block-device <disk>

Change-Id: Ief142b666aadf014d00a8bb3e92a4d71030bc924
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1351289
Reviewed-by: Dariusz Marcinkiewicz <darekm@google.com>
Reviewed-by: Felix Ekblom <felixe@chromium.org>
Reviewed-by: Dmitry Torokhov <dtor@chromium.org>
(cherry picked from commit 4be1b892ee2087b474654f9af605bc8572df41b2)
Reviewed-on: https://chromium-review.googlesource.com/c/1352555

[modify] https://crrev.com/b3236cae9e59881db1d87a5ce658f1a28b1cb278/drivers/usb/host/xhci-hub.c

Project Member

Comment 15 by sheriffbot@chromium.org, Dec 3

Cc: kbleicher@google.com
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-71

Sign in to add a comment