New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 801716 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

mwifiex (pcie): remove races with reset

Project Member Reported by briannorris@chromium.org, Jan 12 2018

Issue description

It's easy to crash mwifiex_pcie (Marvell Wifi driver for PCIe devices) by trying to reset the firmware at the same time as removing the device. e.g., it's easy enough to lock up the system or crash the kernel with the following two commands:

echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
rmmod mwifiex_pcie

(Sometimes it takes a few tries to get the timing right.)

There are upstream patches to fix this.
 
Labels: Kernel-4.4
Project Member

Comment 2 by bugdroid1@chromium.org, Feb 17 2018

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f24d870959d6970a795f99d49c2e0bd5c59e976d

commit f24d870959d6970a795f99d49c2e0bd5c59e976d
Author: Christoph Hellwig <hch@lst.de>
Date: Sat Feb 17 05:41:27 2018

UPSTREAM: PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()

Every method in struct device_driver or structures derived from it like
struct pci_driver MUST provide exclusion vs the driver's ->remove() method,
usually by using device_lock().

Protect use of pci_error_handlers->reset_notify() by holding the device
lock while calling it.

Note:

  - pci_dev_lock() calls device_lock() in addition to blocking user-space
    config accesses.

  - pci_err_handlers->reset_notify() is used inside
    pci_dev_save_and_disable() and pci_dev_restore().  We could hold the
    device lock directly in pci_reset_notify(), but we expand the region
    since we have several calls following each other.

Without this, ->reset_notify() may race with ->remove() calls, which can be
easily triggered in NVMe.

[bhelgaas: changelog, add pci_reset_notify() comment]
[bhelgaas: fold in fix from Dan Carpenter <dan.carpenter@oracle.com>:
http://lkml.kernel.org/r/20170701135323.x5vaj4e2wcs2mcro@mwanda]
Link: http://lkml.kernel.org/r/20170601111039.8913-2-hch@lst.de
Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit b014e96d1abbd67404bbe2018937b46466299e9e)

BUG= chromium:801716 
TEST=`test_that ... network_WiFi_Reset` on kevin

Change-Id: I76135ec347f6b1578cb2f3d2dd968bf7b23b64db
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/871173
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>

[modify] https://crrev.com/f24d870959d6970a795f99d49c2e0bd5c59e976d/drivers/pci/pci.c

Project Member

Comment 3 by bugdroid1@chromium.org, Feb 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2aae7ff4506c785d4ef005d02f2139f2f1405688

commit 2aae7ff4506c785d4ef005d02f2139f2f1405688
Author: Brian Norris <briannorris@chromium.org>
Date: Sat Feb 17 05:41:29 2018

UPSTREAM: mwifiex: resolve reset vs. remove()/shutdown() deadlocks

Commit b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify()
usage with device_lock()") resolves races between driver reset and
removal, but it introduces some new deadlock problems. If we see a
timeout while we've already started suspending, removing, or shutting
down the driver, we might see:

(a) a worker thread, running mwifiex_pcie_work() ->
    mwifiex_pcie_card_reset_work() -> pci_reset_function()
(b) a removal thread, running mwifiex_pcie_remove() ->
    mwifiex_free_adapter() -> mwifiex_unregister() ->
    mwifiex_cleanup_pcie() -> cancel_work_sync(&card->work)

Unfortunately, mwifiex_pcie_remove() already holds the device lock that
pci_reset_function() is now requesting, and so we see a deadlock.

It's necessary to cancel and synchronize our outstanding work before
tearing down the driver, so we can't have this work wait indefinitely
for the lock.

It's reasonable to only "try" to reset here, since this will mostly
happen for cases where it's already difficult to reset the firmware
anyway (e.g., while we're suspending or powering off the system). And if
reset *really* needs to happen, we can always try again later.

Fixes: b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()")
Cc: <stable@vger.kernel.org>
Cc: Xinming Hu <huxm@marvell.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
(cherry picked from commit a64e7a79dd6030479caad603c8d78e6c9c14904f)

BUG= chromium:801716 
TEST=`test_that ... network_WiFi_Reset` on kevin; also manual tests like
     this:
     `while : ; do echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset;
     sleep 0.05; rmmod mwifiex_pcie; modprobe mwifiex_pcie; sleep 2;
     done`

Change-Id: I334990d3419bc0d0b3264ce87d9939d9ca8ff645
Reviewed-on: https://chromium-review.googlesource.com/871174
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>

[modify] https://crrev.com/2aae7ff4506c785d4ef005d02f2139f2f1405688/drivers/net/wireless/marvell/mwifiex/pcie.c

Status: Fixed (was: Started)
Cc: grundler@chromium.org diand...@chromium.org
Status: Verified (was: Fixed)
Bulk verify old fixed bugs...

Sign in to add a comment