mwifiex (pcie): remove races with reset |
|||||
Issue descriptionIt's easy to crash mwifiex_pcie (Marvell Wifi driver for PCIe devices) by trying to reset the firmware at the same time as removing the device. e.g., it's easy enough to lock up the system or crash the kernel with the following two commands: echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset rmmod mwifiex_pcie (Sometimes it takes a few tries to get the timing right.) There are upstream patches to fix this.
,
Feb 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f24d870959d6970a795f99d49c2e0bd5c59e976d commit f24d870959d6970a795f99d49c2e0bd5c59e976d Author: Christoph Hellwig <hch@lst.de> Date: Sat Feb 17 05:41:27 2018 UPSTREAM: PCI: Protect pci_error_handlers->reset_notify() usage with device_lock() Every method in struct device_driver or structures derived from it like struct pci_driver MUST provide exclusion vs the driver's ->remove() method, usually by using device_lock(). Protect use of pci_error_handlers->reset_notify() by holding the device lock while calling it. Note: - pci_dev_lock() calls device_lock() in addition to blocking user-space config accesses. - pci_err_handlers->reset_notify() is used inside pci_dev_save_and_disable() and pci_dev_restore(). We could hold the device lock directly in pci_reset_notify(), but we expand the region since we have several calls following each other. Without this, ->reset_notify() may race with ->remove() calls, which can be easily triggered in NVMe. [bhelgaas: changelog, add pci_reset_notify() comment] [bhelgaas: fold in fix from Dan Carpenter <dan.carpenter@oracle.com>: http://lkml.kernel.org/r/20170701135323.x5vaj4e2wcs2mcro@mwanda] Link: http://lkml.kernel.org/r/20170601111039.8913-2-hch@lst.de Reported-by: Rakesh Pandit <rakesh@tuxera.com> Tested-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> (cherry picked from commit b014e96d1abbd67404bbe2018937b46466299e9e) BUG= chromium:801716 TEST=`test_that ... network_WiFi_Reset` on kevin Change-Id: I76135ec347f6b1578cb2f3d2dd968bf7b23b64db Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/871173 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> [modify] https://crrev.com/f24d870959d6970a795f99d49c2e0bd5c59e976d/drivers/pci/pci.c
,
Feb 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2aae7ff4506c785d4ef005d02f2139f2f1405688 commit 2aae7ff4506c785d4ef005d02f2139f2f1405688 Author: Brian Norris <briannorris@chromium.org> Date: Sat Feb 17 05:41:29 2018 UPSTREAM: mwifiex: resolve reset vs. remove()/shutdown() deadlocks Commit b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()") resolves races between driver reset and removal, but it introduces some new deadlock problems. If we see a timeout while we've already started suspending, removing, or shutting down the driver, we might see: (a) a worker thread, running mwifiex_pcie_work() -> mwifiex_pcie_card_reset_work() -> pci_reset_function() (b) a removal thread, running mwifiex_pcie_remove() -> mwifiex_free_adapter() -> mwifiex_unregister() -> mwifiex_cleanup_pcie() -> cancel_work_sync(&card->work) Unfortunately, mwifiex_pcie_remove() already holds the device lock that pci_reset_function() is now requesting, and so we see a deadlock. It's necessary to cancel and synchronize our outstanding work before tearing down the driver, so we can't have this work wait indefinitely for the lock. It's reasonable to only "try" to reset here, since this will mostly happen for cases where it's already difficult to reset the firmware anyway (e.g., while we're suspending or powering off the system). And if reset *really* needs to happen, we can always try again later. Fixes: b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()") Cc: <stable@vger.kernel.org> Cc: Xinming Hu <huxm@marvell.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> (cherry picked from commit a64e7a79dd6030479caad603c8d78e6c9c14904f) BUG= chromium:801716 TEST=`test_that ... network_WiFi_Reset` on kevin; also manual tests like this: `while : ; do echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset; sleep 0.05; rmmod mwifiex_pcie; modprobe mwifiex_pcie; sleep 2; done` Change-Id: I334990d3419bc0d0b3264ce87d9939d9ca8ff645 Reviewed-on: https://chromium-review.googlesource.com/871174 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/2aae7ff4506c785d4ef005d02f2139f2f1405688/drivers/net/wireless/marvell/mwifiex/pcie.c
,
Feb 21 2018
,
Feb 21 2018
,
Sep 13
Bulk verify old fixed bugs... |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by briannorris@chromium.org
, Jan 12 2018