New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 666587 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Kernel crash in iwlwifi on repeated rmmod/modprobe

Project Member Reported by kirtika@chromium.org, Nov 18 2016

Issue description

OS: Chrome OS cyan-release/R55-8805.0.0

I am running the following command on a cyan release and can reliably repro a crash/reboot each time:

for i in `seq 1 100`; do echo "-- attempt $i ---"; rmmod bluetooth; rmmod iwlmvm iwlwifi; modprobe iwlwifi; modprobe bluetooth;done

(before this, I ran 'stop bluetoothd' and 'rmmod btbcm btrtl btusb rfcomm btintel' to allow rmmod bluetooth to succeed). 


/dev/pstore/console-ramoops output is attached.



[ 3469.666830] audit: type=1400 audit(1479438858.420:363): avc:  denied  { ioctl } for  pid=883 comm="wpa_supplicant" path="socket:[43227]" dev="sockfs" ino=43227 ioctlcmd=8933 scontext=u:r:chromeos:s0 tcontext=u:r:chromeos:s0 tclass=unix_dgram_socket permissive=1
[ 3469.668376] iwlwifi 0000:02:00.0: L1 Enabled - LTR Enabled
[ 3469.669036] iwlwifi 0000:02:00.0: L1 Enabled - LTR Enabled
[ 3469.730920] iwlwifi 0000:02:00.0: L1 Enabled - LTR Enabled
[ 3469.731557] iwlwifi 0000:02:00.0: L1 Enabled - LTR Enabled
[ 3469.766490] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 3469.766514] IP: [<ffffffffc0387108>] __iwl7000_regulatory_set_wiphy_regd+0x295/0x2df [iwl7000_mac80211]
[ 3469.766541] PGD 0 
[ 3469.766548] Oops: 0000 [#1] PREEMPT SMP 
[ 3469.770398] gsmi: Log Shutdown Reason 0x03
[ 3469.770407] Modules linked in: iwlmvm(-) iwlwifi cdc_ether usbnet evdi uinput ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_mark snd_hda_codec_hdmi bridge snd_hda_intel snd_hda_codec i2c_dev snd_hwdep snd_soc_sst_cht_bsw_max98090_ti snd_soc_max98090 snd_hda_core snd_intel_sst_acpi snd_soc_sst_acpi snd_intel_sst_core snd_soc_sst_mfld_platform memconsole_x86_legacy memconsole stp llc fuse zram ip6table_filter r8152 mii iwl7000_mac80211 cfg80211 iio_trig_sysfs uvcvideo cros_ec_sensors_ring videobuf2_vmalloc videobuf2_memops cros_ec_sensors joydev videobuf2_core cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async ppp_generic slhc tun [last unloaded: bluetooth]
[ 3469.770583] CPU: 3 PID: 883 Comm: wpa_supplicant Tainted: G        W      3.18.0-13242-g686d930 #1
[ 3469.770595] Hardware name: GOOGLE Cyan, BIOS Google_Cyan.7287.57.82 08/21/2016
[ 3469.770606] task: ffff880078225220 ti: ffff88017208c000 task.ti: ffff88017208c000
[ 3469.770616] RIP: 0010:[<ffffffffc0387108>]  [<ffffffffc0387108>] __iwl7000_regulatory_set_wiphy_regd+0x295/0x2df [iwl7000_mac80211]
[ 3469.770643] RSP: 0018:ffff88017208f9e8  EFLAGS: 00010246
[ 3469.770651] RAX: ffffffffc03a9880 RBX: 0000000000000000 RCX: 00000000c03a0b0a
[ 3469.770661] RDX: ffffffffc03a0a0a RSI: ffff880075e528e0 RDI: ffffffffc03ae0b0
[ 3469.770671] RBP: ffff88017208fa68 R08: 0000000000000007 R09: ffff8800639616c4
[ 3469.770681] R10: ffffffffc03d7b74 R11: ffffffffc03d7b74 R12: ffff880063961600
[ 3469.770691] R13: ffff8800789302a0 R14: ffff880078932528 R15: ffff880078932528
[ 3469.770701] FS:  00007ff301aa4700(0000) GS:ffff88017fd80000(0000) knlGS:0000000000000000
[ 3469.770712] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3469.770721] CR2: 0000000000000010 CR3: 0000000172071000 CR4: 00000000001007e0
[ 3469.770730] Stack:
[ 3469.770735]  00000000000001c1 00000000d82e4853 ffff880163618790 ffffffffc03d7b74
[ 3469.770751]  ffffffffc03d1c40 ffff88017a8e5098 0100000200003030 ffff880075e52814
[ 3469.770765]  ffff880163618780 00000000d82e4853 ffff8800789318a0 ffff8800789302a0
[ 3469.770780] Call Trace:
[ 3469.770798]  [<ffffffffc0387199>] __iwl7000_regulatory_set_wiphy_regd_sync_rtnl+0x47/0x5d [iwl7000_mac80211]
[ 3469.770817]  [<ffffffffc052b2b4>] iwl_mvm_init_mcc+0x2a0/0x2cb [iwlmvm]
[ 3469.770831]  [<ffffffffc05248c4>] iwl_mvm_up+0x49b/0x60b [iwlmvm]
[ 3469.770845]  [<ffffffffc0529769>] __iwl_mvm_mac_start+0x253/0x389 [iwlmvm]
[ 3469.770860]  [<ffffffffc0529870>] __iwl_mvm_mac_start+0x35a/0x389 [iwlmvm]
[ 3469.770877]  [<ffffffffc02e9e3f>] ? cfg80211_leave+0x470/0x90b [cfg80211]
[ 3469.770895]  [<ffffffffc033ffeb>] drv_start+0x64/0xbb [iwl7000_mac80211]
[ 3469.770914]  [<ffffffffc035247d>] ieee80211_do_open+0x176/0x685 [iwl7000_mac80211]
[ 3469.770933]  [<ffffffffc0352987>] ieee80211_do_open+0x680/0x685 [iwl7000_mac80211]
[ 3469.770951]  [<ffffffffa4db7232>] __dev_open+0x91/0xcd
[ 3469.770961]  [<ffffffffa4db74b0>] __dev_change_flags+0xaa/0x141
[ 3469.770972]  [<ffffffffa4db756f>] dev_change_flags+0x28/0x5f
[ 3469.770985]  [<ffffffffa4e25fd7>] devinet_ioctl+0x31a/0x661
[ 3469.770998]  [<ffffffffa492e23a>] ? might_fault+0x3e/0x40
[ 3469.771009]  [<ffffffffa4e26a81>] inet_ioctl+0x8d/0xa9
[ 3469.771020]  [<ffffffffa4d9ca09>] sock_do_ioctl+0x27/0x45
[ 3469.771030]  [<ffffffffa4d9cc39>] sock_ioctl+0x212/0x21f
[ 3469.771042]  [<ffffffffa4a27f50>] ? ioctl_has_perm+0x9a/0xda
[ 3469.771053]  [<ffffffffa495f565>] do_vfs_ioctl+0x39a/0x460
[ 3469.771064]  [<ffffffffa495f685>] SyS_ioctl+0x5a/0x7f
[ 3469.771075]  [<ffffffffa4e96f9c>] system_call_fastpath+0x1c/0x21
[ 3469.771084] Code: ff e9 93 fe ff ff 4c 3b 6a 98 74 10 48 8b 12 48 81 fa 80 98 3a c0 75 ee 31 db eb 03 48 89 d3 48 c7 c7 b0 e0 3a c0 e8 33 f7 b0 e4 <4c> 8b 6b 10 48 c7 c7 b0 e0 3a c0 4c 89 63 10 e8 8e f7 b0 e4 4c 
[ 3469.771189] RIP  [<ffffffffc0387108>] __iwl7000_regulatory_set_wiphy_regd+0x295/0x2df [iwl7000_mac80211]
[ 3469.771209]  RSP <ffff88017208f9e8>
[ 3469.771215] CR2: 0000000000000010
[ 3469.771223] ---[ end trace ba26185cd92ef563 ]---
[ 3469.787297] Kernel panic - not syncing: Fatal exception
[ 3469.787361] Kernel Offset: 0x23800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3469.787616] gsmi: Log Shutdown Reason 0x02
[ 3469.804164] ACPI MEMORY or I/O RESET_REG.



 

Comment 1 by matt.c...@intel.com, Nov 18 2016

Hi Kirtika,

OK, will get to look at it.

Comment 2 by matt.c...@intel.com, Nov 21 2016

Hi Kirtika,

OK, looks like it is easy to reproduce. I will look at the code, not sure if it is about race or not.

Comment 3 by matt.c...@intel.com, Nov 22 2016

Hi Kirtika,

It looks like when bring up the radio so it gets to the iwlmvm.ko and the iwlmvm.ko is removed at the same time. So it happened to lose the mvm relevant data and run into :
unable to handle kernel NULL pointer dereference at 0000000000000010

Probably rmmod is kinda brutally. I will try use "modprobe -r", see if it gets to wait till the iwlmvm is not used or reject to remove iwlmvm.ko.

Comment 4 by matt.c...@intel.com, Nov 22 2016

Hi Kirtika,

I am debugging this issue. It could be a racing status so I am working on other way to look at this issue.

Comment 5 by kirtika@google.com, Jul 28 2017

Owner: curtissa@google.com

Comment 6 by kirtika@google.com, Jul 28 2017

Assigning to curtissa@ as he is looking at adding a stress test for this. 

Project Member

Comment 7 by sheriffbot@chromium.org, Feb 12 2018

Labels: Hotlist-Recharge-BouncingOwner
Owner: ----
The assigned owner "curtissa@google.com" is not able to receive e-mails, please re-triage.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 8 by kirtika@google.com, Mar 8 2018

Cc: grundler@chromium.org
Owner: kirtika@google.com
Status: Assigned (was: Untriaged)
Matt, thanks for looking into this. module removal has _always_ been racing with any other activity the driver does: IRQ, DMA, control data, registration with other subsystems. Even today, one of the oldest NIC drivers (tulip) still has race conditions (that are broken).

Each race needs to be handled with "shared state" (e.g. local driver state like is_being_removed), careful ordering of shutdown, and polling or other guarantee in each step to make sure that functionality is properly quiesced.

In general, the high level sequence is to:
o stop accepting new requests
o unregister from subsystems
o stop IRQs
o stop or wait for in-flight DMA
o disable PCI device
o abandon or otherwise "complete" outstanding transactions
o unmap DMA'able memory
o release all other memory
o unbind from device

But some of this will vary depending on what system services the device uses and how they are used.
Cc: abhishekbh@chromium.org
Abhishek and I talked briefly about this offline.  One concern that came up is that (AIUI) iwlmvm.ko (formally) depends on iwlwifi.ko, because it calls iwlwifi_opmode_register() to register a bunch of callbacks during module initialization.  Thereafter, iwlwifi (informally) depends on iwlmvm because iwlwifi can invoke any of these callbacks when it needs to perform an MVM operation.

So what is done to make sure that iwlwifi will not try to invoke an iwlmvm callback while iwlmvm is in the process of getting unloaded?

(I'm also curious as to whether the failure is seen if you run `ifconfig wlan0 down` to quiesce iwlwifi prior to unloading any modules.)

Should iwlwifi call try_get_module(iwlmvm) when wlan0 comes up, and call put_module when wlan0 does down?  What I see on my local system is that iwlmvm's use count is 0.
Owner: kirtika@chromium.org

Sign in to add a comment