Somewhere around 5 percent of all kernel crash reports have "kmem_cache_alloc" in the signature for 10032.86.0 release. The query is:
product.name='ChromeOS' AND EXISTS (SELECT 1 FROM UNNEST(productdata)
WHERE Key='exec_name' AND Value='kernel') AND product.Version='10032.86.0' AND
experimental.stable_signature_2 LIKE '%kmem_cache_alloc%'
The problem is kmem_cache_alloc is the "victim" of something else screwing up the mem pool it operates on. This is what I'm seeing locally when I try to summarize the crash reports I've downloaded (en mass):
fgrep kmem_cache_alloc */10032.86.0/feedback-*-kernel-10032.86.0-20180213.txt
CHELL/10032.86.0/feedback-CHELL-kernel-10032.86.0-20180213.txt: 69 kmem_cache_alloc+0x96/0x13f
GNAWTY/10032.86.0/feedback-GNAWTY-kernel-10032.86.0-20180213.txt: 635 kmem_cache_alloc+0x6e/0x123
PAINE/10032.86.0/feedback-PAINE-kernel-10032.86.0-20180213.txt: 274 kmem_cache_alloc+0x69/0x10c
SAMUS/10032.86.0/feedback-SAMUS-kernel-10032.86.0-20180213.txt: 10 kmem_cache_alloc+0x69/0x10c
SWANKY/10032.86.0/feedback-SWANKY-kernel-10032.86.0-20180213.txt: 162 kmem_cache_alloc+0x69/0x18e
TERRA/10032.86.0/feedback-TERRA-kernel-10032.86.0-20180213.txt: 1253 kmem_cache_alloc+0x96/0x13f
Let's just look at GNAWTY (chromeos-4.4) though this is clearly happening on other platforms:
find GNAWTY/10032.86.0/ -name \*.kcrash | wc -l
12268
grundler <2101>find GNAWTY/10032.86.0/ -name \*.kcrash | xargs grep -m1 "RIP: .* kmem_cache_alloc+" | wc -l
648
First clue is to look at "WARNINGS" from those 648 crash reports JUST from iwlwifi driver:
find GNAWTY/10032.86.0/ -name \*.kcrash | xargs grep -l -m1 "RIP: .* kmem_cache_alloc+" | xargs grep 'WARNING: CPU: .*v4.4/drivers/net/wireless/iwl7000/' | sed -e 's;.*v4.4/drivers/net/wireless/iwl7000/;;' | sort | uniq -c
2 iwlwifi/mvm/binding.c:216 iwl_mvm_binding_remove_vif+0x51/0x1e3 [iwlmvm]
31 iwlwifi/mvm/mac80211.c:2190 iwl_mvm_recalc_multicast+0x399/0x749 [iwlmvm]
2 iwlwifi/mvm/mac80211.c:2268 iwl_mvm_recalc_multicast+0x52b/0x749 [iwlmvm]
4 iwlwifi/mvm/mac80211.c:2949 iwl_mvm_rx_mfuart_notif+0x26b4/0x305e [iwlmvm]
26 iwlwifi/mvm/mac80211.c:2954 iwl_mvm_rx_mfuart_notif+0x26b4/0x305e [iwlmvm]
85 iwlwifi/mvm/time-event.c:512 iwl_mvm_te_clear_data+0xd8/0x2ad [iwlmvm]
902 iwlwifi/mvm/time-event.c:724 iwl_mvm_remove_time_event+0xb2/0xcf [iwlmvm]
2 iwlwifi/mvm/tx.c:1514 iwl_mvm_rx_tx_cmd+0x415/0x8b6 [iwlmvm]
8 iwlwifi/mvm/utils.c:804 iwl_mvm_enable_txq+0x311/0x352 [iwlmvm]
4 iwlwifi/mvm/utils.c:886 iwl_mvm_disable_txq+0x335/0x34a [iwlmvm]
6 iwlwifi/pcie/trans.c:2086 iwl_trans_pcie_reclaim+0x1a89/0x2b93 [iwlwifi]
1148 iwlwifi/pcie/tx.c:608 iwl_pcie_irq_msix_handler+0xa29/0xc07 [iwlwifi]
46 iwlwifi/pcie/tx.c:608 iwl_pcie_irq_msix_handler+0xa29/0xc07 [iwlwifi]()
37 mac80211/driver-ops.c:175 drv_conf_tx+0x92/0x147 [iwl7000_mac80211]
2 mac80211/driver-ops.c:175 drv_conf_tx+0x92/0x147 [iwl7000_mac80211]()
6 mac80211/driver-ops.c:39 drv_stop+0x3c/0xd3 [iwl7000_mac80211]
6 mac80211/driver-ops.h:17 ieee80211_purge_tx_queue+0xad/0x17f [iwl7000_mac80211]
851 mac80211/sta_info.c:996 sta_set_sinfo+0x78b/0x844 [iwl7000_mac80211]
6 mac80211/util.c:1873 ieee80211_reconfig+0x174/0xb5b [iwl7000_mac80211]
(This is where I wished the WARNING was a bit more informative)
Then look at frequency iwlwifi shows up in the stack traces (after the crash):
find GNAWTY/10032.86.0/ -name \*.kcrash | xargs grep -h -A 60 "RIP: .* kmem_cache_alloc+" | fgrep -A 2 "Call Trace:" | fgrep -v "] <IRQ>" | sed -e 's; \[iwlmvm\];;' -e 's;.*] ;;' | grep -v -e "Call Trace:" -e "--" | sort | uniq -c | sort -n
...
5 security_file_alloc+0x25/0x35
5 selinux_file_alloc_security+0x27/0x48
6 anon_vma_clone+0x50/0x14b
6 anon_vma_fork+0x3a/0x137
6 ? __iwl7000_ieee80211_get_tx_rates+0x276/0x4bf [iwl7000_mac80211]
6 ? iwl_mvm_rx_frame_release+0x1f9/0xccd
6 prepare_creds+0x40/0x119
6 ? put_dec+0x53/0x58
6 SYSC_faccessat+0x42/0x1d8
7 do_execve+0x23/0x25
7 do_execveat_common.isra.32+0x2c3/0x70c
8 ? inotify_handle_event+0x84/0x102
8 inotify_handle_event+0x84/0x102
8 ? iwl_mvm_rx_frame_release+0x75f/0xccd
8 radix_tree_maybe_preload+0x11/0x1d
8 __radix_tree_preload+0x45/0x9e
8 ? _raw_spin_unlock+0xe/0x20
11 _do_fork+0x8d/0x284
13 security_inode_alloc+0x2d/0x3d
13 selinux_inode_alloc_security+0x2a/0x7c
15 copy_process+0xdde/0x18d0
16 ? __iwl7000_ieee80211_proberesp_get+0x1523/0x1fe3 [iwl7000_mac80211]
26 ? _raw_spin_lock_irqsave+0x19/0x1f
56 ? __switch_to+0x25d/0x477
220 __ip6_dst_alloc+0x1e/0x68
322 dst_alloc+0x38/0x94
524 iwl_mvm_set_tx_cmd_rate+0x401/0x94b
920 iwl_mvm_set_tx_cmd_rate+0x10c/0x94b
I'm pretty sure there is a correlation here. I don't have time right now to dig up the stack traces but the following 10 "randomly selected" ReportIDs are available to Google Chrome OS kernel engineers:
find GNAWTY/10032.86.0/ -name \*.kcrash | xargs grep -A 60 'RIP: .* kmem_cache_alloc+' | fgrep "iwl_mvm_set_tx_cmd_rate+" | cut -d '-' -f 2 | sed 's;.kcrash;;' | sort -u | sort -R | head -10
664dabf817953c9b
7c67031ab7f06b92
088d1217fbc14731
c5a10e4ab9ad1194
75e682c55133f4a1
9bdf83a626c3c506
95983efa384e557b
e8c6637613729d44
0e3ff0a13471aeaa
0ab23eb7814cd49e
Comment 1 by zalcorn@chromium.org
, Nov 16