New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 693724 link

Starred by 14 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug


Show other hotlists

Hotlists containing this issue:
Hotlist-1


Sign in to add a comment

Kevin sometimes does not detect WiFi Module.

Project Member Reported by aashuto...@chromium.org, Feb 17 2017

Issue description

Chrome Version: <From about:version: Google Chrome 56.0.2924.101>
Chrome OS Version: <From about:version: Platform 9000.86.0 & 9000.87.1>
Chrome OS Platform: <Kevin>
Network info: <WiFi>

Please specify Cr-* of the system to which this bug/feature applies (add
the label below).

Steps To Reproduce:
(1) None right now. 


Expected Result:
Device should be able to detect WiFi all the time.

Actual Result:
Device does not detect WiFi Module. 

How frequently does this problem reproduce? (Always, sometimes, hard to
reproduce?)
Sometimes.

What is the impact to the user, and is there a workaround? If so, what is
it?
Workaround: reboot the device.

Please provide any additional information below. Attach a screen shot or
log if possible.
I saw somthing similar earlier : https://bugs.chromium.org/p/chromium/issues/detail?id=683331

For graphics-related bugs, please copy/paste the contents of the about:gpu
page at the end of this report.


 
log-021717-105548.tar.gz
1.6 MB Download
debug-logs_20170217-103514 (1).tgz
2.8 MB Download
Cc: yoshiat@google.com
Owner: snanda@chromium.org
This is a recent regression most likely starting with 9000.86.0

snanda@ - please help assign an owner. Thanks
not reproduced on kevin 
Steps:

1.Connect to wifi network
2.Perform au to 9000.87.1 and reboot the device after successful au.

Results;
Wifi connected successfully after reboot.


Cc: rpra...@marvell.com briannorris@chromium.org dnschn...@chromium.org wnhuang@chromium.org snanda@chromium.org kirtika@chromium.org
Though I'm not 100% certain, I swear there are already bugs about Kevin's WiFi sometimes not coming up at boot and that they aren't new regressions with 9000.86.  My best guess is that you just happened to get a lucky reboot.

Adding a few WiFi folks here.  I'll also dig into the logs a little too.  If someone knows of an existing bug tracking this, please yell.

Added David too since I think he's personally been experiencing various WiFi problems.

Comment 5 Deleted

Ah!  Finally dug through the logs a bit more.  Actually, we have this in your logs:

2017-02-17T10:29:34.709741-08:00 INFO kernel: [   21.818204] mwifiex_pcie 0000:01:00.0: FW failed to be active in time
2017-02-17T10:29:34.709812-08:00 INFO kernel: [   21.818262] mwifiex_pcie 0000:01:00.0: info: mwifiex_fw_dpc: unregister device

---

So I guess the answer is that this appears to be a failure that wasn't analyzed before?

===

Deleting the old comment #5 since it's obviously not the same thing (sorry for the noise).
Just a tiny bit more digging from the logs, I notice that we have:

2017-02-17T10:29:19.558558-08:00 INFO kernel: [    6.667349] mwifiex_pcie 0000:01:00.0: info: FW download over, size 597136 bytes
...
...
2017-02-17T10:29:34.709741-08:00 INFO kernel: [   21.818204] mwifiex_pcie 0000:01:00.0: FW failed to be active in time
2017-02-17T10:29:34.709812-08:00 INFO kernel: [   21.818262] mwifiex_pcie 0000:01:00.0: info: mwifiex_fw_dpc: unregister device


That's roughly 15 seconds and matches the 15 seconds of polling that mwifiex_check_fw_status() will do because:
* MAX_FIRMWARE_POLL_TRIES is 150
* Each poll does msleep(100);

...so we really do seem to be polling for about 15 seconds and never seeing WiFi become active.

Comment 8 by snanda@chromium.org, Feb 17 2017

Owner: diand...@chromium.org
Setting owner to Doug since he is investigating it :).  We can hand-off to Brian once he's back from his vacation.

Comment 9 by dchan@google.com, Feb 17 2017

Cc: gkihumba@chromium.org
gkihumba
Status: Available (was: Unconfirmed)
I'm at a conference all next week, so for all practical purposes I'm done investigating this...
Owner: kirtika@chromium.org
Kirtika volunteered.  Thanks!

Comment 12 by dchan@google.com, Feb 17 2017

Labels: Proj-kevin
Status: Assigned (was: Available)
Cc: aaboagye@chromium.org
Also seen at https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=53382419496

+Aseda: Dumb question, the EC should not have anything to do with the wifi card
on pcie, but the two log messages are suspiciously close and both indicate a timeout. Could you take a look and offer a comment? 


[   10.121472] cros-ec-spi spi5.0: EC failed to respond in time
[   10.121504] cros-ec-spi spi5.0: Command xfer error (err:-110)
[   10.121510] cros-ec-spi spi5.0: error getting time from EC: -110
[   10.121517] cros-ec-rtc cros-ec-rtc: error getting time: -110
[   17.965515] mwifiex_pcie 0000:01:00.0: FW failed to be active in time
[   17.965555] mwifiex_pcie 0000:01:00.0: info: mwifiex_fw_dpc: unregister device
[   36.894191] zram: 2226 (metrics_daemon) Attribute compr_data_size (and others) will be removed. See zram documentation.
[   44.551372] audit: type=1400 audit(1487192668.650:4): avc:  denied  { use } for  pid=4235 comm="init" path="/init" dev="loop1" ino=13 scontext=u:r:init:s0 tcontext=u:r:chromeos:s0 tclass=fd permissive=0
[   44.555000] IPv6: ADDRCONF(NETDEV_UP): veth_android: link is not ready
[   44.559553] device veth_android entered promiscuous mode
[   44.844948] arc0: renamed from slave_android
[   45.354932] audit: type=1400 audit(1487192669.453:5): avc:  denied  { read } for  pid=4235 comm="init" path="/home/.shadow/18b568794391224e8c0c7d0009676d0db7a50e0b/vault/root/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2kfY4hbbCNxwGflXXAAG2kE--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn26Euy3kNrUVyLGW98CMaub---/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2Bn7YJNGBv1iHD8hwGOYT-E--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2LeJoTDpCdSwo7cduqh8tBk--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2fw3UDoMQVuMytwStDOmWXU--" dev="mmcblk0p1" ino=524791 scontext=u:r:init:s0 tcontext=u:object_r:keystore_data_file:s0 tclass=file permissive=0
[   45.359753] audit: type=1400 audit(1487192669.458:6): avc:  denied  { read } for  pid=4235 comm="init" path="/home/.shadow/18b568794391224e8c0c7d0009676d0db7a50e0b/vault/root/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2kfY4hbbCNxwGflXXAAG2kE--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn26Euy3kNrUVyLGW98CMaub---/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2Bn7YJNGBv1iHD8hwGOYT-E--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2LeJoTDpCdSwo7cduqh8tBk--/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2goaKMe8XpIA4BJIX4LXgE---/ECRYPTFS_FNEK_ENCRYPTED.FWY2QZvTWf9fiEStTqoyDg1D.dbfxOBkZwn2Pfku-aFNGVPd2xP0KoXqL---" dev="mmcblk0p1" ino=524796 scontext=u:r:init:s0 tcontext=u:object_r:keystore_data_file:s0 tclass=file permissive=0
[   45.843673] Unhandled debug exception: aarch32 BKPT (0xe0000070) at 0x0000000040391400
[   45.843703] Unhandled debug exception: aarch32 BKPT (0xe000007f) at 0x0000000040391400
[   45.963958] audit: type=1400 audit(1487192670.062:7): avc:  denied  { mounton } for  pid=4235 comm="init" path="/sys/kernel/debug" dev="tmpfs" ino=15749 scontext=u:r:init:s0 tcontext=u:object_r:debugfs:s0 tclass=dir permissive=0
[   45.965240] audit: type=1400 audit(1487192670.063:8): avc:  denied  { module_request } for  pid=4235 comm="init" kmod="fs-configfs" scontext=u:r:init:s0 tcontext=u:r:kernel:s0 tclass=system permissive=0
[   46.009267] binder: 4461:4461 transaction failed 29189, size 0-0
[   46.066251] binder: 4477:4477 transaction failed 29189, size 0-0


The EC shouldn't have anything to do with the WiFi card as far as I'm aware. Here's my analysis of the error messages you pointed out.

2017-02-15T13:03:54.221764-08:00 WARNING kernel: [   10.121472] cros-ec-spi spi5.0: EC failed to respond in time
2017-02-15T13:03:54.221792-08:00 ERR kernel: [   10.121504] cros-ec-spi spi5.0: Command xfer error (err:-110)
2017-02-15T13:03:54.221794-08:00 ERR kernel: [   10.121510] cros-ec-spi spi5.0: error getting time from EC: -110
2017-02-15T13:03:54.221797-08:00 ERR kernel: [   10.121517] cros-ec-rtc cros-ec-rtc: error getting time: -110
2017-02-15T13:03:54.231862-08:00 INFO session_manager[1500]: [INFO:server_backed_state_key_generator.cc(131)] Stable device secret missing!
2017-02-15T13:04:02.065845-08:00 INFO kernel: [   17.965515] mwifiex_pcie 0000:01:00.0: FW failed to be active in time
2017-02-15T13:04:02.065906-08:00 INFO kernel: [   17.965555] mwifiex_pcie 0000:01:00.0: info: mwifiex_fw_dpc: unregister device

2017-02-15 13:03:54 is equivalent to 1487192634 epoch timestamp. Matching that with the EC logs shows the following:

[6330.585149 ERR-GTH]
BAD-in_msg=[00 00 00 ]
[6330.925886 HC 0x18 err 1]
[6330.929679 HC 0x18 err 1]
[6330.930897 HC 0x18 err 1]
[6357.399685 KB poll]
[6365.061159 KB wait]

The "ERR-GTH" indicates that the AP asserted chip select, but didn't actually send the EC anything after 10ms. (What took the AP so long?) We assume it's a glitch and reset to get ready to receive real transactions. The "BAD-in_msg" log indicates the AP sent us 3 bytes, which would have been the start of a new transaction, but those bytes weren't what we were expecting (and were all zeroes).

Shortly afterwards, the EC was able to successfully receive host commands from the AP and respond. ("err 1" indicates that we successfully received and parsed the host command request, but the command doesn't exist for us). We also don't report the successful host commands because it's kinda spammy now. But rest assured that many more host commands came through successfully.

Maybe the kernel was pretty busy at the time and that's how it got briefly out of sync. This does also seem like it happened around boot time. I also noticed in the EC logs, that the AP seemed hung on the previous boot. The user held the power button to force it off and then turned it on again.

Not sure how much that helps :c
EC issues should get a lot better after <https://chromium-review.googlesource.com/c/439713/> lands (and people have the corresponding EC change).  Still more work to do in  bug #57403 , though.

Sometimes if the EC gets an error in just the right place it can wreck havoc.  I didn't do enough digging, but I think I saw that as described in  bug #62478 , comment 4.

...oddly in that last bug Brian also said that it caused WiFi to fail to come up for him, which doesn't make tons of sense to me (??)
*but numbers above are partner tracker numbers...
Labels: -M-56 M-58
Owner: briannorris@chromium.org
Status: Started (was: Assigned)
OK, so I've hacked up remove() support for the RK3399 PCIe driver, which makes it much easier to test this by just removing/reprobing the entire device. e.g.:

while : ; do
echo f8000000.pcie > /sys/bus/platform/drivers/rockchip-pcie/unbind
sleep 0.5
echo f8000000.pcie > /sys/bus/platform/drivers/rockchip-pcie/bind
sleep 5
[ -n "$(iw dev)" ] || break
done

I can try to upload my hacky changes to keep that from crashing eventually. But that process will actually reset the whole bridge and endpoint, including reloading the firmware. An alternate test is something like:

rmmod mwifiex_pcie; modprobe mwifiex_pcie

in a loop. That will *not* reload the FW, since it doesn't actually reset the PCIe endpoint, and so it retains the firmware.

---

Anyway, I eventually was able to reproduce this with the unbind/bind test. This one was quick actually, but I don't know if I was just (un)lucky. Eventually, the FW just stops loading:

...
[  376.907740] PCI host bridge /pcie@f8000000 ranges:
[  376.912647]   MEM 0xfa000000..0xfa5fffff -> 0xfa000000
[  376.917834]    IO 0xfa600000..0xfa6fffff -> 0xfa600000
[  376.923390] rockchip-pcie f8000000.pcie: PCI host bridge to bus 003f:00
[  376.930099] pci_bus 003f:00: root bus resource [bus 00-01]
[  376.935600] pci_bus 003f:00: root bus resource [mem 0xfa000000-0xfa5fffff]
[  376.942490] pci_bus 003f:00: root bus resource [io  0x0000-0xfffff] (bus address [0xfa600000-0xfa6fffff])
[  376.953723] pci 003f:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  376.965255] pci 003f:00:00.0: BAR 8: assigned [mem 0xfa000000-0xfa1fffff]
[  376.972647] pci 003f:01:00.0: BAR 0: assigned [mem 0xfa000000-0xfa0fffff 64bit pref]
[  376.981234] pci 003f:01:00.0: BAR 2: assigned [mem 0xfa100000-0xfa1fffff 64bit pref]
[  376.989517] pci 003f:00:00.0: PCI bridge to [bus 01]
[  376.995091] pci 003f:00:00.0:   bridge window [mem 0xfa000000-0xfa1fffff]
[  377.002014] pcieport 003f:00:00.0: enabling device (0000 -> 0002)
[  377.008603] pcieport 003f:00:00.0: Signaling PME through PCIe PME interrupt
[  377.015595] pci 003f:01:00.0: Signaling PME through PCIe PME interrupt
[  377.022599] mwifiex_pcie 003f:01:00.0: enabling device (0000 -> 0002)
[  377.029093] mwifiex_pcie: try set_consistent_dma_mask(32)
[  377.034538] mwifiex_pcie: PCI memory map Virt0: ffffff8011f80000 PCI memory map Virt2: ffffff8012100000
[  377.044010] mwifiex: rx work enabled, cpus 6
[  378.065360] mwifiex_pcie 003f:01:00.0: info: FW download over, size 597136 bytes
[  378.975367] usb 2-1: new high-speed USB device number 65 using ehci-platform
[  379.102685] usb 2-1: New USB device found, idVendor=1286, idProduct=204e
[  379.109471] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  379.117230] usb 2-1: Product: Bluetooth and Wireless LAN Composite Device
[  379.124564] usb 2-1: Manufacturer: Marvell
[  379.128706] usb 2-1: SerialNumber: 0000000000000000
[  393.222205] mwifiex_pcie 003f:01:00.0: FW failed to be active in time
[  393.228659] mwifiex_pcie 003f:01:00.0: info: mwifiex_fw_dpc: unregister device

Trying to reload the driver (rmmod mwifiex_pcie; modprobe mwifiex_pcie) doesn't actually get the firmware to reload; instead it just stops after loading 0 bytes (!!):

[ 2642.294687] rockchip-dp ff970000.edp: Link Training Clock Recovery success
[ 2642.309455] rockchip-dp ff970000.edp: Link Training success!
[ 2655.888322] mwifiex_pcie 003f:01:00.0: fail to init wakeup for mwifiex
[ 2655.895029] mwifiex_pcie: try set_consistent_dma_mask(32)
[ 2655.900731] mwifiex_pcie: PCI memory map Virt0: ffffff8012280000 PCI memory map Virt2: ffffff8012400000
[ 2655.910371] mwifiex: rx work enabled, cpus 6
[ 2656.026092] mwifiex_pcie 003f:01:00.0: info: FW download over, size 0 bytes
[ 2671.182304] mwifiex_pcie 003f:01:00.0: FW failed to be active in time
[ 2671.188802] mwifiex_pcie 003f:01:00.0: info: mwifiex_fw_dpc: unregister device

And trying again:

[ 2691.367315] mwifiex_pcie 003f:01:00.0: fail to init wakeup for mwifiex
[ 2691.374018] mwifiex_pcie: try set_consistent_dma_mask(32)
[ 2691.380136] mwifiex_pcie: PCI memory map Virt0: ffffff8012580000 PCI memory map Virt2: ffffff8012700000
[ 2691.389923] mwifiex: rx work enabled, cpus 6
[ 2691.505165] mwifiex_pcie 003f:01:00.0: info: FW download over, size 0 bytes

But, just resetting the endpoint again (unbind/rebind) seems to be enough to get it working again.

Either the Marvell FW init sequence isn't reliable here, or maybe we have some kind of issue with power sequencing?

Related to the latter: one random report I made (https://b.corp.google.com/issues/35587376) mentioned that at one point, Wifi-init failures actually correlated with failures in other parts of the boot sequence (DP / display issues). Combined with the mention in comment #14 of another feedback report where EC failures correlated with Wifi FW failure, it seems like other tasks are able to influence this behavior.

I'll take this back from Kirtika, as this is much more likely a platform issue than a general Wifi issue.

Marvell folks (if you're hearing me? I wonder if this should move to the Partner tracker / buganizer): any guesses as to the cause of this behavior? Might a problem with power sequence timing cause this?

I'll see if I can repro this reliably in a reasonable time, and I'll see if I can play with the power sequence timings to affect this.

---

M-56 has definitely sailed, and not likely making M-57
This is a test message.
Kindly allow me to go through the bug and respond within this week.
Filed this to keep Partner issues in the same place:

https://b.corp.google.com/issues/36264732

I can leave this open for now, in case anyone needs to find this externally.
Curious if there has been any progress on this issue. 
Labels: -M-58 M-64
Status: Fixed (was: Started)
This should be fixed with the firmware update in here:

https://issuetracker.google.com/36264732
https://chromium-review.googlesource.com/#/c/717726/

That made it into M-64.

There could be other issues that appear similarly, so if this still happens after M-64...feel free to report a new bug!
Status: Verified (was: Fixed)
No longer reproducible. Closing as verified. 

Sign in to add a comment