set_wifi_transmit_power crashes on kevin |
|||||||||||
Issue descriptionNot sure if this is actually contributing to test failures (e.g. to https://luci-milo.appspot.com/buildbot/chromeos/kevin-paladin/2954 ) but I'm seeing numerous crashdumps on kevin for set_wifi_transmit_power https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos6-row2-rack24-host20/2349203-provision/20170511175204/crashinfo.chromeos6-row2-rack24-host20/ https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/154241131-chromeos-test/chromeos6-row4-rack5-host7/crashinfo.chromeos6-row4-rack5-host7/ At the very least, they are making it harder to investigate and root cause the real issue.
,
Nov 6 2017
Looking at crashes on my own dev machine:
(gdb) bt
#0 0xeb5c3b36 in ?? () from r/lib/libc.so.6
#1 0xeb5d25b2 in raise () from r/lib/libc.so.6
#2 0xeb5d337a in abort () from r/lib/libc.so.6
#3 0xeb7d7580 in base::debug::BreakDebugger () at base/debug/debugger_posix.cc:249
#4 0xeb7e8c68 in logging::LogMessage::~LogMessage (this=0x2) at base/logging.cc:755
#5 0x06fc6364 in (anonymous namespace)::SetWiFiTransmitPower::GetWirelessDeviceIndex ()
at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:144
#6 (anonymous namespace)::SetWiFiTransmitPower::SetPowerMode (this=<optimized out>, tablet=<optimized out>)
at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:214
#7 main (argc=2, argv=<optimized out>) at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:243
That's:
LOG(FATAL) << "No wireless driver found";
---
Searching through my email, I find this is something that was talked about a year ago but no resolution came of it.
...oh, but it looks like b/35582340 was opened but then closed in July as "Won't Fix (Obsolete)"
---
OK, I think that's all the context I have here. Basically we don't _think_ there's any serious problem but getting it so that this application doesn't "crash" seems like a wise thing to do.
,
Nov 8 2017
Note also that this is high on the list of kernel "crashes". puthik@: are you actually planning to look at this? If not we should find another owner who will actually fix this.
,
Nov 8 2017
As discussed earlier, it's super easy to just land a patch to just not make this a fatal error. ...the one problem is that it leaves us open to a race condition expressed like this: 1. powerd starts up 2. powerd tries to call "set_wifi_transmit_power", which fails because WiFi isn't up. 3. WiFi comes up. 4. WiFi won't be running with the correct transmit power mode. Did I get that right, or is there something else that means the above race won't happen? --- Here's an idea that might (?) work? Can we just have powerd register for udev events that say a WiFi device was added? When that happens it can always call "set_wifi_transmit_power" again? I think powerd always registers for udev events so it knows about new input devices that come and go, so it shouldn't be too hard to add another udev observer? --- I suppose we could just get rid of the fatal error without fixing the race condition. That would leave us in no worse shape than we were before except it wouldn't have anything to remind people that we need to fix the face.
,
Nov 8 2017
+kirtika / cernekee I'll let wifi team do this then.
,
Nov 8 2017
> LOG(FATAL) << "No wireless driver found"; Do we think this is caused by a race between powerd and the driver loading, or by the driver failing to initialize the wifi hardware?
,
Nov 8 2017
If it's a race, I'm happy to make powerd watch for udev events. If it's a more serious problem, let's keep the LOG(FATAL) so we have a signal that things are broken.
,
Nov 8 2017
This behavior is still silly, IMO: localhost ~ # set_wifi_transmit_power localhost ~ # rmmod mwifiex_pcie localhost ~ # set_wifi_transmit_power [1108/125325:FATAL:set_wifi_transmit_power.cc(144)] No wireless driver found Aborted (core dumped) It's fine to log things, but there's no reason to force a crash. The coredump is useless, except insofar as it made someone finally pay attention (but do we do that in general? I don't try to panic the kernel just because something unexpected happened). I believe Dan had the right idea here with: https://b.corp.google.com/issues/35582340#comment4 "We could make it exit with 0 or 1 or whatever instead." --- About the race... I'm not aware of any synchronization between powerd and Wifi driver initialization. We have this init script: /etc/init/preload-network.conf That tries to optimize getting network drivers loaded before we start up system services, but it's no guarantee -- because network driver firmware load/init can take a while, the network device appears asynchronously. You can be finished with 'modprobe' but still not have a network interface for another second or 2. So, we definitely need to have something synchronized to the addition of network interfaces. udev is probably the right way to do that. Wei-Ning, are you still paying attention to this? You closed the last bug about this because you were looking at Intel designs. How does this feature work on Intel systems? Does it have the same problem? (Probably yes?)
,
Nov 8 2017
I forgot to address this part: > Do we think this is caused [...] by the driver failing to initialize the wifi hardware? That *is* possible. There's a non-negligible (but not very large) number of people who were hitting firmware bugs like this: https://issuetracker.google.com/36264732 https://bugs.chromium.org/p/chromium/issues/detail?id=693724 Stats like this [1] confirm there were at least a few thousand recent instances of "no wifi device present" (but only a handful of unique boots). So, even if that's the root cause...we can track this (and fix it) via means other than looking for powerd tool crash reports. And we think it's fixed on M-64. We should fix the race regardless, and also kill the spurious (IMO) crash dumps. [1] https://uma.googleplex.com/p/chrome/histograms?endDate=latest&dayCount=7&histograms=Network.Shill.Wifi.DevicePresenceStatus%2CPlatform.WiFiDeviceCount&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CC%2Chw_class%2Ccnt%2CKEVIN%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
,
Nov 8 2017
> I'm not aware of any synchronization between powerd and Wifi driver initialization. We have this init script: Right. So we fix the issue by: 1. Have powerd call "set_wifi_transmit_power" on _either_ a tablet mode switch or the WiFi driver getting loaded (signaled by udev). 2. We make "set_wifi_transmit_power" not abort if it's called when the WiFi driver isn't loaded yet. I don't think either is terribly hard, though personally I wouldn't be super fast at the powerd change just because my C++ is rusty. ...so I was hoping someone else would volunteer to own it. ;-)
,
Nov 8 2017
I'll do it if someone provides details about the udev events that powerd should watch for.
,
Nov 8 2017
Probably just device add with SUBSYSTEM=net and DEVTYPE=wlan. e.g., on Kevin: # cat /sys/devices/platform/f8000000.pcie/pci0000\:00/0000\:00\:00.0/0000\:01\:00.0/net/mlan0/uevent DEVTYPE=wlan INTERFACE=mlan0 IFINDEX=5 and on an ath10k system: # cat /sys/devices/platform/f8000000.pcie/pci0000\:00/0000\:00\:00.0/0000\:01\:00.0/net/wlan0/uevent DEVTYPE=wlan INTERFACE=wlan0 IFINDEX=7 And the relevant kernel event on a Kevin: KERNEL[81505.519413] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net) ACTION=add DEVPATH=/devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 DEVTYPE=wlan IFINDEX=6 INTERFACE=mlan0 SEQNUM=1894 SUBSYSTEM=net ... UDEV [81505.543407] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net) ACTION=add DEVPATH=/devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 DEVTYPE=wlan ID_BUS=pci ID_MM_CANDIDATE=1 ID_MODEL_ID=0x2b42 ID_NET_DRIVER=mwifiex_pcie ID_NET_NAME_MAC=wlx1867b010a111 ID_NET_NAME_PATH=wlp1s0 ID_PATH=platform-f8000000.pcie-pci-0000:01:00.0 ID_PATH_TAG=platform-f8000000_pcie-pci-0000_01_00_0 ID_PCI_CLASS_FROM_DATABASE=Network controller ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller ID_VENDOR_ID=0x1b4b IFINDEX=6 INTERFACE=mlan0 SEQNUM=1894 SUBSYSTEM=net USEC_INITIALIZED=81505530833
,
Nov 8 2017
If I record with 'udevadm monitor' and run 'modprobe mwifiex_pcie'
localhost ~ # modprobe mwifiex_pcie
KERNEL[322.324547] add /module/mwifiex_pcie (module)
UDEV [322.341309] add /module/mwifiex_pcie (module)
KERNEL[322.354130] add /bus/pci/drivers/mwifiex_pcie (drivers)
UDEV [322.362265] add /bus/pci/drivers/mwifiex_pcie (drivers)
KERNEL[322.499244] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2 (ieee80211)
KERNEL[322.501392] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2/rfkill3 (rfkill)
UDEV [322.523890] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2 (ieee80211)
KERNEL[322.526913] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
KERNEL[322.530807] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/rx-0 (queues)
UDEV [322.532423] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2/rfkill3 (rfkill)
KERNEL[322.534867] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-0 (queues)
KERNEL[322.537502] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-1 (queues)
KERNEL[322.538963] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-2 (queues)
KERNEL[322.540234] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-3 (queues)
UDEV [322.559991] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
UDEV [322.567935] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/rx-0 (queues)
UDEV [322.575223] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-0 (queues)
UDEV [322.575817] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-1 (queues)
UDEV [322.580239] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-2 (queues)
UDEV [322.581402] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-3 (queues)
We'd want to listen for something where it's guaranteed that the 'mlan0' device is there. Is that this one?
UDEV [322.559991] add /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
The "set_wifi_transmit_power.cc" program seems to handle net devices of either "mlan0" or "wlan0". See the code:
index = if_nametoindex("mlan0");
if (index)
return index;
index = if_nametoindex("wlan0");
if (index)
return index;
...so perhaps you'd want additions of net devices that end in mlan0 or wlan0?
,
Nov 8 2017
Ah, or just listen to Brian. :-)
,
Nov 8 2017
BTW, this is wrong:
WirelessDriver SetWiFiTransmitPower::GetWirelessDriverType() {
if (base::PathExists(base::FilePath(kMwifiexModulePath))) {
return WirelessDriver::MWIFIEX;
}
return WirelessDriver::IWL;
}
There exist Wifi devices that are neither iwlwifi nor mwifiex.
,
Nov 8 2017
And it's also possible to have the mwifiex module loaded while you're not using it. (We don't do this in production, but it's not too far-fetched.)
,
Nov 8 2017
> BTW, this is wrong... If you want to get technical, it's probably also not massively safe to hardcode "mlan0" or "wlan0". I could sorta believe that it would be possible to plug in a USB based WiFi module and have that module end up being mlan0 / wlan0. I'm just sayin. Probably we should file separate bugs about those things, though.
,
Nov 8 2017
> Probably we should file separate bugs about those things, though. filed: https://bugs.chromium.org/p/chromium/issues/detail?id=782924
,
Nov 9 2017
Thanks!
,
Nov 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5 commit 8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5 Author: Daniel Erat <derat@chromium.org> Date: Fri Nov 10 03:16:30 2017 power: Pass devtype to system::UdevSubsystemObserver. Make powerd's system::Udev class pass the devtype associated with udev events to system::UdevSubsystemObserver implementations. This is needed to detect wifi devices appearing. Also introduce a UdevEvent struct and make minor improvements to observer-managing code in Udev and UdevStub. (Unfortunately, libchrome is still too old for base::ObserverList to be iterable.) BUG= chromium:781838 TEST=manual: verified that powerd (with another change) gets udev add events with subsystem "net" and devtype "wlan" Change-Id: I22719a07e2a46027e3949e09bef0224ee0f0e224 Reviewed-on: https://chromium-review.googlesource.com/760673 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_stub.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher_unittest.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply_unittest.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight_unittest.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher_unittest.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_subsystem_observer.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher.h [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply.cc [modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_stub.cc
,
Nov 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/ef19c783e469f03ca8d71d4610b22e8d64bb157c commit ef19c783e469f03ca8d71d4610b22e8d64bb157c Author: Daniel Erat <derat@chromium.org> Date: Fri Nov 10 10:17:24 2017 power: Set wifi transmit power when device added. Move powerd's Daemon class's wifi transmit power code to a new policy::WifiController class. Make the new class subscribe to udev events on the "net" subsystem and set the power when "wlan" devices are added. This is intended to work around a race where powerd starts before wifi devices are initialized. BUG= chromium:781838 TEST=added unit tests; also manually verified that powerd sets wifi transmit power after device appears when booting kevin Change-Id: Ie7468c8f28da167c17f5648a772166c8943d94e8 Reviewed-on: https://chromium-review.googlesource.com/761407 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Eric Caruso <ejcaruso@chromium.org> [add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller.h [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/power_manager.gyp [add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller_unittest.cc [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon.cc [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/udev/93-powerd-late.rules [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon.h [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon_unittest.cc [modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/input_device_controller_unittest.cc [add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller.cc
,
Nov 10 2017
Thanks for the above! ...sounds like the last thing is to just change the error so it's no longer a "fatal" error, then we're done, right? Could even put a comment somewhere in the code that says that this case is expected if powerd loads before the WiFi driver and that powerd will automatically retry when the wifi driver is loaded. :)
,
Nov 11 2017
Yep, I have an in-progress change to do that (along with cleaning up various other things in this program). :-)
,
Nov 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/fb5d18437804906b4df0c7babc47a5124b2af826 commit fb5d18437804906b4df0c7babc47a5124b2af826 Author: Daniel Erat <derat@chromium.org> Date: Wed Nov 15 00:03:23 2017 power: Make set_wifi_transmit_power not crash on error. Update powerd's set_wifi_transmit_power helper program to return exit code 1 when it fails to find a wifi chipset instead of aborting. powerd runs the program on boot, and the chipset sometimes isn't available at that point (e.g. on kevin). Also make various other cleanups to this program's code. BUG= chromium:781838 TEST=manual: added some logging and checked via /var/log/powerd.out that the program still runs successfully after restarting powerd Change-Id: I6cecec6fdbcce69fff4d85717d73a4dbcd4250ce Reviewed-on: https://chromium-review.googlesource.com/766530 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/fb5d18437804906b4df0c7babc47a5124b2af826/power_manager/powerd/set_wifi_transmit_power.cc
,
Nov 15 2017
,
Mar 29 2018
Are the crashes being reported here the same thing? https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(productdata)%20WHERE%20Key%3D%27hwclass%27%20AND%20Value%20LIKE%20%27KEVIN%20%25%27)%20AND%20product.Version%3D%2710032.86.0%27#-property-selector,-stablesignature:50,magicsignaturesorted:50 In any case, 4/5 crashes reported on kevin are still for set_wifi_transmit_power.
,
Mar 29 2018
#26: That query is against M63. Are you still seeing crashes in M64 (i.e. the first release with the fix for this)? I don't see crashes for set_wifi_transmit_power in the top 100 executables when I look at 10176.76.0.
,
Mar 29 2018
derat, you are totally right - my apologies - I confused 10323 with 10032. It's clearly fixed in 10323: https://crash.corp.google.com/browse?q=product.name%3D%27ChromeOS%27%20AND%20EXISTS%20(SELECT%201%20FROM%20UNNEST(productdata)%20WHERE%20Key%3D%27hwclass%27%20AND%20Value%20LIKE%20%27KEVIN%20%25%27)%20AND%20product.Version%3D%2710323.62.0%27
,
Sep 13
Bulk verify old fixed bugs... |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by diand...@chromium.org
, Nov 6 2017