New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781838 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

set_wifi_transmit_power crashes on kevin

Project Member Reported by akes...@chromium.org, Nov 6 2017

Issue description

Cc: groeck@chromium.org wnhuang@chromium.org
Guenter noticed this a little while ago, too.  I swear I also remember something weird where this call was happening too early at bootup and it was causing the "crash", but I can't 100% swear.  I'll see if I can dig up anything.  
Looking at crashes on my own dev machine:

(gdb) bt
#0  0xeb5c3b36 in ?? () from r/lib/libc.so.6
#1  0xeb5d25b2 in raise () from r/lib/libc.so.6
#2  0xeb5d337a in abort () from r/lib/libc.so.6
#3  0xeb7d7580 in base::debug::BreakDebugger () at base/debug/debugger_posix.cc:249
#4  0xeb7e8c68 in logging::LogMessage::~LogMessage (this=0x2) at base/logging.cc:755
#5  0x06fc6364 in (anonymous namespace)::SetWiFiTransmitPower::GetWirelessDeviceIndex ()
    at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:144
#6  (anonymous namespace)::SetWiFiTransmitPower::SetPowerMode (this=<optimized out>, tablet=<optimized out>)
    at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:214
#7  main (argc=2, argv=<optimized out>) at ../../../../../../../../../../mnt/host/source/src/platform2/power_manager/powerd/set_wifi_transmit_power.cc:243

That's:

  LOG(FATAL) << "No wireless driver found";

---

Searching through my email, I find this is something that was talked about a year ago but no resolution came of it.  

...oh, but it looks like b/35582340 was opened but then closed in July as "Won't Fix (Obsolete)"

---

OK, I think that's all the context I have here.  Basically we don't _think_ there's any serious problem but getting it so that this application doesn't "crash" seems like a wise thing to do.
Note also that this is high on the list of kernel "crashes".  puthik@: are you actually planning to look at this?  If not we should find another owner who will actually fix this.
Cc: derat@chromium.org
As discussed earlier, it's super easy to just land a patch to just not make this a fatal error.

...the one problem is that it leaves us open to a race condition expressed like this:

1. powerd starts up
2. powerd tries to call "set_wifi_transmit_power", which fails because WiFi isn't up.
3. WiFi comes up.
4. WiFi won't be running with the correct transmit power mode.

Did I get that right, or is there something else that means the above race won't happen?

---

Here's an idea that might (?) work?  Can we just have powerd register for udev events that say a WiFi device was added?  When that happens it can always call "set_wifi_transmit_power" again?  I think powerd always registers for udev events so it knows about new input devices that come and go, so it shouldn't be too hard to add another udev observer?

---

I suppose we could just get rid of the fatal error without fixing the race condition.  That would leave us in no worse shape than we were before except it wouldn't have anything to remind people that we need to fix the face.


Cc: puthik@chromium.org kirtika@chromium.org cernekee@chromium.org
Owner: ----
+kirtika / cernekee

I'll let wifi team do this then.


Cc: briannorris@chromium.org
> LOG(FATAL) << "No wireless driver found";

Do we think this is caused by a race between powerd and the driver loading, or by the driver failing to initialize the wifi hardware?

Comment 7 by derat@chromium.org, Nov 8 2017

If it's a race, I'm happy to make powerd watch for udev events.

If it's a more serious problem, let's keep the LOG(FATAL) so we have a signal that things are broken.
This behavior is still silly, IMO:

localhost ~ # set_wifi_transmit_power   
localhost ~ # rmmod mwifiex_pcie
localhost ~ # set_wifi_transmit_power 
[1108/125325:FATAL:set_wifi_transmit_power.cc(144)] No wireless driver found

Aborted (core dumped)

It's fine to log things, but there's no reason to force a crash. The coredump is useless, except insofar as it made someone finally pay attention (but do we do that in general? I don't try to panic the kernel just because something unexpected happened). I believe Dan had the right idea here with:

https://b.corp.google.com/issues/35582340#comment4

"We could make it exit with 0 or 1 or whatever instead."

---

About the race...

I'm not aware of any synchronization between powerd and Wifi driver initialization. We have this init script:

/etc/init/preload-network.conf

That tries to optimize getting network drivers loaded before we start up system services, but it's no guarantee -- because network driver firmware load/init can take a while, the network device appears asynchronously. You can be finished with 'modprobe' but still not have a network interface for another second or 2.

So, we definitely need to have something synchronized to the addition of network interfaces. udev is probably the right way to do that.

Wei-Ning, are you still paying attention to this? You closed the last bug about this because you were looking at Intel designs. How does this feature work on Intel systems? Does it have the same problem? (Probably yes?)
I forgot to address this part:

> Do we think this is caused [...] by the driver failing to initialize the wifi hardware?

That *is* possible. There's a non-negligible (but not very large) number of people who were hitting firmware bugs like this:

https://issuetracker.google.com/36264732
https://bugs.chromium.org/p/chromium/issues/detail?id=693724

Stats like this [1] confirm there were at least a few thousand recent instances of "no wifi device present" (but only a handful of unique boots).

So, even if that's the root cause...we can track this (and fix it) via means other than looking for powerd tool crash reports. And we think it's fixed on M-64.

We should fix the race regardless, and also kill the spurious (IMO) crash dumps.


[1]
https://uma.googleplex.com/p/chrome/histograms?endDate=latest&dayCount=7&histograms=Network.Shill.Wifi.DevicePresenceStatus%2CPlatform.WiFiDeviceCount&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CC%2Chw_class%2Ccnt%2CKEVIN%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> I'm not aware of any synchronization between powerd and Wifi driver initialization. We have this init script:

Right.  So we fix the issue by:

1. Have powerd call "set_wifi_transmit_power" on _either_ a tablet mode switch or the WiFi driver getting loaded (signaled by udev).

2. We make "set_wifi_transmit_power" not abort if it's called when the WiFi driver isn't loaded yet.


I don't think either is terribly hard, though personally I wouldn't be super fast at the powerd change just because my C++ is rusty.  ...so I was hoping someone else would volunteer to own it.  ;-)

Components: OS>Systems>Network OS>Kernel>Power
Labels: -Pri-1 Pri-2
Owner: derat@chromium.org
Status: Assigned (was: Untriaged)
I'll do it if someone provides details about the udev events that powerd should watch for.
Probably just device add with SUBSYSTEM=net and DEVTYPE=wlan. e.g., on Kevin:

# cat /sys/devices/platform/f8000000.pcie/pci0000\:00/0000\:00\:00.0/0000\:01\:00.0/net/mlan0/uevent
DEVTYPE=wlan
INTERFACE=mlan0
IFINDEX=5


and on an ath10k system:

# cat /sys/devices/platform/f8000000.pcie/pci0000\:00/0000\:00\:00.0/0000\:01\:00.0/net/wlan0/uevent
DEVTYPE=wlan
INTERFACE=wlan0
IFINDEX=7



And the relevant kernel event on a Kevin:

KERNEL[81505.519413] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
ACTION=add
DEVPATH=/devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0
DEVTYPE=wlan
IFINDEX=6
INTERFACE=mlan0
SEQNUM=1894
SUBSYSTEM=net

...

UDEV  [81505.543407] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
ACTION=add
DEVPATH=/devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0
DEVTYPE=wlan
ID_BUS=pci
ID_MM_CANDIDATE=1
ID_MODEL_ID=0x2b42
ID_NET_DRIVER=mwifiex_pcie
ID_NET_NAME_MAC=wlx1867b010a111
ID_NET_NAME_PATH=wlp1s0
ID_PATH=platform-f8000000.pcie-pci-0000:01:00.0
ID_PATH_TAG=platform-f8000000_pcie-pci-0000_01_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_ID=0x1b4b
IFINDEX=6
INTERFACE=mlan0
SEQNUM=1894
SUBSYSTEM=net
USEC_INITIALIZED=81505530833
If I record with 'udevadm monitor' and run 'modprobe mwifiex_pcie'

localhost ~ # modprobe mwifiex_pcie
KERNEL[322.324547] add      /module/mwifiex_pcie (module)
UDEV  [322.341309] add      /module/mwifiex_pcie (module)
KERNEL[322.354130] add      /bus/pci/drivers/mwifiex_pcie (drivers)
UDEV  [322.362265] add      /bus/pci/drivers/mwifiex_pcie (drivers)
KERNEL[322.499244] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2 (ieee80211)
KERNEL[322.501392] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2/rfkill3 (rfkill)
UDEV  [322.523890] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2 (ieee80211)
KERNEL[322.526913] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
KERNEL[322.530807] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/rx-0 (queues)
UDEV  [322.532423] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/ieee80211/phy2/rfkill3 (rfkill)
KERNEL[322.534867] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-0 (queues)
KERNEL[322.537502] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-1 (queues)
KERNEL[322.538963] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-2 (queues)
KERNEL[322.540234] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-3 (queues)
UDEV  [322.559991] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)
UDEV  [322.567935] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/rx-0 (queues)
UDEV  [322.575223] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-0 (queues)
UDEV  [322.575817] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-1 (queues)
UDEV  [322.580239] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-2 (queues)
UDEV  [322.581402] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0/queues/tx-3 (queues)


We'd want to listen for something where it's guaranteed that the 'mlan0' device is there.  Is that this one?

  UDEV  [322.559991] add      /devices/platform/f8000000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/net/mlan0 (net)


The "set_wifi_transmit_power.cc" program seems to handle net devices of either "mlan0" or "wlan0".  See the code:

  index = if_nametoindex("mlan0");
  if (index)
    return index;

  index = if_nametoindex("wlan0");
  if (index)
    return index;

...so perhaps you'd want additions of net devices that end in mlan0 or wlan0?
Ah, or just listen to Brian.  :-)
BTW, this is wrong:

WirelessDriver SetWiFiTransmitPower::GetWirelessDriverType() {
  if (base::PathExists(base::FilePath(kMwifiexModulePath))) {
    return WirelessDriver::MWIFIEX;
  }
  return WirelessDriver::IWL;
}


There exist Wifi devices that are neither iwlwifi nor mwifiex.
And it's also possible to have the mwifiex module loaded while you're not using it. (We don't do this in production, but it's not too far-fetched.)
> BTW, this is wrong...

If you want to get technical, it's probably also not massively safe to hardcode "mlan0" or "wlan0".  I could sorta believe that it would be possible to plug in a USB based WiFi module and have that module end up being mlan0 / wlan0.  I'm just sayin.

Probably we should file separate bugs about those things, though.
> Probably we should file separate bugs about those things, though.

filed:

https://bugs.chromium.org/p/chromium/issues/detail?id=782924
Status: Started (was: Assigned)
Thanks!
Project Member

Comment 20 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5

commit 8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5
Author: Daniel Erat <derat@chromium.org>
Date: Fri Nov 10 03:16:30 2017

power: Pass devtype to system::UdevSubsystemObserver.

Make powerd's system::Udev class pass the devtype associated
with udev events to system::UdevSubsystemObserver
implementations. This is needed to detect wifi devices
appearing.

Also introduce a UdevEvent struct and make minor
improvements to observer-managing code in Udev and UdevStub.
(Unfortunately, libchrome is still too old for
base::ObserverList to be iterable.)

BUG= chromium:781838 
TEST=manual: verified that powerd (with another change) gets
     udev add events with subsystem "net" and devtype "wlan"

Change-Id: I22719a07e2a46027e3949e09bef0224ee0f0e224
Reviewed-on: https://chromium-review.googlesource.com/760673
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_stub.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher_unittest.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply_unittest.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight_unittest.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/input_watcher_unittest.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_subsystem_observer.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/pluggable_internal_backlight.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/display/display_watcher.h
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/power_supply.cc
[modify] https://crrev.com/8cbc60cb4a9ae568bd331cdef0bbf5c663a054a5/power_manager/powerd/system/udev_stub.cc

Project Member

Comment 21 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/ef19c783e469f03ca8d71d4610b22e8d64bb157c

commit ef19c783e469f03ca8d71d4610b22e8d64bb157c
Author: Daniel Erat <derat@chromium.org>
Date: Fri Nov 10 10:17:24 2017

power: Set wifi transmit power when device added.

Move powerd's Daemon class's wifi transmit power code to a
new policy::WifiController class. Make the new class
subscribe to udev events on the "net" subsystem and set the
power when "wlan" devices are added. This is intended to
work around a race where powerd starts before wifi devices
are initialized.

BUG= chromium:781838 
TEST=added unit tests; also manually verified that powerd
     sets wifi transmit power after device appears when
     booting kevin

Change-Id: Ie7468c8f28da167c17f5648a772166c8943d94e8
Reviewed-on: https://chromium-review.googlesource.com/761407
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Eric Caruso <ejcaruso@chromium.org>

[add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller.h
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/power_manager.gyp
[add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller_unittest.cc
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon.cc
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/udev/93-powerd-late.rules
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon.h
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/daemon_unittest.cc
[modify] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/input_device_controller_unittest.cc
[add] https://crrev.com/ef19c783e469f03ca8d71d4610b22e8d64bb157c/power_manager/powerd/policy/wifi_controller.cc

Thanks for the above!

...sounds like the last thing is to just change the error so it's no longer a "fatal" error, then we're done, right?  Could even put a comment somewhere in the code that says that this case is expected if powerd loads before the WiFi driver and that powerd will automatically retry when the wifi driver is loaded.

:)

Comment 23 by derat@chromium.org, Nov 11 2017

Yep, I have an in-progress change to do that (along with cleaning up various other things in this program). :-)
Project Member

Comment 24 by bugdroid1@chromium.org, Nov 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/fb5d18437804906b4df0c7babc47a5124b2af826

commit fb5d18437804906b4df0c7babc47a5124b2af826
Author: Daniel Erat <derat@chromium.org>
Date: Wed Nov 15 00:03:23 2017

power: Make set_wifi_transmit_power not crash on error.

Update powerd's set_wifi_transmit_power helper program to
return exit code 1 when it fails to find a wifi chipset
instead of aborting. powerd runs the program on boot, and
the chipset sometimes isn't available at that point (e.g. on
kevin).

Also make various other cleanups to this program's code.

BUG= chromium:781838 
TEST=manual: added some logging and checked via
     /var/log/powerd.out that the program still runs
     successfully after restarting powerd

Change-Id: I6cecec6fdbcce69fff4d85717d73a4dbcd4250ce
Reviewed-on: https://chromium-review.googlesource.com/766530
Commit-Ready: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>

[modify] https://crrev.com/fb5d18437804906b4df0c7babc47a5124b2af826/power_manager/powerd/set_wifi_transmit_power.cc

Labels: M-64
Status: Fixed (was: Started)

Comment 27 by derat@chromium.org, Mar 29 2018

Cc: grundler@chromium.org
#26: That query is against M63. Are you still seeing crashes in M64 (i.e. the first release with the fix for this)? I don't see crashes for set_wifi_transmit_power in the top 100 executables when I look at 10176.76.0.
Status: Verified (was: Fixed)
Bulk verify old fixed bugs...

Sign in to add a comment