New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 827413 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Feature



Sign in to add a comment

shill: make robust to missing cfg80211/nl80211

Project Member Reported by briannorris@chromium.org, Mar 30 2018

Issue description

Forked from comments like this:

https://bugs.chromium.org/p/chromium/issues/detail?id=826900#c7

Today, shill will crash if it can't retrieve the nl80211 family ID (this can happen if the cfg80211 module is not loaded):

if (netlink_manager_) {
    netlink_manager_->Init();
    uint16_t nl80211_family_id =
        netlink_manager_->GetFamily(Nl80211Message::kMessageTypeString,
                                    Bind(&Nl80211Message::CreateMessage));
    if (nl80211_family_id == NetlinkMessage::kIllegalMessageType) {
      LOG(FATAL) << "Didn't get a legal message type for 'nl80211' messages.";
    }

We have previously relied on an init script to forcibly 'modprobe cfg80211' before starting shill, and recently we fell back to just relying on the appropriate module(s) being loaded automatically when the first nl80211 request gets made:

https://elixir.bootlin.com/linux/v4.15.14/source/net/netlink/genetlink.c#L868

(That defines the netlink mechanism, supported as of Linux v3.3; nl80211 started using this in v3.11. We've ported that to all our current kernels now, so it should be working.)

This still makes things a bit fragile though. If we ever determine that the autoloading is getting in our way though (see recent issues with the iwl7000 driver), this would be a major blocker to unravel. So I'd like to investigate whether we can remove this dependency.

This bug should track our ability to remove this assumption, and instead make shill dynamically pick up any nl80211 hooks when devices are loaded. (Or else, serve as a pointer for anyone who claims we can fork the cfg80211 stack in the future and disable nl80211 autoloading. [1])

A first stab at this shows that the above FATAL assertion is just used for caching the NL80211 family ID, and it isn't immediately required. However, it seems like shill usually only picks up new Wifi devices by listening for nl80211 events...so this is a chicken and egg problem :)

IIUC, to really unravel this, we'd have to plumb into the uevent framework / udev to get shill to pick up Wifi devices that way instead, and only start using nl80211 after it has found a Wifi device.

Or let me know if I'm misunderstanding something so far.


[1] Feel free to loop in anyone from Intel, in case they want to claim this is easy.
 

Comment 1 by kirtika@google.com, Mar 30 2018

The original behavior was added here and has not been touched much in 6 years: 
https://chromium-review.googlesource.com/#/c/chromiumos/platform/shill/+/44770/


>(Or else, serve as a pointer for anyone who claims we can fork the cfg80211 stack in the future and disable nl80211 autoloading. [1])
> [1] Feel free to loop in anyone from Intel, in case they want to claim this is easy.
 
I'd like to add that its really our fault if its not easy today - we can't be sitting on a ~5 year old codebase that supports the cfg80211 version of ~3 years ago and either doesnt support or breaks on newer netlink messages. Irrespective of the cfg80211 forking, this will come back to bite us later. 

Labels: Enterprise-Triaged

Sign in to add a comment