New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 869205 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Nyan devices failing network_WiFi_SuspendStress.Hidden testcase with "FAIL(Discovery timed out)" error on channel 36/48

Project Member Reported by jmuppala@chromium.org, Jul 30

Issue description

devices nyan_big, nyan_blaze and nyan_kitty failing on channel 36 or 48, passes on channel 6 always.

The same devices pass the following testcase: network_WiFi_SimpleConnect.wifi_checkHidden without any issues using same channels 6, 36 and 48. 

hostnames:
nyan_big -> chromeos2-row11-rack5-host4 (fails only on channel 36)
nyan_blaze -> chromeos2-row11-rack2-host1 (fails only on channel 36)
nyan_kitty -> chromeos15-row2-rack2-host2(fails on channel 36 or 48)

Logs@
https://stainless.corp.google.com/search?view=matrix&row=build&col=model&first_date=2018-05-27&last_date=2018-06-30&test=network_WiFi_SuspendStress.Hidden&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false



Sample failure:
06/30 09:55:07.704 DEBUG|             utils:0218| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x  -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22' -az --no-o --no-g  root@chromeos15-row2-rack2-host2:"/tmp/autoserv-4P8Vtv/sysinfo.pickle" "/tmp/tmpeSiKiA"'
06/30 09:55:08.134 DEBUG|              test:0420| after_iteration_hooks completed
06/30 09:55:08.134 WARNI|              test:0637| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 116, in run_once
    self.context.assert_connect_wifi(assoc_params)
  File "/usr/local/autotest/server/cros/network/wifi_test_context_manager.py", line 273, in assert_connect_wifi
    connect_name, assoc_result.failure_reason))
TestFail: Expected connection to SuspendStress_b_k1lqu_ch36 to succeed, but it failed with reason: FAIL(Discovery timed out).
06/30 09:55:08.139 DEBUG|              test:0642| Running cleanup for test.
06/30 09:55:08.140 DEBUG|   logging_manager:0627| Logging subprocess finished
06/30 09:55:08.144 DEBUG|   logging_manager:0627| Logging subprocess finished
 
Cc: briannorris@chromium.org kirtika@chromium.org
Labels: -Pri-3 Pri-2
Continues to fail on nyan and veyron devices...

https://stainless.corp.google.com/search?view=matrix&row=queued_date&col=model&test=network_WiFi_SuspendStress.Hidden&status=GOOD&status=WARN&status=FAIL&status=ERROR&reason=Discovery+timed+out&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=25


Below is partial messages log that device initially associated to shows SuspendStress_a_q1sgp_ch6. Followed by a disconnect from this ssid and failed association back to the same ssid. Then it seems to configure SuspendStress_b_4fru8_ch36 but after "Discovering..." it tries to associate back to SuspendStress_a_q1sgp_ch6. I also only see 1 hostapd_router log file. Autoserve debug logs claim that AP was configured with SuspendStress_b_4fru8_ch36 and messages log below seems to indicate the same but in that case shouldn't there be a second hostapd_router log file?

Here is a link to the complete logs for nyan_big at chromeos2-row11-rack5-host4 - https://stainless.corp.google.com/browse/chromeos-autotest-results/225065731-chromeos-test/



2018-08-08T10:48:43.381206-07:00 INFO kernel: [ 4438.856462] mwifiex_sdio mmc1:0001:1: info: trying to associate to SuspendStress_a_q1sgp_ch6 and bssid 04:f0:21:03:7e:c0
2018-08-08T10:48:43.391190-07:00 INFO kernel: [ 4438.867618] mwifiex_sdio mmc1:0001:1: info: trying to associate	to 'SuspendStress_a_q1sgp_ch6' bssid 04:f0:21:03:7e:c0
2018-08-08T10:48:43.411206-07:00 INFO kernel: [ 4438.890249] mwifiex_sdio mmc1:0001:1: info: associated to bssid 04:f0:21:03:7e:c0 successfully
2018-08-08T10:48:43.411219-07:00 INFO kernel: [ 4438.890478] IPv6: ADDRCONF(NETDEV_CHANGE): mlan0: link becomes ready
2018-08-08T10:48:43.420834-07:00 ERR wpa_supplicant[532]: WMM AC: Missing IEs
2018-08-08T10:48:43.753642-07:00 INFO avahi-daemon[1818]: Joining mDNS multicast group on interface mlan0.IPv4 with address 192.168.0.14.
2018-08-08T10:48:43.757355-07:00 INFO avahi-daemon[1818]: New relevant interface mlan0.IPv4 for mDNS.
2018-08-08T10:48:43.757469-07:00 INFO avahi-daemon[1818]: Registering new address record for 192.168.0.14 on mlan0.IPv4.
2018-08-08T10:48:43.794373-07:00 ERR shill[1013]: [ERROR:error.cc(138)] [dns_client.cc(112)]: No valid DNS server addresses
2018-08-08T10:48:43.794659-07:00 ERR shill[1013]: [ERROR:http_request.cc(125)] Failed to start DNS client: No valid DNS server addresses
2018-08-08T10:48:43.800969-07:00 ERR shill[1013]: [ERROR:error.cc(138)] [dns_client.cc(112)]: No valid DNS server addresses
2018-08-08T10:48:43.801270-07:00 ERR shill[1013]: [ERROR:http_request.cc(125)] Failed to start DNS client: No valid DNS server addresses
2018-08-08T10:48:44.141395-07:00 INFO sshd[32362]: Accepted publickey for root from 100.116.60.159 port 43260 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU
2018-08-08T10:48:44.252654-07:00 NOTICE ag[32369]: autotest server[stack::ipv4_address_and_prefix|addresses|run] -> ssh_run(ip addr show mlan0 2> /dev/null)
2018-08-08T10:48:44.257434-07:00 INFO sshd[32362]: Received disconnect from 100.116.60.159 port 43260:11: disconnected by user
2018-08-08T10:48:44.257500-07:00 INFO sshd[32362]: Disconnected from user root 100.116.60.159 port 43260
2018-08-08T10:48:44.390457-07:00 INFO sshd[32371]: Accepted publickey for root from 100.116.60.159 port 43262 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU
2018-08-08T10:48:44.501706-07:00 NOTICE ag[32378]: autotest server[stack::ping|ping|run] -> ssh_run(ping -c 10 192.168.0.254)
2018-08-08T10:48:46.812387-07:00 ERR shill[1013]: [ERROR:error.cc(138)] [dns_client.cc(112)]: No valid DNS server addresses
2018-08-08T10:48:46.812608-07:00 ERR shill[1013]: [ERROR:http_request.cc(125)] Failed to start DNS client: No valid DNS server addresses
2018-08-08T10:48:49.822348-07:00 ERR shill[1013]: [ERROR:error.cc(138)] [dns_client.cc(112)]: No valid DNS server addresses
2018-08-08T10:48:49.825375-07:00 ERR shill[1013]: [ERROR:http_request.cc(125)] Failed to start DNS client: No valid DNS server addresses
2018-08-08T10:48:53.531673-07:00 INFO sshd[32371]: Received disconnect from 100.116.60.159 port 43262:11: disconnected by user
2018-08-08T10:48:53.534832-07:00 INFO sshd[32371]: Disconnected from user root 100.116.60.159 port 43262
2018-08-08T10:48:53.741606-07:00 INFO kernel: [ 4449.175377] mwifiex_sdio mmc1:0001:1: info: successfully disconnected from 04:f0:21:03:7e:c0:	reason code 3
2018-08-08T10:48:53.762412-07:00 INFO kernel: [ 4449.191175] cfg80211: Calling CRDA to update world regulatory domain
2018-08-08T10:48:53.763074-07:00 ERR wpa_supplicant[532]: nl80211: Failed to open /proc/sys/net/ipv4/conf/mlan0/drop_unicast_in_l2_multicast: No such file or directory
2018-08-08T10:48:53.763086-07:00 ERR wpa_supplicant[532]: nl80211: Failed to set IPv4 unicast in multicast filter
2018-08-08T10:48:53.771250-07:00 INFO kernel: [ 4449.208015] cfg80211: World regulatory domain updated:
2018-08-08T10:48:53.771267-07:00 INFO kernel: [ 4449.208026] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
2018-08-08T10:48:53.771272-07:00 INFO kernel: [ 4449.208032] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771276-07:00 INFO kernel: [ 4449.208038] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771280-07:00 INFO kernel: [ 4449.208044] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771283-07:00 INFO kernel: [ 4449.208049] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771288-07:00 INFO kernel: [ 4449.208054] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771292-07:00 INFO kernel: [ 4449.208059] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771295-07:00 INFO kernel: [ 4449.208064] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm)
2018-08-08T10:48:53.771299-07:00 INFO kernel: [ 4449.208069] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm)
2018-08-08T10:48:53.771302-07:00 INFO kernel: [ 4449.208422] cfg80211: Calling CRDA for country: US
2018-08-08T10:48:53.769417-07:00 ERR shill[1013]: [ERROR:wifi.cc(988)] Disconnected due to inability to connect to the service.
2018-08-08T10:48:53.781237-07:00 INFO kernel: [ 4449.214318] cfg80211: Regulatory domain changed to country: US
2018-08-08T10:48:53.781255-07:00 INFO kernel: [ 4449.214329] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
2018-08-08T10:48:53.781261-07:00 INFO kernel: [ 4449.214336] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm)
2018-08-08T10:48:53.781266-07:00 INFO kernel: [ 4449.214341] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm)
2018-08-08T10:48:53.781270-07:00 INFO kernel: [ 4449.214346] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm)
2018-08-08T10:48:53.781274-07:00 INFO kernel: [ 4449.214352] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2300 mBm)
2018-08-08T10:48:53.781279-07:00 INFO kernel: [ 4449.214357] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm)
2018-08-08T10:48:53.781283-07:00 INFO kernel: [ 4449.214362] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm)
2018-08-08T10:48:53.841335-07:00 INFO kernel: [ 4449.277523] mwifiex_sdio mmc1:0001:1: info: scan: num_probes = 4
2018-08-08T10:48:53.882374-07:00 INFO avahi-daemon[1818]: Withdrawing address record for 192.168.0.14 on mlan0.
2018-08-08T10:48:53.882560-07:00 INFO avahi-daemon[1818]: Leaving mDNS multicast group on interface mlan0.IPv4 with address 192.168.0.14.
2018-08-08T10:48:53.882686-07:00 INFO avahi-daemon[1818]: Interface mlan0.IPv4 no longer relevant for mDNS.
2018-08-08T10:48:57.522696-07:00 INFO kernel: [ 4452.936080] mwifiex_sdio mmc1:0001:1: info: trying to associate to SuspendStress_a_q1sgp_ch6 and bssid 04:f0:21:03:7e:c0
2018-08-08T10:48:57.542206-07:00 INFO kernel: [ 4452.957835] mwifiex_sdio mmc1:0001:1: info: trying to associate	to 'SuspendStress_a_q1sgp_ch6' bssid 04:f0:21:03:7e:c0
2018-08-08T10:48:58.561541-07:00 INFO kernel: [ 4453.972705] mwifiex_sdio mmc1:0001:1: ASSOC_RESP: failed, status code=2	err=0xfffc a_id=0x3fff
2018-08-08T10:48:58.561592-07:00 INFO kernel: [ 4453.972955] mwifiex_sdio mmc1:0001:1: info: association to bssid 04:f0:21:03:7e:c0 failed
2018-08-08T10:48:58.582562-07:00 ERR wpa_supplicant[532]: nl80211: Failed to open /proc/sys/net/ipv4/conf/mlan0/drop_unicast_in_l2_multicast: No such file or directory
2018-08-08T10:48:58.582775-07:00 ERR wpa_supplicant[532]: nl80211: Failed to set IPv4 unicast in multicast filter
2018-08-08T10:48:59.092320-07:00 INFO kernel: [ 4454.508843] mwifiex_sdio mmc1:0001:1: info: scan: num_probes = 4
2018-08-08T10:48:59.753853-07:00 DEBUG shill_xmlrpc_server[30386]: [DEBUG] connect_wifi()
2018-08-08T10:48:59.754588-07:00 DEBUG shill_xmlrpc_server[30386]: [DEBUG] Constructing AssociationParameters object with args=[], kwargs={'save_credentials': False, 'association_timeout': 15, 'ssid': 'SuspendStress_b_4fru8_ch36', 'station_type': None, 'configuration_timeout': 15, 'security_config': {'xmlrpc_struct_type_key': 'SecurityConfig', 'security': 'none'}, 'autoconnect': None, 'expect_failure': False, 'discovery_timeout': 15, 'bgscan_config': {'signal': -50, 'short_interval': 30, 'long_interval': 180, 'interface': None, 'xmlrpc_struct_type_key': 'BgscanConfiguration', 'method': 'default'}, 'guid': None, 'is_hidden': True}
2018-08-08T10:48:59.758660-07:00 DEBUG shill_xmlrpc_server[30386]: [DEBUG] Constructing SecurityConfig object with args=[], kwargs={'security': 'none'}
2018-08-08T10:48:59.759299-07:00 DEBUG shill_xmlrpc_server[30386]: [DEBUG] Constructing BgscanConfiguration object with args=[], kwargs={'interface': None, 'signal': -50, 'long_interval': 180, 'short_interval': 30, 'method': 'default'}
2018-08-08T10:48:59.760160-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Using default interface for bgscan configuration
2018-08-08T10:48:59.760656-07:00 DEBUG shill_xmlrpc_server[30386]: [DEBUG] list_controlled_wifi_interfaces()
2018-08-08T10:48:59.775047-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Attempting to connect to SuspendStress_b_4fru8_ch36
2018-08-08T10:48:59.775503-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Configuring SuspendStress_b_4fru8_ch36 as a hidden network.
2018-08-08T10:48:59.805841-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Configured service: SuspendStress_b_4fru8_ch36
2018-08-08T10:48:59.806315-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Configured hidden service: SuspendStress_b_4fru8_ch36
2018-08-08T10:48:59.806740-07:00 INFO shill_xmlrpc_server[30386]: [INFO] Discovering...
2018-08-08T10:49:02.771987-07:00 INFO kernel: [ 4458.169198] mwifiex_sdio mmc1:0001:1: info: scan: num_probes = 4
2018-08-08T10:49:02.851350-07:00 INFO kernel: [ 4458.253101] mwifiex_sdio mmc1:0001:1: info: trying to associate to SuspendStress_a_q1sgp_ch6 and bssid 04:f0:21:03:7e:c0
2018-08-08T10:49:02.871219-07:00 INFO kernel: [ 4458.268208] mwifiex_sdio mmc1:0001:1: info: trying to associate	to 'SuspendStress_a_q1sgp_ch6' bssid 04:f0:21:03:7e:c0
2018-08-08T10:49:03.896001-07:00 INFO kernel: [ 4459.282219] mwifiex_sdio mmc1:0001:1: ASSOC_RESP: failed, status code=2	err=0xfffc a_id=0x3fff
2018-08-08T10:49:03.896124-07:00 INFO kernel: [ 4459.282467] mwifiex_sdio mmc1:0001:1: info: association to bssid 04:f0:21:03:7e:c0 failed
2018-08-08T10:49:03.903470-07:00 ERR wpa_supplicant[532]: nl80211: Failed to open /proc/sys/net/ipv4/conf/mlan0/drop_unicast_in_l2_multicast: No such file or directory
2018-08-08T10:49:03.903588-07:00 ERR wpa_supplicant[532]: nl80211: Failed to set IPv4 unicast in multicast filter
Owner: briannorris@chromium.org
Status: Started (was: Unconfirmed)
I believe Qualcomm is seeing this too, and they've actually done a bit of debugging:

https://issuetracker.google.com/118664967

It looks like we're still trying to connect to the old (no-longer running) channel 6 AP when we're supposed to be switching to a new channel 36 AP. (Or, the same with 36 -> 48 -- this test runs a loop across {6,36,48}.) This is a combination of the fact that hidden scan results don't immediately get evicted -- they hang around for a little while -- and that shill/supplicant haven't been told to kill the old service. So supplicant is busy telling us to reconnect (fruitlessly), sucking up our time, and preventing us from scanning promptly for the next AP. This understandably causes timeouts.

I'm cooking up a fix to clear out the old Service in between each loop iteration, which should hopefully avoid this problem.
Issue 525139 has been merged into this issue.
Project Member

Comment 4 by bugdroid1@chromium.org, Nov 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/06b59ffd4639ec9038d65101bb7c7532355476e6

commit 06b59ffd4639ec9038d65101bb7c7532355476e6
Author: Brian Norris <briannorris@chromium.org>
Date: Tue Nov 13 19:57:53 2018

[autotest] network_WiFi_SuspendStress: clear networks between configs

Scan results may remain listed by a driver for some time after the last
probe response we've seen (e.g., expiry time can be around 30 seconds).
In this test, we run multiple configurations (e.g., AP channels)
back-to-back, without clearing any of the shill profile information or
wpa_supplicant configuration. Together, this means that as we start a
second iteration of this loop, sometimes
(a) the AP goes down (setting up the next hostapd instance tears down
    the old one)
(b) DUT disconnects
(c) new AP (new channel) starts up
(d) DUT hasn't found new AP yet, but still has old results --
    wpa_supplicant tries to connect (shill never told it not to)
(e) DUT wastes a lot of time on associating to an AP that doesn't exist,
    and it can't perform additional scans in the meantime
(f) concurrently with (d) and (e), the test is sending a Connect request
    to shill; this connection times out in the discovery phase

If we force a disconnect and clear out the shill profile before we start
the next loop, then we won't be affected by old profile results.

BUG= chromium:869205 , b:118664967
TEST=`test_that ... network_WiFi_SuspendStress.Hidden` on a few devices

Change-Id: I5e0b66beea99a1f37f81577050639e5799862ca4
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/c/1327510
Reviewed-by: Grant Grundler <grundler@chromium.org>

[modify] https://crrev.com/06b59ffd4639ec9038d65101bb7c7532355476e6/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py

Status: Fixed (was: Started)

Sign in to add a comment