Issue metadata
Sign in to add a comment
|
[Autotest failure] network_WiFi_RoamFT: failing on various boards with "Failed to roam" error |
||||||||||||||||||||||
Issue description
Failing on various boards with "Failed to roam" error. Failing pretty consistently on nocturne.
10/29 03:10:34.554 DEBUG| utils:0287| [stdout] Connected to 02:00:00:00:03:00 (on wlan0)
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] SSID: RoamFT_a_8dlzh_ch1
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] freq: 2412
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] RX: 8156 bytes (39 packets)
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] TX: 6558 bytes (34 packets)
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] signal: -3 dBm
10/29 03:10:34.555 DEBUG| utils:0287| [stdout] tx bitrate: 54.0 MBit/s
10/29 03:10:34.556 DEBUG| utils:0287| [stdout]
10/29 03:10:34.556 DEBUG| utils:0287| [stdout] bss flags: short-slot-time
10/29 03:10:34.556 DEBUG| utils:0287| [stdout] dtim period: 2
10/29 03:10:34.556 DEBUG| utils:0287| [stdout] beacon int: 100
10/29 03:10:34.567 DEBUG| ssh_host:0310| Running (ssh) '/bin/ip link del veth0' from 'execute|_call_run_once|run_once|delete_link|run|run_very_slowly'
10/29 03:10:35.482 DEBUG| utils:0287| [stderr] Cannot find device "veth0"
10/29 03:10:35.493 DEBUG| ssh_host:0310| Running (ssh) '/bin/ip link del veth1' from 'execute|_call_run_once|run_once|delete_link|run|run_very_slowly'
10/29 03:10:36.478 DEBUG| utils:0287| [stderr] Cannot find device "veth1"
10/29 03:10:36.489 DEBUG| ssh_host:0310| Running (ssh) 'ip link add veth0 type veth peer name veth1' from '_call_test_function|execute|_call_run_once|run_once|run|run_very_slowly'
10/29 03:10:37.587 DEBUG| ssh_host:0310| Running (ssh) 'ifconfig veth0 up' from '_call_test_function|execute|_call_run_once|run_once|run|run_very_slowly'
10/29 03:10:38.675 DEBUG| ssh_host:0310| Running (ssh) 'ifconfig veth1 up' from '_call_test_function|execute|_call_run_once|run_once|run|run_very_slowly'
10/29 03:10:40.246 DEBUG| ssh_host:0310| Running (ssh) 'ip link set veth0 master hostapbr0' from '_call_test_function|execute|_call_run_once|run_once|run|run_very_slowly'
10/29 03:10:41.314 DEBUG| ssh_host:0310| Running (ssh) 'ip link set veth1 master hostapbr1' from '_call_test_function|execute|_call_run_once|run_once|run|run_very_slowly'
10/29 03:10:42.322 DEBUG| ssh_host:0310| Running (ssh) '/usr/sbin/iw dev managed0 set txpower fixed 100' from 'execute|_call_run_once|run_once|set_tx_power|run|run_very_slowly'
10/29 03:10:43.471 DEBUG| ssh_host:0310| Running (ssh) 'su wpa -s /bin/bash -c "/usr/bin/wpa_cli scan"' from 'execute|_call_run_once|run_once|run_wpa_cli_cmd|run|run_very_slowly'
10/29 03:10:43.926 DEBUG| utils:0287| [stdout] Selected interface 'wlan0'
10/29 03:10:43.927 DEBUG| utils:0287| [stdout] OK
10/29 03:10:43.990 INFO |network_WiFi_RoamF:0182| Attempting to roam.
10/29 03:10:43.994 DEBUG| ssh_host:0310| Running (ssh) '/usr/sbin/iw dev wlan0 link' from 'wait_for_roam|poll_for_condition|attempt_roam|get_current_bssid|run|run_very_slowly'
10/29 03:10:44.558 DEBUG| utils:0287| [stdout] Connected to 02:00:00:00:03:00 (on wlan0)
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] SSID: RoamFT_a_8dlzh_ch1
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] freq: 2412
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] RX: 8519 bytes (45 packets)
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] TX: 6654 bytes (35 packets)
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] signal: -33 dBm
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] tx bitrate: 54.0 MBit/s
10/29 03:10:44.559 DEBUG| utils:0287| [stdout]
10/29 03:10:44.559 DEBUG| utils:0287| [stdout] bss flags: short-slot-time
10/29 03:10:44.560 DEBUG| utils:0287| [stdout] dtim period: 2
10/29 03:10:44.560 DEBUG| utils:0287| [stdout] beacon int: 100
10/29 03:10:44.561 DEBUG| wifi_client:0972| Current BSSID is 02:00:00:00:03:00.
10/29 03:10:45.066 DEBUG| ssh_host:0310| Running (ssh) '/usr/sbin/iw dev wlan0 link' from 'wait_for_roam|poll_for_condition|attempt_roam|get_current_bssid|run|run_very_slowly'
.
.
.
.
.
10/29 03:11:15.099 DEBUG| utils:0287| [stdout] Connected to 02:00:00:00:03:00 (on wlan0)
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] SSID: RoamFT_a_8dlzh_ch1
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] freq: 2412
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] RX: 12497 bytes (70 packets)
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] TX: 6750 bytes (36 packets)
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] signal: -20 dBm
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] tx bitrate: 54.0 MBit/s
10/29 03:11:15.100 DEBUG| utils:0287| [stdout]
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] bss flags: short-slot-time
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] dtim period: 2
10/29 03:11:15.100 DEBUG| utils:0287| [stdout] beacon int: 100
10/29 03:11:15.101 DEBUG| wifi_client:0972| Current BSSID is 02:00:00:00:03:00.
10/29 03:11:15.101 ERROR| utils:2769| Timed out waiting for condition: Wait for a roam to the given bssid
10/29 03:11:15.101 DEBUG| wifi_client:0991| Failed to roam to 02:00:00:00:04:00 in 15.732602 seconds.
10/29 03:11:15.101 DEBUG| test:0381| Test failed due to Failed to roam.. Exception log follows the after_iteration_hooks.
.
.
.
10/29 03:11:40.995 WARNI| test:0606| The test failed with the following exception
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 600, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 800, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 464, in execute
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 371, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/server/site_tests/network_WiFi_RoamFT/network_WiFi_RoamFT.py", line 189, in run_once
raise error.TestFail('Failed to roam.')
https://stainless.corp.google.com/search?view=matrix&row=board_model&col=build&first_date=2018-10-04&last_date=2018-10-31&test=network_WiFi_RoamFT.mixedEAP&board=auron_paine%7Ccandy%7Ccaroline%7Cdaisy_skate%7Celm%7Ceve%24%7Cgrunt%7Ckevin%7Cnocturne%7Cnyan_big%7Coctopus%7Cpeach_pit%7Cpeppy%7Crelm%7Csamus%7Cscarlet%7Cveyron_speedy&status=GOOD&status=FAIL&status=ERROR&status=ABORT&reason=Failed+to+roam.&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Dec 21
I guess we're trying one approach that does not fix BRCM issues, so de-duplicating: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1385527
,
Dec 21
Am I confused or does this failure actually have nothing to do with the client (BRCM or Marvell or whatever): 10/29 03:10:34.567 DEBUG| ssh_host:0310| Running (ssh) '/bin/ip link del veth0' from 'execute|_call_run_once|run_once|delete_link|run|run_very_slowly' 10/29 03:10:35.482 DEBUG| utils:0287| [stderr] Cannot find device "veth0" 10/29 03:10:35.493 DEBUG| ssh_host:0310| Running (ssh) '/bin/ip link del veth1' from 'execute|_call_run_once|run_once|delete_link|run|run_very_slowly' 10/29 03:10:36.478 DEBUG| utils:0287| [stderr] Cannot find device "veth1" This is referring to the virtual interfaces on the AP side.
,
Dec 21
I think the log snippet pasted above was due to test flakiness. There is an issue that has to do with BRCM and also one that has to do with the signal/est throughput.
,
Dec 21
#13: That's not flakiness, it's just a harmless failure in our just-to-be-safe cleanup, IIUC:
# Cleanup veth interfaces from previous runs
self.context.router.delete_link(self.veth0)
self.context.router.delete_link(self.veth1)
^^ Those lines are a defensive check, in case the cleanup() doesn't run for some reason.
---
The only reason I mention BRCM is that I initially called this a duplicate of bug 907656 . But it looks like Matthew is going to solve these two issues differently.
,
Dec 21
Ah right, my mistake.
,
Dec 21
(And my mistake: I mean #3, not #13)
,
Dec 28
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/623de6da873ccd9a556d06bfce069a91dc1a02f4 commit 623de6da873ccd9a556d06bfce069a91dc1a02f4 Author: Matthew Wang <matthewmwang@chromium.org> Date: Fri Dec 28 02:35:56 2018 network_WiFi_RoamFT: Change second AP to use 11n with HT40 This results in the second AP having a much higher estimated throughput than the first AP, reliably causing a roam. BUG=chromium:900453 TEST=Ran on a few devices currently failing the autotest with reason "Failed to roam" and they pass. Ran on caroline at my desk and it still passes. Change-Id: Ic00c5eef0e6a28ebc4c1e2d519e87e76951cf73a Reviewed-on: https://chromium-review.googlesource.com/1385527 Commit-Ready: Matthew Wang <matthewmwang@chromium.org> Tested-by: Matthew Wang <matthewmwang@chromium.org> Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> [modify] https://crrev.com/623de6da873ccd9a556d06bfce069a91dc1a02f4/server/site_tests/network_WiFi_RoamFT/network_WiFi_RoamFT.py
,
Jan 2
"Failed to roam" failures are significantly reduced, but we still have several "failed to roam" failures beyond 11482.0.0, where the CL landed: https://stainless.corp.google.com/search?view=matrix&row=board_model&col=build&first_date=2018-12-28&last_date=2019-01-02&test=network_WiFi_RoamFT.mixedEAP&build=R73&status=GOOD&status=FAIL&status=ERROR&status=ABORT&reason=Failed+to+roam.&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false The veyron_{mickey,minnie,speedy} failures are all BRCM (known issue 907656 ). But cyan and wizpig are still failing regularly.
,
Jan 2
I had a look and all of the non-BRCM "failed to roam" failures are for some reason still running the old version of the test (the second AP is still on channel 48 whereas it should now be on channel 157). Any idea why this is happening?
,
Jan 8
It's back to the "correct" channels now, but still failing on cyan, wizpig, and maybe elm. I do see pretty large signal discrepancies (20dBm?) between 2.4G and 5G APs for these runs. Is that expected?
,
Jan 8
It looks like those signal discrepancies is the reason the remaining tests are failing. I'm not sure why that's happening though. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by briannorris@chromium.org
, Nov 27Status: Duplicate (was: Untriaged)