network_WiFi_MaskedBSSID.wifi_masked_bssid test failing with "SSID CrOS_Masked{0,1} is not in scan results" error |
||||||||||
Issue descriptionLogs@ https://stainless.corp.google.com/search?view=matrix&row=build&col=model&first_date=2018-05-31&last_date=2018-06-27&test=network_WiFi_MaskedBSSID.wifi_masked_bssid&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Sample failure: 06/24 21:11:12.621 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/network_WiFi_MaskedBSSID/network_WiFi_MaskedBSSID.py", line 42, in run_once [config.ssid for config in configurations]) File "/usr/local/autotest/server/cros/network/wifi_client.py", line 586, in scan self.assert_bsses_include_ssids(bss_list, ssids) File "/usr/local/autotest/server/cros/network/wifi_client.py", line 280, in assert_bsses_include_ssids (ssid, found_bsses)) TestFail: SSID CrOS_Masked1 is not in scan results: [IwBss(bss='00:11:22:33:44:55', frequency=2412, ssid='CrOS_Masked0', security='open', ht=None, signal=-35.0)] devices seeing this issue: The following devices always failt this testcase, astronaut -> chromeos15-row1-rack2-host6 gandof -> chromeos15-row1-rack8-host1 lulu -> chromeos15-row1-rack5-host4 and some devices fail intermittently, edgar -> chromeos15-row1-rack10-host2 orco -> chromeos15-row1-rack9-host2 wizpig-> chromeos15-row1-rack6-host2
,
Aug 1
I'm not sure the highlighted snippets capture what's really going wrong. It's pretty suspicious that the problem is consistently with the 2nd SSID, on phy2. I'm suspecting that phy2 is not actually a good radio to use on the Whirlwind. I'm seeing some related problems on Gale, since it doesn't have that 3rd phy (it only has a 2GHz phy0 and a 5GHz phy1), and once I rewrite the test to avoid using the 3rd radio (which I already did locally for Gale), it passes consistently on that astronaut. BTW, I believe network_WiFi_VerifyRouter isn't actually verifying phy2 on Whirlwind -- it seems to only test the first two radios (despite the comments that suggest otherwise). If I'm correct, we need to take a closer look at our lab verification tests, as well as our use of phy2 on Whirlwind. I have a few changes related to testing out Gale.
,
Aug 1
Brian, I am suspicious of any TX use of phy2. Historically (currently?), phy2 was only used to scan and measure noise of other channels while phy0/phy1 were in normal use. The idea was a third "ear" would allow the cloud to figure out the best channels (one in each band) for a given whirlwind to be using.
,
Aug 1
(This was mostly written before comment #3)
So I'm pretty sure it's just that phy2 isn't very useful on these routers. I'm not sure if it's a defective assembly, an inherent issue with the 3rd radio on Whirlwind, or a little of both. But if I do the following in network_WiFi_VerifyRouter, then it will actually verify the 3rd radio in AP mode again:
diff --git a/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py b/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py
index 07a09bdabe69..60a34aeb0c1a 100644
--- a/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py
+++ b/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py
@@ -66,7 +66,7 @@ class network_WiFi_VerifyRouter(wifi_cell_test_base.WiFiCellTestBase):
# Setup two APs on |channel|. configure() will spread these across
# radios.
n_mode = hostap_config.HostapConfig.MODE_11N_MIXED
- ap_config = hostap_config.HostapConfig(channel=channel, mode=n_mode)
+ ap_config = hostap_config.HostapConfig(channel=channel, mode=n_mode, min_streams=1)
self.context.configure(ap_config)
self.context.configure(ap_config, multi_interface=True)
failures = []
and it passes on this device (where $subject was already passing):
chromeos15-row4-rack12-host4
but it fails on this, where $subject always fails:
chromeos15-row1-rack2-host6
---
EDIT, after comment #3:
I suppose it makes sense to avoid this radio as an AP then. We should still probably validate it as a monitor though, since we might end up using it as such (especially if we don't install a separate pcap device). I'm not sure the best way to do that though...maybe just keep the above change, to validate in AP mode?
I'm going to assign to Harpreet for now, to audit the lab for badly-installed Whirwlinds, in case this is partially a lab issue.
,
Aug 1
,
Aug 1
I've got some work in progress for utilizing the radios differently on this test, so I'll take $subject. I filed bug 870042 to track lab verification, and assigned *that* to Harpreet.
,
Aug 1
I think this is another symptom of the same problem: https://stainless.corp.google.com/search?view=matrix&row=build&col=hostname&first_date=2018-07-26&last_date=2018-08-01&test=%5Enetwork%5C_WiFi%5C_BgscanBackoff%5C.&reason=%5EBackground+scans+should+detect+new+BSSeswithin+an+associated+ESS.&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false All of these network_WiFi_BgscanBackoff runs use phy2 on Whirlwind, and the test is failing with "Background scans should detect new BSSeswithin an associated ESS". That's probably because phy2 is not properly (?) broadcasting the new BSS.
,
Aug 2
Re #4, chromeos15-row4-rack12-host4 is a OTA setups where whirlwind in use is not taken apart whereas chromeos15-row1-rack2-host6 is a conductive setup where the whirlwind was disassembled and antennas connected are as shown in the picture at the link below. We only connect 2.4Ghz (phy0) and 5Ghz (phy1) antennas and do not connect the aux-radio which is what phy2 seems to be. https://screenshot.googleplex.com/1fJgPBpNQV2 Here are more details about the conductive setup https://docs.google.com/document/d/1-mI6OIUgZhCcaprc9HFYwa5UzMA0twNFk9ccWoK4GJI/edit# Given the above, network_WiFi_MaskedBSSID test still does pass on approx half of the conductive setups (anything in chromeos15-row1 racks 1 to 11 - see stainless link below). Does that mean it is able to get some signal on phy2 over-the-air or that it maybe using a different (phy0 or phy1) interface in those cases? https://stainless.corp.google.com/search?view=matrix&row=hostname&col=build&test=network_WiFi_MaskedBSSID.wifi_masked_bssid&hostname=%5Echromeos15-row1-&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=15
,
Aug 2
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/042f9b2be03c04e57959a94aa64d1e059086191f commit 042f9b2be03c04e57959a94aa64d1e059086191f Author: Brian Norris <briannorris@chromium.org> Date: Thu Aug 02 20:53:47 2018 network_WiFi_VerifyRouter: verify all router radios Whirlwind has a 3rd radio (phy2) with a single antenna. The current test skips this radio, because it can only support a single spatial stream, and our defaults look for a minimum of 2. Lower the minimum for this test, so we pick up the radio still. BUG=chromium:866181 TEST=network_WiFi_VerifyRouter -- fails on routers where phy2 isn't working properly for whatever reason Change-Id: Ib24336986e8bb49698050d8987aa69093ec316a8 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1158796 Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/042f9b2be03c04e57959a94aa64d1e059086191f/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py
,
Aug 2
> Does that mean it is able to get some signal on phy2 over-the-air or that it maybe using a different (phy0 or phy1) interface in those cases? It's just getting an extremely weak signal over the air, but it's enough to at least register a scan. (That's all this is looking for -- it doesn't need to associate.) See one log: 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] BSS 00:11:22:33:44:55(on wlan0) 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] TSF: 4505984 usec (0d, 00:00:04) 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] freq: 2412 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] beacon interval: 100 TUs 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] capability: ESS (0x0001) 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] signal: -98.00 dBm 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] last seen: 101 ms ago 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] Information elements from Probe Response frame: 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] SSID: CrOS_Masked1 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] Supported rates: 1.0* 2.0* 5.5 11.0 07/28 16:40:29.314 DEBUG| utils:0287| [stdout] DS Parameter set: channel 1 07/28 16:40:29.315 DEBUG| utils:0287| [stdout] TIM: DTIM Count 0 DTIM Period 2 Bitmap Control 0x0 Bitmap[0] 0x0 07/28 16:40:29.315 DEBUG| utils:0287| [stdout] Extended capabilities: Extended Channel Switching, SSID List, 6 And the hostapd log is clearly showing it on phy2: 1532821223.101726: nl80211: interface managed1 in phy phy2 https://stainless.corp.google.com/browse/chromeos-autotest-results/221669331-chromeos-test/ https://storage.cloud.google.com/chromeos-autotest-results/221669331-chromeos-test/chromeos15-row1-rack1-host3/network_WiFi_MaskedBSSID/debug/hostapd_router_1_managed1.log > We only connect 2.4Ghz (phy0) and 5Ghz (phy1) antennas and do not connect the aux-radio which is what phy2 seems to be. OK. Then I don't know what's up with stuff like this: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/321917 https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/283621 We should probably rip out any of that stuff that allows using the 1x1 AUX radio. I've partially done that, but we should do that 100% if we don't expect this radio to be hooked up. (And I should probably also just revert comment #9 too.)
,
Aug 3
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1e1bc01176164206941956a673a740343b8e7b89 commit 1e1bc01176164206941956a673a740343b8e7b89 Author: Brian Norris <briannorris@chromium.org> Date: Fri Aug 03 04:50:00 2018 [autotest] network_WiFi_MaskedBSSID: stop requiring MULTI_AP_SAME_BAND This test has some strange requirements: it wants to set up an illegal configuration, with 2 BSS's using the same BSSID, to imitate some broken routers in the field. The Linux mac80211 framework doesn't accept this, returning -ENOTUNIQ instead, so this doesn't work when you run both of these BSS's on the same radio. This all worked OK on APs that had more than 1 radio for each band (so you work around Linux's per-interface BSSID restriction), but it doesn't work on Gale, where we force the 2 BSS's onto the same radio. We can work around this by just switching this test to put the two incompatible BSS's on separate bands (2.4GHz / 5GHz), and then drop the CAPABILITY_MULTI_AP_SAME_BAND requirement. This should also fix some issues seen on some Whirlwind routers, where the 3rd radio (phy2) wasn't operating reliably. We may try to avoid using this radio entirely in the future, so this is a good start. BUG=chromium:774808, chromium:866181 TEST=network_WiFi_MaskedBSSID.wifi_masked_bssid on Gale Change-Id: Idab9c56f42ad426a0f6b323e49539699679cd2d4 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1159461 Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/1e1bc01176164206941956a673a740343b8e7b89/server/site_tests/network_WiFi_MaskedBSSID/network_WiFi_MaskedBSSID.py
,
Aug 3
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/1a7644e393fbb03980c495524b0aaee9afdb1d6d commit 1a7644e393fbb03980c495524b0aaee9afdb1d6d Author: Brian Norris <briannorris@chromium.org> Date: Fri Aug 03 04:50:01 2018 [autotest] network_WiFi_BgscanBackoff: straighten out router requirements As written today, this test requires CAPABILITY_MULTI_AP_SAME_BAND but doesn't declare it. The test starts up multiple BSS's at distinct frequencies, which can't be served by a single radio. For the 'wifi_bgscan_backoff' test, this isn't really required; the test can just as well be run on separate 2G vs. 5G bands. For the '5760noise_check' variant, we explicitly wanted to test two 5GHz channels. This isn't possible on Gale, so let's add a capability check so this test gets a TEST_NA result. As a related effect, this also should move the .wifi_bgscan_backoff variant to avoid running on phy2 on Whirlwind, which can help avoid some flakiness. Whirlwind's phy2 is not known to be a reliable transmitter, and we may stop using it entirely soon. BUG=chromium:774808, chromium:866181 TEST=network_WiFi_BgscanBackoff.wifi_bgscan_backoff and network_WiFi_BgscanBackoff.5760_noise_check with gale; the former now passes, and the latter gets TEST_NA Change-Id: I9f6d7ea0dba86d84aaa8cbc8ca236baf8fbdf92b Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1159462 Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/1a7644e393fbb03980c495524b0aaee9afdb1d6d/server/site_tests/network_WiFi_BgscanBackoff/network_WiFi_BgscanBackoff.py [modify] https://crrev.com/1a7644e393fbb03980c495524b0aaee9afdb1d6d/server/site_tests/network_WiFi_BgscanBackoff/control.5760noise_check [modify] https://crrev.com/1a7644e393fbb03980c495524b0aaee9afdb1d6d/server/site_tests/network_WiFi_BgscanBackoff/control.wifi_bgscan_backoff
,
Aug 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/66c7d49d0456ba37ad68963872a80f64a65ece2e commit 66c7d49d0456ba37ad68963872a80f64a65ece2e Author: Brian Norris <briannorris@chromium.org> Date: Tue Aug 07 08:50:35 2018 Revert "network_WiFi_VerifyRouter: verify all router radios" This reverts commit 042f9b2be03c04e57959a94aa64d1e059086191f and adjusts some related comments (that were previously incorrect). It turns out we *don't* want to use the 3rd Whirlwind radio as an AP, and it's often not even connected in conductive setups. So don't try to verify it. BUG=chromium:866181 TEST=network_WiFi_VerifyRouter -- see that it doesn't pick up phy2 on whirlwind Change-Id: I5f1ce02be49c192f422b8f2a2dc19a59547358fc Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1162982 [modify] https://crrev.com/66c7d49d0456ba37ad68963872a80f64a65ece2e/server/site_tests/network_WiFi_VerifyRouter/network_WiFi_VerifyRouter.py
,
Aug 14
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/e942af85c9cd6d05b468e41be9bea1fd26eddd97 commit e942af85c9cd6d05b468e41be9bea1fd26eddd97 Author: Brian Norris <briannorris@chromium.org> Date: Tue Aug 14 23:05:20 2018 [autotest] network_WiFi_BgscanBackoff: stop using Whirlwind's phy2 The 5760noise_check variant of this test was requesting 1 spatial stream, so that it could run on Whirlwind's phy2. Don't do this, because phy2 is not normally used in production as a transmitter, and because our lab conductive setups don't usually wire up its antennas. As an effect of this, we can't support 2 simultaneous channels on the 5 GHz band. Just use different channels from the 2.4 and 5 GHz bands. Per Kirtika's suggestion, I make the 5760noise_check variant roughly comparable to the wifi_bgscan_backoff variant, so that we can see whether channel 153 (a known noisy channel) behaves significantly differently than a known less-noisy 5GHz channel. BUG=chromium:774808, chromium:866181 TEST=network_WiFi_BgscanBackoff.wifi_bgscan_backoff and network_WiFi_BgscanBackoff.5760_noise_check with gale; both now pass; run w/ whirlwind, and see we avoid phy2 Change-Id: I22227cee072d362ceb00bda39876717a374a1d44 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1162983 Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/e942af85c9cd6d05b468e41be9bea1fd26eddd97/server/site_tests/network_WiFi_BgscanBackoff/network_WiFi_BgscanBackoff.py [modify] https://crrev.com/e942af85c9cd6d05b468e41be9bea1fd26eddd97/server/site_tests/network_WiFi_BgscanBackoff/control.5760noise_check [modify] https://crrev.com/e942af85c9cd6d05b468e41be9bea1fd26eddd97/server/site_tests/network_WiFi_BgscanBackoff/control.wifi_bgscan_backoff
,
Aug 15
OK, I think this should all be fixed. Stainless shows that there's very little red here now.
,
Oct 10
Closing this as verified. No longer see failure with the "SSID CrOS_Masked1 is not in scan results" error https://stainless.corp.google.com/search?view=matrix&row=board_model&col=build&first_date=2018-10-04&last_date=2018-10-10&test=network_WiFi_MaskedBSSID.wifi_masked_bssid&build=R70%7CR71&status=GOOD&status=WARN&status=FAIL&status=ERROR&status=TEST_NA&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Oct 10
Re-opening as this is failing on cyan and auron_paine. https://stainless.corp.google.com/search?view=list&first_date=2018-10-04&last_date=2018-10-10&test=network_WiFi_MaskedBSSID.wifi_masked_bssid&build=R70%7CR71&status=WARN&status=FAIL&status=ERROR&status=TEST_NA&reason=SSID+CrOS_Masked&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Oct 11
Well, the previous bug was specifically about Masked1, which was caused by the way we incorrectly set up the 2nd BSS. But I'll rename the bug and maybe look at it. BTW, anyone know why network_WiFi_VerifyRouter is barely running at all in the lab even though we scheduled it? There's a whole lot of NOT_RUN.
,
Oct 15
BTW, anyone know why network_WiFi_VerifyRouter is barely running at all in the lab even though we scheduled it? There's a whole lot of NOT_RUN. >> I am suspecting it is due to high load. We run wifi_matfunc (on beta and stable), and wifi_end_to_end suite (on beta and stable) along with nightly tot runs for wifi_matfunc, perf and end to end suites on day 3 of the week. wifi_update_router runs on day 4 of the week, so they are plausibly getting timed out. https://stainless.corp.google.com/search?view=matrix&row=build&col=queued_date&first_date=2018-08-18&last_date=2018-10-15&suite=wifi_update_router&test=network_WiFi_VerifyRouter&status=GOOD&status=WARN&status=FAIL&status=ERROR&status=ABORT&status=ALERT&status=RUNNING&status=TEST_NA&status=NOSTATUS&status=NOT_RUN&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true Suite_scheduler link below, https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?q=suite_sched&g=0&l=1 |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by harpreet@chromium.org
, Jul 25