network_WiFi_MalformedProbeResp failing with "Client failed to receive probe responses" |
||||||
Issue descriptionThe following command is failing 06/28 04:11:19.704 DEBUG| ssh_host:0301| Running (ssh) '(time -p /usr/sbin/iw dev mlan0 scan freq 2412) 2>&1' from '_call_run_once|run_once|scan|timed_scan|run|run_very_slowly' 06/28 04:11:20.081 DEBUG| utils:0286| [stdout] command failed: Device or resource busy (-16) >> I can try if adding sleep before scanning fixes the issue. So, bss_list is None and iterating through a None object throws an error. Also we need to raise a test fail exception if bss list empty.
,
Dec 1
That failure message isn't very helpful. We should at least fix that so that a better error message like "Client failed to receive probe responses" can be reported. https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1357250
,
Dec 1
,
Dec 1
I think you've got this one :) As noted on your CL, the problem is that we're using 'iw' for scans, while supplicant owns the device. That's a problem: 11/30 12:07:30.944 DEBUG| utils:0287| [stdout] command failed: Device or resource busy (-16) But the way the test is written, it will still keep trying over some time interval, which should hopefully be enough to "eventually" get a valid scan result. We might want to improve the test to ask shill/supplicant for scans instead, but I think we have that problem elsewhere too.
,
Dec 1
,
Dec 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/900aaff6d54d19f35e7439e46335cdf2308a8aa9 commit 900aaff6d54d19f35e7439e46335cdf2308a8aa9 Author: Ben Chan <benchan@chromium.org> Date: Sat Dec 01 13:19:29 2018 network_WiFi_MalformedProbeResp: fix an unhandled exception The following failure message, triggered by an unhandled exception, isn't very helpful: File "/usr/local/autotest/server/site_tests/network_WiFi_MalformedProbeResp/network_WiFi_MalformedProbeResp.py", line 63, in run_once for bss in bss_list: TypeError: 'NoneType' object is not iterable This CL updates network_WiFi_MalformedProbeResp to handle the absence of scan result more gracefully, so that a better error message like "Client failed to receive probe responses" can be reported instead. BUG=chromium:868503 TEST=Run network_WiFi_MalformedProbeResp. Change-Id: If2364d3d5417fcb5900abfd85593777a61b594e7 Reviewed-on: https://chromium-review.googlesource.com/1357250 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Ben Chan <benchan@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> [modify] https://crrev.com/900aaff6d54d19f35e7439e46335cdf2308a8aa9/server/site_tests/network_WiFi_MalformedProbeResp/network_WiFi_MalformedProbeResp.py
,
Dec 3
From the first comment: > So, bss_list is None and iterating through a None object throws an error. Also we need to raise a test fail exception if bss list empty. We don't necessarily need an exception for this case -- we retry under a timeout, so it's still possible we pick up good scan results later. Ben fixed that, so we can actually keep retrying. But so far, that isn't really helping. With the fix, we now have a few results that show: TEST_FAIL: Client failed to receive probe responses So it remains to be seen if fixing the supplicant vs. iw scanning issues will fix the problem (e.g., perhaps we're never completing *any* iw scans), or else maybe there's a true bug on these devices, such that they're not picking up these probes at all.
,
Dec 3
Looking a little closer, this does indeed seem to simply be a test bug. On one of the failures I mentioned in comment #7, it looks like we were unlucky enough to see errno -16 (busy) on all 6 scan attempts, but in the meantime, net.log shows that we *did* pick up the probe frames: DEBUG wpa_supplicant[769]: mlan0: BSS: Add new id 15 BSSID xx:xx:xx:xx:xx:xx SSID 'TestingProbes00000000' freq 2412 So if we just fix up the -EBUSY race behaviors, we should be OK.
,
Dec 5
Still happening plenty, but now we just get a proper error ("Client failed to receive probe responses"). I looked a little bit, and
1. It's not very reproducible on demand, even when I lock these same devices down
2. This test is always scheduled after network_WiFi_SuspendStress.Hidden, at least on these devices
3. #2 very often ABORTs after an hour
So maybe it's influenced by a lack of cleanup from previous tests too?
Anyway, I think we have more or less the idea on how to solve this: either get shill to stop scanning, or else direct our scans through shill/supplicant instead.
I don't think Kirtika was clearly signed up for doing the above -- her first attempt wasn't quite right, anyway. So I'll put this back as Available.
On a side note: I'll probably file a bug for #3 if there isn't one already. We're wasting a lot of lab time if those flaky tests are really taking over an hour just to do ~15 suspend/resume cycles. It seems like some quirky SSH behavior I think, where we hang for several minutes (sometimes 5-15 minutes) in the "going to suspend" phase, even if the device still managed to go through sleep/wake on its normal time.
,
Jan 15
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by harpreet@chromium.org
, Aug 9Labels: -Pri-3 Pri-2
For posterity here is the trace... 08/09 11:38:55.253 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled TypeError: 'NoneType' object is not iterable Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/network_WiFi_MalformedProbeResp/network_WiFi_MalformedProbeResp.py", line 63, in run_once for bss in bss_list: TypeError: 'NoneType' object is not iterable There is already a 10sec sleep in between scans. SCAN_LOOP_SLEEP_SEC = 10 . . . while time.time() - start_time < self.SCAN_LOOP_SEC: bss_list = self.context.client.iw_runner.scan( self.context.client.wifi_if, [2412]) for bss in bss_list: logging.debug('found bss: %s', bss.ssid) if bss.ssid == 'TestingProbes00000000': rx_probe_resp_count += 1 time.sleep(self.SCAN_LOOP_SLEEP_SEC) Looks like this is mainly an issue with veyron devices. https://stainless.corp.google.com/search?view=matrix&row=model&col=build&test=%5Enetwork%5C_WiFi%5C_MalformedProbeResp%24&status=WARN&status=FAIL&status=ERROR&status=ABORT&status=ALERT&reason=Unhandled+TypeError&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=30