Nyan devices failing network_WiFi_SuspendStress.Hidden testcase with "FAIL(Discovery timed out)" error on channel 36/48 |
|||
Issue descriptiondevices nyan_big, nyan_blaze and nyan_kitty failing on channel 36 or 48, passes on channel 6 always. The same devices pass the following testcase: network_WiFi_SimpleConnect.wifi_checkHidden without any issues using same channels 6, 36 and 48. hostnames: nyan_big -> chromeos2-row11-rack5-host4 (fails only on channel 36) nyan_blaze -> chromeos2-row11-rack2-host1 (fails only on channel 36) nyan_kitty -> chromeos15-row2-rack2-host2(fails on channel 36 or 48) Logs@ https://stainless.corp.google.com/search?view=matrix&row=build&col=model&first_date=2018-05-27&last_date=2018-06-30&test=network_WiFi_SuspendStress.Hidden&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Sample failure: 06/30 09:55:07.704 DEBUG| utils:0218| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22' -az --no-o --no-g root@chromeos15-row2-rack2-host2:"/tmp/autoserv-4P8Vtv/sysinfo.pickle" "/tmp/tmpeSiKiA"' 06/30 09:55:08.134 DEBUG| test:0420| after_iteration_hooks completed 06/30 09:55:08.134 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 116, in run_once self.context.assert_connect_wifi(assoc_params) File "/usr/local/autotest/server/cros/network/wifi_test_context_manager.py", line 273, in assert_connect_wifi connect_name, assoc_result.failure_reason)) TestFail: Expected connection to SuspendStress_b_k1lqu_ch36 to succeed, but it failed with reason: FAIL(Discovery timed out). 06/30 09:55:08.139 DEBUG| test:0642| Running cleanup for test. 06/30 09:55:08.140 DEBUG| logging_manager:0627| Logging subprocess finished 06/30 09:55:08.144 DEBUG| logging_manager:0627| Logging subprocess finished
,
Nov 8
I believe Qualcomm is seeing this too, and they've actually done a bit of debugging: https://issuetracker.google.com/118664967 It looks like we're still trying to connect to the old (no-longer running) channel 6 AP when we're supposed to be switching to a new channel 36 AP. (Or, the same with 36 -> 48 -- this test runs a loop across {6,36,48}.) This is a combination of the fact that hidden scan results don't immediately get evicted -- they hang around for a little while -- and that shill/supplicant haven't been told to kill the old service. So supplicant is busy telling us to reconnect (fruitlessly), sucking up our time, and preventing us from scanning promptly for the next AP. This understandably causes timeouts. I'm cooking up a fix to clear out the old Service in between each loop iteration, which should hopefully avoid this problem.
,
Nov 8
Issue 525139 has been merged into this issue.
,
Nov 13
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/06b59ffd4639ec9038d65101bb7c7532355476e6 commit 06b59ffd4639ec9038d65101bb7c7532355476e6 Author: Brian Norris <briannorris@chromium.org> Date: Tue Nov 13 19:57:53 2018 [autotest] network_WiFi_SuspendStress: clear networks between configs Scan results may remain listed by a driver for some time after the last probe response we've seen (e.g., expiry time can be around 30 seconds). In this test, we run multiple configurations (e.g., AP channels) back-to-back, without clearing any of the shill profile information or wpa_supplicant configuration. Together, this means that as we start a second iteration of this loop, sometimes (a) the AP goes down (setting up the next hostapd instance tears down the old one) (b) DUT disconnects (c) new AP (new channel) starts up (d) DUT hasn't found new AP yet, but still has old results -- wpa_supplicant tries to connect (shill never told it not to) (e) DUT wastes a lot of time on associating to an AP that doesn't exist, and it can't perform additional scans in the meantime (f) concurrently with (d) and (e), the test is sending a Connect request to shill; this connection times out in the discovery phase If we force a disconnect and clear out the shill profile before we start the next loop, then we won't be affected by old profile results. BUG= chromium:869205 , b:118664967 TEST=`test_that ... network_WiFi_SuspendStress.Hidden` on a few devices Change-Id: I5e0b66beea99a1f37f81577050639e5799862ca4 Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/c/1327510 Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/06b59ffd4639ec9038d65101bb7c7532355476e6/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py
,
Nov 13
|
|||
►
Sign in to add a comment |
|||
Comment 1 by harpreet@chromium.org
, Aug 10Labels: -Pri-3 Pri-2