[Wifi_Matfunc] Wifi_SuspendStress tests are failing on (Teemo & Kench) with "Connection refused" error |
||||||||||
Issue descriptionWifi_SuspendStress tests fail with the below error message: "failing with Unhandled error: [Errno 111] Connection refused" Stainless link - https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2017-10-01&last_date=2018-04-02&test=%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11a%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11b%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11g%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.24HT40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.Hidden%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WEP40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WPA2%24&board=fizz&status=GOOD&status=WARN&status=FAIL&status=ERROR&status=ABORT&status=ALERT&status=RUNNING&status=TEST_NA&status=NOSTATUS&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Error snapshot: 03/27 20:19:35.740 ERROR| threading:0823| Exception in thread Thread-1: 03/27 20:19:35.741 ERROR| threading:0823| Traceback (most recent call last): 03/27 20:19:35.741 ERROR| threading:0823| File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner 03/27 20:19:35.742 ERROR| threading:0823| self.run() 03/27 20:19:35.743 ERROR| threading:0823| File "/mnt/host/source/src/third_party/autotest/files/server/cros/stress.py", line 61, in run 03/27 20:19:35.743 ERROR| threading:0823| self._loop_stressor() 03/27 20:19:35.744 ERROR| threading:0823| File "/mnt/host/source/src/third_party/autotest/files/server/cros/stress.py", line 183, in _loop_stressor 03/27 20:19:35.745 ERROR| threading:0823| self.stressor() 03/27 20:19:35.745 ERROR| threading:0823| File "/mnt/host/source/src/third_party/autotest/files/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend 03/27 20:19:35.746 ERROR| threading:0823| self.stress_wifi_suspend(try_servo=False) 03/27 20:19:35.747 ERROR| threading:0823| File "/mnt/host/source/src/third_party/autotest/files/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend 03/27 20:19:35.747 ERROR| threading:0823| self.context.client.do_suspend(_DELAY) 03/27 20:19:35.748 ERROR| threading:0823| File "/mnt/host/source/src/third_party/autotest/files/server/cros/network/wifi_client.py", line 734, in do_suspend 03/27 20:19:35.749 ERROR| threading:0823| self._shill_proxy.do_suspend(seconds) 03/27 20:19:35.749 ERROR| threading:0823| File "/usr/lib64/python2.7/xmlrpclib.py", line 1240, in __call__ 03/27 20:19:35.750 ERROR| threading:0823| return self.__send(self.__name, args) 03/27 20:19:35.750 ERROR| threading:0823| File "/usr/lib64/python2.7/xmlrpclib.py", line 1599, in __request 03/27 20:19:35.751 ERROR| threading:0823| verbose=self.__verbose 03/27 20:19:35.752 ERROR| threading:0823| File "/usr/lib64/python2.7/xmlrpclib.py", line 1280, in request 03/27 20:19:35.752 ERROR| threading:0823| return self.single_request(host, handler, request_body, verbose) 03/27 20:19:35.753 ERROR| threading:0823| File "/usr/lib64/python2.7/xmlrpclib.py", line 1308, in single_request 03/27 20:19:35.753 ERROR| threading:0823| self.send_content(h, request_body) 03/27 20:19:35.754 ERROR| threading:0823| File "/usr/lib64/python2.7/xmlrpclib.py", line 1456, in send_content 03/27 20:19:35.754 ERROR| threading:0823| connection.endheaders(request_body) 03/27 20:19:35.755 ERROR| threading:0823| File "/usr/lib64/python2.7/httplib.py", line 1049, in endheaders 03/27 20:19:35.756 ERROR| threading:0823| self._send_output(message_body) 03/27 20:19:35.756 ERROR| threading:0823| File "/usr/lib64/python2.7/httplib.py", line 893, in _send_output 03/27 20:19:35.757 ERROR| threading:0823| self.send(msg) 03/27 20:19:35.757 ERROR| threading:0823| File "/usr/lib64/python2.7/httplib.py", line 855, in send 03/27 20:19:35.758 ERROR| threading:0823| self.connect() 03/27 20:19:35.758 ERROR| threading:0823| File "/usr/lib64/python2.7/httplib.py", line 832, in connect 03/27 20:19:35.759 ERROR| threading:0823| self.timeout, self.source_address) 03/27 20:19:35.759 ERROR| threading:0823| File "/usr/lib64/python2.7/socket.py", line 575, in create_connection 03/27 20:19:35.760 ERROR| threading:0823| raise err 03/27 20:19:35.761 ERROR| threading:0823| error: [Errno 111] Connection refused 03/27 20:19:35.761 ERROR| threading:0823| 03/27 20:19:35.762 DEBUG| test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.
,
Apr 3 2018
,
Apr 4 2018
,
Apr 4 2018
(Random mumbling): I couldn't figure out when was this last passing (I checked for the last 3 months), was this test ever passing? Secondly, is this test only run on this DUT (because may be this is places in some wifi shielded cell)?
,
Apr 4 2018
BTW, I ran suspend_stress_test a couple of times on this DUT, and there were instances where it did not come back (as there were when it did come back). So yeah, there is some issue.
,
Apr 4 2018
I believe that this is the only DUT running the test. If suspend/resume is known to be reliable on Fizz, then there's probably some DUT-specific problem.
,
Apr 4 2018
igo@ Can we provide another DUT to the lab team to swap this one out? Investigating this DUT may be easier on the desk.
,
Apr 4 2018
The suspend resume tests never passed on fizz devices. Those devices have not been in the lab that long. From the link below, various suspend/resume tests seems to be failing pretty consistently on celes, kench, monroe, nyan_kitty, temmo... https://stainless.corp.google.com/search?view=matrix&row=model&col=build&first_date=2018-03-08&last_date=2018-04-04&test=%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11a%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11b%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11g%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.24HT40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.Hidden%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WEP40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WPA2%24&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Apr 4 2018
[+benzh] Hi Ben, I assumed suspend / resume is quite stable on Fizz?
,
Apr 18 2018
WiFi Suspend Stress fails on various boards (bob, celes, eve, kench, kevin, teemo, tidus, mighty) consistently due to "Unhandled error: [Errno 111] Connection refused" https://stainless.corp.google.com/search?view=matrix&row=model&col=test&first_date=2018-03-21&last_date=2018-04-17&suite=wifi_matfunc&status=FAIL&reason=Connection+refused&exclude_cts=true&exclude_not_run=true&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Apr 18 2018
,
Apr 18 2018
(Assigning to Kirtika as discussed)
,
Apr 18 2018
Sample log:
04/17 10:43:09.097 INFO | site_linux_router:0454| AP configured.
04/17 10:43:09.098 INFO |wifi_test_context_:0255| Connecting to SuspendStress_a_i6ggc_ch1.
04/17 10:43:13.482 DEBUG| xmlrpc_types:0056| Constructing AssociationResult object with args=[], kwargs={'configuration_time': 0.4057610034942627, 'association_time': 0.20685601234436035, 'discovery_time': 3.7171101570129395, 'success': True, 'failure_reason': 'SUCCESS(Connection successful)'}
04/17 10:43:13.482 INFO |wifi_test_context_:0263| Finished connection attempt to SuspendStress_a_i6ggc_ch1 with times: discovery=3.72, association=0.21, configuration=0.41.
04/17 10:43:13.495 DEBUG| ssh_host:0301| Running (ssh) 'iw dev wlan0 info' from 'assert_connect_wifi|wifi_signal_level|signal_level|is_wifi_device|run|run_very_slowly'
04/17 10:43:13.815 DEBUG| utils:0282| [stdout] Interface wlan0
04/17 10:43:13.815 DEBUG| utils:0282| [stdout] ifindex 2
04/17 10:43:13.815 DEBUG| utils:0282| [stdout] wdev 0x1
04/17 10:43:13.815 DEBUG| utils:0282| [stdout] addr f8:59:71:e8:7a:2e
04/17 10:43:13.816 DEBUG| utils:0282| [stdout] type managed
04/17 10:43:13.816 DEBUG| utils:0282| [stdout] wiphy 0
04/17 10:43:13.816 DEBUG| utils:0282| [stdout] channel 1 (2412 MHz), width: 20 MHz (no HT), center1: 2412 MHz
04/17 10:43:13.816 DEBUG| utils:0282| [stdout] txpower 22.00 dBm
04/17 10:43:13.874 DEBUG| ssh_host:0301| Running (ssh) 'iw dev wlan0 link' from 'run_once|assert_connect_wifi|wifi_signal_level|signal_level|run|run_very_slowly'
04/17 10:43:14.203 DEBUG| utils:0282| [stdout] Connected to f4:f2:6d:ce:ac:91 (on wlan0)
04/17 10:43:14.204 DEBUG| utils:0282| [stdout] SSID: SuspendStress_a_i6ggc_ch1
04/17 10:43:14.204 DEBUG| utils:0282| [stdout] freq: 2412
04/17 10:43:14.204 DEBUG| utils:0282| [stdout] RX: 2761 bytes (19 packets)
04/17 10:43:14.204 DEBUG| utils:0282| [stdout] TX: 2429 bytes (17 packets)
04/17 10:43:14.204 DEBUG| utils:0282| [stdout] signal: -6 dBm
04/17 10:43:14.205 DEBUG| utils:0282| [stdout] tx bitrate: 11.0 MBit/s
04/17 10:43:14.205 DEBUG| utils:0282| [stdout]
04/17 10:43:14.205 DEBUG| utils:0282| [stdout] bss flags:
04/17 10:43:14.205 DEBUG| utils:0282| [stdout] dtim period: 2
04/17 10:43:14.205 DEBUG| utils:0282| [stdout] beacon int: 100
04/17 10:43:14.207 INFO |wifi_test_context_:0282| Connected successfully to SuspendStress_a_i6ggc_ch1, signal level: -6.
04/17 10:43:14.207 INFO |network_WiFi_Suspe:0142| Running Non integration Wifi Stress test.
04/17 10:43:14.208 INFO | stress:0182| Stressor iteration: 1 of 5
04/17 10:43:14.212 DEBUG| ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from '_loop_stressor|powerd_non_servo_wifi_suspend|stress_wifi_suspend|get_boot_id|run|run_very_slowly'
04/17 10:43:14.545 DEBUG| utils:0282| [stdout] d44b7e11-ac2b-473d-9534-fffc10c45ae6
04/17 10:43:14.596 INFO | wifi_client:0733| Suspending DUT for 10 seconds...
04/17 11:13:58.114 ERROR| threading:0823| Exception in thread Thread-1:
04/17 11:13:58.115 ERROR| threading:0823| Traceback (most recent call last):
04/17 11:13:58.115 ERROR| threading:0823| File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
04/17 11:13:58.116 ERROR| threading:0823| self.run()
04/17 11:13:58.116 ERROR| threading:0823| File "/usr/local/autotest/server/cros/stress.py", line 61, in run
04/17 11:13:58.116 ERROR| threading:0823| self._loop_stressor()
04/17 11:13:58.116 ERROR| threading:0823| File "/usr/local/autotest/server/cros/stress.py", line 183, in _loop_stressor
04/17 11:13:58.117 ERROR| threading:0823| self.stressor()
04/17 11:13:58.117 ERROR| threading:0823| File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend
04/17 11:13:58.117 ERROR| threading:0823| self.stress_wifi_suspend(try_servo=False)
04/17 11:13:58.118 ERROR| threading:0823| File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend
04/17 11:13:58.118 ERROR| threading:0823| self.context.client.do_suspend(_DELAY)
04/17 11:13:58.118 ERROR| threading:0823| File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend
04/17 11:13:58.119 ERROR| threading:0823| self._shill_proxy.do_suspend(seconds)
04/17 11:13:58.119 ERROR| threading:0823| File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
04/17 11:13:58.119 ERROR| threading:0823| return self.__send(self.__name, args)
04/17 11:13:58.120 ERROR| threading:0823| File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
04/17 11:13:58.120 ERROR| threading:0823| verbose=self.__verbose
04/17 11:13:58.120 ERROR| threading:0823| File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
04/17 11:13:58.121 ERROR| threading:0823| return self.single_request(host, handler, request_body, verbose)
04/17 11:13:58.121 ERROR| threading:0823| File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
04/17 11:13:58.121 ERROR| threading:0823| self.send_content(h, request_body)
04/17 11:13:58.122 ERROR| threading:0823| File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
04/17 11:13:58.122 ERROR| threading:0823| connection.endheaders(request_body)
04/17 11:13:58.122 ERROR| threading:0823| File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
04/17 11:13:58.123 ERROR| threading:0823| self._send_output(message_body)
04/17 11:13:58.123 ERROR| threading:0823| File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
04/17 11:13:58.123 ERROR| threading:0823| self.send(msg)
04/17 11:13:58.124 ERROR| threading:0823| File "/usr/lib/python2.7/httplib.py", line 826, in send
04/17 11:13:58.124 ERROR| threading:0823| self.connect()
04/17 11:13:58.124 ERROR| threading:0823| File "/usr/lib/python2.7/httplib.py", line 807, in connect
04/17 11:13:58.124 ERROR| threading:0823| self.timeout, self.source_address)
04/17 11:13:58.125 ERROR| threading:0823| File "/usr/lib/python2.7/socket.py", line 571, in create_connection
04/17 11:13:58.125 ERROR| threading:0823| raise err
04/17 11:13:58.125 ERROR| threading:0823| error: [Errno 111] Connection refused
04/17 11:13:58.126 ERROR| threading:0823|
04/17 11:13:58.127 DEBUG| test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.
Here are the complete test and device logs for the above test..https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/192625581-chromeos-test/chromeos15-row4-rack12-host2/
,
Apr 18 2018
Missed some...
04/17 11:13:58.127 DEBUG| test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.
04/17 11:13:58.128 DEBUG| test:0415| Starting after_iteration_hooks for network_WiFi_SuspendStress.11b
04/17 11:13:58.142 DEBUG| ssh_host:0301| Running (ssh) 'test -e "/tmp/sysinfo/autoserv-ozMxUb"' from 'decorated_func|wrapper|_install|path_exists|run|run_very_slowly'
04/17 11:15:01.373 DEBUG| utils:0282| [stderr] ssh: connect to host chromeos15-row4-rack12-host2 port 22: Connection timed out
04/17 11:15:01.376 ERROR| log:0027| post-test iteration server sysinfo error:
04/17 11:15:01.377 ERROR| traceback:0013| Traceback (most recent call last):
04/17 11:15:01.377 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/log.py", line 25, in decorated_func
04/17 11:15:01.378 ERROR| traceback:0013| fn(*args, **dargs)
04/17 11:15:01.378 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 66, in wrapper
04/17 11:15:01.378 ERROR| traceback:0013| host, at, outputdir = self._install()
04/17 11:15:01.379 ERROR| traceback:0013| File "/usr/local/autotest/server/test.py", line 135, in _install
04/17 11:15:01.379 ERROR| traceback:0013| if not autodir or not self.host.path_exists(autodir):
04/17 11:15:01.380 ERROR| traceback:0013| File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 526, in path_exists
04/17 11:15:01.380 ERROR| traceback:0013| ignore_status=True)
04/17 11:15:01.380 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run
04/17 11:15:01.381 ERROR| traceback:0013| return self.run_very_slowly(*args, **kwargs)
04/17 11:15:01.381 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly
04/17 11:15:01.382 ERROR| traceback:0013| ssh_failure_retry_ok)
04/17 11:15:01.382 ERROR| traceback:0013| File "/usr/local/autotest/server/hosts/ssh_host.py", line 254, in _run
04/17 11:15:01.382 ERROR| traceback:0013| raise error.AutoservSSHTimeout("ssh timed out", result)
04/17 11:15:01.388 ERROR| traceback:0013| AutoservSSHTimeout: ('ssh timed out', * Command:
04/17 11:15:01.389 ERROR| traceback:0013| /usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o
04/17 11:15:01.389 ERROR| traceback:0013| UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
04/17 11:15:01.389 ERROR| traceback:0013| ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
04/17 11:15:01.390 ERROR| traceback:0013| -l root -p 22 chromeos15-row4-rack12-host2 "export LIBC_FATAL_STDERR_=1;
04/17 11:15:01.390 ERROR| traceback:0013| if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
04/17 11:15:01.390 ERROR| traceback:0013| \"server[stack::_install|path_exists|run] -> ssh_run(test -e
04/17 11:15:01.391 ERROR| traceback:0013| \\\"/tmp/sysinfo/autoserv-ozMxUb\\\")\";fi; test -e \"/tmp/sysinfo
04/17 11:15:01.391 ERROR| traceback:0013| /autoserv-ozMxUb\""
04/17 11:15:01.391 ERROR| traceback:0013| Exit status: 255
04/17 11:15:01.391 ERROR| traceback:0013| Duration: 63.2169890404
04/17 11:15:01.392 ERROR| traceback:0013|
04/17 11:15:01.392 ERROR| traceback:0013| stderr:
04/17 11:15:01.392 ERROR| traceback:0013| ssh: connect to host chromeos15-row4-rack12-host2 port 22: Connection timed out)
04/17 11:15:01.393 DEBUG| test:0420| after_iteration_hooks completed
04/17 11:15:01.395 WARNI| test:0637| The test failed with the following exception
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function
raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled error: [Errno 111] Connection refused
Traceback (most recent call last):
File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 147, in run_once
stressor.wait()
File "/usr/local/autotest/server/cros/stress.py", line 209, in wait
self.reraise()
File "/usr/local/autotest/server/cros/stress.py", line 61, in run
self._loop_stressor()
File "/usr/local/autotest/server/cros/stress.py", line 183, in _loop_stressor
self.stressor()
File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend
self.stress_wifi_suspend(try_servo=False)
File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend
self.context.client.do_suspend(_DELAY)
File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend
self._shill_proxy.do_suspend(seconds)
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
,
Apr 18 2018
Why are we looking at this as a Wifi issue? Comment #5 confirmed issues on suspend_stress_test (which does not really involve Wifi much). And those latest pastes look very very similar, where we time out waiting for the device to come back after 30 minutes. Similarly, there are plenty of power_SuspendStress failures, across a lot of boards. Filtering just for "Autotest client terminated unexpectedly: DUT rebooted during the test run.": https://stainless.corp.google.com/search?view=matrix&row=model&col=build&first_date=2018-04-12&last_date=2018-04-18&test=%5Epower%5C_SuspendStress%5C..*%24&reason=Autotest+client+terminated+unexpectedly%3A+DUT+rebooted+during+the+test+run.&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Apr 18 2018
Poking at one arbitrary device from the failure list (chromeos15-row1-rack4-host6), I see that the ethernet link often takes a really long time to come up after suspend/resume. e.g.: [ 5659.322415] PM: suspend exit 2018-04-18 20:22:53.495005400 UTC [ 5678.089814] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 or in some cases, it bounces up and down: [ 5173.315217] PM: suspend exit 2018-04-18 20:14:47.467667532 UTC [ 5173.447165] [drm] RC6 on [ 5180.340079] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 [ 5180.595877] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0 [ 5183.923951] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 [ 5184.307765] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0 [ 5187.507698] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 [ 5187.891817] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0 [ 5191.475692] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 [ 5191.859553] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0 [ 5196.513220] i2c_designware i2c_designware.2: i2c_dw_handle_tx_abort: slave address not acknowledged (7bit mode) [ 5199.923257] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1 In fact, I can't even pass power_SuspendStress.bare here. But it's not just because of the loss of SSH connectivity; it looks like the device actively rebooted shortly after a suspend/resume attempt: 2018-04-18T13:30:59.541186-07:00 NOTICE pre-shutdown[8622]: Shutting down for reboot: not-via-powerd For some reason, this tickles me as a 'recover_duts' problem, combined with flaky ethernet.
,
Apr 18 2018
Ah, so the above failure is because power_SuspendStress now is somewhat aggressive about getting the network back up; if it doesn't come up in approximately 10 seconds, we reboot. Perhaps that's too aggressive (I think I'll increase that). At any rate, the ethernet on this particular device is very flaky. It's going in and out even with no apparent involvement from either suspend/resume or recover_duts. After talking with grundler, I see that several of those failing devices seem to have an unsupported ASIX ethernet dongle: chromeos15-row1-rack3-host2:Bus 002 Device 007: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet chromeos15-row1-rack3-host3:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet chromeos15-row1-rack4-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet chromeos15-row1-rack6-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet chromeos15-row1-rack8-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet It's our understanding that all such adapters should be getting replaced. Filed bug 834493 for these. There are likely different problems on some of the others.
,
Apr 18 2018
re comment #16: the rate that the link is bouncing suggests to me the link partner is bouncing the link or there is a flakey cable. That's too quick for recover_duts. re comment #17: AX88179 (GigE) was more ~5x reliable than AX88772 variants (100Mbit). But still not as good as any RTL8153 based USB ethernet dongle on the older kernels that I originally measured this against. So while I'd still like to see the AX88179 dongles replaced, IIRC, the failure to bring a link up was slightly less than 1 in 1000 linkup/down events. So not likely the issue with chromeos15-row1-rack4-host6. My guess is something else is wrong with the link.
,
Apr 19 2018
I see the same sample failure for kevin and bob on R66 and R67 builds. R65 and R68 does not see this issue. https://stainless.corp.google.com/search?view=list&first_date=2018-03-09&last_date=2018-04-19&suite=wifi_matfunc&test=%5Enetwork_WiFi_Reset%24&board=bob%7Ckevin&status=GOOD&status=FAIL&status=ERROR&status=ABORT&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled error: [Errno 111] Connection refused Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/server/site_tests/network_WiFi_Reset/network_WiFi_Reset.py", line 116, in run_once client.do_suspend(self._SUSPEND_DELAY) File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend self._shill_proxy.do_suspend(seconds) File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__ return self.__send(self.__name, args) File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request verbose=self.__verbose File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request self.send_content(h, request_body) File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content connection.endheaders(request_body) File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 864, in _send_output self.send(msg) File "/usr/lib/python2.7/httplib.py", line 826, in send self.connect() File "/usr/lib/python2.7/httplib.py", line 807, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 571, in create_connection raise err error: [Errno 111] Connection refused
,
Apr 19 2018
The above comment is for network_WiFi_Reset test.
,
Apr 19 2018
For network_WiFi_Reset on Marvell 8997 systems; see this: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/852560 Particularly: """ OK, well Marvell is hopeless at fixing this, and I don't have time to debug failure modes in the autotest framework. If the error path takes 4+ minutes, then so be it. Not my fault... Watch out for seeing all-red for this test on Kevin/Bob now. """ And I finally bothered landing a hack solution to that here: https://b.corp.google.com/issues/64074778#comment50 So, the test is known broken on M66 on Bob and Kevin (where the test CL landed; the feature was already broken long before that), and is fixed now on M68 and also cherry-picked to M67 (please watch for new results >=10575.4.0). That should be a different class of issues than the $subject, where SuspendStress is failing on multiple non-Marvell boards -- clearly due to either ethernet network connectivity or suspend/resume crashing (or both). It's also quite likely not related to Wifi.
,
Apr 24 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/940e7b00c6336a3034d9efdb751750b1472b6f56 commit 940e7b00c6336a3034d9efdb751750b1472b6f56 Author: Dinesh Kumar Sunkara <dsunkara@chromium.org> Date: Tue Apr 24 21:27:37 2018
,
Apr 25 2018
After replacing the dongle, test now always fails with the reason "<class 'autotest_lib.client.cros.power.sys_power.SpuriousWakeupError'>:Woke from suspend early" Below is the link to stainless: https://stainless.corp.google.com/search?view=list&first_date=2018-04-19&last_date=2018-04-25&suite=%5Ewifi%5C_matfunc%24&test=network_WiFi_SuspendStress*&model=teemo&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false localhost ~ # lsusb Bus 001 Device 004: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter Bus 002 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 8087:0a2a Intel Corp. Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
,
Aug 2
|
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by briannorris@chromium.org
, Apr 2 2018Owner: yungleem@google.com