New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 828178 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

[Wifi_Matfunc] Wifi_SuspendStress tests are failing on (Teemo & Kench) with "Connection refused" error

Project Member Reported by rpattumani@chromium.org, Apr 2 2018

Issue description

Wifi_SuspendStress tests fail with the below error message:
"failing with Unhandled error: [Errno 111] Connection refused"

Stainless link - 

https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2017-10-01&last_date=2018-04-02&test=%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11a%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11b%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.11g%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.24HT40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.Hidden%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WEP40%24%7C%5Enetwork%5C_WiFi%5C_SuspendStress%5C.WPA2%24&board=fizz&status=GOOD&status=WARN&status=FAIL&status=ERROR&status=ABORT&status=ALERT&status=RUNNING&status=TEST_NA&status=NOSTATUS&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false

Error snapshot:

03/27 20:19:35.740 ERROR|         threading:0823| Exception in thread Thread-1:
03/27 20:19:35.741 ERROR|         threading:0823| Traceback (most recent call last):
03/27 20:19:35.741 ERROR|         threading:0823|   File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
03/27 20:19:35.742 ERROR|         threading:0823|     self.run()
03/27 20:19:35.743 ERROR|         threading:0823|   File "/mnt/host/source/src/third_party/autotest/files/server/cros/stress.py", line 61, in run
03/27 20:19:35.743 ERROR|         threading:0823|     self._loop_stressor()
03/27 20:19:35.744 ERROR|         threading:0823|   File "/mnt/host/source/src/third_party/autotest/files/server/cros/stress.py", line 183, in _loop_stressor
03/27 20:19:35.745 ERROR|         threading:0823|     self.stressor()
03/27 20:19:35.745 ERROR|         threading:0823|   File "/mnt/host/source/src/third_party/autotest/files/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend
03/27 20:19:35.746 ERROR|         threading:0823|     self.stress_wifi_suspend(try_servo=False)
03/27 20:19:35.747 ERROR|         threading:0823|   File "/mnt/host/source/src/third_party/autotest/files/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend
03/27 20:19:35.747 ERROR|         threading:0823|     self.context.client.do_suspend(_DELAY)
03/27 20:19:35.748 ERROR|         threading:0823|   File "/mnt/host/source/src/third_party/autotest/files/server/cros/network/wifi_client.py", line 734, in do_suspend
03/27 20:19:35.749 ERROR|         threading:0823|     self._shill_proxy.do_suspend(seconds)
03/27 20:19:35.749 ERROR|         threading:0823|   File "/usr/lib64/python2.7/xmlrpclib.py", line 1240, in __call__
03/27 20:19:35.750 ERROR|         threading:0823|     return self.__send(self.__name, args)
03/27 20:19:35.750 ERROR|         threading:0823|   File "/usr/lib64/python2.7/xmlrpclib.py", line 1599, in __request
03/27 20:19:35.751 ERROR|         threading:0823|     verbose=self.__verbose
03/27 20:19:35.752 ERROR|         threading:0823|   File "/usr/lib64/python2.7/xmlrpclib.py", line 1280, in request
03/27 20:19:35.752 ERROR|         threading:0823|     return self.single_request(host, handler, request_body, verbose)
03/27 20:19:35.753 ERROR|         threading:0823|   File "/usr/lib64/python2.7/xmlrpclib.py", line 1308, in single_request
03/27 20:19:35.753 ERROR|         threading:0823|     self.send_content(h, request_body)
03/27 20:19:35.754 ERROR|         threading:0823|   File "/usr/lib64/python2.7/xmlrpclib.py", line 1456, in send_content
03/27 20:19:35.754 ERROR|         threading:0823|     connection.endheaders(request_body)
03/27 20:19:35.755 ERROR|         threading:0823|   File "/usr/lib64/python2.7/httplib.py", line 1049, in endheaders
03/27 20:19:35.756 ERROR|         threading:0823|     self._send_output(message_body)
03/27 20:19:35.756 ERROR|         threading:0823|   File "/usr/lib64/python2.7/httplib.py", line 893, in _send_output
03/27 20:19:35.757 ERROR|         threading:0823|     self.send(msg)
03/27 20:19:35.757 ERROR|         threading:0823|   File "/usr/lib64/python2.7/httplib.py", line 855, in send
03/27 20:19:35.758 ERROR|         threading:0823|     self.connect()
03/27 20:19:35.758 ERROR|         threading:0823|   File "/usr/lib64/python2.7/httplib.py", line 832, in connect
03/27 20:19:35.759 ERROR|         threading:0823|     self.timeout, self.source_address)
03/27 20:19:35.759 ERROR|         threading:0823|   File "/usr/lib64/python2.7/socket.py", line 575, in create_connection
03/27 20:19:35.760 ERROR|         threading:0823|     raise err
03/27 20:19:35.761 ERROR|         threading:0823| error: [Errno 111] Connection refused
03/27 20:19:35.761 ERROR|         threading:0823| 
03/27 20:19:35.762 DEBUG|              test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.
 

 
autoserv.DEBUG
206 KB Download
Labels: -M66 -M67 M-66 M-67
Owner: yungleem@google.com
Looks like the device either never came back or came back very slowly from suspend.

Somebody should check out this device to see if it's in good shape.
Cc: igo@chromium.org yungleem@chromium.org
Owner: rajatja@chromium.org

Comment 3 by rajatja@google.com, Apr 4 2018

Owner: rajatja@google.com

Comment 4 by rajatja@google.com, Apr 4 2018

(Random mumbling): I couldn't figure out when was this last passing (I checked for the last 3 months), was this test ever passing? Secondly, is this test only run on this DUT (because may be this is places in some wifi shielded cell)?

Comment 5 by rajatja@google.com, Apr 4 2018

BTW, I ran suspend_stress_test a couple of times on this DUT, and there were instances where it did not come back (as there were when it did come back). So yeah, there is some issue.
I believe that this is the only DUT running the test. If suspend/resume is known to be reliable on Fizz, then there's probably some DUT-specific problem.

Comment 7 by rajatja@google.com, Apr 4 2018

igo@ Can we provide another DUT to the lab team to swap this one out? Investigating this DUT may be easier on the desk.

Comment 9 by rajatja@google.com, Apr 4 2018

Cc: benzh@google.com benzh@chromium.org
[+benzh]

Hi Ben, I assumed suspend / resume is quite stable on Fizz?
Cc: rpattumani@chromium.org
Labels: wifi-test-failures
WiFi Suspend Stress fails on various boards (bob, celes, eve, kench, kevin, teemo, tidus, mighty) consistently due to "Unhandled error: [Errno 111] Connection refused"


https://stainless.corp.google.com/search?view=matrix&row=model&col=test&first_date=2018-03-21&last_date=2018-04-17&suite=wifi_matfunc&status=FAIL&reason=Connection+refused&exclude_cts=true&exclude_not_run=true&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
Summary: [Wifi_Matfunc] Wifi_SuspendStress tests are failing on (Teemo & Kench) with "Connection refused" error (was: [Wifi_Matfunc] Wifi_SuspendStress tests are failing on (Teemo & Kench) Board-Fizz.)

Comment 12 by rajatja@google.com, Apr 18 2018

Owner: kirtika@google.com
(Assigning to Kirtika as discussed)
Sample log: 

04/17 10:43:09.097 INFO | site_linux_router:0454| AP configured.
04/17 10:43:09.098 INFO |wifi_test_context_:0255| Connecting to SuspendStress_a_i6ggc_ch1.
04/17 10:43:13.482 DEBUG|      xmlrpc_types:0056| Constructing AssociationResult object with args=[], kwargs={'configuration_time': 0.4057610034942627, 'association_time': 0.20685601234436035, 'discovery_time': 3.7171101570129395, 'success': True, 'failure_reason': 'SUCCESS(Connection successful)'}
04/17 10:43:13.482 INFO |wifi_test_context_:0263| Finished connection attempt to SuspendStress_a_i6ggc_ch1 with times: discovery=3.72, association=0.21, configuration=0.41.
04/17 10:43:13.495 DEBUG|          ssh_host:0301| Running (ssh) 'iw dev wlan0 info' from 'assert_connect_wifi|wifi_signal_level|signal_level|is_wifi_device|run|run_very_slowly'
04/17 10:43:13.815 DEBUG|             utils:0282| [stdout] Interface wlan0
04/17 10:43:13.815 DEBUG|             utils:0282| [stdout] 	ifindex 2
04/17 10:43:13.815 DEBUG|             utils:0282| [stdout] 	wdev 0x1
04/17 10:43:13.815 DEBUG|             utils:0282| [stdout] 	addr f8:59:71:e8:7a:2e
04/17 10:43:13.816 DEBUG|             utils:0282| [stdout] 	type managed
04/17 10:43:13.816 DEBUG|             utils:0282| [stdout] 	wiphy 0
04/17 10:43:13.816 DEBUG|             utils:0282| [stdout] 	channel 1 (2412 MHz), width: 20 MHz (no HT), center1: 2412 MHz
04/17 10:43:13.816 DEBUG|             utils:0282| [stdout] 	txpower 22.00 dBm
04/17 10:43:13.874 DEBUG|          ssh_host:0301| Running (ssh) 'iw dev wlan0 link' from 'run_once|assert_connect_wifi|wifi_signal_level|signal_level|run|run_very_slowly'
04/17 10:43:14.203 DEBUG|             utils:0282| [stdout] Connected to f4:f2:6d:ce:ac:91 (on wlan0)
04/17 10:43:14.204 DEBUG|             utils:0282| [stdout] 	SSID: SuspendStress_a_i6ggc_ch1
04/17 10:43:14.204 DEBUG|             utils:0282| [stdout] 	freq: 2412
04/17 10:43:14.204 DEBUG|             utils:0282| [stdout] 	RX: 2761 bytes (19 packets)
04/17 10:43:14.204 DEBUG|             utils:0282| [stdout] 	TX: 2429 bytes (17 packets)
04/17 10:43:14.204 DEBUG|             utils:0282| [stdout] 	signal: -6 dBm
04/17 10:43:14.205 DEBUG|             utils:0282| [stdout] 	tx bitrate: 11.0 MBit/s
04/17 10:43:14.205 DEBUG|             utils:0282| [stdout] 
04/17 10:43:14.205 DEBUG|             utils:0282| [stdout] 	bss flags:	
04/17 10:43:14.205 DEBUG|             utils:0282| [stdout] 	dtim period:	2
04/17 10:43:14.205 DEBUG|             utils:0282| [stdout] 	beacon int:	100
04/17 10:43:14.207 INFO |wifi_test_context_:0282| Connected successfully to SuspendStress_a_i6ggc_ch1, signal level: -6.
04/17 10:43:14.207 INFO |network_WiFi_Suspe:0142| Running Non integration Wifi Stress test.
04/17 10:43:14.208 INFO |            stress:0182| Stressor iteration: 1 of 5
04/17 10:43:14.212 DEBUG|          ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from '_loop_stressor|powerd_non_servo_wifi_suspend|stress_wifi_suspend|get_boot_id|run|run_very_slowly'
04/17 10:43:14.545 DEBUG|             utils:0282| [stdout] d44b7e11-ac2b-473d-9534-fffc10c45ae6
04/17 10:43:14.596 INFO |       wifi_client:0733| Suspending DUT for 10 seconds...
04/17 11:13:58.114 ERROR|         threading:0823| Exception in thread Thread-1:
04/17 11:13:58.115 ERROR|         threading:0823| Traceback (most recent call last):
04/17 11:13:58.115 ERROR|         threading:0823|   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
04/17 11:13:58.116 ERROR|         threading:0823|     self.run()
04/17 11:13:58.116 ERROR|         threading:0823|   File "/usr/local/autotest/server/cros/stress.py", line 61, in run
04/17 11:13:58.116 ERROR|         threading:0823|     self._loop_stressor()
04/17 11:13:58.116 ERROR|         threading:0823|   File "/usr/local/autotest/server/cros/stress.py", line 183, in _loop_stressor
04/17 11:13:58.117 ERROR|         threading:0823|     self.stressor()
04/17 11:13:58.117 ERROR|         threading:0823|   File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend
04/17 11:13:58.117 ERROR|         threading:0823|     self.stress_wifi_suspend(try_servo=False)
04/17 11:13:58.118 ERROR|         threading:0823|   File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend
04/17 11:13:58.118 ERROR|         threading:0823|     self.context.client.do_suspend(_DELAY)
04/17 11:13:58.118 ERROR|         threading:0823|   File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend
04/17 11:13:58.119 ERROR|         threading:0823|     self._shill_proxy.do_suspend(seconds)
04/17 11:13:58.119 ERROR|         threading:0823|   File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
04/17 11:13:58.119 ERROR|         threading:0823|     return self.__send(self.__name, args)
04/17 11:13:58.120 ERROR|         threading:0823|   File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
04/17 11:13:58.120 ERROR|         threading:0823|     verbose=self.__verbose
04/17 11:13:58.120 ERROR|         threading:0823|   File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
04/17 11:13:58.121 ERROR|         threading:0823|     return self.single_request(host, handler, request_body, verbose)
04/17 11:13:58.121 ERROR|         threading:0823|   File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
04/17 11:13:58.121 ERROR|         threading:0823|     self.send_content(h, request_body)
04/17 11:13:58.122 ERROR|         threading:0823|   File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
04/17 11:13:58.122 ERROR|         threading:0823|     connection.endheaders(request_body)
04/17 11:13:58.122 ERROR|         threading:0823|   File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
04/17 11:13:58.123 ERROR|         threading:0823|     self._send_output(message_body)
04/17 11:13:58.123 ERROR|         threading:0823|   File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
04/17 11:13:58.123 ERROR|         threading:0823|     self.send(msg)
04/17 11:13:58.124 ERROR|         threading:0823|   File "/usr/lib/python2.7/httplib.py", line 826, in send
04/17 11:13:58.124 ERROR|         threading:0823|     self.connect()
04/17 11:13:58.124 ERROR|         threading:0823|   File "/usr/lib/python2.7/httplib.py", line 807, in connect
04/17 11:13:58.124 ERROR|         threading:0823|     self.timeout, self.source_address)
04/17 11:13:58.125 ERROR|         threading:0823|   File "/usr/lib/python2.7/socket.py", line 571, in create_connection
04/17 11:13:58.125 ERROR|         threading:0823|     raise err
04/17 11:13:58.125 ERROR|         threading:0823| error: [Errno 111] Connection refused
04/17 11:13:58.126 ERROR|         threading:0823| 
04/17 11:13:58.127 DEBUG|              test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.


Here are the complete test and device logs for the above test..https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/192625581-chromeos-test/chromeos15-row4-rack12-host2/

Missed some...

04/17 11:13:58.127 DEBUG|              test:0410| Test failed due to [Errno 111] Connection refused. Exception log follows the after_iteration_hooks.
04/17 11:13:58.128 DEBUG|              test:0415| Starting after_iteration_hooks for network_WiFi_SuspendStress.11b
04/17 11:13:58.142 DEBUG|          ssh_host:0301| Running (ssh) 'test -e "/tmp/sysinfo/autoserv-ozMxUb"' from 'decorated_func|wrapper|_install|path_exists|run|run_very_slowly'
04/17 11:15:01.373 DEBUG|             utils:0282| [stderr] ssh: connect to host chromeos15-row4-rack12-host2 port 22: Connection timed out
04/17 11:15:01.376 ERROR|               log:0027| post-test iteration server sysinfo error:
04/17 11:15:01.377 ERROR|         traceback:0013| Traceback (most recent call last):
04/17 11:15:01.377 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/log.py", line 25, in decorated_func
04/17 11:15:01.378 ERROR|         traceback:0013|     fn(*args, **dargs)
04/17 11:15:01.378 ERROR|         traceback:0013|   File "/usr/local/autotest/server/test.py", line 66, in wrapper
04/17 11:15:01.378 ERROR|         traceback:0013|     host, at, outputdir = self._install()
04/17 11:15:01.379 ERROR|         traceback:0013|   File "/usr/local/autotest/server/test.py", line 135, in _install
04/17 11:15:01.379 ERROR|         traceback:0013|     if not autodir or not self.host.path_exists(autodir):
04/17 11:15:01.380 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 526, in path_exists
04/17 11:15:01.380 ERROR|         traceback:0013|     ignore_status=True)
04/17 11:15:01.380 ERROR|         traceback:0013|   File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run
04/17 11:15:01.381 ERROR|         traceback:0013|     return self.run_very_slowly(*args, **kwargs)
04/17 11:15:01.381 ERROR|         traceback:0013|   File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly
04/17 11:15:01.382 ERROR|         traceback:0013|     ssh_failure_retry_ok)
04/17 11:15:01.382 ERROR|         traceback:0013|   File "/usr/local/autotest/server/hosts/ssh_host.py", line 254, in _run
04/17 11:15:01.382 ERROR|         traceback:0013|     raise error.AutoservSSHTimeout("ssh timed out", result)
04/17 11:15:01.388 ERROR|         traceback:0013| AutoservSSHTimeout: ('ssh timed out', * Command: 
04/17 11:15:01.389 ERROR|         traceback:0013|     /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o
04/17 11:15:01.389 ERROR|         traceback:0013|     UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
04/17 11:15:01.389 ERROR|         traceback:0013|     ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
04/17 11:15:01.390 ERROR|         traceback:0013|     -l root -p 22 chromeos15-row4-rack12-host2 "export LIBC_FATAL_STDERR_=1;
04/17 11:15:01.390 ERROR|         traceback:0013|     if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
04/17 11:15:01.390 ERROR|         traceback:0013|     \"server[stack::_install|path_exists|run] -> ssh_run(test -e
04/17 11:15:01.391 ERROR|         traceback:0013|     \\\"/tmp/sysinfo/autoserv-ozMxUb\\\")\";fi; test -e \"/tmp/sysinfo
04/17 11:15:01.391 ERROR|         traceback:0013|     /autoserv-ozMxUb\""
04/17 11:15:01.391 ERROR|         traceback:0013| Exit status: 255
04/17 11:15:01.391 ERROR|         traceback:0013| Duration: 63.2169890404
04/17 11:15:01.392 ERROR|         traceback:0013| 
04/17 11:15:01.392 ERROR|         traceback:0013| stderr:
04/17 11:15:01.392 ERROR|         traceback:0013| ssh: connect to host chromeos15-row4-rack12-host2 port 22: Connection timed out)
04/17 11:15:01.393 DEBUG|              test:0420| after_iteration_hooks completed
04/17 11:15:01.395 WARNI|              test:0637| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function
    raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 147, in run_once
    stressor.wait()
  File "/usr/local/autotest/server/cros/stress.py", line 209, in wait
    self.reraise()
  File "/usr/local/autotest/server/cros/stress.py", line 61, in run
    self._loop_stressor()
  File "/usr/local/autotest/server/cros/stress.py", line 183, in _loop_stressor
    self.stressor()
  File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 103, in powerd_non_servo_wifi_suspend
    self.stress_wifi_suspend(try_servo=False)
  File "/usr/local/autotest/server/site_tests/network_WiFi_SuspendStress/network_WiFi_SuspendStress.py", line 88, in stress_wifi_suspend
    self.context.client.do_suspend(_DELAY)
  File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend
    self._shill_proxy.do_suspend(seconds)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
Cc: tbroch@chromium.org
Why are we looking at this as a Wifi issue? Comment #5 confirmed issues on suspend_stress_test (which does not really involve Wifi much). And those latest pastes look very very similar, where we time out waiting for the device to come back after 30 minutes.

Similarly, there are plenty of power_SuspendStress failures, across a lot of boards. Filtering just for "Autotest client terminated unexpectedly: DUT rebooted during the test run.":

https://stainless.corp.google.com/search?view=matrix&row=model&col=build&first_date=2018-04-12&last_date=2018-04-18&test=%5Epower%5C_SuspendStress%5C..*%24&reason=Autotest+client+terminated+unexpectedly%3A+DUT+rebooted+during+the+test+run.&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
Cc: grundler@chromium.org
Poking at one arbitrary device from the failure list (chromeos15-row1-rack4-host6), I see that the ethernet link often takes a really long time to come up after suspend/resume. e.g.:

[ 5659.322415] PM: suspend exit 2018-04-18 20:22:53.495005400 UTC
[ 5678.089814] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1

or in some cases, it bounces up and down:

[ 5173.315217] PM: suspend exit 2018-04-18 20:14:47.467667532 UTC
[ 5173.447165] [drm] RC6 on
[ 5180.340079] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1
[ 5180.595877] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0
[ 5183.923951] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1
[ 5184.307765] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0
[ 5187.507698] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1
[ 5187.891817] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0
[ 5191.475692] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1
[ 5191.859553] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 0
[ 5196.513220] i2c_designware i2c_designware.2: i2c_dw_handle_tx_abort: slave address not acknowledged (7bit mode)
[ 5199.923257] ax88179_178a 2-5:1.0 eth0: ax88179 - Link status is: 1


In fact, I can't even pass power_SuspendStress.bare here. But it's not just because of the loss of SSH connectivity; it looks like the device actively rebooted shortly after a suspend/resume attempt:

2018-04-18T13:30:59.541186-07:00 NOTICE pre-shutdown[8622]: Shutting down for reboot: not-via-powerd


For some reason, this tickles me as a 'recover_duts' problem, combined with flaky ethernet.
Ah, so the above failure is because power_SuspendStress now is somewhat aggressive about getting the network back up; if it doesn't come up in approximately 10 seconds, we reboot. Perhaps that's too aggressive (I think I'll increase that). At any rate, the ethernet on this particular device is very flaky. It's going in and out even with no apparent involvement from either suspend/resume or recover_duts.

After talking with grundler, I see that several of those failing devices seem to have an unsupported ASIX ethernet dongle:

chromeos15-row1-rack3-host2:Bus 002 Device 007: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
chromeos15-row1-rack3-host3:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
chromeos15-row1-rack4-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
chromeos15-row1-rack6-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
chromeos15-row1-rack8-host6:Bus 002 Device 002: ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet

It's our understanding that all such adapters should be getting replaced. Filed bug 834493 for these.

There are likely different problems on some of the others.
re comment #16: the rate that the link is bouncing suggests to me the link partner is bouncing the link or there is a flakey cable. That's too quick for recover_duts.


re comment #17: AX88179 (GigE) was more ~5x reliable than AX88772 variants (100Mbit). But still not as good as any RTL8153 based USB ethernet dongle on the older kernels that I originally measured this against.

So while I'd still like to see the AX88179 dongles replaced, IIRC, the failure to bring a link up was slightly less than 1 in 1000 linkup/down events. So not likely the issue with chromeos15-row1-rack4-host6. My guess is something else is wrong with the link.
I see the same sample failure for kevin and bob on R66 and R67 builds. R65 and R68 does not see this issue.

https://stainless.corp.google.com/search?view=list&first_date=2018-03-09&last_date=2018-04-19&suite=wifi_matfunc&test=%5Enetwork_WiFi_Reset%24&board=bob%7Ckevin&status=GOOD&status=FAIL&status=ERROR&status=ABORT&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true


Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 631, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 837, in _call_test_function
    raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled error: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/network_WiFi_Reset/network_WiFi_Reset.py", line 116, in run_once
    client.do_suspend(self._SUSPEND_DELAY)
  File "/usr/local/autotest/server/cros/network/wifi_client.py", line 734, in do_suspend
    self._shill_proxy.do_suspend(seconds)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 1013, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 864, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 826, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
The above comment is for network_WiFi_Reset test. 
For network_WiFi_Reset on Marvell 8997 systems; see this:

https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/852560

Particularly:

"""
OK, well Marvell is hopeless at fixing this, and I don't have time to debug failure modes in the autotest framework. If the error path takes 4+ minutes, then so be it. Not my fault...

Watch out for seeing all-red for this test on Kevin/Bob now.
"""

And I finally bothered landing a hack solution to that here:

https://b.corp.google.com/issues/64074778#comment50

So, the test is known broken on M66 on Bob and Kevin (where the test CL landed; the feature was already broken long before that), and is fixed now on M68 and also cherry-picked to M67 (please watch for new results >=10575.4.0).

That should be a different class of issues than the $subject, where SuspendStress is failing on multiple non-Marvell boards -- clearly due to either ethernet network connectivity or suspend/resume crashing (or both). It's also quite likely not related to Wifi.
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 24 2018

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/940e7b00c6336a3034d9efdb751750b1472b6f56

commit 940e7b00c6336a3034d9efdb751750b1472b6f56
Author: Dinesh Kumar Sunkara <dsunkara@chromium.org>
Date: Tue Apr 24 21:27:37 2018

After replacing the dongle, test now always fails with the reason "<class 'autotest_lib.client.cros.power.sys_power.SpuriousWakeupError'>:Woke from suspend early"

Below is the link to stainless: 

https://stainless.corp.google.com/search?view=list&first_date=2018-04-19&last_date=2018-04-25&suite=%5Ewifi%5C_matfunc%24&test=network_WiFi_SuspendStress*&model=teemo&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false


localhost ~ # lsusb
Bus 001 Device 004: ID 0424:ec00 Standard Microsystems Corp. SMSC9512/9514 Fast Ethernet Adapter
Bus 002 Device 002: ID 0bda:8153 Realtek Semiconductor Corp. 
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 8087:0a2a Intel Corp. 
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp. SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Owner: kirtika@chromium.org

Sign in to add a comment