New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714286 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 714259
Owner: ----
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

test was aborted because of ssh connection timeout

Project Member Reported by nxia@chromium.org, Apr 21 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/veyron_speedy-paladin/5060

http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113653732

START	cheets_CTS.com.android.cts.dram	cheets_CTS.com.android.cts.dram	timestamp=1492776514	localtime=Apr 21 05:08:34	
	START	----	----	timestamp=1492776716	localtime=Apr 21 05:11:56	
		GOOD	----	sysinfo.before	timestamp=1492776716	localtime=Apr 21 05:11:56	
	END GOOD	----	----	timestamp=1492776716	localtime=Apr 21 05:11:56	
	START	----	----	timestamp=1492776952	localtime=Apr 21 05:15:52	
		GOOD	----	sysinfo.iteration.before	timestamp=1492776952	localtime=Apr 21 05:15:52	
	END GOOD	----	----	timestamp=1492776952	localtime=Apr 21 05:15:52	
	START	----	reboot	timestamp=1492777259	localtime=Apr 21 05:20:59	
		GOOD	----	reboot.start	timestamp=1492777259	localtime=Apr 21 05:20:59	
		GOOD	----	reboot.verify	timestamp=1492777283	localtime=Apr 21 05:21:23	
	END GOOD	----	reboot	kernel=3.14.0	localtime=Apr 21 05:21:25	timestamp=1492777285	
INFO	----	----	Job aborted by autotest_system on 2017-04-21 05:23:42
INFO	----	----	timestamp=1492777716	localtime=Apr 21 05:28:36	Start crashcollection record
INFO	----	Orphaned Crash Dump	timestamp=1492777716	localtime=Apr 21 05:28:36	/var/spool/crash/sslh_fork.20170421.052103.2356.core
INFO	----	Orphaned Crash Dump	timestamp=1492777716	localtime=Apr 21 05:28:36	/var/spool/crash/sslh_fork.20170421.052103.2356.diaglog
INFO	----	----	timestamp=1492777716	localtime=Apr 21 05:28:36	End crashcollection record




04/21 05:24:02.909 DEBUG|        base_utils:0185| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_q9pElQssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmpPElBqf -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=300 -l root -p 22 chromeos4-row4-rack10-host22'
04/21 05:24:08.041 INFO |      abstract_ssh:0824| Timed out waiting for master-ssh connection to be established.
04/21 05:24:16.763 DEBUG|      abstract_ssh:0756| Nuking master_ssh_job.
04/21 05:24:17.770 DEBUG|      abstract_ssh:0762| Cleaning master_ssh_tempdir.
04/21 05:24:18.448 DEBUG|          ssh_host:0284| Running (ssh) 'test ! -e /var/log/messages || cp -f /var/log/messages /var/tmp/messages.autotest_start'
04/21 05:24:18.467 INFO |      abstract_ssh:0809| Starting master ssh connection '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_JPD93yssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row4-rack10-host22'
04/21 05:24:18.468 DEBUG|        base_utils:0185| Running '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_JPD93yssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row4-rack10-host22'
04/21 05:24:23.559 INFO |      abstract_ssh:0824| Timed out waiting for master-ssh connection to be established.



 

Comment 1 by nxia@chromium.org, Apr 21 2017

Summary: test was aborted because of ssh connection timeout (was: test failed with ssh connection timeout and crash infos )
https://luci-milo.appspot.com/buildbot/chromeos/veyron_minnie-paladin/2326

04/20 09:36:59.479 DEBUG|          ssh_host:0284| Running (ssh) 'rm -f /usr/local/autotest/control.autoserv;rm -f /usr/local/autotest/control.autoserv.state;rm -f /usr/local/autotest/control;rm -f /usr/local/autotest/control.state'
04/20 09:42:00.134 DEBUG|        retry_util:0129| ending retries with error: <class 'chromite.lib.timeout_util.TimeoutError'>(Timeout occurred- waited 300 seconds.)
04/20 09:42:00.135 ERROR|        server_job:0809| Exception escaped control file, job aborting:
Traceback (most recent call last):
  File "/usr/local/autotest/server/server_job.py", line 801, in run
    self._execute_code(server_control_file, namespace)
  File "/usr/local/autotest/server/server_job.py", line 1301, in _execute_code
    execfile(code_file, namespace, namespace)
  File "/usr/local/autotest/results/113505234-chromeos-test/chromeos4-row9-rack11-host22/control.srv", line 10, in <module>
    job.parallel_simple(run_client, machines)
  File "/usr/local/autotest/server/server_job.py", line 625, in parallel_simple
    return_results=return_results)
  File "/usr/local/autotest/server/subcommand.py", line 93, in parallel_simple
    function(arg)
  File "/usr/local/autotest/results/113505234-chromeos-test/chromeos4-row9-rack11-host22/control.srv", line 7, in run_client
    at.run(control, host=host, use_packaging=use_packaging)
  File "/usr/local/autotest/server/autotest.py", line 381, in run
    client_disconnect_timeout, use_packaging=use_packaging)
  File "/usr/local/autotest/server/autotest.py", line 428, in _do_run
    repos = self.get_fetch_location()
  File "/usr/local/autotest/server/site_autotest.py", line 103, in get_fetch_location
    found_repo = self._get_fetch_location_from_host_attribute()
  File "/usr/local/autotest/server/site_autotest.py", line 47, in _get_fetch_location_from_host_attribute
    hosts = afe.get_hosts(hostname=self.host.hostname)
  File "/usr/local/autotest/server/frontend.py", line 538, in get_hosts
    hosts = self.run('get_hosts', **query_args)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 122, in GenericRetry
    ret = functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 81, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 107, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 114, in __call__
    respdata = urllib2.urlopen(request).read()
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
    r = h.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 444, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 400, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python2.7/socket.py", line 476, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/local/autotest/site-packages/chromite/lib/timeout_util.py", line 62, in kill_us
    raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 300 seconds.

Comment 2 by nxia@chromium.org, Apr 24 2017

Mergedinto: 714259
Status: Duplicate (was: Untriaged)

Sign in to add a comment