lars-release canary failed: SSHConnectionError: Connection to 100.115.202.109 timed out |
|||||||
Issue descriptionThis build: https://uberchromegw.corp.google.com/i/chromeos/builders/lars-release/builds/821 autotest log: ------------- 02/03 06:10:10.734 DEBUG| ssh_host:0272| Running (ssh) 'rsync --version' 02/03 06:10:10.914 DEBUG| abstract_ssh:0465| Using Rsync. 02/03 06:10:10.914 DEBUG| base_utils:0185| Running 'rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""' -------------- DUT is pingable, ssh-able, relevant part of log: -------------- 2017-02-03T06:10:05.148626-08:00 NOTICE ag[15783]: autotest server[stack::send_file|use_rsync|_check_rsync] -> ssh_run(rsync --version) 2017-02-03T06:19:20.489001-08:00 NOTICE simple-rotate[17122]: rotating /var/log/tlsdate.log 2017-02-03T06:25:59.935676-08:00 INFO periodic_scheduler[18112]: crash_sender: running /sbin/crash_sender 2017-02-03T06:25:59.944914-08:00 NOTICE crash_sender[18122]: Exiting early due to test image. 2017-02-03T06:25:59.948080-08:00 INFO periodic_scheduler[18124]: crash_sender: job completed 2017-02-03T06:26:03.858729-08:00 INFO periodic_scheduler[18140]: cros-machine-id-regen: running /usr/sbin/cros-machine-id-regen -r periodic -t 21600 2017-02-03T06:26:03.873558-08:00 NOTICE cros-machine-id-regen[18154]: Not regenerating since we did so 11161 seconds ago. 2017-02-03T06:26:03.874837-08:00 INFO periodic_scheduler[18155]: cros-machine-id-regen: job completed 2017-02-03T07:07:01.392447-08:00 NOTICE simple-rotate[24014]: rotating /var/log/tlsdate.log 2017-02-03T07:28:00.037526-08:00 INFO periodic_scheduler[27038]: crash_sender: running /sbin/crash_sender 2017-02-03T07:28:00.046953-08:00 NOTICE crash_sender[27048]: Exiting early due to test image. 2017-02-03T07:28:00.049963-08:00 INFO periodic_scheduler[27050]: crash_sender: job completed -------------- It seems rsync --version came through, but working rsync command did not. Sounds like connectivity issue in the lab.
,
Feb 3 2017
I think the build URL may be wrong, should be 820 instead of 821.
I have this snippet from build 820:
02/03 06:10:10.102 DEBUG| abstract_ssh:0587| Host chromeos4-row11-rack7-host15 is now up
02/03 06:10:10.102 INFO | autotest:0243| Installing autotest on chromeos4-row11-rack7-host15
02/03 06:10:10.102 INFO | autotest:0248| Using installation dir /tmp/sysinfo/autoserv-T7h4nZ
02/03 06:10:10.103 DEBUG| ssh_host:0272| Running (ssh) 'mkdir -p /tmp/sysinfo/autoserv-T7h4nZ'
02/03 06:10:10.288 DEBUG| ssh_host:0272| Running (ssh) 'rm -rf /tmp/sysinfo/autoserv-T7h4nZ/results/*'
02/03 06:10:10.733 WARNI| site_autotest:0052| No job_repo_url for <remote host: chromeos4-row11-rack7-host15>
02/03 06:10:10.734 INFO | autotest:0267| Could not install autotest using the packaging system: No repos to install an autotest client from. Trying other methods
02/03 06:10:10.734 DEBUG| abstract_ssh:0448| send_file. source: /usr/local/autotest/client/, dest: /tmp/sysinfo/autoserv-T7h4nZ, delete_dest: True,preserve_symlinks:False
02/03 06:10:10.734 DEBUG| ssh_host:0272| Running (ssh) 'rsync --version'
02/03 06:10:10.914 DEBUG| abstract_ssh:0465| Using Rsync.
02/03 06:10:10.914 DEBUG| base_utils:0185| Running 'rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""'
02/03 06:10:53.017 DEBUG| monitors:0191| ts_mon: sending 9 metrics to PubSub
02/03 08:17:56.353 WARNI| abstract_ssh:0474| trying scp, rsync failed: Command <rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""> failed, rc=11, Command returned non-zero exit status
* Command:
rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x -o
ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o
StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes
-o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3
-o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g
"/usr/local/autotest/client/"
"root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""
Exit status: 11
Duration: 7665.39836812
How did an rsync command with a 1800 second timeout end up taking 7665 seconds???? The only explanation I can think of is that maybe rsync was going to start counting only after a successful ssh connection, which may not have happened. But the ssh has a connect timeout of 30s and a ServerAlive interval of 900s. SAD.
,
Feb 6 2017
Likely due to dhcp outage, marking fixed. Reopen if keeps happening.
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by slavamn@chromium.org
, Feb 3 2017