New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 688424 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

lars-release canary failed: SSHConnectionError: Connection to 100.115.202.109 timed out

Project Member Reported by slavamn@chromium.org, Feb 3 2017

Issue description


This build:
https://uberchromegw.corp.google.com/i/chromeos/builders/lars-release/builds/821

autotest log:

-------------
02/03 06:10:10.734 DEBUG|          ssh_host:0272| Running (ssh) 'rsync --version'
02/03 06:10:10.914 DEBUG|      abstract_ssh:0465| Using Rsync.
02/03 06:10:10.914 DEBUG|        base_utils:0185| Running 'rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""'
--------------

DUT is pingable, ssh-able, relevant part of log:
--------------
2017-02-03T06:10:05.148626-08:00 NOTICE ag[15783]: autotest server[stack::send_file|use_rsync|_check_rsync] -> ssh_run(rsync --version)
2017-02-03T06:19:20.489001-08:00 NOTICE simple-rotate[17122]: rotating /var/log/tlsdate.log
2017-02-03T06:25:59.935676-08:00 INFO periodic_scheduler[18112]: crash_sender: running /sbin/crash_sender
2017-02-03T06:25:59.944914-08:00 NOTICE crash_sender[18122]: Exiting early due to test image.
2017-02-03T06:25:59.948080-08:00 INFO periodic_scheduler[18124]: crash_sender: job completed
2017-02-03T06:26:03.858729-08:00 INFO periodic_scheduler[18140]: cros-machine-id-regen: running /usr/sbin/cros-machine-id-regen -r periodic -t 21600
2017-02-03T06:26:03.873558-08:00 NOTICE cros-machine-id-regen[18154]: Not regenerating since we did so 11161 seconds ago.
2017-02-03T06:26:03.874837-08:00 INFO periodic_scheduler[18155]: cros-machine-id-regen: job completed
2017-02-03T07:07:01.392447-08:00 NOTICE simple-rotate[24014]: rotating /var/log/tlsdate.log
2017-02-03T07:28:00.037526-08:00 INFO periodic_scheduler[27038]: crash_sender: running /sbin/crash_sender
2017-02-03T07:28:00.046953-08:00 NOTICE crash_sender[27048]: Exiting early due to test image.
2017-02-03T07:28:00.049963-08:00 INFO periodic_scheduler[27050]: crash_sender: job completed
--------------

It seems rsync --version came through, but working rsync command did not. Sounds like connectivity issue in the lab.
 
Summary: lars-release canary failed: SSHConnectionError: Connection to 100.115.202.109 timed out (was: larse-release canary failed: SSHConnectionError: Connection to 100.115.202.109 timed out )
Labels: OS-Chrome
I think the build URL may be wrong, should be 820 instead of 821.

I have this snippet from build 820:

02/03 06:10:10.102 DEBUG|      abstract_ssh:0587| Host chromeos4-row11-rack7-host15 is now up
02/03 06:10:10.102 INFO |          autotest:0243| Installing autotest on chromeos4-row11-rack7-host15
02/03 06:10:10.102 INFO |          autotest:0248| Using installation dir /tmp/sysinfo/autoserv-T7h4nZ
02/03 06:10:10.103 DEBUG|          ssh_host:0272| Running (ssh) 'mkdir -p /tmp/sysinfo/autoserv-T7h4nZ'
02/03 06:10:10.288 DEBUG|          ssh_host:0272| Running (ssh) 'rm -rf /tmp/sysinfo/autoserv-T7h4nZ/results/*'
02/03 06:10:10.733 WARNI|     site_autotest:0052| No job_repo_url for <remote host: chromeos4-row11-rack7-host15>
02/03 06:10:10.734 INFO |          autotest:0267| Could not install autotest using the packaging system: No repos to install an autotest client from. Trying other methods
02/03 06:10:10.734 DEBUG|      abstract_ssh:0448| send_file. source: /usr/local/autotest/client/, dest: /tmp/sysinfo/autoserv-T7h4nZ, delete_dest: True,preserve_symlinks:False
02/03 06:10:10.734 DEBUG|          ssh_host:0272| Running (ssh) 'rsync --version'
02/03 06:10:10.914 DEBUG|      abstract_ssh:0465| Using Rsync.
02/03 06:10:10.914 DEBUG|        base_utils:0185| Running 'rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""'
02/03 06:10:53.017 DEBUG|          monitors:0191| ts_mon: sending 9 metrics to PubSub
02/03 08:17:56.353 WARNI|      abstract_ssh:0474| trying scp, rsync failed: Command <rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/usr/local/autotest/client/" "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""> failed, rc=11, Command returned non-zero exit status
* Command: 
    rsync -L --delete --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o
    ControlPath=/tmp/_autotmp_H6AvbHssh-master/socket -o
    StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes
    -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3
    -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g
    "/usr/local/autotest/client/"
    "root@chromeos4-row11-rack7-host15:"/tmp/sysinfo/autoserv-T7h4nZ""
Exit status: 11
Duration: 7665.39836812

How did an rsync command with a 1800 second timeout end up taking 7665 seconds????  The only explanation I can think of is that maybe rsync was going to start counting only after a successful ssh connection, which may not have happened.  But the ssh has a connect timeout of 30s and a ServerAlive interval of 900s.  SAD.


Owner: sbasi@chromium.org
Status: Fixed (was: Untriaged)
Likely due to dhcp outage, marking fixed. Reopen if keeps happening.

Comment 4 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 5 by dchan@google.com, May 30 2017

Labels: VerifyIn-60

Comment 6 by dchan@chromium.org, Aug 1 2017

Labels: VerifyIn-61

Comment 7 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment