New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714259 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Tests failed or aborted because of network connection issues (ssh connection failures, rsync failures, remote command failures)

Project Member Reported by nxia@chromium.org, Apr 21 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/peppy-paladin/14929

http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=113662642


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/113662642-chromeos-test/chromeos4-row6-rack13-host19/




04/21 07:56:45.177 DEBUG|      abstract_ssh:0357| Using Rsync.
04/21 07:56:45.178 DEBUG|        base_utils:0185| Running 'rsync -l  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_iXMfJQssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos4-row6-rack13-host19:"/usr/local/autotest/results/default/" "/usr/local/autotest/results/113662642-chromeos-test/chromeos4-row6-rack13-host19"'
04/21 07:56:55.051 WARNI|      abstract_ssh:0387| trying scp, rsync failed: Command <rsync -l  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_iXMfJQssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos4-row6-rack13-host19:"/usr/local/autotest/results/default/" "/usr/local/autotest/results/113662642-chromeos-test/chromeos4-row6-rack13-host19"> failed, rc=23, Command returned non-zero exit status
* Command: 
    rsync -l  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp
    /_autotmp_iXMfJQssh-master/socket -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -o Protocol=2 -l root -p 22' -az --no-o --no-g
    root@chromeos4-row6-rack13-host19:"/usr/local/autotest/results/default/"
    "/usr/local/autotest/results/113662642-chromeos-
    test/chromeos4-row6-rack13-host19"
Exit status: 23
Duration: 9.76802301407

stderr:
rsync: change_dir "/usr/local/autotest/results/default" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.0]
rsync: [Receiver] write error: Broken pipe (32) (23)
04/21 07:56:55.053 DEBUG|      abstract_ssh:0390| Trying scp.
04/21 07:56:55.054 DEBUG|          ssh_host:0284| Running (ssh) 'ls "/usr/local/autotest/results/default/"*'
04/21 07:57:09.410 DEBUG|        base_utils:0280| [stderr] ls: cannot access /usr/local/autotest/results/default/*: No such file or directory
04/21 07:57:16.276 DEBUG|          ssh_host:0284| Running (ssh) 'ls "/usr/local/autotest/results/default/".[!.]*'
04/21 07:57:24.399 DEBUG|        base_utils:0280| [stderr] ls: cannot access /usr/local/autotest/results/default/.[!.]*: No such file or directory
04/21 07:57:24.403 DEBUG|        server_job:1371| Client state file /usr/local/autotest/results/113662642-chromeos-test/chromeos4-row6-rack13-host19/control.autoserv.state not found
04/21 07:57:24.405 DEBUG|          base_job:0392| Persistent state client.* deleted
04/21 07:57:24.407 DEBUG|          autotest:0966| Autotest job finishes.
04/21 07:57:24.408 ERROR|        server_job:0809| Exception escaped control file, job aborting:
Traceback (most recent call last):
  File "/usr/local/autotest/server/server_job.py", line 801, in run
    self._execute_code(server_control_file, namespace)
  File "/usr/local/autotest/server/server_job.py", line 1301, in _execute_code
    execfile(code_file, namespace, namespace)
  File "/usr/local/autotest/results/113662642-chromeos-test/chromeos4-row6-rack13-host19/control.srv", line 10, in <module>
    job.parallel_simple(run_client, machines)
  File "/usr/local/autotest/server/server_job.py", line 625, in parallel_simple
    return_results=return_results)
  File "/usr/local/autotest/server/subcommand.py", line 93, in parallel_simple
    function(arg)
  File "/usr/local/autotest/results/113662642-chromeos-test/chromeos4-row6-rack13-host19/control.srv", line 7, in run_client
    at.run(control, host=host, use_packaging=use_packaging)
  File "/usr/local/autotest/server/autotest.py", line 381, in run
    client_disconnect_timeout, use_packaging=use_packaging)
  File "/usr/local/autotest/server/autotest.py", line 464, in _do_run
    client_disconnect_timeout=client_disconnect_timeout)
  File "/usr/local/autotest/server/autotest.py", line 896, in execute_control
    boot_id = self.host.get_boot_id()
  File "/usr/local/autotest/client/common_lib/hosts/base_classes.py", line 238, in get_boot_id
    boot_id = self.run(cmd, timeout=timeout).stdout.strip()
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 300, in run
    raise error.AutoservRunError(timeout_message, cmderr.args[1])
AutoservRunError: Timeout encountered: /usr/bin/ssh -a -x    -o ControlPath=/tmp/_autotmp_ryEZZ8ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row6-rack13-host19 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::_do_run|execute_control|get_boot_id] -> ssh_run(if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi)\";fi; if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi"
* Command: 
    /usr/bin/ssh -a -x    -o ControlPath=/tmp/_autotmp_ryEZZ8ssh-
    master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
    -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
    ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22
    chromeos4-row6-rack13-host19 "export LIBC_FATAL_STDERR_=1; if type
    \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::_do_run|execute_control|get_boot_id] -> ssh_run(if [ -f
    '/proc/sys/kernel/random/boot_id' ]; then cat
    '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available';
    fi)\";fi; if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat
    '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi"
Exit status: 255
Duration: 61.5624740124

stdout:
4c362b13-4584-4c9a-86dd-59720bc9b38e
 

Comment 1 by nxia@chromium.org, Apr 21 2017

Summary: Test failures caused by unstable ssh connections and rsync failures (was: peppey-paladin: platform_Perf failed because ssh connection and rsync failure)
https://luci-milo.appspot.com/buildbot/chromeos/lumpy-paladin/28067
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/113649221-chromeos-test/chromeos6-row2-rack8-host4/debug

Comment 3 by nxia@chromium.org, Apr 22 2017

Summary: Tests failed because of ssh connection failures and rsync failures (was: Test failures caused by unstable ssh connections and rsync failures)
https://luci-milo.appspot.com/buildbot/chromeos/x86-zgb-paladin/9662

Comment 4 by nxia@chromium.org, Apr 22 2017

Summary: Tests failed because of network connection issues (ssh connection failures, rsync failures, remote command failures) (was: Tests failed because of ssh connection failures and rsync failures)
I'm going to merge different bugs into this one. They were all caused by network connectivity issues, but in different symptoms: ssh failures, remote command failures, rsync failures, etc.

Comment 5 by nxia@chromium.org, Apr 22 2017

 Issue 713535  has been merged into this issue.

Comment 6 by nxia@chromium.org, Apr 22 2017

Summary: Tests failed or aborted because of network connection issues (ssh connection failures, rsync failures, remote command failures) (was: Tests failed because of network connection issues (ssh connection failures, rsync failures, remote command failures))

Comment 7 by nxia@chromium.org, Apr 22 2017

Cc: tbroch@chromium.org zhihongyu@chromium.org owenlin@chromium.org
 Issue 713011  has been merged into this issue.

Comment 8 by nxia@chromium.org, Apr 22 2017

 Issue 714275  has been merged into this issue.

Comment 9 by nxia@chromium.org, Apr 22 2017

Cc: jrbarnette@chromium.org
 Issue 713845  has been merged into this issue.

Comment 10 by nxia@chromium.org, Apr 24 2017

 Issue 714286  has been merged into this issue.

Comment 11 by aut...@google.com, Apr 25 2017

Owner: nxia@chromium.org

Comment 12 by nxia@chromium.org, May 16 2017

Cc: akes...@chromium.org
 Issue 713825  has been merged into this issue.

Comment 13 by nxia@chromium.org, May 16 2017

Status: Fixed (was: Untriaged)
this is believed to be fixed, please see more details in the postmortem http://shortn/_6ecl5t5CQ7 

Comment 14 by nxia@chromium.org, Jun 20 2017

 Issue 714252  has been merged into this issue.
Labels: VerifyIn-61

Comment 16 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment