New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 652198 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

provision timeout during rsync

Project Member Reported by akes...@chromium.org, Oct 3 2016

Issue description

Example build:


Provision Job:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/79055769-chromeos-test/


Reading from the sysinfo/CrOS... update logs, it seems like the first update timed out during initial rsync

2016/10/03 01:18:03.007 INFO |      auto_updater:0376| Copying rootfs payload to device...
2016/10/03 01:18:03.008 DEBUG|    cros_build_lib:0565| RunCommand: rsync --perms --verbose --times --compress --omit-dir-times --exclude .svn --links --rsync-path 'PATH=/usr/local/bin:/usr/local/sbin:$PATH  rsync' --recursive --rsh 'ssh -p 22 -oConnectionAttempts=4 -oUserKnownHostsFile=/dev/null -oProtocol=2 -oConnectTimeout=30 -oServerAliveCountMax=3 -oStrictHostKeyChecking=no -oServerAliveInterval=10 -oNumberOfPasswordPrompts=0 -oIdentitiesOnly=yes -i /tmp/ssh-tmp9g0JnE/testing_rsa' /home/chromeos-test/images/link-paladin/R55-8858.0.0-rc3/update.gz '[root@chromeos4-row10-rack4-host7]:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.PpluF9lWif/static/pregenerated'
Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M
sending incremental file list
update.gz
rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(632) [sender=3.1.0]
rsync: [sender] write error: Broken pipe (32)
2016/10/03 01:44:43.502 DEBUG|        retry_util:0108| Encounter unexpected exception <class 'chromite.lib.timeout_util.TimeoutError'>(Timeout occurred- waited 1800 seconds.), will not retry.
2016/10/03 01:44:43.503 DEBUG|    cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp9g0JnE/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.PpluF9lWif
2016/10/03 01:44:44.132 DEBUG|       cros_update:0224| Error happens in CrOS auto-update: TimeoutError('Timeout occurred- waited 1800 seconds.',)


Then the second attempt also timed out during an rsync step

2016/10/03 02:15:16.889 DEBUG|    cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -f /var/run/update_engine_autoupdate_completed
Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M
2016/10/03 02:15:17.228 DEBUG|    cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- stop ui
Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M
2016/10/03 02:15:18.712 DEBUG|    cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.FC9SiTaBuy
2016/10/03 02:15:19.071 DEBUG|       cros_update:0224| Error happens in CrOS auto-update: TimeoutError('Timeout occurred- waited 1800 seconds.',)


Both of those timeouts are, I believe, 30 minute timeouts relative to the start of that provision attempt.


The provision DEBUG log also has lots of warnings about retried SSH commands. These taken together suggest that the devserver was network-overloaded at the time of this provision job.
 
Cc: jrbarnette@chromium.org cywang@chromium.org kevcheng@chromium.org ayatane@chromium.org
+ayatane do we have sysmon metrics about devservers yet?
The devserver in question was 100.115.219.132 though I'm not sure how to determine it's hostname from that.
Myriad provision timeouts were also visible in https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_speedy-paladin/builds/3273 in the same CQ run.

Comment 4 by autumn@chromium.org, Oct 12 2016

Labels: -current-issue
Owner: xixuan@chromium.org
Xixuan - can you take a look?
Cc: semenzato@chromium.org
Status: Archived (was: Untriaged)
This bug has not been touched in over a year.  It is probably no longer relevant.

Sign in to add a comment