provision timeout during rsync |
||||
Issue descriptionExample build: Provision Job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/79055769-chromeos-test/ Reading from the sysinfo/CrOS... update logs, it seems like the first update timed out during initial rsync 2016/10/03 01:18:03.007 INFO | auto_updater:0376| Copying rootfs payload to device... 2016/10/03 01:18:03.008 DEBUG| cros_build_lib:0565| RunCommand: rsync --perms --verbose --times --compress --omit-dir-times --exclude .svn --links --rsync-path 'PATH=/usr/local/bin:/usr/local/sbin:$PATH rsync' --recursive --rsh 'ssh -p 22 -oConnectionAttempts=4 -oUserKnownHostsFile=/dev/null -oProtocol=2 -oConnectTimeout=30 -oServerAliveCountMax=3 -oStrictHostKeyChecking=no -oServerAliveInterval=10 -oNumberOfPasswordPrompts=0 -oIdentitiesOnly=yes -i /tmp/ssh-tmp9g0JnE/testing_rsa' /home/chromeos-test/images/link-paladin/R55-8858.0.0-rc3/update.gz '[root@chromeos4-row10-rack4-host7]:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.PpluF9lWif/static/pregenerated' Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M sending incremental file list update.gz rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(632) [sender=3.1.0] rsync: [sender] write error: Broken pipe (32) 2016/10/03 01:44:43.502 DEBUG| retry_util:0108| Encounter unexpected exception <class 'chromite.lib.timeout_util.TimeoutError'>(Timeout occurred- waited 1800 seconds.), will not retry. 2016/10/03 01:44:43.503 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp9g0JnE/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.PpluF9lWif 2016/10/03 01:44:44.132 DEBUG| cros_update:0224| Error happens in CrOS auto-update: TimeoutError('Timeout occurred- waited 1800 seconds.',) Then the second attempt also timed out during an rsync step 2016/10/03 02:15:16.889 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -f /var/run/update_engine_autoupdate_completed Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M 2016/10/03 02:15:17.228 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- stop ui Warning: Permanently added 'chromeos4-row10-rack4-host7,100.115.200.239' (RSA) to the list of known hosts.^M 2016/10/03 02:15:18.712 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpEX7_x8/testing_rsa root@chromeos4-row10-rack4-host7 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.FC9SiTaBuy 2016/10/03 02:15:19.071 DEBUG| cros_update:0224| Error happens in CrOS auto-update: TimeoutError('Timeout occurred- waited 1800 seconds.',) Both of those timeouts are, I believe, 30 minute timeouts relative to the start of that provision attempt. The provision DEBUG log also has lots of warnings about retried SSH commands. These taken together suggest that the devserver was network-overloaded at the time of this provision job.
,
Oct 3 2016
The devserver in question was 100.115.219.132 though I'm not sure how to determine it's hostname from that.
,
Oct 3 2016
Myriad provision timeouts were also visible in https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_speedy-paladin/builds/3273 in the same CQ run.
,
Oct 12 2016
Xixuan - can you take a look?
,
Nov 21 2016
,
Mar 10 2018
This bug has not been touched in over a year. It is probably no longer relevant. |
||||
►
Sign in to add a comment |
||||
Comment 1 by akes...@chromium.org
, Oct 3 2016