provision failure: 'ssh: connect to host <hostname> port 22: Connection refused' |
|||||
Issue descriptionA rash of failures on a veyron_mighty-paladin run: https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_mighty-paladin/builds/3332 One of the failures from that build: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/80122053-chromeos-test/chromeos4-row6-rack10-host17/sysinfo/ 2016/10/10 01:04:49.544 DEBUG| auto_updater:0903| Start post check for rootfs update... 2016/10/10 01:04:49.546 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpIhLbBB/testing_rsa root@chromeos4-row6-rack10-host17 -- rootdev -s 2016/10/10 01:04:49.891 DEBUG| auto_updater:0307| Current root device is /dev/mmcblk0p3 2016/10/10 01:04:49.893 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpIhLbBB/testing_rsa root@chromeos4-row6-rack10-host17 -- cgpt show -n -i 4 -P '$(rootdev -s -d)' 2016/10/10 01:04:50.258 DEBUG| cros_build_lib:0614| (stdout): 2 2016/10/10 01:04:50.258 DEBUG| cros_build_lib:0616| (stderr): Warning: Permanently added 'chromeos4-row6-rack10-host17,100.115.197.113' (RSA) to the list of known hosts. 2016/10/10 01:04:50.259 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpIhLbBB/testing_rsa root@chromeos4-row6-rack10-host17 -- cgpt show -n -i 2 -P '$(rootdev -s -d)' 2016/10/10 01:04:50.616 DEBUG| cros_build_lib:0614| (stdout): 1 2016/10/10 01:04:50.616 DEBUG| cros_build_lib:0616| (stderr): Warning: Permanently added 'chromeos4-row6-rack10-host17,100.115.197.113' (RSA) to the list of known hosts. ... ... ... 2016/10/10 01:12:57.037 INFO | remote_access:0371| Cannot connect to device; reboot in progress. 2016/10/10 01:12:57.037 ERROR| cros_build_lib:0660| Reboot has not completed after 480 seconds; giving up. 2016/10/10 01:12:57.038 DEBUG| cros_build_lib:0565| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpIhLbBB/testing_rsa root@chromeos4-row6-rack10-host17 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.dWjRcwmYjB 2016/10/10 01:13:00.111 ERROR| remote_access:0832| Error connecting to device chromeos4-row6-rack10-host17 2016/10/10 01:13:00.112 DEBUG| cros_update:0224| Error happens in CrOS auto-update: SSHConnectionError('ssh: connect to host chromeos4-row6-rack10-host17 port 22: Connection refused\r\n',)
,
Oct 15 2016
and on elm-paladin: https://uberchromegw.corp.google.com/i/chromeos/builders/elm-paladin/builds/789
,
Oct 15 2016
Hmm... looks like it hit all of the hwtest in this master run: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12578
,
Oct 15 2016
,
Oct 15 2016
And this master run as well: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12581
,
Oct 17 2016
Looks like there's some flakeness between devservers and DUTs at the first several hours on Oct 10. Hold this bug for tracking.
,
Oct 17 2016
assigning to Xixuan for now (reassign to someone else if needed).
,
Oct 19 2016
Actually I'm not sure whether it happens often. Maybe we can retry some specific commands in auto-update to avoid such network flakeness. But at present, we can hold this and see whether there're following more failures caused by network issue.
,
Oct 19 2016
,
Dec 9 2016
Looks like we don't face a large scale of failure as we hit at Oct 15. I close this bug for now. Feel free to re-open it. Seems that 'connection refused' is probably coming from network flakeness or authority issues, not the codes. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by kevcheng@chromium.org
, Oct 15 2016