test_push: repair failing because DUT doesn't have rsync after powerwash |
|||||
Issue descriptionThe failure is here: http://chromeos-shard2-staging.hot.corp.google.com/results/hosts/chromeos4-row10-rack9-host15/577-repair/ Looking at the autoupdate logs: 2017/03/14 14:34:54.187 INFO | auto_updater:0489| Copying devserver package to device... 2017/03/14 14:34:54.225 DEBUG| cros_build_lib:0564| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmp_iwJhE/testing_rsa root@100.115.201.101 -- mkdir -p /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.f4V7F9rqc1 Warning: Permanently added '100.115.201.101' (RSA) to the list of known hosts. 2017/03/14 14:34:54.411 DEBUG| cros_build_lib:0564| RunCommand: rsync --perms --verbose --times --compress --omit-dir-times --exclude .svn --links --rsync-path 'PATH=/usr/local/bin:/usr/local/sbin:$PATH rsync' --recursive --rsh 'ssh -p 22 -oConnectionAttempts=4 -oUserKnownHostsFile=/dev/null -oProtocol=2 -oConnectTimeout=30 -oServerAliveCountMax=3 -oStrictHostKeyChecking=no -oServerAliveInterval=10 -oNumberOfPasswordPrompts=0 -oIdentitiesOnly=yes -i /tmp/ssh-tmp_iwJhE/testing_rsa' /tmp/cros-update_100.115.201.101_23625/src '[root@100.115.201.101]:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.f4V7F9rqc1/' Warning: Permanently added '100.115.201.101' (RSA) to the list of known hosts. bash: rsync: command not found rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: remote command not found (code 127) at io.c(226) [sender=3.1.0] It looks like rsync wasn't available on the remote My first guess is that we changed this behaviour here: https://chromium-review.googlesource.com/c/450886/9/lib/remote_access.py#704 Maybe we were falling back to scp in this case because mode=None and the self.HasRsync() would return False (it checks if remote has rsync) I think the right thing to do here is to check self.HasRsync even when mode=rsync and fallback to scp as before. Assigning to ihf@ (his CL), adding deputy to CC because this may be hapenning in prod as well.
,
Mar 14 2017
You can check the repair logs from here: http://chromeos-shard2-staging.hot.corp.google.com/afe/#tab_id=view_host&object_id=2
,
Mar 14 2017
Updated (permanent) link to logs: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row10-rack9-host15/581-repair/20171403143726/ 03/14 14:41:12.336 ERROR| repair:0449| Repair failed: Powerwash and then re-install the stable build via AU Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 447, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 349, in repair super(PowerWashRepair, self).repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 330, in repair afe_utils.machine_install_and_update_labels(host, repair=True) File "/usr/local/autotest/server/afe_utils.py", line 206, in machine_install_and_update_labels *args, **dargs) File "/usr/local/autotest/server/hosts/cros_host.py", line 742, in machine_install_by_devserver force_update=force_update, full_update=force_full_update) File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2087, in auto_update raise DevServerException(error_msg % (host_name, error_list[0])) DevServerException: CrOS auto-update failed for host chromeos4-row10-rack9-host15: Could not copy /tmp/cros-update_chromeos4-row10-rack9-host15_18021/src to device.
,
Mar 14 2017
chromeos-shard2-staging does have rsync. You are saying some lxc base images don't have rsync? Yes, I did explicitly request rsync for text file transfers, which had the leisure to use scp before. I can change it so we fall back again to scp even though we request rsync.
,
Mar 14 2017
Logs say this happens in TransferDevServerPackage(). I will change CopyToDevice() to fall back.
,
Mar 14 2017
It's devserver which kicks off the rsync. So the server that has problem with rsync is 100.115.219.132. The output of rsync is: rsync version 3.1.0 protocol version 31 Copyright (C) 1996-2013 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ ... ... --read-batch=FILE read a batched update from FILE --protocol=NUM force an older protocol version to be used --iconv=CONVERT_SPEC request charset conversion of filenames --checksum-seed=NUM set block/file checksum seed (advanced) -4, --ipv4 prefer IPv4 -6, --ipv6 prefer IPv6 --version print version number (-h) --help show this help (-h is --help only if used alone) Use "rsync --daemon --help" to see the daemon-mode command-line options. Please see the rsync(1) and rsyncd.conf(5) man pages for full documentation. See http://rsync.samba.org/ for updates, bug reports, and answers rsync error: syntax or usage error (code 1) at main.c(1556) [client=3.1.0]
,
Mar 14 2017
Re #4: I'm saying that the DUT doesn't have rsync. And indeed: pprabhu@pprabhu:~$ ssh root@chromeos4-row10-rack9-host15.cros The authenticity of host 'chromeos4-row10-rack9-host15.cros.corp.google.com (100.115.201.101)' can't be established. RSA key fingerprint is SHA256:EBc8rsAhukHwitqKZ2EO/ucnmRrrCJQUPtQTAsmB9eo. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'chromeos4-row10-rack9-host15.cros.corp.google.com,100.115.201.101' (RSA) to the list of known hosts. localhost ~ # rsync -bash: rsync: command not found localhost ~ # sudo rsync sudo: rsync: command not found This happened just after a powerwash test. This is expected since rsync is part of the test image and powerwash blows away stateful (where rsync is installed). scp is always present, and we should always fallback to it.
,
Mar 14 2017
Yes, understood. https://chromium-review.googlesource.com/#/c/455237/
,
Mar 14 2017
,
Mar 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/3398d1f358e6c3810d3026032986075d7fd97d80 commit 3398d1f358e6c3810d3026032986075d7fd97d80 Author: Ilja H. Friedel <ihf@chromium.org> Date: Tue Mar 14 22:23:02 2017 remote_access: check if rsync is on device. Use scp as default to copy to device. Use rsync as default to copy from device. For all rsync usage check if it exists on device. BUG= chromium:701553 TEST=pylint Change-Id: Ic50c8fc70b32c4a9b1cd0979e630458ba51b6f7d Reviewed-on: https://chromium-review.googlesource.com/455237 Reviewed-by: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Ilja H. Friedel <ihf@chromium.org> [modify] https://crrev.com/3398d1f358e6c3810d3026032986075d7fd97d80/lib/remote_access.py
,
Mar 14 2017
As for unittest request. Do we have a mock for the state of a dut? The problem here was that nobody ever called the function with parameter 'rsync', even though it announced it. So we effectively hit this with an integration test, but that was too late. Notice that xixuan also was worried about using scp, not about using rsync. Maybe the logic cleanup + comment in my change in #8 makes the situation more obvious?
,
Mar 14 2017
Can you push this change? I need to go into the lab right now and will be away from the computer for a bit.
,
Mar 14 2017
I will start the push right now.
,
Mar 15 2017
@xixuan: This change is needed on the devserver. - test_push doesn't test devserver because we don't actually have a staging devserver at all. - devservers run cros/master code, so we don't need an update to the cros/prod branch for this to go live. So, you can just update the devservers without waiting for a test_push.
,
Mar 15 2017
yep I always forget to tell devserver push/normal push :( starting devserver push.
,
Mar 15 2017
Thank you!
,
Mar 15 2017
Pushed. Verified 100.115.219.132 and it has the changes.
,
Mar 15 2017
Yes, looking good now on chromeos4-devserver3/100.115.219.131 as well. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by ihf@chromium.org
, Mar 14 2017