samus-release: key_verify failed for server_host_key
Project Member Reported by email@example.com, Aug 3
Justin and I have been playing with this machine. For both of us, it failed on the first SSH connection to it, and then succeeded on all subsequent connections: ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa firstname.lastname@example.org echo hi ssh_dispatch_run_fatal: Connection to 100.115.203.7 port 22: incorrect signature ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa email@example.com echo hi hi I asked someone else to try SSHing in, but it succeeded for them on the first try.
This seems likely to be a result of a stale testing_rsa, which is updated when you use it. This would be consistent with some but not all people having it work immediately. Testing locally, I don't have a file in the specified testing_rsa location, but it used another copy of the file I do have.
I don't see any invocation of ssh in the test being run there, though it is an autoupdate test so I would not be surprised if there is one a couple more archaelogical digs down from where I checked. I'll look into it.
jrbarnette@ suggests that this could be caused by a powerwash invalidating the credentials, because the DUT's identity changes
When Justin an I investigated, we both logged onto the same machine. The sequence was something like: 1) Justin fail 2) Justin succeed 3) Justin succeed ...a minute passes. Reboot is possible, but I don't think we rebooted. 4) Evan fail 5) Evan succeed 6) Evan succeed 7) Guenter succeed (I asked someone else to try it just to see if it was the first time for everyone). Also, IIRC from looking at the build history this happens at random/unpredictable points throughout the build.
The reboot would be affecting the machine sending the request (Justin/Evan/Guenter), not the one being requested to. However, this problem seems to be much newer than the changes that introduced the powerwashing to this test (which were added in May) and this problem has only surfaced in July and later.
My understanding is that this blocks release of new versions for samus. Is that correct? If so, the priority bump and Chase seem appropriate.
Seems confined to certain DUTs or certain time window, removing from chase queue.
This failure blocked at least on PFQ run: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=231387884 Quite a few tests looked failing in the past due to this, but all of them seem to be from the same single DUT "chromeos4-row12-rack6-host3" https://stainless.corp.google.com/search?view=list&first_date=2018-08-14&last_date=2018-08-28&suite=bvt&board=samus&status=GOOD&status=WARN&status=FAIL&status=ERROR&reason=AutoservRunError%3A+command+execution+error&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false (click Columns => Host)
I filed b/120932045 to get chromeos4-row12-rack6-host3 replaced.
Sign in to add a comment