New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 870787 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Dec 12
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Sign in to add a comment

samus-release: key_verify failed for server_host_key

Project Member Reported by, Aug 3

Issue description

The samus-release builder has been consistently failing. One example of a failing build:

It's a little difficult to find the actual failure, but one message I've seen consistently in the errors of the last three builds is:

key_verify failed for server_host_key

One set of potentially interesting looking errors is here:

In my searches I only see this message produced as a result of SSH. Is it possible that the DUT SSHes out at times during the build, and it's got some sort of out of date set of keys for the hosts it connects to?
Justin and I have been playing with this machine. For both of us, it failed on the first SSH connection to it, and then succeeded on all subsequent connections:

ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa root@chromeos4-row12-rack6-host3.cros echo hi
ssh_dispatch_run_fatal: Connection to port 22: incorrect signature
ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa root@chromeos4-row12-rack6-host3.cros echo hi

I asked someone else to try SSHing in, but it succeeded for them on the first try.
This seems likely to be a result of a stale testing_rsa, which is updated when you use it. This would be consistent with some but not all people having it work immediately. Testing locally, I don't have a file in the specified testing_rsa location, but it used another copy of the file I do have.
I don't see any invocation of ssh in the test being run there, though it is an autoupdate test so I would not be surprised if there is one a couple more archaelogical digs down from where I checked. I'll look into it.
jrbarnette@ suggests that this could be caused by a powerwash invalidating the credentials, because the DUT's identity changes
When Justin an I investigated, we both logged onto the same machine. The sequence was something like:

1) Justin fail
2) Justin succeed
3) Justin succeed
...a minute passes. Reboot is possible, but I don't think we rebooted.
4) Evan fail
5) Evan succeed
6) Evan succeed
7) Guenter succeed (I asked someone else to try it just to see if it was the first time for everyone).

Also, IIRC from looking at the build history this happens at random/unpredictable points throughout the build.

The reboot would be affecting the machine sending the request (Justin/Evan/Guenter), not the one being requested to.

However, this problem seems to be much newer than the changes that introduced the powerwashing to this test (which were added in May) and this problem has only surfaced in July and later.
Labels: -Pri-3 Chase-Pending Pri-1
Status: Available (was: Untriaged)
My understanding is that this blocks release of new versions for samus. Is that correct? If so, the priority bump and Chase seem appropriate.
Labels: -Pri-1 -Chase-Pending Pri-2
Seems confined to certain DUTs or certain time window, removing from chase queue.
Status: Fixed (was: Available)
I filed b/120932045 to get chromeos4-row12-rack6-host3  replaced.

Sign in to add a comment