New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

samus-release: key_verify failed for server_host_key

Project Member Reported by evgreen@chromium.org, Aug 3

Issue description

The samus-release builder has been consistently failing. One example of a failing build:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939220612054347072

It's a little difficult to find the actual failure, but one message I've seen consistently in the errors of the last three builds is:

key_verify failed for server_host_key

One set of potentially interesting looking errors is here:
https://00e9e64bacac34fc7dd480a8aa424978171798c51996e4765c-apidata.googleusercontent.com/download/storage/v1/b/chromeos-autotest-results/o/223526513-chromeos-test%2Fchromeos4-row12-rack6-host3%2Fdebug%2Fautoserv.ERROR?qk=AD5uMEsNr_GwWJU3Utx6zyCZcCjMQEdUyGCNmHOTLDcQZlrLc2JAogjgcqJ78QvMxecEl0DCMFGGV3vKuYwiyRK59hBCaM-r53c_UmEK_w10x7FNwYM9UW2aETUDdb-ogL6B89b319F_kmdB4i6E7HJBztsF8i7Lj0fZtlc9IOL8QTMRo5SyXXtyBY3GVAsAdUtLGkqbFSHnoJbgXe1A5ehWV4XleBF5oVfMsG26nebBzZ1ofUr9A58wCeYJDwdR4VgdLPRlMYeDpP9ub8kHW1N4bXASdWtXPKHaGVDZ6GAjm685YCJcJxshnytmFGB21nYEpuANfw0EZEH8fCPFSyzQJjcc4I8qXE7jA9g6K0Etpgp3d6kRI2Ob8NxazNKPRuKu57u3Uj_NEIz4Q19fqu7lb0DTb_vc8gywkNmgA8bKfONVJH70gYYTnME5OCGXrrElILnyGPXhGUl6PbQXeeh5Y9ZfgQV7VmfkCSpVi9_BlrqpTDYDCnjnPq39Yu6kXm64MSc9xatD9vDquNWIKUv4MPI4Y5gXySn5jjfNV6JQ-P_1tn2eFHDhMNbmG9skyXjSQ-dq_v67KOxhzfXd93u_DN_a4DDLhOK0uhjeMwe9L3ihEbyRCkyVS-WO8dcYtksg5cvR5wCgKREH1LOkpocokKHuv4J_OMq9GFReV4Gxbt-FBK8XEmUbLB74JMQi25ZkHeFglSajtNjwhgHwMYNMDO9d6Gt811A5dObv1MVUTrdyra7g63kyuRq4C7E7qzIHwRvzlisfPY_MkH6gZ9eTJ0drfLE0hqpXydIMqM4NcK77aU36yfES7yVbOAZXvkhM2sTJ_Z1c_q9y07oOa5AcHFV7wrmseQ

In my searches I only see this message produced as a result of SSH. Is it possible that the DUT SSHes out at times during the build, and it's got some sort of out of date set of keys for the hosts it connects to?
 
Justin and I have been playing with this machine. For both of us, it failed on the first SSH connection to it, and then succeeded on all subsequent connections:

ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa root@chromeos4-row12-rack6-host3.cros echo hi
ssh_dispatch_run_fatal: Connection to 100.115.203.7 port 22: incorrect signature
ssh -i ~/cros2/src/scripts/mod_for_test_scripts/ssh_keys/testing_rsa root@chromeos4-row12-rack6-host3.cros echo hi
hi

I asked someone else to try SSHing in, but it succeeded for them on the first try.
This seems likely to be a result of a stale testing_rsa, which is updated when you use it. This would be consistent with some but not all people having it work immediately. Testing locally, I don't have a file in the specified testing_rsa location, but it used another copy of the file I do have.
I don't see any invocation of ssh in the test being run there, though it is an autoupdate test so I would not be surprised if there is one a couple more archaelogical digs down from where I checked. I'll look into it.
jrbarnette@ suggests that this could be caused by a powerwash invalidating the credentials, because the DUT's identity changes
When Justin an I investigated, we both logged onto the same machine. The sequence was something like:

1) Justin fail
2) Justin succeed
3) Justin succeed
...a minute passes. Reboot is possible, but I don't think we rebooted.
4) Evan fail
5) Evan succeed
6) Evan succeed
7) Guenter succeed (I asked someone else to try it just to see if it was the first time for everyone).

Also, IIRC from looking at the build history this happens at random/unpredictable points throughout the build.


The reboot would be affecting the machine sending the request (Justin/Evan/Guenter), not the one being requested to.

However, this problem seems to be much newer than the changes that introduced the powerwashing to this test (which were added in May) and this problem has only surfaced in July and later.
Labels: -Pri-3 Chase-Pending Pri-1
Status: Available (was: Untriaged)
My understanding is that this blocks release of new versions for samus. Is that correct? If so, the priority bump and Chase seem appropriate.

Comment 8 by akes...@chromium.org, Aug 13 (2 days ago)

Labels: -Pri-1 -Chase-Pending Pri-2
Seems confined to certain DUTs or certain time window, removing from chase queue.

Sign in to add a comment