New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742041 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Many *-paladin HWTests fails to provision by ssh timeout

Project Member Reported by oka@chromium.org, Jul 13 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/15360

Sample log:
veyron_mighty:
https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fveyron_mighty-paladin%2F5906%2F%2B%2Frecipes%2Fsteps%2FHWTest__bvt-inline_%2F0%2Fstdout

Here is excerpt
3771435 Provision started on: 2017-07-12 21:06:11 status FAIL
3771185 Repair started on: 2017-07-12 20:29:51 status PASS
3770921 Provision started on: 2017-07-12 19:47:53 status FAIL
ERROR:root:host: chromeos4-row6-rack11-host17, status: Repairing, locked: False diagnosis: Working
labels: ['bluetooth', 'ec:cros', 'veyron_mighty', 'board:veyron_mighty', 'pool:cq', 'audio_loopback_dongle', 'cts_abi_arm', 'internal_display', 'os:cros', 'power:battery', 'storage:mmc', 'servo', 'arc', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'webcam', 'hw_video_acc_enc_vp8', 'sku:mighty_rk3288_2Gb', 'variant:mighty', 'phase:PVT2', 'touchpad']
Last 10 jobs within 1:48:00:
3771411 Provision started on: 2017-07-12 21:04:59 status FAIL
3771182 Repair started on: 2017-07-12 20:29:40 status PASS
3770917 Provision started on: 2017-07-12 19:47:50 status FAIL
INFO:root:Reason: Suite job failed or provisioning failed.
INFO:root:
 07-12-2017 [21:23:50] Output below this line is for buildbot consumption:
INFO:root:@@@STEP_LINK@[Test-Logs]: Suite job: ABORT@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283245-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Flake-Dashboard]: Suite job@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=Suite job@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host17 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283320-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host8: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host8 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283322-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row4-rack13-host1: SSHConnectionError: ssh: connect to host chromeos4-row4-rack13-host1 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283323-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host14: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host14 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283325-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row4-rack13-host15: SSHConnectionError: ssh: connect to host chromeos4-row4-rack13-host15 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283327-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host6: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host6 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283329-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host17: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host17 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283330-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host21: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host21 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283332-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack10-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack10-host4 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283334-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host2: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host2 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283336-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host7: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host7 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283338-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host4: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host4 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283340-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host20: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host20 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283342-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host9: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host9 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283344-chromeos-test/@@@
INFO:root:@@@STEP_LINK@[Test-Logs]: provision: FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row6-rack11-host5: SSHConnectionError: ssh: connect to host chromeos4-row6-rack11-host5 port 22: Connection timed out@http://cautotest/tko/retrieve_logs.cgi?job=/results/128283346-chromeos-test/@@@
INFO:root:Will return from run_suite with status: INFRA_FAILURE
 

Comment 1 by oka@chromium.org, Jul 13 2017

Components: Infra>Client>ChromeOS

Comment 2 by oka@chromium.org, Jul 13 2017

Cc: sergeybe...@chromium.org estaab@chromium.org

Comment 3 by xixuan@chromium.org, Jul 13 2017

Owner: skau@chromium.org
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15357
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/15360

two masters caused large scale of 'DUT cannot reboot' failure, and later this error is gone, leave some broken peach_pit DUTs. Any suspicious CL? I mark 2 of them -1, seems they're not the culprit since it happened twice.

I mark peach_pit as experimental until lab team move them back to lab with servos. Now they don't have any servos so if they're broken, they're broken.
FWIW, I restarted chromeos master yesterday around 6:30pm PDT (draining started around 5:20pm): https://crbug.com/739890

Note that pin bump was done 2 days prior: https://crbug.com/739890#c3

Comment 6 by xixuan@chromium.org, Jul 13 2017

Labels: -Pri-1 Pri-0
board like elm, cave failed due to the same provision issue again. Raise to P0. Tree is throttled. 

Comment 7 by skau@chromium.org, Jul 13 2017

Issue looks to have subsided.  No idea what the route cause is.

peach_pit boards are still down.

Comment 8 by xixuan@chromium.org, Jul 13 2017

Labels: -Pri-0 Pri-1
We tend to think it's https://chromium-review.googlesource.com/c/565632/ that causes such errors.

peach_pit DUTs are under fixing: b/63642466.

Comment 9 by skau@chromium.org, Jul 14 2017

a handful of hana boards haven't recovered.  Request for fix:
b/63682227

Comment 10 by skau@chromium.org, Jul 14 2017

Issue 742561 has been merged into this issue.
Status: Fixed (was: Untriaged)
Let's make this bug focus on the 'ssh connection error', which has already been confirmed is the bad CL:565632. So mark this as fixed.

Hana is always experimental, its DUTs shortage will be tracked on b/63682227.

Sign in to add a comment