New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 788456 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

All board:reef model:reef pool:suites devices in Repair Failed state

Project Member Reported by jclinton@chromium.org, Nov 24 2017

Issue description

While researching other issues, I noticed that all 6/6 board:reef model:reef pool:suites devices are in the Repair Failed state.

Is there alerting for this?


 
Cc: jkop@chromium.org
We don't have alerting for unhealthy pools, though that's a direction I want to investigate. +jkop is building dashboards and metrics from the pool balancer.

We do have DUT utilization dashboards, that are filterable by board and pool. e,g  https://viceroy.corp.google.com/chromeos/dut_utilization?board=reef&pool=managed%3Asuites&status=Running&topstreams=5&duration=1d&mdb_role=chrome-infra&refresh=-1
Cc: pprabhu@chromium.org
Current dut-status snapshot suggests that the pool has several non-broken DUTs.

$ dut-status -p suites -b reef
hostname                       S   last checked         URL
chromeos6-row4-rack9-host15    NO  2017-11-27 05:11:10  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host15/61854797-repair/
chromeos6-row4-rack9-host8     OK  2017-11-27 11:33:26  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host8/18945-reset/
chromeos6-row4-rack9-host6     OK  2017-11-27 11:33:52  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host6/18950-reset/
chromeos6-row4-rack9-host4     OK  2017-11-27 11:34:01  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host4/18951-reset/
chromeos6-row4-rack9-host2     OK  2017-11-27 11:32:22  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host2/18944-reset/
chromeos6-row4-rack9-host14    OK  2017-11-27 11:33:32  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host14/18947-reset/
chromeos6-row4-rack10-host15   OK  2017-11-27 05:35:45  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host15/18716-verify/
chromeos6-row4-rack10-host19   NO  2017-11-27 06:55:00  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host19/61855096-repair/
chromeos6-row3-rack10-host15   OK  2017-11-27 05:35:45  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack10-host15/18719-verify/
chromeos6-row4-rack10-host14   NO  2017-11-27 10:41:41  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host14/61855776-repair/
chromeos6-row4-rack10-host20   NO  2017-11-27 09:51:20  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host20/61855464-repair/
chromeos6-row4-rack9-host18    OK  2017-11-27 05:35:45  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host18/18725-verify/
chromeos6-row4-rack9-host16    NO  2017-11-27 01:02:44  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host16/61854370-repair/
chromeos6-row4-rack9-host22    NO  2017-11-27 05:11:10  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host22/61854805-repair/
chromeos6-row3-rack12-host7    OK  2017-11-27 05:35:45  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack12-host7/18735-verify/


There are also 15 DUTs rather than 6.

Are we missing the model:reef label on some of them still?
If they haven't been repaired / provisioned successfully in the last two weeks, they wouldn't get the model:reef label.

Status: Fixed (was: Untriaged)
Spot checked chromeos6-row4-rack10-host15 , seems to be working, in pool suites, and possesses the model label.

OP how did you determine that 6/6 were broken? Absent other evidence, marking this as fixed.
Labels: -Pri-1 Pri-2
Status: Assigned (was: Fixed)
http://cautotest/afe/#tab_id=hosts board:reef pool:suites shows 15 now instead of 6 when I reported this. But, there are still 6 in the failed repair state.

Owner: xixuan@chromium.org
-> deputy

Comment 7 by xixuan@chromium.org, Nov 29 2017

Only 2 DUTs are in repair failed state. Send them to lab for fixing:
https://b.corp.google.com/issues/69916339
Cc: -akes...@chromium.org -pprabhu@chromium.org dgarr...@chromium.org
Owner: pho...@chromium.org
Pass to current deputy & secondary: b/69874059 is filed to fix these DUTs, however I just check, there're another bunch of reef suites DUTs in repair failed (updated on that bug):

chromeos6-row4-rack9-host17    NO  2017-12-07 10:49:12  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host17/79404-repair/
chromeos6-row4-rack10-host19   NO  2017-12-07 10:49:12  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host19/79405-repair/
chromeos6-row4-rack10-host18   NO  2017-12-07 10:49:12  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack10-host18/79406-repair/
chromeos6-row4-rack9-host16    ??  2017-11-30 10:48:57  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host16/37812-reset/
chromeos6-row4-rack9-host22    NO  2017-12-07 09:17:02  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack9-host22/61899707-repair/

Comment 9 by pho...@chromium.org, Dec 18 2017

Cc: pprabhu@chromium.org
Owner: jrbarnette@chromium.org
pass to current deputy, +secondary
Status: WontFix (was: Assigned)
Reef pools aren't an issue at this time.

Sign in to add a comment