New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users

Issue metadata

Status: Archived
Closed: Dec 2016
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Sign in to add a comment

No machine available in moblab cq pool

Project Member Reported by, Dec 5 2016 Back to list

Issue description

+sheriff / Infradeputy

From log in

1 machine is locked and another 3 are in Repair Failed state.

Attempting to display pool info: cq
host: chromeos2-row1-rack8-host1, status: Ready, locked: True diagnosis: Unused
host: chromeos2-row1-rack8-host3, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host5, status: Repair Failed, locked: False diagnosis: Failed repair
Reason: Some test(s) was aborted before running, suite must have timed out.
Status: Assigned
From the deputy e-mail earlier today, I see that both the CQ and BVT
pools are in a similar state:

Status for pool:bvt, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
link                       1     0     7     8

Status for pool:cq, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
whirlwind                  1     0     7     8

Comment 3 by, Dec 5 2016

Issue 671287 has been merged into this issue.
Looking at the CQ pool, all four Moblab hosts are offline
(no answer to ping).

Looking at the repair logs, this isn't showing up.  So, we seem
to have two problems:
  * The Moblab instances are broken in various ways.
  * Repair isn't properly reporting the problems.

Holding just this one bug (for now) while I sort out what's
really going on.

All four Moblab instances went offline in sequence after provisioning.
It looks like there may be a bad build.  All four BVT instances have
a problem, too, so the problem could be ToT (not a bad CL).

$ dut-status -b guado_moblab -p cq -g
    2016-12-05 11:17:46  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170275-repair/
    2016-12-05 11:13:10  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170272-provision/
    2016-11-08 15:42:21  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/84729938-chromeos-test/
    2016-11-08 15:30:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/116355-provision/
    2016-12-03 20:24:30  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166260-repair/
    2016-12-03 20:19:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166246-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/162908-verify/
    2016-12-03 17:32:19  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166116-repair/
    2016-12-03 17:27:41  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166114-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/162909-verify/
    2016-12-03 10:18:23  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165561-repair/
    2016-12-03 10:13:57  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165558-provision/
    2016-12-03 04:30:55  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/88725551-chromeos-test/
    2016-12-03 04:16:44  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165080-provision/

Looking at the logs, it seems all moblab instances were offline at the
start of provisioning.  So, something happened in the lab (or at least
on the DUTs) in between testing.

I've filed ticket b2/33346512 to request repair and diagnosis of all eight
moblab instances

Labels: -Pri-1 Pri-0
CQ is blocked by this.
Status: Fixed
Most recent moblab lpaladin passed. This was probably fixed by b/33346512
Status: Untriaged
only one build success.

builds fails on 4432,4431, 4430, 4429
Status: Fixed
success on 4433 

Comment 12 by, Mar 4 2017

Labels: VerifyIn-58

Comment 13 by, Apr 17 2017

Labels: VerifyIn-59

Comment 14 by, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61
Status: Archived

Sign in to add a comment