Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users
Status: Archived
Closed: Dec 2016
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Sign in to add a comment
No machine available in moblab cq pool
Project Member Reported by, Dec 5 2016 Back to list
+sheriff / Infradeputy

From log in

1 machine is locked and another 3 are in Repair Failed state.

Attempting to display pool info: cq
host: chromeos2-row1-rack8-host1, status: Ready, locked: True diagnosis: Unused
host: chromeos2-row1-rack8-host3, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host5, status: Repair Failed, locked: False diagnosis: Failed repair
Reason: Some test(s) was aborted before running, suite must have timed out.
Status: Assigned
From the deputy e-mail earlier today, I see that both the CQ and BVT
pools are in a similar state:

Status for pool:bvt, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
link                       1     0     7     8

Status for pool:cq, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
whirlwind                  1     0     7     8

Comment 3 by, Dec 5 2016
Issue 671287 has been merged into this issue.
Looking at the CQ pool, all four Moblab hosts are offline
(no answer to ping).

Looking at the repair logs, this isn't showing up.  So, we seem
to have two problems:
  * The Moblab instances are broken in various ways.
  * Repair isn't properly reporting the problems.

Holding just this one bug (for now) while I sort out what's
really going on.

All four Moblab instances went offline in sequence after provisioning.
It looks like there may be a bad build.  All four BVT instances have
a problem, too, so the problem could be ToT (not a bad CL).

$ dut-status -b guado_moblab -p cq -g
    2016-12-05 11:17:46  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170275-repair/
    2016-12-05 11:13:10  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170272-provision/
    2016-11-08 15:42:21  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/84729938-chromeos-test/
    2016-11-08 15:30:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/116355-provision/
    2016-12-03 20:24:30  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166260-repair/
    2016-12-03 20:19:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166246-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/162908-verify/
    2016-12-03 17:32:19  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166116-repair/
    2016-12-03 17:27:41  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166114-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/162909-verify/
    2016-12-03 10:18:23  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165561-repair/
    2016-12-03 10:13:57  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165558-provision/
    2016-12-03 04:30:55  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/88725551-chromeos-test/
    2016-12-03 04:16:44  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165080-provision/

Looking at the logs, it seems all moblab instances were offline at the
start of provisioning.  So, something happened in the lab (or at least
on the DUTs) in between testing.

I've filed ticket b2/33346512 to request repair and diagnosis of all eight
moblab instances

Labels: -Pri-1 Pri-0
CQ is blocked by this.
Status: Fixed
Most recent moblab lpaladin passed. This was probably fixed by b/33346512
Status: Untriaged
only one build success.

builds fails on 4432,4431, 4430, 4429
Status: Fixed
success on 4433 
Comment 12 by, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Comment 16 by, Oct 14 (4 days ago)
Status: Archived
Sign in to add a comment