Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users
Status: Archived
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
No machine available in moblab cq pool
Project Member Reported by puthik@chromium.org, Dec 5 2016 Back to list
+sheriff / Infradeputy

From log in https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin

1 machine is locked and another 3 are in Repair Failed state.


Attempting to display pool info: cq
host: chromeos2-row1-rack8-host1, status: Ready, locked: True diagnosis: Unused
host: chromeos2-row1-rack8-host3, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host5, status: Repair Failed, locked: False diagnosis: Failed repair
Reason: Some test(s) was aborted before running, suite must have timed out.
 
Cc: akes...@chromium.org
Owner: jrbarnette@chromium.org
Status: Assigned
From the deputy e-mail earlier today, I see that both the CQ and BVT
pools are in a similar state:

Status for pool:bvt, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
link                       1     0     7     8

Status for pool:cq, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
whirlwind                  1     0     7     8

Comment 3 by jinsong@google.com, Dec 5 2016
Issue 671287 has been merged into this issue.
Looking at the CQ pool, all four Moblab hosts are offline
(no answer to ping).

Looking at the repair logs, this isn't showing up.  So, we seem
to have two problems:
  * The Moblab instances are broken in various ways.
  * Repair isn't properly reporting the problems.

Holding just this one bug (for now) while I sort out what's
really going on.

All four Moblab instances went offline in sequence after provisioning.
It looks like there may be a bad build.  All four BVT instances have
a problem, too, so the problem could be ToT (not a bad CL).

$ dut-status -b guado_moblab -p cq -g
chromeos2-row1-rack8-host1
    2016-12-05 11:17:46  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170275-repair/
    2016-12-05 11:13:10  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170272-provision/
    2016-11-08 15:42:21  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/84729938-chromeos-test/
    2016-11-08 15:30:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/116355-provision/
chromeos2-row1-rack8-host3
    2016-12-03 20:24:30  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166260-repair/
    2016-12-03 20:19:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166246-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/162908-verify/
chromeos2-row2-rack8-host5
    2016-12-03 17:32:19  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166116-repair/
    2016-12-03 17:27:41  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166114-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/162909-verify/
chromeos2-row2-rack8-host1
    2016-12-03 10:18:23  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165561-repair/
    2016-12-03 10:13:57  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165558-provision/
    2016-12-03 04:30:55  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/88725551-chromeos-test/
    2016-12-03 04:16:44  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165080-provision/

Looking at the logs, it seems all moblab instances were offline at the
start of provisioning.  So, something happened in the lab (or at least
on the DUTs) in between testing.

I've filed ticket b2/33346512 to request repair and diagnosis of all eight
moblab instances

Labels: -Pri-1 Pri-0
CQ is blocked by this.
Status: Fixed
Most recent moblab lpaladin passed. This was probably fixed by b/33346512
Status: Untriaged
only one build success.

builds fails on 4432,4431, 4430, 4429

https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin
Status: Fixed
success on 4433 
Comment 12 by dchan@google.com, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Comment 16 by dchan@chromium.org, Oct 14 (4 days ago)
Status: Archived
Sign in to add a comment