New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 671279 link

Starred by 3 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

No machine available in moblab cq pool

Project Member Reported by puthik@chromium.org, Dec 5 2016

Issue description

+sheriff / Infradeputy

From log in https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin

1 machine is locked and another 3 are in Repair Failed state.


Attempting to display pool info: cq
host: chromeos2-row1-rack8-host1, status: Ready, locked: True diagnosis: Unused
host: chromeos2-row1-rack8-host3, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
host: chromeos2-row2-rack8-host5, status: Repair Failed, locked: False diagnosis: Failed repair
Reason: Some test(s) was aborted before running, suite must have timed out.
 
Cc: akes...@chromium.org
Owner: jrbarnette@chromium.org
Status: Assigned (was: Available)
From the deputy e-mail earlier today, I see that both the CQ and BVT
pools are in a similar state:

Status for pool:bvt, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
link                       1     0     7     8

Status for pool:cq, by board:
Board                    Bad  Idle  Good Total
guado_moblab               3     1     0     4
whirlwind                  1     0     7     8

Comment 3 by jinsong@google.com, Dec 5 2016

Issue 671287 has been merged into this issue.
Looking at the CQ pool, all four Moblab hosts are offline
(no answer to ping).

Looking at the repair logs, this isn't showing up.  So, we seem
to have two problems:
  * The Moblab instances are broken in various ways.
  * Repair isn't properly reporting the problems.

Holding just this one bug (for now) while I sort out what's
really going on.

All four Moblab instances went offline in sequence after provisioning.
It looks like there may be a bad build.  All four BVT instances have
a problem, too, so the problem could be ToT (not a bad CL).

$ dut-status -b guado_moblab -p cq -g
chromeos2-row1-rack8-host1
    2016-12-05 11:17:46  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170275-repair/
    2016-12-05 11:13:10  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/170272-provision/
    2016-11-08 15:42:21  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/84729938-chromeos-test/
    2016-11-08 15:30:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host1/116355-provision/
chromeos2-row1-rack8-host3
    2016-12-03 20:24:30  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166260-repair/
    2016-12-03 20:19:52  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/166246-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row1-rack8-host3/162908-verify/
chromeos2-row2-rack8-host5
    2016-12-03 17:32:19  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166116-repair/
    2016-12-03 17:27:41  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/166114-provision/
    2016-12-02 03:44:07  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host5/162909-verify/
chromeos2-row2-rack8-host1
    2016-12-03 10:18:23  NO http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165561-repair/
    2016-12-03 10:13:57  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165558-provision/
    2016-12-03 04:30:55  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/88725551-chromeos-test/
    2016-12-03 04:16:44  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row2-rack8-host1/165080-provision/

Looking at the logs, it seems all moblab instances were offline at the
start of provisioning.  So, something happened in the lab (or at least
on the DUTs) in between testing.

I've filed ticket b2/33346512 to request repair and diagnosis of all eight
moblab instances

Labels: -Pri-1 Pri-0
CQ is blocked by this.
Status: Fixed (was: Assigned)
Most recent moblab lpaladin passed. This was probably fixed by b/33346512
Status: Untriaged (was: Fixed)
only one build success.

builds fails on 4432,4431, 4430, 4429

https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-paladin
Status: Fixed (was: Untriaged)
success on 4433 

Comment 12 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 13 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 14 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 16 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment