New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Mar 21
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

"Not enough DUTs for board: expresso"

Project Member Reported by norvez@chromium.org, Feb 20 Back to list

Issue description

expresso-release has been failing for ~2 weeks, it can't find DUTs

https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2125

The HWTest stages show:

"
NotEnoughDutsError: Not enough DUTs for board: expresso, pool: bvt; required: 4, found: 3

"

Assigning to deputy.

 
There are 6 expresso in pool:bvt. 3 of them are locked by afaris@ for the reason "battery swelling: send out for replacement".

Checked other pools. Most of them are affected by the battery swelling issue.
Cc: jrbarnette@chromium.org
It looks like we have 0 spares. 3 DUTs aren't enough to run the tests before timeout.

I'm not sure what to do except push for the replacements.

Status: Started
Owner: pprabhu@chromium.org
This is still happening: https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2150

If we don't expect to get replacements anytime soon, can we make this release builder experimental for now?
Still failing: https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2204

I will put up a CL to mark this builder (and maybe some others) as experimental.
> I will put up a CL to mark this builder (and maybe some others) as experimental.

Wait, I've just checked inventory.  The database records some 30
expresso DUTs.  I don't know if they all work, but that's enough.

However, the automated inventory believes there are only 11 DUTs,
so there's a disconnect somewhere.  We should sort out the infra
problem first...

Digging deeper into the inventory, most of the expresso units aren't
working because of battery problems.  Replacements are on order.
When you look at what's left over, it's woefully inadequate:
$ atest host list -b board:expresso --unlocked | count_labels -p
      3 bvt
      2 cts
      1 performance
      1 suites
      1 wificell
      1 wifichaos

For now, I think the right answer is to re-assign the cts, performance,
and suites pools to bvt, so that we can at least cover the release
builders.

All right after fighting (repeatedly) with balance-pool, I broke down
and manually reassigned pool labels until all working DUTs are in the
BVT pool:
    $ dut-status -b expresso -p bvt
    hostname                       S   last checked         URL
    chromeos4-row4-rack9-host5     OK  2018-03-20 12:00:55  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host5/1094847-provision/
    chromeos4-row4-rack9-host6     OK  2018-03-20 12:01:01  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host6/1094849-provision/
    chromeos4-row4-rack8-host20    OK  2018-03-20 12:05:46  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack8-host20/1094852-provision/
    chromeos4-row4-rack9-host8     OK  2018-03-20 12:05:46  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host8/1094851-provision/
    chromeos2-row6-rack9-host6     OK  2018-03-20 11:59:55  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack9-host6/1094846-provision/
    chromeos2-row6-rack9-host3     OK  2018-03-20 12:07:16  http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack9-host3/1094854-provision/

I'll note that I'm continuing to fight with the system:  Even after
assigning DUTs to pools, they keep forgetting their pool label.

Owner: jrbarnette@chromium.org
OK.  The system _seems_ to have finally decided to remember the
pool assignments.  Here's where we stand with pool assignments:
    $ atest host list -b board:expresso | count_labels -p
          6 bvt
         25 suites
          1 wificell
          1 wifichaos

Everything in pool:suites is broken.  One DUT seems to be stuck in
repair; the others are all locked and awaiting replacement.

Status: Verified

Sign in to add a comment