"Not enough DUTs for board: expresso" |
||||||
Issue descriptionexpresso-release has been failing for ~2 weeks, it can't find DUTs https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2125 The HWTest stages show: " NotEnoughDutsError: Not enough DUTs for board: expresso, pool: bvt; required: 4, found: 3 " Assigning to deputy.
,
Feb 20 2018
It looks like we have 0 spares. 3 DUTs aren't enough to run the tests before timeout. I'm not sure what to do except push for the replacements.
,
Feb 20 2018
,
Feb 26 2018
,
Feb 28 2018
This is still happening: https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2150 If we don't expect to get replacements anytime soon, can we make this release builder experimental for now?
,
Mar 20 2018
Still failing: https://luci-milo.appspot.com/buildbot/chromeos/expresso-release/2204 I will put up a CL to mark this builder (and maybe some others) as experimental.
,
Mar 20 2018
> I will put up a CL to mark this builder (and maybe some others) as experimental. Wait, I've just checked inventory. The database records some 30 expresso DUTs. I don't know if they all work, but that's enough. However, the automated inventory believes there are only 11 DUTs, so there's a disconnect somewhere. We should sort out the infra problem first...
,
Mar 20 2018
Digging deeper into the inventory, most of the expresso units aren't
working because of battery problems. Replacements are on order.
When you look at what's left over, it's woefully inadequate:
$ atest host list -b board:expresso --unlocked | count_labels -p
3 bvt
2 cts
1 performance
1 suites
1 wificell
1 wifichaos
For now, I think the right answer is to re-assign the cts, performance,
and suites pools to bvt, so that we can at least cover the release
builders.
,
Mar 20 2018
All right after fighting (repeatedly) with balance-pool, I broke down
and manually reassigned pool labels until all working DUTs are in the
BVT pool:
$ dut-status -b expresso -p bvt
hostname S last checked URL
chromeos4-row4-rack9-host5 OK 2018-03-20 12:00:55 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host5/1094847-provision/
chromeos4-row4-rack9-host6 OK 2018-03-20 12:01:01 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host6/1094849-provision/
chromeos4-row4-rack8-host20 OK 2018-03-20 12:05:46 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack8-host20/1094852-provision/
chromeos4-row4-rack9-host8 OK 2018-03-20 12:05:46 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row4-rack9-host8/1094851-provision/
chromeos2-row6-rack9-host6 OK 2018-03-20 11:59:55 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack9-host6/1094846-provision/
chromeos2-row6-rack9-host3 OK 2018-03-20 12:07:16 http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row6-rack9-host3/1094854-provision/
I'll note that I'm continuing to fight with the system: Even after
assigning DUTs to pools, they keep forgetting their pool label.
,
Mar 20 2018
OK. The system _seems_ to have finally decided to remember the
pool assignments. Here's where we stand with pool assignments:
$ atest host list -b board:expresso | count_labels -p
6 bvt
25 suites
1 wificell
1 wifichaos
Everything in pool:suites is broken. One DUT seems to be stuck in
repair; the others are all locked and awaiting replacement.
,
Mar 21 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by waihong@chromium.org
, Feb 20 2018