New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 596711 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

balance_pool: error messages don't explain about locked DUTs.

Project Member Reported by kevcheng@chromium.org, Mar 21 2016

Issue description

on stumpy:

$ dut-status  -o -p bvt -b banjo
hostname                       S   last checked         URL
chromeos4-row5-rack2-host1     OK  2016-03-21 16:05:58  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack2-host1/52592863-reset/
chromeos4-row5-rack2-host5     OK  2016-03-21 16:01:19  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack2-host5/52592603-reset/
chromeos4-row5-rack3-host1     OK  2016-03-21 16:01:45  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack3-host1/52592628-reset/
chromeos4-row5-rack2-host13    OK  2016-03-21 16:04:28  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack2-host13/52592770-reset/
chromeos4-row5-rack2-host11    NO  2016-03-21 15:31:44  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack2-host11/52591207-repair/
chromeos4-row5-rack3-host15    OK  2016-03-21 16:05:58  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row5-rack3-host15/52592862-reset/


but when I rebalance I get:
$ balance_pool bvt stumpy

Balancing stumpy bvt pool:
Total 5 DUTs, 2 working, 3 broken, 0 reserved.
Target is 5 working DUTs; grow pool by 3 DUTs.
stumpy bvt pool has 1 spares available.
ERROR: Not enough spares: need 3, only have 1.
ERROR: stumpy bvt pool: Refusing to act on pool with 3 broken DUTs.
ERROR: Please investigate this board to see if there is a bug 
ERROR: that is bricking devices. Once you have finished your 
ERROR: investigation, you can force a rebalance with 
ERROR: --force-rebalance


There should be a list of 4 working, 1 broken, 0 reserved.
 
whoops, wrong dut status:

$ dut-status  -o -p bvt -b stumpy
hostname                       S   last checked         URL
chromeos4-row2-rack9-host2     NO  2016-03-21 15:38:47  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row2-rack9-host2/286932-repair/
chromeos4-row9-rack2-host11    OK  2016-03-21 15:34:21  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row9-rack2-host11/286930-reset/
chromeos4-row2-rack8-host22    OK  2016-03-21 15:41:57  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row2-rack8-host22/286935-reset/
chromeos2-row24-rack11-host2   OK  2016-03-21 08:12:30  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row24-rack11-host2/286223-reset/
chromeos2-row24-rack11-host1   OK  2016-03-21 08:11:57  http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row24-rack11-host1/286214-reset/

14/24 of the stumpys are locked which is probably the cause for the status miscalculation.
Summary: balance_pool: error messages don't explain about locked DUTs. (was: balance_pool: will sometimes deduce incorrect board states)
Yes, the reason for balance_pool believing there are so
many bad DUTs is that they're temporarily locked.

The error message counts should call out locked DUTs as a status,
since they count as bad, but dut-status shows them as good.

Comment 4 by benhenry@google.com, Apr 26 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS
Status: Archived (was: Untriaged)
This bug has not been touched in over a year.  It is probably no longer relevant.

Sign in to add a comment