balance_pool err'ing out due to presumed overloaded AFE |
|||
Issue description
Since July:
$ cat balance_pool.log.2016-07-06
Default max broken boards calculated to be 29 for bvt pool
Traceback (most recent call last):
File "site_utils/balance_pools.py", line 599, in <module>
main(sys.argv)
File "site_utils/balance_pools.py", line 576, in main
if _too_many_broken_boards(inventory, pool, arguments):
File "site_utils/balance_pools.py", line 446, in _too_many_broken_boards
if counts.get_broken(pool) != 0]
File "/usr/local/autotest/site_utils/lab_inventory.py", line 365, in get_broken
return self._count_pool(_PoolCounts.get_broken, pool)
File "/usr/local/autotest/site_utils/lab_inventory.py", line 312, in _count_pool
return get_pool_count(self._pools[pool])
File "/usr/local/autotest/site_utils/lab_inventory.py", line 231, in get_broken
return len(self.get_broken_list())
File "/usr/local/autotest/site_utils/lab_inventory.py", line 225, in get_broken_list
if h.last_diagnosis()[0] == status_history.BROKEN]
File "/usr/local/autotest/site_utils/status_history.py", line 571, in last_diagnosis
self._init_status_task()
File "/usr/local/autotest/site_utils/status_history.py", line 500, in _init_status_task
self._afe, self._host.id, self.end_time)
File "/usr/local/autotest/site_utils/status_history.py", line 235, in get_status_task
task = afe.get_host_status_task(host_id, query_end)
File "/usr/local/autotest/server/frontend.py", line 336, in get_host_status_task
host_id=host_id, end_time=end_time)
File "/usr/local/autotest/server/frontend.py", line 103, in run
result = utils.strip_unicode(rpc_call(**dargs))
File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 114, in __call__
respdata = urllib2.urlopen(request).read()
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
,
Jul 6 2016
yeah, it looks like it, running a one-off of balance_pool and currently it's taking way longer than normal.
,
Jul 6 2016
Kicking it off ~10AM seems to have worked, perhaps 6AM is a really busy time? $ site_utils/balance_pools.py --all-boards all_critical_pools Default max broken boards calculated to be 29 for bvt pool There are 8 boards in the bvt pool with at least 1 broken DUT (max threshold 29) butterfly daisy lulu parrot samus samus-cheets stout x86-zgb Default max broken boards calculated to be 5 for cq pool There are 0 boards in the cq pool with at least 1 broken DUT (max threshold 5) Default max broken boards calculated to be 0 for continuous pool There are 0 boards in the continuous pool with at least 1 broken DUT (max threshold 0) Default max broken boards calculated to be 1 for cts pool There are 0 boards in the cts pool with at least 1 broken DUT (max threshold 1) Balancing x86-zgb bvt pool: Total 6 DUTs, 3 working, 3 broken, 0 reserved. Target is 6 working DUTs; grow pool by 3 DUTs. x86-zgb bvt pool has 0 spares available. ERROR: Not enough spares: need 3, only have 0. ERROR: x86-zgb bvt pool: Refusing to act on pool with 3 broken DUTs. ERROR: Please investigate this board to see if there is a bug ERROR: that is bricking devices. Once you have finished your ERROR: investigation, you can force a rebalance with ERROR: --force-rebalance Balancing samus-cheets bvt pool: Total 6 DUTs, 4 working, 2 broken, 0 reserved. Target is 6 working DUTs; grow pool by 2 DUTs. samus-cheets bvt pool has 2 spares available. samus-cheets bvt pool will return 2 broken DUTs, leaving 0 still in the pool. Transferring 2 DUTs from bvt to suites. Updating host: chromeos4-row12-rack6-host1. Removing labels ['pool:bvt'] from host chromeos4-row12-rack6-host1 Adding labels ['pool:suites'] to host chromeos4-row12-rack6-host1 Updating host: chromeos4-row12-rack5-host9. Removing labels ['pool:bvt'] from host chromeos4-row12-rack5-host9 Adding labels ['pool:suites'] to host chromeos4-row12-rack5-host9 Transferring 2 DUTs from suites to bvt. Updating host: chromeos4-row12-rack5-host3. Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host3 Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host3 Updating host: chromeos4-row12-rack5-host11. Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host11 Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host11 Balancing butterfly bvt pool: Total 6 DUTs, 3 working, 3 broken, 0 reserved. Target is 6 working DUTs; grow pool by 3 DUTs. butterfly bvt pool has 11 spares available. butterfly bvt pool will return 3 broken DUTs, leaving 0 still in the pool. ERROR: butterfly bvt pool: Refusing to act on pool with 3 broken DUTs. ERROR: Please investigate this board to see if there is a bug ERROR: that is bricking devices. Once you have finished your ERROR: investigation, you can force a rebalance with ERROR: --force-rebalance Balancing daisy bvt pool: Total 6 DUTs, 3 working, 3 broken, 0 reserved. Target is 6 working DUTs; grow pool by 3 DUTs. daisy bvt pool has 3 spares available. daisy bvt pool will return 3 broken DUTs, leaving 0 still in the pool. ERROR: daisy bvt pool: Refusing to act on pool with 3 broken DUTs. ERROR: Please investigate this board to see if there is a bug ERROR: that is bricking devices. Once you have finished your ERROR: investigation, you can force a rebalance with ERROR: --force-rebalance Balancing whirlwind cq pool: Total 8 DUTs, 6 working, 2 broken, 0 reserved. Target is 8 working DUTs; grow pool by 2 DUTs. whirlwind cq pool has 0 spares available. ERROR: Not enough spares: need 2, only have 0. Balancing parrot bvt pool: Total 9 DUTs, 8 working, 1 broken, 0 reserved. Target is 9 working DUTs; grow pool by 1 DUTs. parrot bvt pool has 14 spares available. parrot bvt pool will return 1 broken DUTs, leaving 0 still in the pool. Transferring 1 DUTs from bvt to suites. Updating host: chromeos2-row3-rack2-host2. Removing labels ['pool:bvt'] from host chromeos2-row3-rack2-host2 Adding labels ['pool:suites'] to host chromeos2-row3-rack2-host2 Transferring 1 DUTs from suites to bvt. Updating host: chromeos2-row3-rack1-host15. Removing labels ['pool:suites'] from host chromeos2-row3-rack1-host15 Adding labels ['pool:bvt'] to host chromeos2-row3-rack1-host15 Balancing lulu bvt pool: Total 6 DUTs, 5 working, 1 broken, 0 reserved. Target is 6 working DUTs; grow pool by 1 DUTs. lulu bvt pool has 30 spares available. lulu bvt pool will return 1 broken DUTs, leaving 0 still in the pool. Transferring 1 DUTs from bvt to suites. Updating host: chromeos4-row6-rack1-host1. Removing labels ['pool:bvt'] from host chromeos4-row6-rack1-host1 Adding labels ['pool:suites'] to host chromeos4-row6-rack1-host1 Transferring 1 DUTs from suites to bvt. Updating host: chromeos2-row5-rack9-host1. Removing labels ['pool:suites'] from host chromeos2-row5-rack9-host1 Adding labels ['pool:bvt'] to host chromeos2-row5-rack9-host1 Balancing guado_moblab cq pool: Total 3 DUTs, 2 working, 1 broken, 0 reserved. Target is 3 working DUTs; grow pool by 1 DUTs. guado_moblab cq pool has 0 spares available. ERROR: Not enough spares: need 1, only have 0. Balancing cyan-cheets bvt pool: Total 8 DUTs, 7 working, 1 broken, 0 reserved. Target is 8 working DUTs; grow pool by 1 DUTs. cyan-cheets bvt pool has 0 spares available. ERROR: Not enough spares: need 1, only have 0. Balancing stout bvt pool: Total 6 DUTs, 5 working, 1 broken, 0 reserved. Target is 6 working DUTs; grow pool by 1 DUTs. stout bvt pool has 9 spares available. stout bvt pool will return 1 broken DUTs, leaving 0 still in the pool. Transferring 1 DUTs from bvt to suites. Updating host: chromeos2-row3-rack8-host3. Removing labels ['pool:bvt'] from host chromeos2-row3-rack8-host3 Adding labels ['pool:suites'] to host chromeos2-row3-rack8-host3 Transferring 1 DUTs from suites to bvt. Updating host: chromeos2-row3-rack8-host4. Removing labels ['pool:suites'] from host chromeos2-row3-rack8-host4 Adding labels ['pool:bvt'] to host chromeos2-row3-rack8-host4 Balancing samus bvt pool: Total 6 DUTs, 4 working, 2 broken, 0 reserved. Target is 6 working DUTs; grow pool by 2 DUTs. samus bvt pool has 9 spares available. samus bvt pool will return 2 broken DUTs, leaving 0 still in the pool. Transferring 2 DUTs from bvt to suites. Updating host: chromeos2-row24-rack1-host1. Removing labels ['pool:bvt'] from host chromeos2-row24-rack1-host1 Adding labels ['pool:suites'] to host chromeos2-row24-rack1-host1 Updating host: chromeos2-row24-rack1-host15. Removing labels ['pool:bvt'] from host chromeos2-row24-rack1-host15 Adding labels ['pool:suites'] to host chromeos2-row24-rack1-host15 Transferring 2 DUTs from suites to bvt. Updating host: chromeos4-row12-rack5-host1. Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host1 Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host1 Updating host: chromeos2-row24-rack2-host1. Removing labels ['pool:suites'] from host chromeos2-row24-rack2-host1 Adding labels ['pool:bvt'] to host chromeos2-row24-rack2-host1 Balancing veyron_speedy cq pool: Total 14 DUTs, 13 working, 1 broken, 0 reserved. Target is 14 working DUTs; grow pool by 1 DUTs. veyron_speedy cq pool has 11 spares available. veyron_speedy cq pool will return 1 broken DUTs, leaving 0 still in the pool. Transferring 1 DUTs from cq to suites. Updating host: chromeos4-row4-rack7-host7. Removing labels ['pool:cq'] from host chromeos4-row4-rack7-host7 Adding labels ['pool:suites'] to host chromeos4-row4-rack7-host7 Transferring 1 DUTs from suites to cq. Updating host: chromeos4-row4-rack10-host16. Removing labels ['pool:suites'] from host chromeos4-row4-rack10-host16 Adding labels ['pool:cq'] to host chromeos4-row4-rack10-host16
,
Jul 9 2016
looks like the afe is back to normal for now so closing out for now. |
|||
►
Sign in to add a comment |
|||
Comment 1 by jrbarnette@chromium.org
, Jul 6 2016