New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 626045 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

balance_pool err'ing out due to presumed overloaded AFE

Project Member Reported by kevcheng@chromium.org, Jul 6 2016

Issue description

Since July:

$ cat balance_pool.log.2016-07-06
Default max broken boards calculated to be 29 for bvt pool
Traceback (most recent call last):
  File "site_utils/balance_pools.py", line 599, in <module>
    main(sys.argv)
  File "site_utils/balance_pools.py", line 576, in main
    if _too_many_broken_boards(inventory, pool, arguments):
  File "site_utils/balance_pools.py", line 446, in _too_many_broken_boards
    if counts.get_broken(pool) != 0]
  File "/usr/local/autotest/site_utils/lab_inventory.py", line 365, in get_broken
    return self._count_pool(_PoolCounts.get_broken, pool)
  File "/usr/local/autotest/site_utils/lab_inventory.py", line 312, in _count_pool
    return get_pool_count(self._pools[pool])
  File "/usr/local/autotest/site_utils/lab_inventory.py", line 231, in get_broken
    return len(self.get_broken_list())
  File "/usr/local/autotest/site_utils/lab_inventory.py", line 225, in get_broken_list
    if h.last_diagnosis()[0] == status_history.BROKEN]
  File "/usr/local/autotest/site_utils/status_history.py", line 571, in last_diagnosis
    self._init_status_task()
  File "/usr/local/autotest/site_utils/status_history.py", line 500, in _init_status_task
    self._afe, self._host.id, self.end_time)
  File "/usr/local/autotest/site_utils/status_history.py", line 235, in get_status_task
    task = afe.get_host_status_task(host_id, query_end)
  File "/usr/local/autotest/server/frontend.py", line 336, in get_host_status_task
    host_id=host_id, end_time=end_time)
  File "/usr/local/autotest/server/frontend.py", line 103, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 114, in __call__
    respdata = urllib2.urlopen(request).read()
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

 
That looks a lot like overload/slowness on cautotest.

cautotest has been dragging for over a week, this is probably
just one more symptom.

Summary: balance_pool err'ing out due to presumed overloaded AFE (was: balance_pool err'ing out)
yeah, it looks like it, running a one-off of balance_pool and currently it's taking way longer than normal.
Kicking it off ~10AM seems to have worked, perhaps 6AM is a really busy time?

$ site_utils/balance_pools.py --all-boards all_critical_pools
Default max broken boards calculated to be 29 for bvt pool
There are 8 boards in the bvt pool with at least 1 broken DUT (max threshold 29)
butterfly
daisy
lulu
parrot
samus
samus-cheets
stout
x86-zgb
Default max broken boards calculated to be 5 for cq pool
There are 0 boards in the cq pool with at least 1 broken DUT (max threshold 5)
Default max broken boards calculated to be 0 for continuous pool
There are 0 boards in the continuous pool with at least 1 broken DUT (max threshold 0)
Default max broken boards calculated to be 1 for cts pool
There are 0 boards in the cts pool with at least 1 broken DUT (max threshold 1)




Balancing x86-zgb bvt pool:
Total 6 DUTs, 3 working, 3 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 3 DUTs.
x86-zgb bvt pool has 0 spares available.
ERROR: Not enough spares: need 3, only have 0.
ERROR: x86-zgb bvt pool: Refusing to act on pool with 3 broken DUTs.
ERROR: Please investigate this board to see if there is a bug 
ERROR: that is bricking devices. Once you have finished your 
ERROR: investigation, you can force a rebalance with 
ERROR: --force-rebalance















Balancing samus-cheets bvt pool:
Total 6 DUTs, 4 working, 2 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 2 DUTs.
samus-cheets bvt pool has 2 spares available.
samus-cheets bvt pool will return 2 broken DUTs, leaving 0 still in the pool.
Transferring 2 DUTs from bvt to suites.
Updating host: chromeos4-row12-rack6-host1.
Removing labels ['pool:bvt'] from host chromeos4-row12-rack6-host1
Adding labels ['pool:suites'] to host chromeos4-row12-rack6-host1
Updating host: chromeos4-row12-rack5-host9.
Removing labels ['pool:bvt'] from host chromeos4-row12-rack5-host9
Adding labels ['pool:suites'] to host chromeos4-row12-rack5-host9
Transferring 2 DUTs from suites to bvt.
Updating host: chromeos4-row12-rack5-host3.
Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host3
Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host3
Updating host: chromeos4-row12-rack5-host11.
Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host11
Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host11


Balancing butterfly bvt pool:
Total 6 DUTs, 3 working, 3 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 3 DUTs.
butterfly bvt pool has 11 spares available.
butterfly bvt pool will return 3 broken DUTs, leaving 0 still in the pool.
ERROR: butterfly bvt pool: Refusing to act on pool with 3 broken DUTs.
ERROR: Please investigate this board to see if there is a bug 
ERROR: that is bricking devices. Once you have finished your 
ERROR: investigation, you can force a rebalance with 
ERROR: --force-rebalance






























Balancing daisy bvt pool:
Total 6 DUTs, 3 working, 3 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 3 DUTs.
daisy bvt pool has 3 spares available.
daisy bvt pool will return 3 broken DUTs, leaving 0 still in the pool.
ERROR: daisy bvt pool: Refusing to act on pool with 3 broken DUTs.
ERROR: Please investigate this board to see if there is a bug 
ERROR: that is bricking devices. Once you have finished your 
ERROR: investigation, you can force a rebalance with 
ERROR: --force-rebalance





Balancing whirlwind cq pool:
Total 8 DUTs, 6 working, 2 broken, 0 reserved.
Target is 8 working DUTs; grow pool by 2 DUTs.
whirlwind cq pool has 0 spares available.
ERROR: Not enough spares: need 2, only have 0.




















Balancing parrot bvt pool:
Total 9 DUTs, 8 working, 1 broken, 0 reserved.
Target is 9 working DUTs; grow pool by 1 DUTs.
parrot bvt pool has 14 spares available.
parrot bvt pool will return 1 broken DUTs, leaving 0 still in the pool.
Transferring 1 DUTs from bvt to suites.
Updating host: chromeos2-row3-rack2-host2.
Removing labels ['pool:bvt'] from host chromeos2-row3-rack2-host2
Adding labels ['pool:suites'] to host chromeos2-row3-rack2-host2
Transferring 1 DUTs from suites to bvt.
Updating host: chromeos2-row3-rack1-host15.
Removing labels ['pool:suites'] from host chromeos2-row3-rack1-host15
Adding labels ['pool:bvt'] to host chromeos2-row3-rack1-host15





Balancing lulu bvt pool:
Total 6 DUTs, 5 working, 1 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 1 DUTs.
lulu bvt pool has 30 spares available.
lulu bvt pool will return 1 broken DUTs, leaving 0 still in the pool.
Transferring 1 DUTs from bvt to suites.
Updating host: chromeos4-row6-rack1-host1.
Removing labels ['pool:bvt'] from host chromeos4-row6-rack1-host1
Adding labels ['pool:suites'] to host chromeos4-row6-rack1-host1
Transferring 1 DUTs from suites to bvt.
Updating host: chromeos2-row5-rack9-host1.
Removing labels ['pool:suites'] from host chromeos2-row5-rack9-host1
Adding labels ['pool:bvt'] to host chromeos2-row5-rack9-host1










Balancing guado_moblab cq pool:
Total 3 DUTs, 2 working, 1 broken, 0 reserved.
Target is 3 working DUTs; grow pool by 1 DUTs.
guado_moblab cq pool has 0 spares available.
ERROR: Not enough spares: need 1, only have 0.




Balancing cyan-cheets bvt pool:
Total 8 DUTs, 7 working, 1 broken, 0 reserved.
Target is 8 working DUTs; grow pool by 1 DUTs.
cyan-cheets bvt pool has 0 spares available.
ERROR: Not enough spares: need 1, only have 0.







Balancing stout bvt pool:
Total 6 DUTs, 5 working, 1 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 1 DUTs.
stout bvt pool has 9 spares available.
stout bvt pool will return 1 broken DUTs, leaving 0 still in the pool.
Transferring 1 DUTs from bvt to suites.
Updating host: chromeos2-row3-rack8-host3.
Removing labels ['pool:bvt'] from host chromeos2-row3-rack8-host3
Adding labels ['pool:suites'] to host chromeos2-row3-rack8-host3
Transferring 1 DUTs from suites to bvt.
Updating host: chromeos2-row3-rack8-host4.
Removing labels ['pool:suites'] from host chromeos2-row3-rack8-host4
Adding labels ['pool:bvt'] to host chromeos2-row3-rack8-host4


Balancing samus bvt pool:
Total 6 DUTs, 4 working, 2 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 2 DUTs.
samus bvt pool has 9 spares available.
samus bvt pool will return 2 broken DUTs, leaving 0 still in the pool.
Transferring 2 DUTs from bvt to suites.
Updating host: chromeos2-row24-rack1-host1.
Removing labels ['pool:bvt'] from host chromeos2-row24-rack1-host1
Adding labels ['pool:suites'] to host chromeos2-row24-rack1-host1
Updating host: chromeos2-row24-rack1-host15.
Removing labels ['pool:bvt'] from host chromeos2-row24-rack1-host15
Adding labels ['pool:suites'] to host chromeos2-row24-rack1-host15
Transferring 2 DUTs from suites to bvt.
Updating host: chromeos4-row12-rack5-host1.
Removing labels ['pool:suites'] from host chromeos4-row12-rack5-host1
Adding labels ['pool:bvt'] to host chromeos4-row12-rack5-host1
Updating host: chromeos2-row24-rack2-host1.
Removing labels ['pool:suites'] from host chromeos2-row24-rack2-host1
Adding labels ['pool:bvt'] to host chromeos2-row24-rack2-host1






Balancing veyron_speedy cq pool:
Total 14 DUTs, 13 working, 1 broken, 0 reserved.
Target is 14 working DUTs; grow pool by 1 DUTs.
veyron_speedy cq pool has 11 spares available.
veyron_speedy cq pool will return 1 broken DUTs, leaving 0 still in the pool.
Transferring 1 DUTs from cq to suites.
Updating host: chromeos4-row4-rack7-host7.
Removing labels ['pool:cq'] from host chromeos4-row4-rack7-host7
Adding labels ['pool:suites'] to host chromeos4-row4-rack7-host7
Transferring 1 DUTs from suites to cq.
Updating host: chromeos4-row4-rack10-host16.
Removing labels ['pool:suites'] from host chromeos4-row4-rack10-host16
Adding labels ['pool:cq'] to host chromeos4-row4-rack10-host16

Status: WontFix (was: Untriaged)
looks like the afe is back to normal for now so closing out for now.

Sign in to add a comment