Inventory emails blocked on "Host matching query does not exist." |
|||||
Issue description
2018-08-07 11:14:03 | ERROR | Error escaped main
Traceback (most recent call last):
File "site_utils/lab_inventory.py", line 1336, in main
_perform_inventory_reports(arguments)
File "site_utils/lab_inventory.py", line 1154, in _perform_inventory_reports
_report_untestable_dut_metrics(inventory)
File "site_utils/lab_inventory.py", line 1087, in _report_untestable_dut_metrics
if _host_is_working(history):
File "site_utils/lab_inventory.py", line 135, in _host_is_working
return history.last_diagnosis()[0] == status_history.WORKING
File "/usr/local/autotest/server/lib/status_history.py", line 658, in last_diagnosis
self._init_status_task()
File "/usr/local/autotest/server/lib/status_history.py", line 587, in _init_status_task
self._afe, self._host.id, self.end_time)
File "/usr/local/autotest/server/lib/status_history.py", line 285, in get_status_task
task = afe.get_host_status_task(host_id, query_end)
File "/usr/local/autotest/server/frontend.py", line 633, in get_host_status_task
host_id=host_id, end_time=end_time)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
self, call, **dargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 249, in GenericRetry
return _run()
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 182, in _Wrapper
ret = func(*args, **kwargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 248, in _run
return functor(*args, **kwargs)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
return super(RetryingAFE, self).run(call, **dargs)
File "/usr/local/autotest/server/frontend.py", line 108, in run
result = utils.strip_unicode(rpc_call(**dargs))
File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 143, in __call__
raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 3719}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1559, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 3719}
2018-08-07 11:14:03 | INFO | Attempting refresh to obtain initial access_token
2018-08-07 11:14:03 | INFO | Refreshing access_token
,
Aug 9
For the record, this ain't really new.
,
Aug 9
Did it resolve itself in the past? That's not entirely clear from the previous bug. In any case, I'd suggest a mitigation of catching this error and sending metrics. Figuring out the cause and fixing it is non-urgent.
,
Aug 9
That host is invalid/deleted MySQL [chromeos_autotest_db]> select * from afe_hosts where id=3719; +------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ | id | hostname | locked | synch_id | status | invalid | protection | locked_by_id | lock_time | dirty | leased | shard_id | lock_reason | +------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ | 3719 | chromeos4-row4-rack5-host18 | 0 | NULL | Repair Failed | 1 | 0 | NULL | NULL | 1 | 1 | 228 | | +------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+ 1 row in set (0.00 sec)
,
Aug 13
a CL in flight to harden inventory script against missing host.
,
Aug 15
That was not actually the core problem, which was that the script wasn't even running. jrbarnette@ landed a fix for that, and inventory is running again. No longer Chase.
,
Aug 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7c4c954b4b28d77feb6db97b71c2d62d24c707e7 commit 7c4c954b4b28d77feb6db97b71c2d62d24c707e7 Author: Jacob Kopczynski <jkop@google.com> Date: Thu Aug 23 20:20:49 2018 autotest: Catch errors in lab inventory If there is a missing/invalid DUT whose status is being checked, the exception crashes the entire inventory run without output. This catches those errors and surfaces them as a metric, but continues taking inventory. BUG=chromium:872830 TEST=Ran a debug run of the inventory script, ran unittests Change-Id: Ib8bff0a240a963cfa0a41f2f2b1c5cac0c4c3ff5 Reviewed-on: https://chromium-review.googlesource.com/1173657 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/7c4c954b4b28d77feb6db97b71c2d62d24c707e7/site_utils/lab_inventory.py [modify] https://crrev.com/7c4c954b4b28d77feb6db97b71c2d62d24c707e7/site_utils/lab_inventory_unittest.py
,
Aug 30
,
Sep 26
,
Sep 26
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jrbarnette@chromium.org
, Aug 9