New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872830 link

Starred by 2 users

Issue metadata

Status: Started
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Inventory emails blocked on "Host matching query does not exist."

Project Member Reported by jkop@chromium.org, Aug 9

Issue description

2018-08-07 11:14:03 | ERROR      | Error escaped main                                           
Traceback (most recent call last):
  File "site_utils/lab_inventory.py", line 1336, in main                                    
    _perform_inventory_reports(arguments)
  File "site_utils/lab_inventory.py", line 1154, in _perform_inventory_reports          
    _report_untestable_dut_metrics(inventory)
  File "site_utils/lab_inventory.py", line 1087, in _report_untestable_dut_metrics           
    if _host_is_working(history):                     
  File "site_utils/lab_inventory.py", line 135, in _host_is_working
    return history.last_diagnosis()[0] == status_history.WORKING
  File "/usr/local/autotest/server/lib/status_history.py", line 658, in last_diagnosis
    self._init_status_task()           
  File "/usr/local/autotest/server/lib/status_history.py", line 587, in _init_status_task  
    self._afe, self._host.id, self.end_time)
  File "/usr/local/autotest/server/lib/status_history.py", line 285, in get_status_task
    task = afe.get_host_status_task(host_id, query_end)
  File "/usr/local/autotest/server/frontend.py", line 633, in get_host_status_task
    host_id=host_id, end_time=end_time)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 249, in GenericRetry
    return _run()
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 182, in _Wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 248, in _run
    return functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 108, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 143, in __call__
    raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 3719}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1559, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 3719}

2018-08-07 11:14:03 | INFO       | Attempting refresh to obtain initial access_token
2018-08-07 11:14:03 | INFO       | Refreshing access_token
 
Cc: ayatane@chromium.org
 Issue 655804  has been merged into this issue.
For the record, this ain't really new.

Did it resolve itself in the past? That's not entirely clear from the previous bug.

In any case, I'd suggest a mitigation of catching this error and sending metrics. Figuring out the cause and fixing it is non-urgent.
That host is invalid/deleted

MySQL [chromeos_autotest_db]> select * from afe_hosts where id=3719;
+------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+
| id   | hostname                    | locked | synch_id | status        | invalid | protection | locked_by_id | lock_time | dirty | leased | shard_id | lock_reason |
+------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+
| 3719 | chromeos4-row4-rack5-host18 |      0 |     NULL | Repair Failed |       1 |          0 |         NULL | NULL      |     1 |      1 |      228 |             |
+------+-----------------------------+--------+----------+---------------+---------+------------+--------------+-----------+-------+--------+----------+-------------+
1 row in set (0.00 sec)
Labels: -Chase-Pending Chase
Owner: jkop@chromium.org
Status: Started (was: Available)
a CL in flight to harden inventory script against missing host.
Labels: -Pri-1 -Chase Pri-2
That was not actually the core problem, which was that the script wasn't even running. jrbarnette@ landed a fix for that, and inventory is running again.

No longer Chase.
Project Member

Comment 7 by bugdroid1@chromium.org, Aug 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/7c4c954b4b28d77feb6db97b71c2d62d24c707e7

commit 7c4c954b4b28d77feb6db97b71c2d62d24c707e7
Author: Jacob Kopczynski <jkop@google.com>
Date: Thu Aug 23 20:20:49 2018

autotest: Catch errors in lab inventory

If there is a missing/invalid DUT whose status is being checked, the
exception crashes the entire inventory run without output. This catches
those errors and surfaces them as a metric, but continues taking inventory.

BUG=chromium:872830
TEST=Ran a debug run of the inventory script, ran unittests

Change-Id: Ib8bff0a240a963cfa0a41f2f2b1c5cac0c4c3ff5
Reviewed-on: https://chromium-review.googlesource.com/1173657
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Jacob Kopczynski <jkop@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/7c4c954b4b28d77feb6db97b71c2d62d24c707e7/site_utils/lab_inventory.py
[modify] https://crrev.com/7c4c954b4b28d77feb6db97b71c2d62d24c707e7/site_utils/lab_inventory_unittest.py

Status: Fixed (was: Started)
Status: Started (was: Fixed)
This change was reverted as crrev.com/c/1236779
Cc: -gmeinke@chromium.org -dgarr...@chromium.org -kirtika@chromium.org

Sign in to add a comment