Issue metadata
Sign in to add a comment
|
Lab inventory failed to gather status - host lookup failure
Reported by
jrbarnette@chromium.org,
Oct 13 2016
|
||||||||||||||||||||||||
Issue description
The lab inventory job this morning failed while gathering
status.
Here's the headline failure:
Host matching query does not exist. Lookup parameters were {'pk':
2723}
The failure seems to have happened trying to get status on a
Monroe DUT. The problem didn't happen with the inventory runs
immediately preceding and following the failure.
The full log (including multiple successful runs and the failure)
is attached.
Here's the full traceback in all its glory:
====
2016-10-13 06:28:59 | DEBUG | Listing failed DUTs for monroe
2016-10-13 06:29:09 | DEBUG | Encountered unexpected exception <class 'autotest_lib.frontend.afe.json_rpc.proxy.JSONRPCException'>(DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
), not retrying.
2016-10-13 06:29:09 | ERROR | Unexpected exception: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
File "site_utils/lab_inventory.py", line 1151, in main
inventory, arguments.recommend) + '\n\n\n'
File "site_utils/lab_inventory.py", line 670, in _generate_repair_recommendation
if counts.get_broken() != 0:
File "site_utils/lab_inventory.py", line 365, in get_broken
return self._count_pool(_PoolCounts.get_broken, pool)
File "site_utils/lab_inventory.py", line 310, in _count_pool
for counts in self._pools.values()])
File "site_utils/lab_inventory.py", line 231, in get_broken
return len(self.get_broken_list())
File "site_utils/lab_inventory.py", line 225, in get_broken_list
if h.last_diagnosis()[0] == status_history.BROKEN]
File "/usr/local/autotest/server/lib/status_history.py", line 571, in last_diagnosis
self._init_status_task()
File "/usr/local/autotest/server/lib/status_history.py", line 500, in _init_status_task
self._afe, self._host.id, self.end_time)
File "/usr/local/autotest/server/lib/status_history.py", line 235, in get_status_task
task = afe.get_host_status_task(host_id, query_end)
File "/usr/local/autotest/server/frontend.py", line 336, in get_host_status_task
host_id=host_id, end_time=end_time)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
self, call, **dargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 100, in GenericRetry
ret = functor(*args, **kwargs)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 81, in _run
return super(RetryingAFE, self).run(call, **dargs)
File "/usr/local/autotest/server/frontend.py", line 103, in run
result = utils.strip_unicode(rpc_call(**dargs))
File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 123, in __call__
raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
,
Jan 19 2017
,
Jan 30 2017
,
Feb 21 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 18 2018
This problem still happens. It's killed several inventory runs this week.
Here's the most recent, from about 2 hours ago:
2018-04-18 14:33:35 | ERROR | Unexpected exception: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1563, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
File "site_utils/lab_inventory.py", line 1383, in main
_perform_inventory_reports(arguments)
File "site_utils/lab_inventory.py", line 1197, in _perform_inventory_reports
_perform_model_inventory(arguments, inventory, timestamp)
File "site_utils/lab_inventory.py", line 972, in _perform_model_inventory
model_message = _generate_model_inventory_message(inventory)
File "site_utils/lab_inventory.py", line 749, in _generate_model_inventory_message
counts.get_spares_buffer(),
File "site_utils/lab_inventory.py", line 405, in get_spares_buffer
return self.get_total(spare_pool) - self.get_broken()
File "site_utils/lab_inventory.py", line 359, in get_broken
return self._count_pool(_HostSetInventory.get_broken, pool)
File "site_utils/lab_inventory.py", line 304, in _count_pool
self._histories_by_pool.values()])
File "site_utils/lab_inventory.py", line 228, in get_broken
return len(self.get_broken_list())
File "site_utils/lab_inventory.py", line 222, in get_broken_list
if h.last_diagnosis()[0] == status_history.BROKEN]
File "/usr/local/autotest/server/lib/status_history.py", line 658, in last_diagnosis
self._init_status_task()
File "/usr/local/autotest/server/lib/status_history.py", line 587, in _init_status_task
self._afe, self._host.id, self.end_time)
File "/usr/local/autotest/server/lib/status_history.py", line 285, in get_status_task
task = afe.get_host_status_task(host_id, query_end)
File "/usr/local/autotest/server/frontend.py", line 660, in get_host_status_task
host_id=host_id, end_time=end_time)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
self, call, **dargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry
return _run()
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 177, in _Wrapper
ret = func(*args, **kwargs)
File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 243, in _run
return functor(*args, **kwargs)
File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
return super(RetryingAFE, self).run(call, **dargs)
File "/usr/local/autotest/server/frontend.py", line 108, in run
result = utils.strip_unicode(rpc_call(**dargs))
File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 143, in __call__
raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
results['result'] = self.invokeServiceEndpoint(meth, args)
File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
return meth(*args)
File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
return f(*args, **keyword_args)
File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1563, in get_host_status_task
host = models.Host.smart_get(host_id)
File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
return manager.get(pk=id_or_name)
File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
(self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
,
May 14 2018
Uh, what? I have no idea how this error can happen.
,
May 15 2018
Oh ye of little faith. This is real. You can see it in the logs, if our over-aggressive cleanup hasn't deleted them yet. The traceback above shows the path to the failure. I've no idea how we get into the state the triggers this, but _we do get into that state_, and _this does really happen in production_.
,
May 15 2018
I'm not doubting that the logs show this, I'm having trouble figuring out how this error could happen. It seems like a fundamental violation of how a RDBMS works. Edit: Oh wait, host 2566 is marked invalid. That seems obvious in hindsight. I guess the inventory script is not ignoring invalid hosts at some point and then trying to get its status.
,
May 15 2018
This isn't my area of expertise, and I'm disinclined to take ownership merely by virtue of reporting it.
,
Aug 9
This has been re-reported. Let's use the new bug, because there's no genuinely dispositive content here. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by dshi@chromium.org
, Oct 25 2016