New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 655804 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 872830
Owner: ----
Closed: Aug 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Lab inventory failed to gather status - host lookup failure

Reported by jrbarnette@chromium.org, Oct 13 2016

Issue description

The lab inventory job this morning failed while gathering
status.

Here's the headline failure:
    Host matching query does not exist. Lookup parameters were {'pk': 
2723}

The failure seems to have happened trying to get status on a
Monroe DUT.  The problem didn't happen with the inventory runs
immediately preceding and following the failure.

The full log (including multiple successful runs and the failure)
is attached.

Here's the full traceback in all its glory:
====
2016-10-13 06:28:59 | DEBUG      | Listing failed DUTs for monroe
2016-10-13 06:29:09 | DEBUG      | Encountered unexpected exception <class 'autotest_lib.frontend.afe.json_rpc.proxy.JSONRPCException'>(DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
), not retrying.
2016-10-13 06:29:09 | ERROR      | Unexpected exception: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
  File "site_utils/lab_inventory.py", line 1151, in main
    inventory, arguments.recommend) + '\n\n\n'
  File "site_utils/lab_inventory.py", line 670, in _generate_repair_recommendation
    if counts.get_broken() != 0:
  File "site_utils/lab_inventory.py", line 365, in get_broken
    return self._count_pool(_PoolCounts.get_broken, pool)
  File "site_utils/lab_inventory.py", line 310, in _count_pool
    for counts in self._pools.values()])
  File "site_utils/lab_inventory.py", line 231, in get_broken
    return len(self.get_broken_list())
  File "site_utils/lab_inventory.py", line 225, in get_broken_list
    if h.last_diagnosis()[0] == status_history.BROKEN]
  File "/usr/local/autotest/server/lib/status_history.py", line 571, in last_diagnosis
    self._init_status_task()
  File "/usr/local/autotest/server/lib/status_history.py", line 500, in _init_status_task
    self._afe, self._host.id, self.end_time)
  File "/usr/local/autotest/server/lib/status_history.py", line 235, in get_status_task
    task = afe.get_host_status_task(host_id, query_end)
  File "/usr/local/autotest/server/frontend.py", line 336, in get_host_status_task
    host_id=host_id, end_time=end_time)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 100, in GenericRetry
    ret = functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 81, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 103, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 123, in __call__
    raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1497, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 831, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2723}


 
lab-inventory.log
239 KB View Download

Comment 1 by dshi@chromium.org, Oct 25 2016

Owner: xixuan@chromium.org
Assign to deputy, seems to be some db inconsistency issue.

Comment 2 by xixuan@chromium.org, Jan 19 2017

Labels: Hotlist-Fixit

Comment 3 by xixuan@chromium.org, Jan 30 2017

Owner: ----
Project Member

Comment 4 by sheriffbot@chromium.org, Feb 21 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
This problem still happens.  It's killed several inventory runs this week.

Here's the most recent, from about 2 hours ago:
2018-04-18 14:33:35 | ERROR      | Unexpected exception: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1563, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
  File "site_utils/lab_inventory.py", line 1383, in main
    _perform_inventory_reports(arguments)
  File "site_utils/lab_inventory.py", line 1197, in _perform_inventory_reports
    _perform_model_inventory(arguments, inventory, timestamp)
  File "site_utils/lab_inventory.py", line 972, in _perform_model_inventory
    model_message = _generate_model_inventory_message(inventory)
  File "site_utils/lab_inventory.py", line 749, in _generate_model_inventory_message
    counts.get_spares_buffer(),
  File "site_utils/lab_inventory.py", line 405, in get_spares_buffer
    return self.get_total(spare_pool) - self.get_broken()
  File "site_utils/lab_inventory.py", line 359, in get_broken
    return self._count_pool(_HostSetInventory.get_broken, pool)
  File "site_utils/lab_inventory.py", line 304, in _count_pool
    self._histories_by_pool.values()])
  File "site_utils/lab_inventory.py", line 228, in get_broken
    return len(self.get_broken_list())
  File "site_utils/lab_inventory.py", line 222, in get_broken_list
    if h.last_diagnosis()[0] == status_history.BROKEN]
  File "/usr/local/autotest/server/lib/status_history.py", line 658, in last_diagnosis
    self._init_status_task()
  File "/usr/local/autotest/server/lib/status_history.py", line 587, in _init_status_task
    self._afe, self._host.id, self.end_time)
  File "/usr/local/autotest/server/lib/status_history.py", line 285, in get_status_task
    task = afe.get_host_status_task(host_id, query_end)
  File "/usr/local/autotest/server/frontend.py", line 660, in get_host_status_task
    host_id=host_id, end_time=end_time)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry
    return _run()
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 177, in _Wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 243, in _run
    return functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 108, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 143, in __call__
    raise BuildException(resp['error'])
JSONRPCException: DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 109, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 147, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 270, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 1563, in get_host_status_task
    host = models.Host.smart_get(host_id)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 835, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Host matching query does not exist. Lookup parameters were {'pk': 2566}

Cc: jrbarnette@chromium.org ayatane@chromium.org
Status: Unconfirmed (was: Untriaged)
Uh, what?  I have no idea how this error can happen.
Status: Untriaged (was: Unconfirmed)
Oh ye of little faith.

This is real.  You can see it in the logs, if our over-aggressive
cleanup hasn't deleted them yet.  The traceback above shows the path
to the failure.  I've no idea how we get into the state the triggers
this, but _we do get into that state_, and _this does really happen
in production_.

Owner: jrbarnette@chromium.org
Status: Assigned (was: Untriaged)
I'm not doubting that the logs show this, I'm having trouble figuring out how this error could happen.  It seems like a fundamental violation of how a RDBMS works.

Edit: Oh wait, host 2566 is marked invalid.  That seems obvious in hindsight.  I guess the inventory script is not ignoring invalid hosts at some point and then trying to get its status.
Owner: ----
Status: Available (was: Assigned)
This isn't my area of expertise, and I'm disinclined to take ownership
merely by virtue of reporting it.

Mergedinto: 872830
Status: Duplicate (was: Available)
This has been re-reported.  Let's use the new bug, because there's
no genuinely dispositive content here.

Sign in to add a comment