New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 603247 link

Starred by 3 users

Issue metadata

Status: Duplicate
Merged: issue 609535
Owner:
Last visit > 30 days ago
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 609535



Sign in to add a comment

provision_Firmware: Label matching query does not exist

Project Member Reported by waihong@chromium.org, Apr 13 2016

Issue description

A lot of the firmware provision failed in the reason when removing a label from the host:

DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225}

Check:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row4-rack10-host14/804120-provision/20161304130204/
 
Pasted the traceback:

04/13 13:03:57.708 ERROR|provision_Firmware:0061| JSONRPCException: DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225}
Traceback (most recent call last):
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest
    results['result'] = self.invokeServiceEndpoint(meth, args)
  File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint
    return meth(*args)
  File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 125, in new_fn
    return f(*args, **keyword_args)
  File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 1322, in replacement
    return func(**kwargs)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 214, in label_remove_hosts
    remove_label_from_hosts(id, hosts)
  File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 201, in remove_label_from_hosts
    models.Label.smart_get(id).host_set.remove(*host_objs)
  File "/usr/local/autotest/frontend/afe/model_logic.py", line 846, in smart_get
    return manager.get(pk=id_or_name)
  File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, kwargs))
DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225}

Comment 2 by dshi@chromium.org, Apr 13 2016

It seems to be some kind of shard out of sync issue. The shard (server14) got a label with different ID. I manually cleaned up that duplicated label. Next run should be fine.

Comment 3 by dshi@chromium.org, Apr 13 2016

Cc: shuqianz@chromium.org
Note that some other DUTs have the same issue. Please clean them up.
Seems like we need to clean the shard database. The script you can use is under site_utils/setup_db.sh

Comment 6 by dshi@chromium.org, Apr 13 2016

I cleaned up another label, which is related to 3 other hosts.

Comment 7 by dshi@chromium.org, Apr 14 2016

Status: Fixed (was: Assigned)

Comment 8 by dshi@chromium.org, Apr 16 2016

Status: Assigned (was: Fixed)
It's still happening:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row12-rack3-host11/557117-provision/20161604163029/

I manually deleted the entry in the shard db so it can move on.
on shard db:
mysql> select * from afe_labels where id>=328352 and id <=328355;
+--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+
| id     | name                                         | kernel_config | platform | invalid | only_if_needed | atomic_group_id |
+--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+
| 328352 | fwrw-version:gnawty-firmware/R34-5216.239.16 |               |        0 |       0 |              0 |            NULL |
+--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+
1 row in set (0.00 sec)

on master db:
mysql> select * from afe_labels where id>=328350 and id <=328355;
+--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+
| id     | name                                                                  | kernel_config | platform | invalid | only_if_needed | atomic_group_id |
+--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+
| 328350 | cros-version:chell-release/R50-7910.0.0                               |               |        0 |       0 |              0 |            NULL |
| 328353 | cros-version:peach_pit-tot-chrome-pfq-informational/R50-7910.0.0-b601 |               |        0 |       0 |              0 |            NULL |
| 328354 | cros-version:peach_pit-paladin/R50-7910.0.0-rc3                       |               |        0 |       0 |              0 |            NULL |
+--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+
3 rows in set (0.00 sec)

It almost seems like the write to master db failed, but to shard db succeeded. 

Comment 9 by dshi@chromium.org, Apr 17 2016

Cc: fdeng@chromium.org
+shuqianz, fdeng

possible sharding bug. we need to make sure master db has label created first.
Cc: kevcheng@chromium.org
+ kevcheng@chromium.org, 

is this resolved? I know we rolled back auto label creation due to this problem
This is actually a separate problem but I do think the auto-labeling will exacerbate this issue.  I do think there is an issue with the sharded label rpcs.  I noticed from the sentinel stdout:

2016-05-03 15:48:52,657.657 INFO | Syncing chromeos-server12.cbf.corp.google.com ..
2016-05-03 15:52:21,295.295 ERROR| Inconsistent HostLabels: {'host_id': 4784L, 'label_ids': set([9952L, 102401L, 362011L, 102628L, 81094L, 90662L, 42855L, 152520L, 357321L, 2521
06L, 104683L, 42856L, 210L, 361985L, 362199L, 230048L, 25948L, 10974L])} != {'host_id': 4784L, 'label_ids': set([9952L, 102401L, 362011L, 102628L, 81094L, 90662L, 42855L, 152520
L, 357321L, 252106L, 104683L, 42856L, 210L, 361985L, 362199L, 215L, 230048L, 25948L, 10974L])}

In this instance, the servo label (215L) was on the shard but not the master.

Comment 12 by dshi@chromium.org, May 5 2016

The original bug is caused by Kevin's CL, which is fixed (refer to bug 603420)

There is still a potential issue on label mismatching, I opened a new bug for that:  bug 609535 

Comment 13 by dshi@chromium.org, May 5 2016

Status: Fixed (was: Assigned)
Status: Assigned (was: Fixed)
It happened again at some hosts, like chromeos1-row1-rack1-host6, chromeos1-row1-rack2-host3:
  https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack1-host6/825579-repair/20161605141822/debug/
  https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack2-host3/362579-repair/20161403153024/debug/

Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 395, in _repair_host
    self.repair(host)
  File "/usr/local/autotest/server/hosts/cros_repair.py", line 280, in repair
    host.firmware_install()
  File "/usr/local/autotest/server/hosts/cros_host.py", line 891, in firmware_install
    self._clear_fw_version_labels(rw_only)
  File "/usr/local/autotest/server/hosts/cros_host.py", line 817, in _clear_fw_version_labels
    label.remove_hosts(hosts=[self.hostname])
  File "/usr/local/autotest/server/frontend.py", line 847, in remove_hosts
    return self.afe.run('label_remove_hosts', id=self.id, hosts=hosts)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 71, in GenericRetry
    return functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 81, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 103, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 134, in __call__
    raise BuildException(resp['error'])
JSONRPCException: JSONRPCException: DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 348415}
Blockedon: 609535
Looking at the code, the problem might be an underlying django/mysql bug.  When a label is created, it is added to the db and then retrieved to get the id of the label.  Given Dan's debug in #8, if the label ids match up (that's assumed since the entry is missing in the master db) then that implies the write to the master db failed in some way.  Next steps are to grunge through the django/mysql logs to see if there's anything interesting and go from there.

Debug will be reported on the blocking bug to keep all work on this centralized.
Kevin, should you own this bug? Is there work to be done separately from  issue 609535  ? 
I'm thinking we might want to merge these two bugs and have Xixuan as the owner.  

Dan, what do you think?

Comment 18 by dshi@chromium.org, Jun 2 2016

Mergedinto: 609535
Status: Duplicate (was: Assigned)
merged

Sign in to add a comment