Issue metadata
Sign in to add a comment
|
provision_Firmware: Label matching query does not exist |
||||||||||||||||||||||||
Issue description
A lot of the firmware provision failed in the reason when removing a label from the host:
DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225}
Check:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row4-rack10-host14/804120-provision/20161304130204/
,
Apr 13 2016
It seems to be some kind of shard out of sync issue. The shard (server14) got a label with different ID. I manually cleaned up that duplicated label. Next run should be fine.
,
Apr 13 2016
,
Apr 13 2016
Note that some other DUTs have the same issue. Please clean them up.
,
Apr 13 2016
Seems like we need to clean the shard database. The script you can use is under site_utils/setup_db.sh
,
Apr 13 2016
I cleaned up another label, which is related to 3 other hosts.
,
Apr 14 2016
,
Apr 16 2016
It's still happening: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row12-rack3-host11/557117-provision/20161604163029/ I manually deleted the entry in the shard db so it can move on. on shard db: mysql> select * from afe_labels where id>=328352 and id <=328355; +--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+ | id | name | kernel_config | platform | invalid | only_if_needed | atomic_group_id | +--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+ | 328352 | fwrw-version:gnawty-firmware/R34-5216.239.16 | | 0 | 0 | 0 | NULL | +--------+----------------------------------------------+---------------+----------+---------+----------------+-----------------+ 1 row in set (0.00 sec) on master db: mysql> select * from afe_labels where id>=328350 and id <=328355; +--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+ | id | name | kernel_config | platform | invalid | only_if_needed | atomic_group_id | +--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+ | 328350 | cros-version:chell-release/R50-7910.0.0 | | 0 | 0 | 0 | NULL | | 328353 | cros-version:peach_pit-tot-chrome-pfq-informational/R50-7910.0.0-b601 | | 0 | 0 | 0 | NULL | | 328354 | cros-version:peach_pit-paladin/R50-7910.0.0-rc3 | | 0 | 0 | 0 | NULL | +--------+-----------------------------------------------------------------------+---------------+----------+---------+----------------+-----------------+ 3 rows in set (0.00 sec) It almost seems like the write to master db failed, but to shard db succeeded.
,
Apr 17 2016
+shuqianz, fdeng possible sharding bug. we need to make sure master db has label created first.
,
May 4 2016
+ kevcheng@chromium.org, is this resolved? I know we rolled back auto label creation due to this problem
,
May 4 2016
This is actually a separate problem but I do think the auto-labeling will exacerbate this issue. I do think there is an issue with the sharded label rpcs. I noticed from the sentinel stdout: 2016-05-03 15:48:52,657.657 INFO | Syncing chromeos-server12.cbf.corp.google.com .. 2016-05-03 15:52:21,295.295 ERROR| Inconsistent HostLabels: {'host_id': 4784L, 'label_ids': set([9952L, 102401L, 362011L, 102628L, 81094L, 90662L, 42855L, 152520L, 357321L, 2521 06L, 104683L, 42856L, 210L, 361985L, 362199L, 230048L, 25948L, 10974L])} != {'host_id': 4784L, 'label_ids': set([9952L, 102401L, 362011L, 102628L, 81094L, 90662L, 42855L, 152520 L, 357321L, 252106L, 104683L, 42856L, 210L, 361985L, 362199L, 215L, 230048L, 25948L, 10974L])} In this instance, the servo label (215L) was on the shard but not the master.
,
May 5 2016
The original bug is caused by Kevin's CL, which is fixed (refer to bug 603420) There is still a potential issue on label mismatching, I opened a new bug for that: bug 609535
,
May 5 2016
,
May 16 2016
It happened again at some hosts, like chromeos1-row1-rack1-host6, chromeos1-row1-rack2-host3: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack1-host6/825579-repair/20161605141822/debug/ https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos1-row1-rack2-host3/362579-repair/20161403153024/debug/ Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 395, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 280, in repair host.firmware_install() File "/usr/local/autotest/server/hosts/cros_host.py", line 891, in firmware_install self._clear_fw_version_labels(rw_only) File "/usr/local/autotest/server/hosts/cros_host.py", line 817, in _clear_fw_version_labels label.remove_hosts(hosts=[self.hostname]) File "/usr/local/autotest/server/frontend.py", line 847, in remove_hosts return self.afe.run('label_remove_hosts', id=self.id, hosts=hosts) File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 111, in run self, call, **dargs) File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 71, in GenericRetry return functor(*args, **kwargs) File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 81, in _run return super(RetryingAFE, self).run(call, **dargs) File "/usr/local/autotest/server/frontend.py", line 103, in run result = utils.strip_unicode(rpc_call(**dargs)) File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 134, in __call__ raise BuildException(resp['error']) JSONRPCException: JSONRPCException: DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 348415}
,
May 16 2016
Looking at the code, the problem might be an underlying django/mysql bug. When a label is created, it is added to the db and then retrieved to get the id of the label. Given Dan's debug in #8, if the label ids match up (that's assumed since the entry is missing in the master db) then that implies the write to the master db failed in some way. Next steps are to grunge through the django/mysql logs to see if there's anything interesting and go from there. Debug will be reported on the blocking bug to keep all work on this centralized.
,
Jun 1 2016
Kevin, should you own this bug? Is there work to be done separately from issue 609535 ?
,
Jun 2 2016
I'm thinking we might want to merge these two bugs and have Xixuan as the owner. Dan, what do you think?
,
Jun 2 2016
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by waihong@chromium.org
, Apr 13 2016Pasted the traceback: 04/13 13:03:57.708 ERROR|provision_Firmware:0061| JSONRPCException: DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225} Traceback (most recent call last): File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 114, in dispatchRequest results['result'] = self.invokeServiceEndpoint(meth, args) File "/usr/local/autotest/frontend/afe/json_rpc/serviceHandler.py", line 154, in invokeServiceEndpoint return meth(*args) File "/usr/local/autotest/frontend/afe/rpc_handler.py", line 125, in new_fn return f(*args, **keyword_args) File "/usr/local/autotest/frontend/afe/rpc_utils.py", line 1322, in replacement return func(**kwargs) File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 214, in label_remove_hosts remove_label_from_hosts(id, hosts) File "/usr/local/autotest/frontend/afe/rpc_interface.py", line 201, in remove_label_from_hosts models.Label.smart_get(id).host_set.remove(*host_objs) File "/usr/local/autotest/frontend/afe/model_logic.py", line 846, in smart_get return manager.get(pk=id_or_name) File "/usr/local/autotest/site-packages/django/db/models/manager.py", line 143, in get return self.get_query_set().get(*args, **kwargs) File "/usr/local/autotest/site-packages/django/db/models/query.py", line 389, in get (self.model._meta.object_name, kwargs)) DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 350225}