Label created on shard but not master |
|||||||||
Issue descriptionIt seems that the label creation logic has a chance to flake. Refer to comment 8 and 11 in bug 603247 . A label can exist in shard but not the master. Autotest creates label through RPC, the code path should hit method _create_label_everywhere when a label needs to be created. It first creates the label on master, then fan out the RPC to shard, and create the label on shard. For some unknown reason, a label is created on the shard but not on the master. It almost looks like django finished call to create a label on master db and get the right label ID, which is used to create label on shard, but the db write action to master failed silently. https://bugs.chromium.org/p/chromium/issues/detail?id=603247#c8 There is a gap on the label id in master database. The other possible cause is someone manually delete that label using SQL command.
,
May 5 2016
xixuan plan to work on resolving the inconsistency between masterDB and shardDB
,
May 16 2016
,
May 31 2016
I'm also seeing this on one of my boards.
DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 348415}
What's the proper workaround for this? Clean the shard db? To be honest, I don't even know what that pk label represents.
,
May 31 2016
which shard is affected?
,
May 31 2016
xixuan@, I think: chromeos-server33.cbf.corp.google.com. The host is chromeos1-row1-rack1-host4.
,
May 31 2016
Hi, the pk represent the label id. I manually check the shard and master database, seems shard has label: ('348415', 'fwrw-version:samus-firmware/R39-6300.102.0'), but master hasn't.
I manually delete the shard label in the database. Hope it work.
,
May 31 2016
Thanks! It seems to have gotten further now.
,
Jun 2 2016
Issue 603247 has been merged into this issue.
,
Jun 2 2016
,
Jun 3 2016
This issue always happens on multiple hosts and blocks FAFT running. Check the provision_FirmwareUpdate job of the following hosts: http://cautotest/afe/#tab_id=view_host&object_id=3627 http://cautotest/afe/#tab_id=view_host&object_id=3655 http://cautotest/afe/#tab_id=view_host&object_id=4269 http://cautotest/afe/#tab_id=view_host&object_id=4272 http://cautotest/afe/#tab_id=view_host&object_id=4271 http://cautotest/afe/#tab_id=view_host&object_id=3783 http://cautotest/afe/#tab_id=view_host&object_id=2925 I haven't checked all hosts in the FAFT pool. There might be some other hosts having the same issue.
,
Jun 8 2016
Any update? Or do a quick fix for the hosts mentioned in c#11. autumn@, please prioritize this bug.
,
Jun 8 2016
dshi@, could you do a quick fix by manually cleaning up the unsynced entries in the databases? like what you did in the issue 603247 #c2: https://bugs.chromium.org/p/chromium/issues/detail?id=603247#c2 More failed hosts: http://cautotest/afe/#tab_id=view_host&object_id=2300 http://cautotest/afe/#tab_id=view_host&object_id=2302
,
Jun 8 2016
Re #13, that bug is about duplicated labels. This one is rather about missing labels on master. Xixuan did the last clean up (by removing the label on shard).
,
Jun 8 2016
I don't know much about the cause, as the issue 603247 was merged into this bug. What's the ETA for the above hosts being workable?
,
Jun 8 2016
Update: For shard chromeos-server14.mtv.corp.google.com: delete label id 362623, affected host: chromeos4-row4-rack10-host14 chromeos4-row4-rack11-host14 chromeos4-row4-rack10-host14 chromeos4-row4-rack12-host17 delete label id 362846, affected host: chromeos4-row6-rack11-host11 chromeos4-row6-rack11-host12 chromeos4-row6-rack11-host13 chromeos4-row4-rack13-host11 chromeos4-row4-rack13-host13
,
Jun 8 2016
Update: For shard chromeos-server27.mtv.corp.google.com: delete label id 358557, affected host: chromeos4-row6-rack2-host11 chromeos4-row6-rack2-host13 chromeos4-row6-rack1-host13 chromeos4-row6-rack1-host11
,
Jun 8 2016
For host http://cautotest/afe/#tab_id=view_host&object_id=2925, I don't see the same error happened (cannot find key when deleting labels for a host). It may be not related to this bug. The manual process is: 1. check whether master db has the label. Probably no. 2. check whether the shard db has the label. Probably yes. 3. manually delete the label from shard db, and the related foreign keys. If the cases happens many times before the bug is completed fixed, maybe there's some temporary solutions, like: when deleting the label, if master db doesn't have it, not raise an error, just directly delete it from shard db.
,
Mar 19 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by dshi@chromium.org
, May 5 2016