New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 609535 link

Starred by 5 users

Issue metadata

Status: WontFix
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 603247



Sign in to add a comment

Label created on shard but not master

Project Member Reported by dshi@chromium.org, May 5 2016

Issue description

It seems that the label creation logic has a chance to flake. Refer to comment 8 and 11 in  bug 603247 . A label can exist in shard but not the master.

Autotest creates label through RPC, the code path should hit method _create_label_everywhere when a label needs to be created. It first creates the label on master, then fan out the RPC to shard, and create the label on shard. 

For some unknown reason, a label is created on the shard but not on the master. It almost looks like django finished call to create a label on master db and get the right label ID, which is used to create label on shard, but the db write action to master failed silently.
https://bugs.chromium.org/p/chromium/issues/detail?id=603247#c8
There is a gap on the label id in master database.

The other possible cause is someone manually delete that label using SQL command.
 

Comment 1 by dshi@chromium.org, May 5 2016

Components: OS>Hardware>Firmware
Cc: xixuan@chromium.org
xixuan plan to work on resolving the inconsistency between masterDB and shardDB
Blocking: 603247
Cc: aaboagye@chromium.org
I'm also seeing this on one of my boards. 

DoesNotExist: Label matching query does not exist. Lookup parameters were {'pk': 348415}

What's the proper workaround for this? Clean the shard db? To be honest, I don't even know what that pk label represents.

Comment 5 by xixuan@chromium.org, May 31 2016

which shard is affected?
xixuan@, I think: chromeos-server33.cbf.corp.google.com. The host is chromeos1-row1-rack1-host4.

Comment 7 by xixuan@chromium.org, May 31 2016

Hi, the pk represent the label id. I manually check the shard and master database, seems shard has label: ('348415', 'fwrw-version:samus-firmware/R39-6300.102.0'), but master hasn't. 

I manually delete the shard label in the database. Hope it work.
Thanks! It seems to have gotten further now.

Comment 9 by dshi@chromium.org, Jun 2 2016

Cc: fdeng@chromium.org dshi@chromium.org dchan@chromium.org waihong@chromium.org shchen@chromium.org
 Issue 603247  has been merged into this issue.

Comment 10 by dshi@chromium.org, Jun 2 2016

Owner: xixuan@chromium.org
Labels: -Pri-3 Pri-1
Status: Assigned (was: Available)
This issue always happens on multiple hosts and blocks FAFT running.

Check the provision_FirmwareUpdate job of the following hosts:
http://cautotest/afe/#tab_id=view_host&object_id=3627
http://cautotest/afe/#tab_id=view_host&object_id=3655
http://cautotest/afe/#tab_id=view_host&object_id=4269
http://cautotest/afe/#tab_id=view_host&object_id=4272
http://cautotest/afe/#tab_id=view_host&object_id=4271
http://cautotest/afe/#tab_id=view_host&object_id=3783
http://cautotest/afe/#tab_id=view_host&object_id=2925

I haven't checked all hosts in the FAFT pool. There might be some other hosts having the same issue.
Cc: autumn@chromium.org
Any update? Or do a quick fix for the hosts mentioned in c#11.

autumn@, please prioritize this bug.
dshi@, could you do a quick fix by manually cleaning up the unsynced entries in the databases? like what you did in the  issue 603247  #c2:
  https://bugs.chromium.org/p/chromium/issues/detail?id=603247#c2

More failed hosts:
  http://cautotest/afe/#tab_id=view_host&object_id=2300
  http://cautotest/afe/#tab_id=view_host&object_id=2302

Comment 14 by dshi@chromium.org, Jun 8 2016

Re #13, that bug is about duplicated labels. This one is rather about missing labels on master. Xixuan did the last clean up (by removing the label on shard).
I don't know much about the cause, as the  issue 603247  was merged into this bug.

What's the ETA for the above hosts being workable?
Update:

For shard chromeos-server14.mtv.corp.google.com:

delete label id 362623, affected host:
    chromeos4-row4-rack10-host14
    chromeos4-row4-rack11-host14
    chromeos4-row4-rack10-host14
    chromeos4-row4-rack12-host17

delete label id 362846, affected host:
    chromeos4-row6-rack11-host11
    chromeos4-row6-rack11-host12
    chromeos4-row6-rack11-host13
    chromeos4-row4-rack13-host11
    chromeos4-row4-rack13-host13


    
Update:

For shard chromeos-server27.mtv.corp.google.com:

delete label id 358557, affected host:
    chromeos4-row6-rack2-host11
    chromeos4-row6-rack2-host13
    chromeos4-row6-rack1-host13
    chromeos4-row6-rack1-host11


For host http://cautotest/afe/#tab_id=view_host&object_id=2925, I don't see the same error happened (cannot find key when deleting labels for a host). It may be not related to this bug.

The manual process is:
1. check whether master db has the label. Probably no.
2. check whether the shard db has the label. Probably yes.
3. manually delete the label from shard db, and the related foreign keys.

If the cases happens many times before the bug is completed fixed, maybe there's some temporary solutions, like: 

when deleting the label, if master db doesn't have it, not raise an error, just directly delete it from shard db.
Status: WontFix (was: Assigned)

Sign in to add a comment