New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 756972 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Aug 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

ShardRemoveFromProductionMasterTask did not mark servers as repair_required in server_db

Project Member Reported by pprabhu@chromium.org, Aug 18 2017

Issue description

A bunch of shards were retired as part of issue 753890
But these servers were left behind in server_db as primary shards, causing push-to-prod to fail.
(This would also cause our metrics dashboards to show incorrect graphs, since they will be considered to be in prod. I'd even expect shard down alerts to start firing)

Details from https://bugs.chromium.org/p/chromium/issues/detail?id=753890#c20

pprabhu@pprabhu:files$ atest server list chromeos-server42.cbf.corp.google.com
Hostname     : chromeos-server42.cbf.corp.google.com
Status       : primary
Roles        : shard
Attributes   : {}
Date Created : 2016-04-18 11:04:57
Date Modified: 2016-04-18 11:04:57
Note         : None

pprabhu@pprabhu:files$ atest server list chromeos-server43.cbf.corp.google.com                                       
Hostname     : chromeos-server43.cbf.corp.google.com
Status       : primary
Roles        : shard
Attributes   : {}
Date Created : 2016-04-18 13:13:20
Date Modified: 2016-04-18 13:13:20
Note         : None

pprabhu@pprabhu:files$ atest server list chromeos-server44.cbf.corp.google.com                                       
Hostname     : chromeos-server44.cbf.corp.google.com
Status       : primary
Roles        : shard
Attributes   : {}
Date Created : 2016-04-20 15:30:31
Date Modified: 2016-04-20 15:30:31
Note         : None

pprabhu@pprabhu:files$ atest server list chromeos-server45.cbf.corp.google.com                                       
Hostname     : chromeos-server45.cbf.corp.google.com
Status       : primary
Roles        : shard
Attributes   : {}
Date Created : 2016-04-26 11:16:28
Date Modified: 2016-04-26 11:16:28
Note         : None

These should not have been in server_db
 
Owner: shuqianz@chromium.org
Status: Assigned (was: Untriaged)
Charlene, can you confirm this is WAI, or was this because you short-circuited the shard update process?
Status: WontFix (was: Assigned)
ShardRemove... task will mark the server as repair_required. These servers must be manually removed, so the server_db is not updated.
Status: Assigned (was: WontFix)
Summary: ShardRemoveFromProductionMasterTask did not mark servers as repair_required in server_db (was: ShardRemoveFromProductionMasterTask should remove shards from server_db)
In this case, they were still marked primary.

I had to manually remove them because they broke push-to-prod.

Status: Unconfirmed (was: Assigned)
Status: WontFix (was: Unconfirmed)
These servers are not removed by the task. For example, chromeos-server72.cbf was removed by the task, so it is marked as 'repair_required'. There is no action for the task itself here. 

Sign in to add a comment