New issue
Advanced search Search tips

Issue 887053 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Task



Sign in to add a comment

Multiple swarming bots are in a dead state

Project Member Reported by mikenichols@chromium.org, Sep 19

Issue description

dgarrett@ pointed out that there are a number of swarming bots that show up in a dead state.  The oldest is 9 weeks old, signaling that we need to ensure we're adding this to oncall responsibilities.  

We need to clean up these bots such that they are usable in their respective pools.  

 
Status: Started (was: Assigned)
Rebooting should recover them, but reimaging them is also reasonable.

ccompute ri <hosts>
All bots have been restarted.  Some continue to cycle through trying to get a valid auth via a handshake (swarm-cros-160 has been cycling for 30 minutes trying to get authenticated).  

Some thing seems a bit off on why bots fail and have such a hard time rejoining.  In some cases the instances are available but yet are considered dead to swarming; an example of this is swarm-cros-398.  It was cleared, rejoined, and seems to be considered dead, yet I see nothing obvious on the bot and it remains available via SSH.  

-- Mike
Status: Fixed (was: Started)
Thanks Don.  They've all cleared now (398 went back into the pool) other than two waiting for authentication.  It is taking a long time but going to assume it will eventually clear. 

-- Mike

Sign in to add a comment