Multiple swarming bots are in a dead state |
|||
Issue descriptiondgarrett@ pointed out that there are a number of swarming bots that show up in a dead state. The oldest is 9 weeks old, signaling that we need to ensure we're adding this to oncall responsibilities. We need to clean up these bots such that they are usable in their respective pools.
,
Sep 19
,
Sep 19
Rebooting should recover them, but reimaging them is also reasonable. ccompute ri <hosts>
,
Sep 19
All bots have been restarted. Some continue to cycle through trying to get a valid auth via a handshake (swarm-cros-160 has been cycling for 30 minutes trying to get authenticated). Some thing seems a bit off on why bots fail and have such a hard time rejoining. In some cases the instances are available but yet are considered dead to swarming; an example of this is swarm-cros-398. It was cleared, rejoined, and seems to be considered dead, yet I see nothing obvious on the bot and it remains available via SSH. -- Mike
,
Sep 19
,
Sep 19
Thanks Don. They've all cleared now (398 went back into the pool) other than two waiting for authentication. It is taking a long time but going to assume it will eventually clear. -- Mike |
|||
►
Sign in to add a comment |
|||
Comment 1 by mikenichols@chromium.org
, Sep 19