New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 863524 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 801679



Sign in to add a comment

Win10 MP bots not getting deleted from swarming

Project Member Reported by bpastene@chromium.org, Jul 13

Issue description

Cc: s...@google.com
Looks like a regression.
Cc: -s...@google.com s...@google.com
These are connection failures. To prevent service interruptions, if a bot doesn't connect within 10 minutes, Swarming gives up on it and requests a new one. This is accomplished by deleting the BotInfo entity and scheduling a termination task in case the VM connects in the period after the bot's been disavowed but before MP has actually deleted it (so that when this does happen, the bot doesn't pick up some real workload).

Here's an example:
https://chromium-swarm.appspot.com/bot?id=win10-0af2fd95-us-west1-b-0fp7

This bot was leased at 11:07 and 10 minutes later it still hadn't connected yet, so Swarming deleted the BotInfo and scheduled a termination task in case it did connect. It connected at 11:20, recreating its BotInfo, picked up the termination task, and carried it out, shutting down.

I think the issue is where it recreates the BotInfo when it connects. Ideally it would no longer even be allowed to connect to the Swarming server, then we can get rid of the safety termination task and we won't get dead bots.
Blockedon: 801679
Status: Available (was: Untriaged)
Handshake (issue 801679) is likely the culprit.
Cc: smut@chromium.org
Cc: -s...@google.com

Sign in to add a comment