Swarming: when a leased VM fails to come up online fast enough, the associated terminate task lack capacity |
||
Issue descriptionIn check_for_connection() in the failed case, either: - remove the create_terminate_task(). - make this specific create_terminate_task() not look for task queues. I think the second is the best implementation, albeit with a bit more work but it shouldn't be too much. This is a significant blocker for issue 839173 . Ref: https://chromium.googlesource.com/infra/luci/luci-py.git/+/master/appengine/swarming/server/lease_management.py
,
May 17 2018
The point of the termination task is that if the bot connects within the few moments after Swarming gave up on it but before MP could delete it, we need to ensure it does not accept anything other than a termination task. The termination task should be created when the bot isn't there to be run if the bot appears and to expire if it doesn't. If I understand correctly, you're still scheduling the task, but you're not bothering to check for capacity (since it's expected that there will be zero capacity). Is that right?
,
May 17 2018
Exact. It's deployed to prod now and this worked; it removed the errors. |
||
►
Sign in to add a comment |
||
Comment 1 by bugdroid1@chromium.org
, May 17 2018