New issue
Advanced search Search tips

Issue 863150 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Bot should abort when updating a BOT_DIED task

Project Member Reported by mar...@chromium.org, Jul 12

Issue description

Events:
- Bot reaps a task
- Somehow the bot is unable to get runtime for several minutes (e.g. the system is overwhelmed)
- cron job runs, marks the task a BOT_DIED due to being MIA.
- bot resumes, sends updates

The server shall reply with must_stop: True. The code is nearly all there already, it's a simple change to check for BOT_DIED in addition to KILLED;
https://chromium.googlesource.com/infra/luci/luci-py/+/65de3aef5845afd951e64077df8c50d11cc9a2b1/appengine/swarming/handlers_bot.py#1003
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/35fe3c70241e1c985c3169e3548a54989a221847

commit 35fe3c70241e1c985c3169e3548a54989a221847
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Jul 12 20:44:23 2018

swarming: properly handle task with sleeping bot

Request the bot to terminate the task early if it had been detected as MIA.

Otherwise you get a BOT_DIED task that runs for a long time after it was
marked as BOT_DIED, which is super confusing.

R=qyearsley@chromium.org

Bug:  863150 
Change-Id: I4926e3b0403356603662ed50d92fef49f6e248f0
Reviewed-on: https://chromium-review.googlesource.com/1135664
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/35fe3c70241e1c985c3169e3548a54989a221847/appengine/swarming/handlers_bot.py

Status: Fixed (was: Assigned)

Sign in to add a comment