New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 706586 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 781021
Owner: ----
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 729565

Blocking:
issue 342266



Sign in to add a comment

Deny swarming task triggering if no bot can fulfill the task

Project Member Reported by mar...@chromium.org, Mar 29 2017

Issue description

schedule_request() should do a quick query and presence could be stored in memcache.
https://github.com/luci/luci-py/blob/master/appengine/swarming/server/task_scheduler.py#L476


Maybe cache from SwarmingBotsService.count could be reused
https://github.com/luci/luci-py/blob/master/appengine/swarming/handlers_endpoints.py#L792
 
Cc: nedngu...@google.com
Cc: martiniss@chromium.org
Components: Speed>Benchmarks>Waterfall
Context: this is needed for speed waterfall to distinguish between bots dying & tasks expired due to lack of capacity. 

The reason is because of device affinity requirement in perf waterfall: every task requires a specific bot. For a usual swarming client, if one bot dies, it can keep waiting for other bots, whereas with perf, that mean the task should never be triggered in the first place to begin with.

Comment 3 by jonesmi@google.com, Apr 7 2017

Owner: jonesmi@google.com
Labels: -OS-Linux
Status: Assigned (was: Available)

Comment 5 by jonesmi@google.com, Apr 18 2017

Couple of questions on desired behavior:

1) Do we want this feature to be controlled by a setting, or we want to do this for all swarming instances? This would affect swarming clients that might want to trigger tasks even though a bot can't fulfill it yet. For example, bots that dynamically populate a dimension (i.e. app_version of system-under-test), and tasks are required to run against N+1 app_version.

2) Do we still want to disallow the task request if there's a quarantined bot that could have otherwise fulfilled it?

2-B) What about if there are bots that can fulfill the task, but none of them are currently idle?
Answering questions in #5
1) We should probably at least start doing this with an option, so that we can selectively roll it out. Ideally this would be an option in the task request, like "fail_if_no_capacity: true" or something.

2) Yes.

2-B) If there are bots that could fulfill it, just let it sit. This will happen for chromium.perf at least; we trigger ~80 tasks for a single bot, and the bot goes through and executes all of them. The bot will not be idle most of the time, since it'll be executing tasks almost continuously.
Cc: dpranke@chromium.org
(Answering as one intended user, but you probably should get some answers from someone more familiar with the way chromium uses swarming for main tests)
That behavior seems reasonable. You would definitely want things to be configurable. I couldn't say whether this would make more sense as a dimension, or as a separate flag to the request.

Comment 9 by mar...@chromium.org, Apr 18 2017

A new flag in the NewTaskRequest. I'd say to "default to verify", make the check opt-out. I agree having the flag enables relevant use cases.
Issue 616267 has been merged into this issue.
Blockedon: 729565
Blocking: 342266
Cc: jonesmi@google.com
Owner: ----
Status: Available (was: Assigned)
Summary: Deny swarming task triggering if no bot can fulfill the task (was: Disallow swarming task triggering if no bot can full the task)
Mergedinto: 781021
Status: Duplicate (was: Available)
Will be implemented as part of  issue 781021 .

Sign in to add a comment