New issue
Advanced search Search tips

Issue 812886 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocking:
issue 781021
issue 812860



Sign in to add a comment

Swarming: When polling an expired TaskToRun, cancel it immediately

Project Member Reported by mar...@chromium.org, Feb 15 2018

Issue description

When TaskToRun global index is stale, it's possible that _validate_task_async() has to loop over a lot of expired items.

It should set the negative cache entry while at it, so that other polling loops are less affected, which should help scanning speed.

i.e. call set_lookup_cache(task_key, False) while logging about expired task.
 

Comment 1 by mar...@chromium.org, Feb 15 2018

I realized it wouldn't help:
- Since TaskToRun.expiration_ts is already in the TaskToRun, and it is fetched, there's no advantage of using the negative cache.
- Using a expiration_ts filter on the ndb.Query would make it a composite index, would would have even worse stalelessness.

What can be done is simply removing the logging entry, as it has some overhead and no value; the number of expired tasks is still logged at the end.

Comment 2 by mar...@chromium.org, Feb 16 2018

Summary: Swarming: When polling an expired TaskToRun, cancel it immediately (was: Swarming: better handle state TaskToRun global index with expired items)
Changing the title for a better idea. Basically, as the bots are polling, they will expire tasks concurrently, accelerating the rate of stale task cleanup.

Comment 3 by mar...@chromium.org, Feb 21 2018

Blocking: 781021

Comment 4 by mar...@chromium.org, Feb 21 2018

Labels: -Type-Bug Type-Feature
Owner: mar...@chromium.org
Status: Assigned (was: Available)
Project Member

Comment 5 by bugdroid1@chromium.org, Mar 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/a98580c6c313b4e5677c87f5bbb1feaaa6f67c98

commit a98580c6c313b4e5677c87f5bbb1feaaa6f67c98
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Tue Mar 20 14:00:44 2018

Swarming: expire tasks as they are polled

This means polling is slower, because bots poll handlers are "wasting" time
expiring tasks inline, but in practice it's "faster" for three reasons:
- Expired task is added to the negative cache, so other bots skip it.
- Tasks are expired faster, which is necessary for TaskProperties fallback.
- Reduce the load on the cron job, which can overflow since it's not expiring
  that many task per second, which is a problem when a lot of tasks expires at
  once or a build up occurs due to "force_bots_to_sleep_and_not_run_task" being
  used.

Includes a unit test to confirm it actually works.

R=vadimsh@chromium.org

Bug:  812886 
Change-Id: I2e15a39af0273982cf360a514b137ccb0ed9111c
Reviewed-on: https://chromium-review.googlesource.com/927644
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/a98580c6c313b4e5677c87f5bbb1feaaa6f67c98/appengine/swarming/server/task_scheduler.py
[modify] https://crrev.com/a98580c6c313b4e5677c87f5bbb1feaaa6f67c98/appengine/swarming/server/task_scheduler_test.py
[modify] https://crrev.com/a98580c6c313b4e5677c87f5bbb1feaaa6f67c98/appengine/swarming/server/task_to_run.py
[modify] https://crrev.com/a98580c6c313b4e5677c87f5bbb1feaaa6f67c98/appengine/swarming/server/task_to_run_test.py

Comment 6 by mar...@chromium.org, Mar 22 2018

Status: Fixed (was: Assigned)

Comment 7 by efoo@chromium.org, Jun 2 2018

Labels: cit-pm-73

Sign in to add a comment