New issue
Advanced search Search tips

Issue 863175 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Swarming: increase bot processes priority

Project Member Reported by mar...@chromium.org, Jul 12

Issue description

Bot processes normally shouldn't take any significant CPU/disk and as little network as possible while the task is running.

That said, we have some workloads (ChromeOS build) that are so overwhelming that they can hang the bot processes (mainly task_runner) for several minutes, causing significant issues.

AI:
- Have the bot processes* run at a slightly elevated priority compared to the task priority, so  get a chance to send task updates.

In practice, only task_runner sends updates, so that's the primary point. Still, bot_main and run_isolated do cleanup in abnormal failure so they need to be able to get enough run time to be able to handle failure mode.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-py.git/+/658b5dc2c252e6efeed13ed9068ad14ddb7b2673

commit 658b5dc2c252e6efeed13ed9068ad14ddb7b2673
Author: Marc-Antoine Ruel <maruel@chromium.org>
Date: Thu Jul 12 20:50:53 2018

swarming: increase BOT_PING_TOLERANCE to 6 minutes

The net downside is the if a bot really becomes MIA, it'll take 6.5 minutes on
average to correctly mark the bot as died.
The net upside is that ChromeOS builds start working again.

R=qyearsley@chromium.org

Bug: 860508,863175
Change-Id: I9dd88c348443da78237479359e24e5e76cdc7aa1
Reviewed-on: https://chromium-review.googlesource.com/1135555
Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/658b5dc2c252e6efeed13ed9068ad14ddb7b2673/appengine/swarming/server/task_result.py

Related: https://bugs.chromium.org/p/chromium/issues/detail?id=857574 (bumping the priority of the cron that refreshes auth tokens).

Sign in to add a comment