New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 673493 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Add a metric for when an important build-slave doesn't start reasonably soon

Project Member Reported by pprabhu@chromium.org, Dec 12 2016

Issue description

Sometimes a paladin slave doesn't start on time because the dedicated buildslave is down and there aren't enough floating buildslaves.

We'd like to alert on this so the deputy can go reclaim buildslaves.

Since the paladin-master now waits for all important build slaves to start off (and we retry if they fail soon enough, iiuc), it should be possible to send  monarch metric with (#of important build slaves that buildbucket failed to schedule entirely).
We can then alert on this metric.

nxia@: Is it reasonable to add such a metric?
 

Comment 1 by nxia@chromium.org, Dec 13 2016

this is doable. We do have logics to check slave status periodically. We can add metrics for those builds in 'SCHEDULED' status for long time. 
Status: Archived (was: Assigned)
Bulk closing Infra>Client>ChromeOS issues untouched in over a year.

Sign in to add a comment