New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 873346 link

Starred by 6 users

Issue metadata

Status: Duplicate
Owner:
Closed: Oct 22
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 873754


Participants' hotlists:
chrome-client-infra-monitoring


Sign in to add a comment

We should have alerting for bot and builders being offline too long

Project Member Reported by dpranke@chromium.org, Aug 10

Issue description

See bug 872704 for one possible motivating example. 

It looks like we have two related open bugs:

bug 694611, which talks about BuilderOffline, but seems to have morphed into a ClusterFuzz-specific thing.

bug 647805, which talks about "a large number of machines" going offline.

It's possible that if those two bugs were fixed, and if we had good coverage for when a builder had a lot of pending builds, we'd have sufficient coverage to not need anything further.

But, I don't think we have any of those things, and that seems bad.

Filing against Infra>Platform for initial triage. You could argue that maybe there's some Infra>Client work here as well, but this seems like a core part of the platform quality of service.

Thoughts?
 
Cc: mmoss@chromium.org
Blockedon: 873754
IMHO it'll be handled as part of go/cci-monitoring-doc effort; see issue 873754.
Components: Infra>Client>Chrome
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
Assigning to sergey for now, feel free to re-assign or mark as Available.
Mergedinto: 873754
Status: Duplicate (was: Assigned)
Merging into issue 873754. We now have alerts for pending builds and for expired tasks, so the conditions described in #0 should now be caught by troopers.
Issue 694611 has been merged into this issue.

Sign in to add a comment