ImportantSlaveNotRunning alert misfired
Reported by
jrbarnette@chromium.org,
May 30 2018
|
||
Issue description
Overnight, the ImportantSlaveNotRunning alert went off. The alert
happened because the jadeite-release builder (a canary slave) had
missed two scheduled runs since the regular 11:00 AM run on 5/29.
The alert was correct that the builder hadn't run, but the message
wasn't relevant: The slave had been turned off in goldeneye and wasn't
meant to run. We should figure out how to make this alert not go off
in such a circumstance.
Here's the relevant text of the most recent message:
Alert Details
------------------
Description:
An important slave has not run in 1 day.
name: ImportantSlaveNotRunning
current value: 0
threshold: Lt(1) for 3h
alert fields: {, metric:slave_config=jadeite-release,metric:master_config=master-release}
sent at: 2018-05-30 05:49:28
active since: 2018-05-30 02:48:58 (3 hours 0 mins)
,
May 30 2018
Sounds to me like it isn't really an "important" slave. What defines "important"?
,
Sep 11
|
||
►
Sign in to add a comment |
||
Comment 1 by jrbarnette@chromium.org
, May 30 2018