The TooManyIdleDuts alert is way too spammy. The alert specifies a max of 36:
def _AddIdleDutsAlert(scope):
"""Adds an alert for when there are too many idle DUTs."""
(scope.Fetch(_UNTESTABLE_DUTS_METRIC, {'metric:state': 'idle_unlocked'})
| m.Window(m.Align('8h'))
| m.GroupBy([], m.Count())
| m.JoinWithLiteralTable(
target_schema_name='monarch.acquisitions.Task',
fields=(),
streams=[()],
input_default=0)).AddAlert(
alert_name='TooManyIdleDuts',
alert_condition=m.Gt(36),
alert_trigger_duration='1h',
alert_remind_interval='1d',
alert_description=('Too many DUTs are idle, but not locked. '
'The scheduler may be failing to return DUTs to '
'Ready state.'))
However, our steady state looks to be right near that line. That being said, something appears to have gone wrong around 6/6 and that will be handled elsewhere.
|
Deleted:
idleduts.png
44.1 KB
|
Comment 1 by cra...@chromium.org
, Jun 11 2018