New issue
Advanced search Search tips

Issue 877988 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

For PuppetVersionSkewTooHigh, need a table of affected servers

Reported by jrbarnette@chromium.org, Aug 27

Issue description

Recently, this alert started firing:

Alert Details
------------------
Description:
At least 1 prod server puppet configuration is too far out of date.

name: PuppetVersionSkewTooHigh
current value: 62.216666666674428
threshold: Gt(24) for 3h
alert fields: {, }


As it turns out, there _is_ a playbook (hooray!)  Unfortunately
it starts out like this:
> Check the Puppet dashboard. Some servers have fallen behind the rest running Puppet.

(obviously)
and then this:

> SSH into one of the affected servers and run:

The instructions don't make it clear how to identify the "affected servers".
The dashboard does show two graphs entitled "Failing resources" and "Config
versions"; both graphs purport to list server hostnames, but there's no
real explanation of which graph lists the servers I should care about.

Moreover, even if I knew which graph mattered, a graph is the wrong presentation
for the information:  What's needed is a table showing the "affected servers".

 

Sign in to add a comment