For PuppetVersionSkewTooHigh, need a table of affected servers
Reported by
jrbarnette@chromium.org,
Aug 27
|
|
Issue description
Recently, this alert started firing:
Alert Details
------------------
Description:
At least 1 prod server puppet configuration is too far out of date.
name: PuppetVersionSkewTooHigh
current value: 62.216666666674428
threshold: Gt(24) for 3h
alert fields: {, }
As it turns out, there _is_ a playbook (hooray!) Unfortunately
it starts out like this:
> Check the Puppet dashboard. Some servers have fallen behind the rest running Puppet.
(obviously)
and then this:
> SSH into one of the affected servers and run:
The instructions don't make it clear how to identify the "affected servers".
The dashboard does show two graphs entitled "Failing resources" and "Config
versions"; both graphs purport to list server hostnames, but there's no
real explanation of which graph lists the servers I should care about.
Moreover, even if I knew which graph mattered, a graph is the wrong presentation
for the information: What's needed is a table showing the "affected servers".
|
|
►
Sign in to add a comment |
|