New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 820241 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
EstimatedDays: ----
NextAction: 2019-07-09
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

puppet dashboard not showing failing resources

Project Member Reported by pprabhu@chromium.org, Mar 8 2018

Issue description

https://viceroy.corp.google.com/chromeos/puppet

See  issue 817645  for context. I hit this problem a few times when trying to deploy the new SSH keys.

I expected to see failing puppet runs on this dashboard. In this case, it was critical that all devserverse receive the new public keys before rotating keys, but the dashboard didn't help me determine that.

What gives?
 
Puppet runs only outright fail if manifest compilation fails, otherwise you only get a failing resource, which might not be easily visible, especially since we still get failure flake (I believe it's apt failing to grab the lock).

I don't think I fully grasp the context, so correct me.
Yep, the ask here is for a dashboard that would let us detect when we start failing more resources.

An easier partial solution would be a dashboard to show sudden spike in number of failed resources. This happens when a very basic step fails (say, in my case something in profiles/base) because it leads to a lot of dependent packages to be skipped. So, if we tracked total number of packages failed + skipped, we'd see a sudden spike.

But there are also failure modes where an important but terminal package fails. Detecting this would require us to clearly flag _any_ failed packages if they start failing consistently.

So 
(1) A dashboard for # skipped + failed packages changing suddenly.
(2) A dashboard for # packages failing consistently (i.e., no pass in ~4 hours?)


Labels: -Type-Bug Hotlist-Fixit Type-Feature
Status: Available (was: Untriaged)
Cc: pho...@chromium.org
 Issue 820242  has been merged into this issue.
Labels: Pri-3
NextAction: 2019-07-09
Downgrading P2s that haven't been modified in more than 6 months, which have no component or owner.

Sign in to add a comment