New issue
Advanced search Search tips

Issue 866058 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 30
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Add a "number of new jobs with hosts" graph to shards dashboard

Project Member Reported by pprabhu@chromium.org, Jul 20

Issue description

The job pending durations too large omen has been firing for weeks now: http://shortn/_nxkpBNoLq6

I've been personally ignoring it. And guess what, at least partially, it was point to a real issue 864227

The problem with this omen is that there are no obvious next steps. We should add a graph of median job pending duration to the shards dashboard: https://viceroy.corp.google.com/chromeos/capacity_health 

The top 5 shards with longest pending durations is a useful metric to know:
- either we've overlaoded the shards and need to distribute load better
- or something is wrong with the shard and it isn't processing jobs as it should
 
Labels: -Chase-Pending Chase
Owner: zamorzaev@chromium.org
Status: Assigned (was: Untriaged)
We found a dashboard that _might_ be doing something like this: https://viceroy.corp.google.com/chromeos/capacity_health#_VG_7D_0PTt3

But it does not show the impact of issue 864227 http://shortn/_7w6lNefAzb
Looks like the existing metric known_jobs_durations does not capture time outs - it only looks at the incomplete jobs present on the shard at a given moment: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/scheduler/shard/shard_client.py?q=known_jobs_durations&l=334
Added an alert on the number of new jobs with hosts per shard and a corresponding dashboard at https://viceroy.corp.google.com/chromeos/capacity_health#_VG_c0_J4Kyq
Status: Fixed (was: Assigned)
Summary: Add a "number of new jobs with hosts" graph to shards dashboard (was: Add a "median job pending duration" graph to shards dashboard)

Sign in to add a comment