New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878940 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

[Findit] Metrics: gather duration metrics for each analysis

Project Member Reported by chanli@chromium.org, Aug 29

Issue description

Add some CumulativeDistributionMetric to track
- pending time of heuristic analyses for consistent failures
- running time of heuristic analyses for consistent failures
- pending time of swarming task
- running time of swarming task
- pending time of try job
- running time of try job

Add graphs accordingly to display 95% and 99% durations and add alerts for spikes of durations.


 
I think we have pending time for try-jobs already. You may talk to robertocn@ on these metric set up.

A follow-up question is: if those alerts are raised, what action are expected? If we can't do anything, the alert might not be useful.
The existing pending try job metric is from buildbucket, so we may be able to see a change of the pending time but not sure what causes it easily.

For the try job pending metric, the new one I plan to add will include more info, such as master/builder etc, so it might be easier to spot on the causes. 

And I also think alerts need to be actionable. So I'll think more on that part.
Roberto, what do you say about the try-job pending alert? We used to have that, and then deleted that to be in favor of the buildbucket one IIRC.

Sign in to add a comment