New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 770939 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Implement cron job to detect hung analyses.

Project Member Reported by robert...@chromium.org, Oct 2 2017

Issue description

We should also trigger an alert if analyses fail to make progress after 24 hours, (due to resource starvation, continuous loop in logic, external failures, etc.)

To achive this we could add an hourly cron job that checks all analyses in progress and records the culprit range for all jobs. If this hasn't changed in 24 hours, then we can consider the analysis as hung.

 

Comment 1 by st...@chromium.org, Oct 2 2017

It could also be achieved by adding a ts_mon in the "finalized" method of current pipelines which is called even the pipeline timeout/etc.

If it is for try-jobs, we could also add the ts_mon in the callback from the timeout task scheduled by Findit itself.

Will this cover what you want to monitor or easier for implementation?

Comment 2 by st...@chromium.org, Oct 2 2017

Status: Assigned (was: Untriaged)
If the purpose is to auto-rerun the analyses, it might be a different story. But we are not there yet.
Cc: robert...@chromium.org
Labels: -Pri-1 Pri-3
Owner: ----
Status: Untriaged (was: Assigned)
Status: WontFix (was: Untriaged)
We have a different approach for this.

Sign in to add a comment