New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 878190 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

job_aborter sometimes doesn't clean up jobs

Reported by jrbarnette@chromium.org, Aug 28

Issue description

Host chromeos15-row2-rack2-host2 is stuck in "Repair Failed"
state.  Scheduling new work on it has no effect.

 
job_aborter is supposed to bring the job back to a good state however.  The logs indicate that it did so, and poking it so that job_aborter does it again marks the job failed as expected.

I'm not sure why job_aborter doesn't work sometimes; I can only assume it's somehow racing with the scheduler or shard client.  It's become quite clear that the scheduler does not properly create transactions when it's fiddling with the database.
Status: Available (was: Untriaged)
Summary: job_aborter sometimes doesn't clean up jobs (was: chromeos15-row2-rack2-host2 is stuck in "Repair Failed")
I don't have a viable action item for the underlying bug, but fixed this host.

Sign in to add a comment