job_aborter sometimes doesn't clean up jobs
Reported by
jrbarnette@chromium.org,
Aug 28
|
||
Issue descriptionHost chromeos15-row2-rack2-host2 is stuck in "Repair Failed" state. Scheduling new work on it has no effect.
,
Sep 11
job_aborter is supposed to bring the job back to a good state however. The logs indicate that it did so, and poking it so that job_aborter does it again marks the job failed as expected. I'm not sure why job_aborter doesn't work sometimes; I can only assume it's somehow racing with the scheduler or shard client. It's become quite clear that the scheduler does not properly create transactions when it's fiddling with the database.
,
Sep 11
I don't have a viable action item for the underlying bug, but fixed this host. |
||
►
Sign in to add a comment |
||
Comment 1 by ayatane@chromium.org
, Sep 11