Abort a job stuck in Gather stage failed to release dut |
|||||||||||
Issue descriptionDUT host page: http://cautotest/afe/#tab_id=view_host&object_id=2273 Did not run chromeos-test scheduled jobs since march 14th. Currently stuck at a Verify job: https://screenshot.googleplex.com/UD8nemwygY8
,
Mar 31 2016
Ping! Any pointers to root cause of this board being unused?
,
Apr 1 2016
Even manually scheduled re-runs are hanging at Queued status.
,
Apr 1 2016
,
Apr 4 2016
the host still stuck on Verify Queue.
,
Apr 4 2016
,
Apr 5 2016
The host was stuck in a job in gathering stage (job id: 56644426) After I manually update that hqe: update afe_host_queue_entries set active=0,complete=1,status="Aborted" where job_id=56644426; The host then was released. It seems that there is a bug in scheduler. For a job got stuck in Gathering stage, abort failed to release the dut. The original state of the hqe is as follows: mysql> select * from afe_host_queue_entries where job_id = 56644426; +----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+ | id | job_id | host_id | status | meta_host | active | complete | deleted | execution_subdir | atomic_group_id | aborted | started_on | finished_on | +----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+ | 56974368 | 56644426 | 2273 | Gathering | 53004 | 1 | 0 | 0 | chromeos1-row1-rack4-host5 | NULL | 1 | 2016-03-14 04:55:16 | NULL | +----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+ We should be able to reproduce the issue by modify an hqe's db entry to that.
,
Apr 5 2016
peach_pit board DUT chromeos1-row1-rack4-host5 is running tests now -http://cautotest/afe/#tab_id=view_host&object_id=2273
,
Apr 6 2016
,
Apr 6 2016
Re-opening for root-cause fix
,
Apr 11 2016
,
Apr 26 2016
,
Apr 27 2017
This issue has been available for more than 365 days, and should be re-evaluated. Please re-triage this issue. The Hotlist-Recharge-Cold label is applied for tracking purposes, and should not be removed after re-triaging the issue. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 14 2018
This bug is very old, is Untriaged, and has no owner. If it is still relevant, reopen as Untriaged or open a new bug |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by ka...@chromium.org
, Mar 30 2016