New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 599238 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Abort a job stuck in Gather stage failed to release dut

Project Member Reported by ka...@chromium.org, Mar 30 2016

Issue description

DUT host page: http://cautotest/afe/#tab_id=view_host&object_id=2273

Did not run chromeos-test scheduled jobs since march 14th.

Currently stuck at a Verify job: https://screenshot.googleplex.com/UD8nemwygY8


 

Comment 1 by ka...@chromium.org, Mar 30 2016

Could it be somrthing wrong with shard chromeos-server10 - i don't see much going on at http://chromeos-server10.hot.corp.google.com/afe/#tab_id=job_list&state_filter=all&type_filter=all

Comment 2 by ka...@chromium.org, Mar 31 2016

Cc: helenzhang@chromium.org shrawan@chromium.org sontis@chromium.org
Labels: -Pri-2 Pri-1
Ping!
Any pointers to root cause of this board being unused?

Comment 3 by ka...@chromium.org, Apr 1 2016

Even manually scheduled re-runs are hanging at Queued status.
Labels: -Hardware-Lab Infra-ChromeOS
the host still stuck on Verify Queue. 

Comment 6 by ka...@chromium.org, Apr 4 2016

Cc: xixuan@chromium.org
Owner: dshi@chromium.org

Comment 7 by dshi@chromium.org, Apr 5 2016

Labels: -Pri-1 starter Pri-2
Owner: ----
Status: Available (was: Untriaged)
Summary: Abort a job stuck in Gather stage failed to release dut (was: peach_pit DUT at chromeos1-row1-rack4-host5 is not running tests)
The host was stuck in a job in gathering stage (job id: 56644426)
After I manually update that hqe:
update afe_host_queue_entries set active=0,complete=1,status="Aborted" where job_id=56644426;

The host then was released.

It seems that there is a bug in scheduler. For a job got stuck in Gathering stage, abort failed to release the dut.

The original state of the hqe is as follows:
mysql> select * from  afe_host_queue_entries where job_id = 56644426;
+----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+
| id       | job_id   | host_id | status    | meta_host | active | complete | deleted | execution_subdir           | atomic_group_id | aborted | started_on          | finished_on |
+----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+
| 56974368 | 56644426 |    2273 | Gathering |     53004 |      1 |        0 |       0 | chromeos1-row1-rack4-host5 |            NULL |       1 | 2016-03-14 04:55:16 | NULL        |
+----------+----------+---------+-----------+-----------+--------+----------+---------+----------------------------+-----------------+---------+---------------------+-------------+

We should be able to reproduce the issue by modify an hqe's db entry to that.

Comment 8 by ka...@chromium.org, Apr 5 2016

peach_pit board DUT chromeos1-row1-rack4-host5 is running tests now -http://cautotest/afe/#tab_id=view_host&object_id=2273

Comment 9 by ka...@chromium.org, Apr 6 2016

Status: Verified (was: Available)
Status: Available (was: Verified)
Re-opening for root-cause fix
Labels: Hotlist-Fixit
Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS
Project Member

Comment 13 by sheriffbot@chromium.org, Apr 27 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been available for more than 365 days, and should be re-evaluated. Please re-triage this issue.
The Hotlist-Recharge-Cold label is applied for tracking purposes, and should not be removed after re-triaging the issue.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: Archived (was: Untriaged)
This bug is very old, is Untriaged, and has no owner.  If it is still relevant, reopen as Untriaged or open a new bug

Sign in to add a comment