New issue
Advanced search Search tips

Issue 812284 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Scheduler struggling to abort jobs over and over: "Aborting HQE: ... for job: ... and host: ..."

Project Member Reported by pho...@chromium.org, Feb 14 2018

Issue description

The scheduler seems to be attempting (and failing?) over and over to abort the same jobs.

...strangely, "Aborting HQE" doesn't show up in autotest's or lucifer's codebase. What code is creating this log message?

09:49:05 INFO | Drones refreshed.                                                                                                                                                             [1005/9442]
09:49:05 INFO | Aborting HQE: 176407434, for job: 175993934 and host: chromeos2-row2-rack10-host4 has status:Gathering [active,aborted]
09:49:05 INFO | Aborting HQE: 177016601, for job: 176602279 and host: chromeos2-row2-rack10-host5 has status:Gathering [active,aborted]
09:49:05 INFO | Aborting HQE: 177016607, for job: 176602285 and host: chromeos2-row1-rack10-host3 has status:Gathering [active,aborted]
09:49:05 INFO | Aborting HQE: 177018675, for job: 176604351 and host: chromeos2-row2-rack10-host6 has status:Gathering [active,aborted]
09:49:05 INFO | Aborting HQE: 177018679, for job: 176604356 and host: chromeos2-row1-rack10-host7 has status:Gathering [active,aborted]
09:49:05 INFO | Aborting HQE: 177018681, for job: 176604358 and host: chromeos2-row1-rack10-host5 has status:Gathering [active,aborted]
09:49:05 INFO | 50 running processes. 0 added this tick.
09:49:05 INFO | Invoking drone refresh.
09:49:05 INFO | (Worker.localhost) starting.
09:49:05 INFO | Running drone_utility on localhost
09:49:05 INFO | (Task Queue) Waiting for drone_manager.refresh_queue.localhost
09:49:05 INFO | (Worker.localhost) finished.
09:49:05 INFO | (Task Queue) All threads have returned, clearing map.
09:49:05 INFO | Drones refreshed.
09:49:05 INFO | Aborting HQE: 176407434, for job: 175993934 and host: chromeos2-row2-rack10-host4 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177016601, for job: 176602279 and host: chromeos2-row2-rack10-host5 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177016607, for job: 176602285 and host: chromeos2-row1-rack10-host3 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018675, for job: 176604351 and host: chromeos2-row2-rack10-host6 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018679, for job: 176604356 and host: chromeos2-row1-rack10-host7 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018681, for job: 176604358 and host: chromeos2-row1-rack10-host5 has status:Gathering [active,aborted]
09:49:06 INFO | 52 running processes. 0 added this tick.
09:49:06 INFO | Invoking drone refresh.
09:49:06 INFO | (Worker.localhost) starting.
09:49:06 INFO | Running drone_utility on localhost
09:49:06 INFO | (Worker.localhost) finished.
09:49:06 INFO | (Task Queue) Waiting for drone_manager.refresh_queue.localhost
09:49:06 INFO | (Task Queue) All threads have returned, clearing map.
09:49:06 INFO | Drones refreshed.
09:49:06 INFO | Aborting HQE: 176407434, for job: 175993934 and host: chromeos2-row2-rack10-host4 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177016601, for job: 176602279 and host: chromeos2-row2-rack10-host5 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177016607, for job: 176602285 and host: chromeos2-row1-rack10-host3 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018675, for job: 176604351 and host: chromeos2-row2-rack10-host6 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018679, for job: 176604356 and host: chromeos2-row1-rack10-host7 has status:Gathering [active,aborted]
09:49:06 INFO | Aborting HQE: 177018681, for job: 176604358 and host: chromeos2-row1-rack10-host5 has status:Gathering [active,aborted]
09:49:06 INFO | 52 running processes. 0 added this tick.
09:49:06 INFO | Invoking drone refresh.
09:49:06 INFO | (Worker.localhost) starting.
09:49:06 INFO | Running drone_utility on localhost
09:49:06 INFO | (Task Queue) Waiting for drone_manager.refresh_queue.localhost
09:49:07 INFO | (Worker.localhost) finished.
09:49:07 INFO | (Task Queue) All threads have returned, clearing map.
09:49:07 INFO | Drones refreshed.
09:49:07 INFO | Aborting HQE: 176407434, for job: 175993934 and host: chromeos2-row2-rack10-host4 has status:Gathering [active,aborted]
09:49:07 INFO | Aborting HQE: 177016601, for job: 176602279 and host: chromeos2-row2-rack10-host5 has status:Gathering [active,aborted]
09:49:07 INFO | Aborting HQE: 177016607, for job: 176602285 and host: chromeos2-row1-rack10-host3 has status:Gathering [active,aborted]
09:49:07 INFO | Aborting HQE: 177018675, for job: 176604351 and host: chromeos2-row2-rack10-host6 has status:Gathering [active,aborted]
09:49:07 INFO | Aborting HQE: 177018679, for job: 176604356 and host: chromeos2-row1-rack10-host7 has status:Gathering [active,aborted]
09:49:07 INFO | Aborting HQE: 177018681, for job: 176604358 and host: chromeos2-row1-rack10-host5 has status:Gathering [active,aborted]
09:49:07 INFO | Finished: Special Task 45350 (host chromeos6-row1-rack3-host17, task Reset, time 2018-02-14 09:48:11) (active)
09:49:07 INFO | forgetting pidfile /usr/local/autotest/results/hosts/chromeos6-row1-rack3-host17/45350-reset/20181402094811/.autoserv_execute
09:49:07 INFO | ResetTask finished with success=True
 

Comment 1 by pho...@chromium.org, Feb 15 2018

Mergedinto: 812848
Status: Duplicate (was: Untriaged)

Sign in to add a comment