New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 881491 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Deleting a board from a shard leads to aborting completed HQEs

Project Member Reported by pprabhu@chromium.org, Sep 6

Issue description

I removed board:falco from cros-full-0004.mtv for issue 880913

pprabhu@pprabhu:chromiumos$ atest shard remove_board -l board:falco cros-full-0004.mtv.corp.google.com
None


The DUTs were taking a long time moving to master. Turns out, the shard is aborting old HQEs for each host, HQEs from months ago that are already completed jobs.

eg:

chromeos-test@cros-full-0004:~$ tail /usr/local/autotest/logs/shard_client.latest 
09/06 12:35:33.557 INFO |            models:0759|   Deleting and aborting hqe chromeos6-row1-rack9-host5/183451434 (183888007)...
09/06 12:35:33.559 INFO |            models:0762|   ... done with hqe chromeos6-row1-rack9-host5/183451434 (183888007).
09/06 12:35:33.559 INFO |            models:0759|   Deleting and aborting hqe chromeos6-row1-rack9-host5/183451439 (183888012)...
09/06 12:35:33.560 INFO |            models:0762|   ... done with hqe chromeos6-row1-rack9-host5/183451439 (183888012).
09/06 12:35:33.560 INFO |            models:0759|   Deleting and aborting hqe chromeos6-row1-rack9-host5/183451459 (183888032)...
09/06 12:35:33.561 INFO |            models:0762|   ... done with hqe chromeos6-row1-rack9-host5/183451459 (183888032).
09/06 12:35:33.562 INFO |            models:0759|   Deleting and aborting hqe chromeos6-row1-rack9-host5/183451463 (183888036)...
09/06 12:35:33.563 INFO |            models:0762|   ... done with hqe chromeos6-row1-rack9-host5/183451463 (183888036).
09/06 12:35:33.563 INFO |            models:0759|   Deleting and aborting hqe chromeos6-row1-rack9-host5/183451483 (183888056)...
09/06 12:35:33.564 INFO |            models:0762|   ... done with hqe chromeos6-row1-rack9-host5/183451483 (183888056).

Looking at one of the jobs: http://cros-full-0004.mtv.corp.google.com/afe/#tab_id=view_job&object_id=183451434
This is a completed job from March.

I don't know the implications of this incorrect aborting of HQEs. Likely nothing, since we haven't seen the world fall apart every time we moved a board like this?
 
Cc: gu...@chromium.org
Labels: Hotlist-Deputy
Status: Available (was: Untriaged)
I think the main issue is the increased time for removing the board
Labels: -Hotlist-Deputy
Summary: Deleting a board from a shard leads to aborting completed HQEs (was: deleteing a board from a shard leads to aborting completed HQEs)
Not a hot deputy issue

Sign in to add a comment