New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 813811 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Cleanup jobs running regularly in between tests

Reported by jrbarnette@chromium.org, Feb 20 2018

Issue description

At least some boards, some of the time, are regularly running 'cleanup'
jobs in addition to 'reset' jobs.  Here's a short history from an eve
BVT pool DUT:
chromeos6-row3-rack11-host3
    2018-02-20 08:26:34  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack11-host3/151945-reset/
    2018-02-20 08:25:38  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack11-host3/151939-cleanup/
    2018-02-20 08:24:46  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178028843-chromeos-test/
    2018-02-20 08:24:19  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack11-host3/151934-reset/
    2018-02-20 08:23:23  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack11-host3/151926-cleanup/
    2018-02-20 08:22:13  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178028821-chromeos-test/

Same story, but for bob:
chromeos2-row8-rack11-host14
    2018-02-20 08:17:11  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row8-rack11-host14/216248-reset/
    2018-02-20 08:16:31  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row8-rack11-host14/216240-cleanup/
    2018-02-20 08:07:31  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178022107-chromeos-test/
    2018-02-20 08:07:03  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row8-rack11-host14/216170-reset/
    2018-02-20 08:06:26  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos2-row8-rack11-host14/216161-cleanup/
    2018-02-20 07:57:43  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178022065-chromeos-test/


I'm not sure how widespread this symptom is, but I've seen it
anecdotally in the past.  The fact that I was able to see it for two
randomly selected DUTs makes me think that it's _very_ widespread.

The cleanup jobs take extra time, and are likely seriously straining our DUT
capacity, which will lead to tests being dropped or aborted.

 
Passing to this week's primary deputy.

Cc: ayatane@chromium.org
The decision about whether to run a cleanup job falls to the
scheduler, and is based on the return status from the test job.
These days, I think lucifer is involved in that process.

I did a spot check on hana and bob DUTs in the CQ pool.  The story there
is slightly different.  Here's a sample for bob:

chromeos6-row4-rack13-host11
    2018-02-20 08:42:24  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/216453-cleanup/
    2018-02-20 08:41:24  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178098433-chromeos-test/
    2018-02-20 08:35:24  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/216383-provision/
    2018-02-20 05:25:37  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214879-reset/
    2018-02-20 05:24:09  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178065861-chromeos-test/
    2018-02-20 05:23:38  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214861-reset/
    2018-02-20 05:21:38  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178065845-chromeos-test/
    2018-02-20 05:21:04  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214837-reset/
    2018-02-20 05:19:05  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178065827-chromeos-test/
    2018-02-20 05:18:32  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214808-reset/
    2018-02-20 05:16:33  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178065819-chromeos-test/
    2018-02-20 05:16:02  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214788-reset/
    2018-02-20 05:13:10  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178065811-chromeos-test/
    2018-02-20 05:05:49  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214692-cleanup/
    2018-02-20 05:04:52  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/178059999-chromeos-test/
    2018-02-20 04:58:42  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack13-host11/214633-provision/

The hana DUTs I checked were similar.

The key feature is the sequence of "provision", "cleanup", and then only
"reset" afterward (not "cleanup and reset").

Cc: nxia@chromium.org
Owner: ayatane@chromium.org
Passing to Allen for investigation as the new scheduler expert. ;>
Status: Started (was: Assigned)
Stupid type error

https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/927686
Project Member

Comment 7 by bugdroid1@chromium.org, Feb 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/84fc40f03c54b064cab25a8b059ed9d997b6bac1

commit 84fc40f03c54b064cab25a8b059ed9d997b6bac1
Author: Allen Li <ayatane@chromium.org>
Date: Thu Feb 22 22:28:14 2018

[autotest] Fix type error

BUG= chromium:813811 
TEST=None

Change-Id: I0cbd760c9a598be54b28ffb9b7e3abedbc4b961a
Reviewed-on: https://chromium-review.googlesource.com/927686
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/84fc40f03c54b064cab25a8b059ed9d997b6bac1/venv/lucifer/handlers.py

Status: Fixed (was: Started)
Think this is fixed, need verify

Sign in to add a comment