New issue
Advanced search Search tips

Issue 879217 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Aug 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

reef-chrome-pfq fails because reef BVT pool has been migrated to skylab

Reported by jrbarnette@chromium.org, Aug 30

Issue description

The reef-chrome-pfq builder is failing.  Here's the most recent
example:
    https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936778980884660160

This is the relevant error message:
      NotEnoughDutsError: Not enough DUTs for board: reef, pool: bvt; required: 1, found: 0

Checking the BVT pool supply, you see this:
    $ atest host list -b board:reef,pool:bvt | awk '{print $1}'
    Host
    chromeos6-row4-rack9-host11-migrated-do-not-use
    chromeos6-row4-rack9-host13-migrated-do-not-use
    chromeos6-row4-rack9-host6-migrated-do-not-use
    chromeos6-row4-rack9-host12-migrated-do-not-use
    chromeos6-row4-rack9-host14-migrated-do-not-use
    chromeos6-row4-rack10-host13-migrated-do-not-use
    chromeos6-row4-rack10-host11-migrated-do-not-use
    chromeos6-row3-rack10-host1-migrated-do-not-use
    chromeos6-row3-rack10-host15-migrated-do-not-use
    chromeos6-row4-rack10-host12-migrated-do-not-use
    chromeos6-row4-rack9-host19-migrated-do-not-use
    chromeos6-row3-rack12-host3-migrated-do-not-use
    chromeos6-row3-rack12-host5-migrated-do-not-use
    chromeos6-row3-rack12-host7-migrated-do-not-use

The pool has been migrated to skylab, but the builder still goes to the
regular Autotest services.

P0 - This is block Chrome uprev.

 
Cc: ihf@chromium.org
Cc: ayatane@chromium.org
+Allen: It will be useful to have Allen pick up this bug as dry-run before xixuan@ goes on leave.
Pick one of two:

[1] Rollback reef-bvt to Autotest (I'm not sure if xixuan@ has been migrating DUTs back, if there be some bugs in the scripts. Deal with those as they arise...)
[2] Check what suites chrome-pfq runs. Likely the coverage isn't greater than CQ and release put together. In that case, migrate chrome-pfq to use SkylabHWTest for reef.

My gut feeling is we'll have to do [1] first for this P0, then do [2] because having all pools on Skylab for the board is necessary for sane DUT management for the board.
Owner: ayatane@chromium.org
Cc: xixuan@chromium.org
hit enter too soon...
So rollback it is.  Are there any tools available?
Do we support bvt-tast-android-pfq in skylab? maybe we can give it a try first to migrate chrome-pfq to skylab to see whether it fails first.

We can only rollback part of the BVTs as now reef-release is running in skylab.
Rollback link: https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/lab-tools/lab-tools-for-skylab
https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1196832, will let allen to decide whether to move.
N.B. We need to choose what we do based on what can be safely done
the fastest.  This is blocking Chrome uprev, and we're trying to
branch for R70.  Unblocking the PFQ is the top priority.

I guess we can try it, so we know what doesn't work if it fails.
Let's rollback first (there might be rough corners there too, so might need some time).

We can try later try running a suite independent of what the builder does (we can run it on the suites pool in skylab).
btw, this is all the rollback tooling does, in case it misbehaves:

- Deletes all these hosts from infra_internal/skylab_inventory/data/skylab/... (also removes references to the uuids from drones in there)
- Rename the DUTs from X-migrated-do-not-use to X via 'atest' in the AFE.

Project Member

Comment 14 by bugdroid1@chromium.org, Aug 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/59cbc10725c53176338df377932f75428fc3cea2

commit 59cbc10725c53176338df377932f75428fc3cea2
Author: Xixuan Wu <xixuan@chromium.org>
Date: Thu Aug 30 18:43:44 2018

cbuildbot: Move reef-chrome-pfq to skylab.

BUG= chromium:879217 
TEST=None

Change-Id: Iacca3ae63d16b801df3954d98755b41c77972e83
Reviewed-on: https://chromium-review.googlesource.com/1196832
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/59cbc10725c53176338df377932f75428fc3cea2/config/chromeos_config.py
[modify] https://crrev.com/59cbc10725c53176338df377932f75428fc3cea2/config/config_dump.json

bvt-tast-android-pfq runs a single test, tast.mustpass-android
The PFQ schedule seems out of whack.  Anyway, I'm going to try running that single test and seeing if it passes.  Since the next scheduled PFQ run is hours away (I think), that gives me some room to experiment.
Labels: Hotlist-Deputy
Cc: xiy...@chromium.org
<sigh> And it only took until Thursday for me to see some
good news.  :-(

Kudos!
:)

Sign in to add a comment