reef-chrome-pfq fails because reef BVT pool has been migrated to skylab
Reported by
jrbarnette@chromium.org,
Aug 30
|
|||||||
Issue description
The reef-chrome-pfq builder is failing. Here's the most recent
example:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936778980884660160
This is the relevant error message:
NotEnoughDutsError: Not enough DUTs for board: reef, pool: bvt; required: 1, found: 0
Checking the BVT pool supply, you see this:
$ atest host list -b board:reef,pool:bvt | awk '{print $1}'
Host
chromeos6-row4-rack9-host11-migrated-do-not-use
chromeos6-row4-rack9-host13-migrated-do-not-use
chromeos6-row4-rack9-host6-migrated-do-not-use
chromeos6-row4-rack9-host12-migrated-do-not-use
chromeos6-row4-rack9-host14-migrated-do-not-use
chromeos6-row4-rack10-host13-migrated-do-not-use
chromeos6-row4-rack10-host11-migrated-do-not-use
chromeos6-row3-rack10-host1-migrated-do-not-use
chromeos6-row3-rack10-host15-migrated-do-not-use
chromeos6-row4-rack10-host12-migrated-do-not-use
chromeos6-row4-rack9-host19-migrated-do-not-use
chromeos6-row3-rack12-host3-migrated-do-not-use
chromeos6-row3-rack12-host5-migrated-do-not-use
chromeos6-row3-rack12-host7-migrated-do-not-use
The pool has been migrated to skylab, but the builder still goes to the
regular Autotest services.
P0 - This is block Chrome uprev.
,
Aug 30
+Allen: It will be useful to have Allen pick up this bug as dry-run before xixuan@ goes on leave.
,
Aug 30
Pick one of two: [1] Rollback reef-bvt to Autotest (I'm not sure if xixuan@ has been migrating DUTs back, if there be some bugs in the scripts. Deal with those as they arise...) [2] Check what suites chrome-pfq runs. Likely the coverage isn't greater than CQ and release put together. In that case, migrate chrome-pfq to use SkylabHWTest for reef. My gut feeling is we'll have to do [1] first for this P0, then do [2] because having all pools on Skylab for the board is necessary for sane DUT management for the board.
,
Aug 30
,
Aug 30
hit enter too soon...
,
Aug 30
hmm, pfq has extra stage: bvt-tast-android-pfq, https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936978615453664640
,
Aug 30
So rollback it is. Are there any tools available?
,
Aug 30
Do we support bvt-tast-android-pfq in skylab? maybe we can give it a try first to migrate chrome-pfq to skylab to see whether it fails first. We can only rollback part of the BVTs as now reef-release is running in skylab. Rollback link: https://sites.google.com/a/google.com/chromeos/for-team-members/infrastructure/lab-tools/lab-tools-for-skylab
,
Aug 30
https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1196832, will let allen to decide whether to move.
,
Aug 30
N.B. We need to choose what we do based on what can be safely done the fastest. This is blocking Chrome uprev, and we're trying to branch for R70. Unblocking the PFQ is the top priority.
,
Aug 30
I guess we can try it, so we know what doesn't work if it fails.
,
Aug 30
Let's rollback first (there might be rough corners there too, so might need some time). We can try later try running a suite independent of what the builder does (we can run it on the suites pool in skylab).
,
Aug 30
btw, this is all the rollback tooling does, in case it misbehaves: - Deletes all these hosts from infra_internal/skylab_inventory/data/skylab/... (also removes references to the uuids from drones in there) - Rename the DUTs from X-migrated-do-not-use to X via 'atest' in the AFE.
,
Aug 30
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/59cbc10725c53176338df377932f75428fc3cea2 commit 59cbc10725c53176338df377932f75428fc3cea2 Author: Xixuan Wu <xixuan@chromium.org> Date: Thu Aug 30 18:43:44 2018 cbuildbot: Move reef-chrome-pfq to skylab. BUG= chromium:879217 TEST=None Change-Id: Iacca3ae63d16b801df3954d98755b41c77972e83 Reviewed-on: https://chromium-review.googlesource.com/1196832 Commit-Ready: Allen Li <ayatane@chromium.org> Tested-by: Allen Li <ayatane@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/59cbc10725c53176338df377932f75428fc3cea2/config/chromeos_config.py [modify] https://crrev.com/59cbc10725c53176338df377932f75428fc3cea2/config/config_dump.json
,
Aug 30
bvt-tast-android-pfq runs a single test, tast.mustpass-android
,
Aug 30
The PFQ schedule seems out of whack. Anyway, I'm going to try running that single test and seeing if it passes. Since the next scheduled PFQ run is hours away (I think), that gives me some room to experiment.
,
Aug 30
,
Aug 31
,
Aug 31
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8936730010422581104 Heck yes
,
Aug 31
<sigh> And it only took until Thursday for me to see some good news. :-(
,
Aug 31
Kudos! :) |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by ihf@chromium.org
, Aug 30