New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 824581 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Reconsider provision labels for arc-presubmit jobs

Reported by jrbarnette@chromium.org, Mar 22 2018

Issue description

This is follow-up from an incident described in  bug 812467 .

Jobs sent to the arc-presubmit run devices through a provisioning
procedure that's quite different from regular flows:
  * The DUT installs a regular release build.
  * Root FS verification is removed.
  * A new Android build then overwrites the current build in
    the Root FS.
  * The DUT is then labeled with a non-standard cros-version
    label: instead of "cros-version:<buildpath>", the label is
    "cros-version:<buildpath>-cheetsth".
  * The DUT is also labeled with a "cheets-version" label.

There are reasons for these steps.
  * It's desirable to keep the arc-presubmit cycle time low, and
    the steps above were believed to be faster than the alternatives.
  * The non-standard cros-version labeling is a necessary consequence
    of altering the root FS.  DUTs in this state shouldn't be
    recognized as running a standard Chrome OS build.

But, the process has some awkwardness:
  * Guaranteeing proper provisioning requires both the special
    cros-version and the cheets-version labels be requested as
    dependencies; it's not valid to request only one or the other.
    That is, the labeling scheme mainly works because there's only
    one client, the arc-presubmit builder.
  * The non-standard cros-version label forces a special case exception
    in post-provisioning sanity checks, as described in  bug 812467 .

The best solution would be to have a single builder that produced a
pre-combined CrOS build+ARC image, and installed that via the
standard CrOS provisioning flow.

The process should be to do something like this on the builder:
  * Download the desired CrOS release image.
  * Loop mount the image, and replace the Android bits with the
    new bits to be tested.
  * Re-sign the image, and upload to storage.
That process ought to be time-competitive with a process that requires
two large image downloads plus a reboot.

 
Cc: rohi...@chromium.org
Richard, do you have a candidate owner for this bug?
Thanks for the detailed write up!

Sorry if I have missed any prior discussions as I just returned from paternity leave.

It seems introducing a builder in the workflow will introduce some delay in ARC++ presubmit, but on other hand, these DUTs are part of the presubmit pool, and won't be re-used by any other tests in their existing state.

So, is this proposal only to ensure that all provisioning flows meet some standard procedure and there is no awkwardness? Looking at test time overhead by introducing a builder and the builder's future maintenance, is fixing post provisioning verification more viable?
> It seems introducing a builder in the workflow [ ... ]

The proposal doesn't require introducing a builder.  It _does_
require introducing a new build step in whatever code is building
for arc-presubmit.  The cost of that new build step is likely to
be offset by making provisioning faster in the test lab.

As for why make the change:  Our provisioning system is riddled
with too many special cases.  Those special cases make it inevitable
that from time to time, innocent improvements will break the special
cases.  That's exactly what happened in  bug 812467 .  So, we can choose
to tolerate that kind of event from time to time, or we can clean up.

I've become quite intolerant in my old age, so my vote is for the
cleanup.

Comment 5 by sbasi@chromium.org, Mar 22 2018

Cc: davidjames@chromium.org
David what would it take to introduce the build step Richard is describing to a treehugger workplan? Is it even possible since I remember hearing Android builds aren't allowed to use network/internet resources but TreeHugger might have looser requirements right?
Cc: -davidjames@chromium.org
Owner: davidjames@chromium.org
David, can you please take a look? 
Status: Assigned (was: Untriaged)
Owner: akes...@chromium.org
Just saw this bug. Aviv, is this something that your team (or the boulder team?) is still looking to do? We don't do network during the build itself, but that shouldn't preclude us from solving this (e.g. we can put the network activity somewhere else). If you'd like to take a look at this as a joint project let's schedule a meeting and discuss.


Cc: jclinton@google.com vapier@chromium.org
Sounds to me like a custom chromeos builder with some logic that knows how to download and package a new candidate arc build into a chromeos build is the right way to accomplish this (and would also be more in line with normal lab provisioning flow).

I think this falls into the category of self-service work that the build team could assist on but wouldn't want to own. +vapier (build) +jclinton (CI)
Owner: vapier@chromium.org
-> vapier for comment
i don't really understand the ARC/lab image flow that's being described.  if these aren't CrOS builds already, and the CI is on the Android side, i'm not sure having two parallel CIs trying to communicate with each other is less complicated.

if the code for kicking off the provision/image process is on the treehugger side, then clearly it has some networking/API access ?

plus we're hoping to move ARC++ to a DLC flow in which case nothing would be in the rootfs, so any image changes we work on would all get thrown out once the DLC conversion happens right ?
So the ARC++/lab flow here is that TreeHugger builds the Android system image and then kick off a runsuite job in the Chrome OS Lab with 2 builds:
1) The LKGB Chrome OS Build.
2) The ARC++ presubmit built.

Then provision flashes the LKGB build and then runs a special ARC++ tool to then replace the Android system image with the presubmit one.

We added a new labels for this flow to work properly and I believe Richard wanted to move away from this model, ~1 year ago.

This is running tests today for the ARC++ team and they do rely on this flow but I've heard that they may be switching to virtual device testing sometime 2019.
i can't speak to the implications on the lab side, but seems like we should stick with the status quo pending DLC migration (as that would also completely change the flow and make this discussion moot iiuc) ?  i'm not sure how painful it is for the lab today to keep this working.
What is the DLC flow?
Cc: uekawa@chromium.org satorux@chromium.org
the mid/long term plan is for ARC++ to not live in the rootfs at all.  instead it'd be in a disk image that gets downloaded on the fly to the stateful partition and then mounted from there.

so in that flow, we'd be able to provision a DUT with the normal image, drop the ARC++ image into the stateful partition, and have the system use that directly.  no need to modify the rootfs.

DLC is a general mechanism for this sort of thing (download+verify+mount arbitrary components).  i don't know the exact schedule, but iiuc, it's "any day now", and then hopefully ARC++ can start investigating how to leverage it.

Sign in to add a comment