New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 709663 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Feature



Sign in to add a comment

generate au_control.tar.bz2 file for rerun of AU tests, despite HWTest(sanity) failure

Project Member Reported by dchan@google.com, Apr 7 2017

Issue description


The missing au_control.tar.bz2 are happening on a few platforms on M59. Below is an examples:

The error looks like

DownloaderException: Could not find au_control.tar.bz2 in Google Storage at gs://chromeos-image-archive/chell-release/R59-9413.0.0

Here are the list of platforms

Chell:http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066910
quawks: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066911
reks: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066914


this is preventing release team from rerunning AU test.  


See https://bugs.chromium.org/p/chromium/issues/detail?id=674690#c11
and https://bugs.chromium.org/p/chromium/issues/detail?id=674690#c12 

This is the FR to generate au_control.tar.bz2 always. 
 
Labels: current-issue
Owner: ----
Status: Untriaged (was: Assigned)
This is not in the deputy queue. Adding to triage queue.
Cc: pprabhu@chromium.org abod...@chromium.org akes...@chromium.org dchan@chromium.org dhadd...@chromium.org
 Issue 709538  has been merged into this issue.

Comment 4 by dchan@google.com, Apr 20 2017

Labels: -Type-Bug Type-Feature
hi, any update on this ?
Would help to include links to the builds that generated (or were expected to generate) these.

Comment 6 by dchan@google.com, Apr 21 2017

+dhaddock, can you provide more info on how au_control is generated.
I imagine this feature request would cover point 1 

I will file a separate bug for point 2
Filed  issue 714277  for 2
Issue 714322 has been merged into this issue.

Comment 11 by dchan@google.com, Apr 22 2017

Labels: bvttriage
Any way we can up the priority of this one?

We are seeing this failure on R58 stable and it means we cannot verify a bunch of the AU tests in the lab.

Adding the string 500 Internal Server Error to this bug also as that is the prominent error one sees in the log so we can more easily locate this bug.

Comment 13 by dchan@google.com, May 12 2017

Cc: xixuan@chromium.org ayatane@chromium.org dshi@chromium.org
Labels: OS-Chrome
Add a few people from infra deputy to see if we can fix it soon. Without this fix it is very difficult to determine health of AU in stable channel.

Comment 14 by dshi@chromium.org, May 12 2017

Components: Infra>Client>ChromeOS
See also  issue 714277  for an error that also contributes to the problem sometimes. 
Owner: akes...@chromium.org
Status: Assigned (was: Untriaged)
Aviv, do you know who could look into this?
Cc: dgarr...@chromium.org jrbarnette@chromium.org
Summary: generate au_control.tar.bz2 file for rerun of AU tests, despite HWTest(sanity) failure (was: Always generate au_control.tar.bz2 file for rerun of AU tests.)
Reading back to #7, sounds like 2 root causes. Cause #2 already filed as  Issue 714277 . let's make this about cause #1.

+dgarrett +jrbarnette Is it the desired behavior that we skip generated the au control files if sanity suite fails?
Status: Available (was: Assigned)
The answer is arguable.

I argue that a build which fails the sanity test should be considered 'bad' and thrown away. The big drawback, is that we can't produce any 'good' builds during a lab outage.

I've been background working on this proposal, which would fix this class of problem by clearly defining which builds are can be salvaged by re-running tests, and which ones can't.

go/cros-split-release-build-test

Comment 20 by dchan@google.com, May 26 2017

please open go/cros-split-release-build-test to all @google.com
Sorry about that, done.

Comment 22 by dchan@google.com, May 26 2017

Thanks Don, great doc, I am all for it.  

In the mean time, we are delaying release and cause heartache when we have to perform manual verification on some of the boards.  We can't rerun the test due to this missing tar file.

I would vote for full steam ahead with your proposal and generate the au_control file now.

Comment 23 by dchan@google.com, May 31 2017

ping! Any ETA to make au_control available ?
Owner: ----
Are you still regularly getting builds that fail sanity suite? If so, maybe we should focus on that aspect of the problem. Or remove sanity suite from canaries (what purpose do they really serve there? they only test that a server side job can be created, they don't do any build-specific testing work)

Comment 25 by dchan@google.com, May 31 2017

Yes we are. Even one board that has this problem we will have to manually run test to ensure it still can AU.  Is there a technical road block that we cannot always genreate au_control.tar.bz2 ?
With our build system, we mostly have to decide to ignore an error, or stop the build right away.
I think that for the bug as originally filed, this is a won't fix. But I need to polish up that design doc and follow up with that work.
Status: WontFix (was: Available)

Comment 29 by dchan@google.com, May 31 2017

Status: Available (was: WontFix)
regards to c#26 I think one or the other is fine.  If we stop the build, we should delete all artifact.

Currently we end up with a build that seems to install and work and on our end we have endless debate weather to push the build or not... most likely we push.  

I think even with the work in your designdoc, we will have the same problem.

So as far as I can tell we basically ignore the error on our end, I think it is safe to do so in the build system.




One of the points in my doc was that if the build fails, we mark the build as unreleasable. That means you should always have all artifacts, or none.
Having all artifacts or none is good, however we may want to isolate having all artifacts from any interaction with the lab if possible. 

If we start missing a percentage of our build artifacts due to lab flakes there will be sadness. 

Comment 32 by dchan@google.com, Jun 1 2017

+1 for all or nothing, we should be able to do this NOW.

Lab flakes should be fix ASAP.  We should not release any build when we don't have any test result.
According to the doc, a 'pass' build would require the sanity suite to succeed. All other test suites would be run, but the results would be ignored by the build.

GE should still publish the test results to help TPMs/test teams make release decisions, and tests could be rerun if needed.

However failures with compile, VMTest, sanity suite, signing, etc, would mark the build part as failed, and we'd consider it unreleasable.

I'll go work on that doc today, and setup a meeting after it's ready.
The only test in suite:sanity is dummy_PassServer.

Why are we running this suite at all on the canaries?
To ensure that the image can correctly install.

In theory, it blocks all other testing, so that we don't break the lab with wildly invalid images that can't boot.

Comment 36 by aut...@google.com, Jun 2 2017

Labels: -current-issue
Labels: -Pri-1 Pri-2
Owner: dgarr...@chromium.org
Project Member

Comment 38 by sheriffbot@chromium.org, Jul 20 2017

Labels: Hotlist-Google
What is the the reason that we have an additional NPO autoupdate_EndToEndTest test in the au suite anyway? We already get a bunch of runs of the test in the paygen_au* suite. 


We used to use N Plus One to prove that we could update away from the current build. Arguably, the most important update test.

However, the NPO tests required generation goofy special images and all sorts of things that were really expensive, so we've stopped using them, and have only partially cleaned up the code that supported them.

Instead we generate N2N (N to N) for testing the same thing. This requires no new images, and is a very small delta. Much cheaper for builders, signers, and long term storage.

It does require that our update tests validate that we really moved from one OS partition to another, or an update that fails by roll back can be treated as successful.

In general, it should be safe to remove NPO code. The only exception is that old builds (years old now) may still have NPO images or payloads, and we should quietly ignore them instead of treating the old artifacts as corrupt.

There should no longer be any need to install an NPO image or payload into the lab.
Project Member

Comment 42 by bugdroid1@chromium.org, Sep 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485

commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485
Author: David Haddock <dhaddock@chromium.org>
Date: Fri Sep 15 07:24:02 2017

Remove staging of control files for au suite.

We no longer need to stage control files as part of the au suite. The
npo test case that this test suite controls is covered as part of the
N2N tests in paygen_au* suites.

This change will remove a lot of manual work required by the test team
to run powerwash tests manually when the sanity stage on the builder
fails and the control file is not generated.

BUG= chromium:758307 
BUG= chromium:764038 
BUG= chromium:709663 
TEST=None

Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25
Reviewed-on: https://chromium-review.googlesource.com/665366
Commit-Ready: David Haddock <dhaddock@chromium.org>
Tested-by: David Haddock <dhaddock@chromium.org>
Reviewed-by: David Haddock <dhaddock@chromium.org>

[modify] https://crrev.com/8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485/test_suites/control.au

Status: Fixed (was: Available)
Project Member

Comment 44 by bugdroid1@chromium.org, Sep 15 2017

Labels: merge-merged-release-R62-9901.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/24874293ea89dcce5af71f7d5da74d483e4f4dc4

commit 24874293ea89dcce5af71f7d5da74d483e4f4dc4
Author: David Haddock <dhaddock@chromium.org>
Date: Fri Sep 15 23:09:34 2017

Remove staging of control files for au suite.

We no longer need to stage control files as part of the au suite. The
npo test case that this test suite controls is covered as part of the
N2N tests in paygen_au* suites.

This change will remove a lot of manual work required by the test team
to run powerwash tests manually when the sanity stage on the builder
fails and the control file is not generated.

BUG= chromium:758307 
BUG= chromium:764038 
BUG= chromium:709663 
TEST=None

Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25
Reviewed-on: https://chromium-review.googlesource.com/665366
Commit-Ready: David Haddock <dhaddock@chromium.org>
Tested-by: David Haddock <dhaddock@chromium.org>
Reviewed-by: David Haddock <dhaddock@chromium.org>
(cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485)
Reviewed-on: https://chromium-review.googlesource.com/669901
Trybot-Ready: David Haddock <dhaddock@chromium.org>

[modify] https://crrev.com/24874293ea89dcce5af71f7d5da74d483e4f4dc4/test_suites/control.au

Project Member

Comment 45 by bugdroid1@chromium.org, Sep 18 2017

Labels: merge-merged-release-R61-9765.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a

commit eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a
Author: David Haddock <dhaddock@chromium.org>
Date: Mon Sep 18 18:27:41 2017

Remove staging of control files for au suite.

We no longer need to stage control files as part of the au suite. The
npo test case that this test suite controls is covered as part of the
N2N tests in paygen_au* suites.

This change will remove a lot of manual work required by the test team
to run powerwash tests manually when the sanity stage on the builder
fails and the control file is not generated.

BUG= chromium:758307 
BUG= chromium:764038 
BUG= chromium:709663 
TEST=None

Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25
Reviewed-on: https://chromium-review.googlesource.com/665366
Commit-Ready: David Haddock <dhaddock@chromium.org>
Tested-by: David Haddock <dhaddock@chromium.org>
Reviewed-by: David Haddock <dhaddock@chromium.org>
(cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485)
Reviewed-on: https://chromium-review.googlesource.com/671186
Trybot-Ready: David Haddock <dhaddock@chromium.org>

[modify] https://crrev.com/eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a/test_suites/control.au

Status: Verified (was: Fixed)
Project Member

Comment 47 by bugdroid1@chromium.org, Sep 19 2017

Labels: merge-merged-release-R60-9592.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/cc274348acbbad5bde4480bca36fca6ea8c77981

commit cc274348acbbad5bde4480bca36fca6ea8c77981
Author: David Haddock <dhaddock@chromium.org>
Date: Tue Sep 19 01:38:10 2017

Remove staging of control files for au suite.

We no longer need to stage control files as part of the au suite. The
npo test case that this test suite controls is covered as part of the
N2N tests in paygen_au* suites.

This change will remove a lot of manual work required by the test team
to run powerwash tests manually when the sanity stage on the builder
fails and the control file is not generated.

BUG= chromium:758307 
BUG= chromium:764038 
BUG= chromium:709663 
TEST=None

Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25
Reviewed-on: https://chromium-review.googlesource.com/665366
Commit-Ready: David Haddock <dhaddock@chromium.org>
Tested-by: David Haddock <dhaddock@chromium.org>
Reviewed-by: David Haddock <dhaddock@chromium.org>
(cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485)
Reviewed-on: https://chromium-review.googlesource.com/671153
Trybot-Ready: David Haddock <dhaddock@chromium.org>

[modify] https://crrev.com/cc274348acbbad5bde4480bca36fca6ea8c77981/test_suites/control.au

Labels: FixedByAURewrite

Sign in to add a comment