generate au_control.tar.bz2 file for rerun of AU tests, despite HWTest(sanity) failure |
||||||||||||||||||||||
Issue descriptionThe missing au_control.tar.bz2 are happening on a few platforms on M59. Below is an examples: The error looks like DownloaderException: Could not find au_control.tar.bz2 in Google Storage at gs://chromeos-image-archive/chell-release/R59-9413.0.0 Here are the list of platforms Chell:http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066910 quawks: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066911 reks: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=110066914 this is preventing release team from rerunning AU test. See https://bugs.chromium.org/p/chromium/issues/detail?id=674690#c11 and https://bugs.chromium.org/p/chromium/issues/detail?id=674690#c12 This is the FR to generate au_control.tar.bz2 always.
,
Apr 7 2017
This is not in the deputy queue. Adding to triage queue.
,
Apr 7 2017
Issue 709538 has been merged into this issue.
,
Apr 20 2017
hi, any update on this ?
,
Apr 21 2017
Would help to include links to the builds that generated (or were expected to generate) these.
,
Apr 21 2017
+dhaddock, can you provide more info on how au_control is generated.
,
Apr 21 2017
From what I can tell there are two causes of this problem: 1. The builder fails at the sanity stage so doesn't run the au stage (and the au_control.tar.bz2 file doesn't get generated). Some examples of this: https://uberchromegw.corp.google.com/i/chromeos_release/builders/caroline-release%20release-R58-9334.B/builds/47 https://uberchromegw.corp.google.com/i/chromeos_release/builders/kevin-release%20release-R58-9334.B/builds/48 https://uberchromegw.corp.google.com/i/chromeos_release/builders/buddy-release%20release-R59-9460.B/builds/2 2. The au stage fails with error "CRITICAL:root:no test configurations generated, nothing to do". Sample failing: https://uberchromegw.corp.google.com/i/chromeos/builders/kefka-release/builds/1048 https://uberchromegw.corp.google.com/i/chromeos_release/builders/nyan_kitty-release%20release-R59-9460.B/builds/2 Compared to passing: https://uberchromegw.corp.google.com/i/chromeos_release/builders/nyan_kitty-release%20release-R59-9460.B/builds/3
,
Apr 21 2017
I imagine this feature request would cover point 1 I will file a separate bug for point 2
,
Apr 21 2017
Filed issue 714277 for 2
,
Apr 21 2017
Issue 714322 has been merged into this issue.
,
Apr 22 2017
,
May 11 2017
Any way we can up the priority of this one? We are seeing this failure on R58 stable and it means we cannot verify a bunch of the AU tests in the lab. Adding the string 500 Internal Server Error to this bug also as that is the prominent error one sees in the log so we can more easily locate this bug.
,
May 12 2017
Add a few people from infra deputy to see if we can fix it soon. Without this fix it is very difficult to determine health of AU in stable channel.
,
May 12 2017
,
May 12 2017
See also issue 714277 for an error that also contributes to the problem sometimes.
,
May 25 2017
Aviv, do you know who could look into this?
,
May 26 2017
Reading back to #7, sounds like 2 root causes. Cause #2 already filed as Issue 714277 . let's make this about cause #1. +dgarrett +jrbarnette Is it the desired behavior that we skip generated the au control files if sanity suite fails?
,
May 26 2017
,
May 26 2017
The answer is arguable. I argue that a build which fails the sanity test should be considered 'bad' and thrown away. The big drawback, is that we can't produce any 'good' builds during a lab outage. I've been background working on this proposal, which would fix this class of problem by clearly defining which builds are can be salvaged by re-running tests, and which ones can't. go/cros-split-release-build-test
,
May 26 2017
please open go/cros-split-release-build-test to all @google.com
,
May 26 2017
Sorry about that, done.
,
May 26 2017
Thanks Don, great doc, I am all for it. In the mean time, we are delaying release and cause heartache when we have to perform manual verification on some of the boards. We can't rerun the test due to this missing tar file. I would vote for full steam ahead with your proposal and generate the au_control file now.
,
May 31 2017
ping! Any ETA to make au_control available ?
,
May 31 2017
Are you still regularly getting builds that fail sanity suite? If so, maybe we should focus on that aspect of the problem. Or remove sanity suite from canaries (what purpose do they really serve there? they only test that a server side job can be created, they don't do any build-specific testing work)
,
May 31 2017
Yes we are. Even one board that has this problem we will have to manually run test to ensure it still can AU. Is there a technical road block that we cannot always genreate au_control.tar.bz2 ?
,
May 31 2017
With our build system, we mostly have to decide to ignore an error, or stop the build right away.
,
May 31 2017
I think that for the bug as originally filed, this is a won't fix. But I need to polish up that design doc and follow up with that work.
,
May 31 2017
,
May 31 2017
regards to c#26 I think one or the other is fine. If we stop the build, we should delete all artifact. Currently we end up with a build that seems to install and work and on our end we have endless debate weather to push the build or not... most likely we push. I think even with the work in your designdoc, we will have the same problem. So as far as I can tell we basically ignore the error on our end, I think it is safe to do so in the build system.
,
Jun 1 2017
One of the points in my doc was that if the build fails, we mark the build as unreleasable. That means you should always have all artifacts, or none.
,
Jun 1 2017
Having all artifacts or none is good, however we may want to isolate having all artifacts from any interaction with the lab if possible. If we start missing a percentage of our build artifacts due to lab flakes there will be sadness.
,
Jun 1 2017
+1 for all or nothing, we should be able to do this NOW. Lab flakes should be fix ASAP. We should not release any build when we don't have any test result.
,
Jun 1 2017
According to the doc, a 'pass' build would require the sanity suite to succeed. All other test suites would be run, but the results would be ignored by the build. GE should still publish the test results to help TPMs/test teams make release decisions, and tests could be rerun if needed. However failures with compile, VMTest, sanity suite, signing, etc, would mark the build part as failed, and we'd consider it unreleasable. I'll go work on that doc today, and setup a meeting after it's ready.
,
Jun 1 2017
The only test in suite:sanity is dummy_PassServer. Why are we running this suite at all on the canaries?
,
Jun 1 2017
To ensure that the image can correctly install. In theory, it blocks all other testing, so that we don't break the lab with wildly invalid images that can't boot.
,
Jun 2 2017
,
Jun 12 2017
,
Jul 20 2017
,
Aug 17 2017
What is the the reason that we have an additional NPO autoupdate_EndToEndTest test in the au suite anyway? We already get a bunch of runs of the test in the paygen_au* suite.
,
Aug 17 2017
We used to use N Plus One to prove that we could update away from the current build. Arguably, the most important update test. However, the NPO tests required generation goofy special images and all sorts of things that were really expensive, so we've stopped using them, and have only partially cleaned up the code that supported them. Instead we generate N2N (N to N) for testing the same thing. This requires no new images, and is a very small delta. Much cheaper for builders, signers, and long term storage. It does require that our update tests validate that we really moved from one OS partition to another, or an update that fails by roll back can be treated as successful.
,
Aug 17 2017
In general, it should be safe to remove NPO code. The only exception is that old builds (years old now) may still have NPO images or payloads, and we should quietly ignore them instead of treating the old artifacts as corrupt. There should no longer be any need to install an NPO image or payload into the lab.
,
Sep 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485 commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485 Author: David Haddock <dhaddock@chromium.org> Date: Fri Sep 15 07:24:02 2017 Remove staging of control files for au suite. We no longer need to stage control files as part of the au suite. The npo test case that this test suite controls is covered as part of the N2N tests in paygen_au* suites. This change will remove a lot of manual work required by the test team to run powerwash tests manually when the sanity stage on the builder fails and the control file is not generated. BUG= chromium:758307 BUG= chromium:764038 BUG= chromium:709663 TEST=None Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25 Reviewed-on: https://chromium-review.googlesource.com/665366 Commit-Ready: David Haddock <dhaddock@chromium.org> Tested-by: David Haddock <dhaddock@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> [modify] https://crrev.com/8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485/test_suites/control.au
,
Sep 15 2017
,
Sep 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/24874293ea89dcce5af71f7d5da74d483e4f4dc4 commit 24874293ea89dcce5af71f7d5da74d483e4f4dc4 Author: David Haddock <dhaddock@chromium.org> Date: Fri Sep 15 23:09:34 2017 Remove staging of control files for au suite. We no longer need to stage control files as part of the au suite. The npo test case that this test suite controls is covered as part of the N2N tests in paygen_au* suites. This change will remove a lot of manual work required by the test team to run powerwash tests manually when the sanity stage on the builder fails and the control file is not generated. BUG= chromium:758307 BUG= chromium:764038 BUG= chromium:709663 TEST=None Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25 Reviewed-on: https://chromium-review.googlesource.com/665366 Commit-Ready: David Haddock <dhaddock@chromium.org> Tested-by: David Haddock <dhaddock@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> (cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485) Reviewed-on: https://chromium-review.googlesource.com/669901 Trybot-Ready: David Haddock <dhaddock@chromium.org> [modify] https://crrev.com/24874293ea89dcce5af71f7d5da74d483e4f4dc4/test_suites/control.au
,
Sep 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a commit eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a Author: David Haddock <dhaddock@chromium.org> Date: Mon Sep 18 18:27:41 2017 Remove staging of control files for au suite. We no longer need to stage control files as part of the au suite. The npo test case that this test suite controls is covered as part of the N2N tests in paygen_au* suites. This change will remove a lot of manual work required by the test team to run powerwash tests manually when the sanity stage on the builder fails and the control file is not generated. BUG= chromium:758307 BUG= chromium:764038 BUG= chromium:709663 TEST=None Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25 Reviewed-on: https://chromium-review.googlesource.com/665366 Commit-Ready: David Haddock <dhaddock@chromium.org> Tested-by: David Haddock <dhaddock@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> (cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485) Reviewed-on: https://chromium-review.googlesource.com/671186 Trybot-Ready: David Haddock <dhaddock@chromium.org> [modify] https://crrev.com/eb06fa4395f1cf3bd8d3b930308d8f977a7bdd9a/test_suites/control.au
,
Sep 18 2017
,
Sep 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/cc274348acbbad5bde4480bca36fca6ea8c77981 commit cc274348acbbad5bde4480bca36fca6ea8c77981 Author: David Haddock <dhaddock@chromium.org> Date: Tue Sep 19 01:38:10 2017 Remove staging of control files for au suite. We no longer need to stage control files as part of the au suite. The npo test case that this test suite controls is covered as part of the N2N tests in paygen_au* suites. This change will remove a lot of manual work required by the test team to run powerwash tests manually when the sanity stage on the builder fails and the control file is not generated. BUG= chromium:758307 BUG= chromium:764038 BUG= chromium:709663 TEST=None Change-Id: I88a9bf7266c5615deb221d4adb50e5fbf4977d25 Reviewed-on: https://chromium-review.googlesource.com/665366 Commit-Ready: David Haddock <dhaddock@chromium.org> Tested-by: David Haddock <dhaddock@chromium.org> Reviewed-by: David Haddock <dhaddock@chromium.org> (cherry picked from commit 8f9b1e6bfd3bda3fc922034cbd2f3c89ba7bc485) Reviewed-on: https://chromium-review.googlesource.com/671153 Trybot-Ready: David Haddock <dhaddock@chromium.org> [modify] https://crrev.com/cc274348acbbad5bde4480bca36fca6ea8c77981/test_suites/control.au
,
Sep 19 2017
|
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by pprabhu@chromium.org
, Apr 7 2017Owner: ----