Paygen failing in Gale: can't find stateful.tgz
Reported by
jrbarnette@chromium.org,
Sep 16 2016
|
||||||||||||||||
Issue description
Gale Beta builds have been failing since R54-8743.25.0 on Sep 12.
Although other problems have occurred on some of the builds,
all of them include a Paygen failure with an error like this:
PayloadTestError: cannot find source stateful.tgz for testing chromeos_8743.30.0_gale_beta-channel_full_test.bin-d89313d44866c044291fc6a132473dd6
Looking at the error message, it seems like the problem with
the missing stateful.tgz is for the source build, which should
not be 8743.30.0. Alas, the error message didn't bother to name
that source build, so I don't know where the problem is, exactly.
,
Sep 16 2016
Paygen thinks that 8743.19.0 is the current beta channel release for Gale.
22:33:40: INFO: Previous, non-FSI, builds considered:
22:33:40: INFO: 1: Build definition (board='gale', version=u'8743.19.0', channel='beta-channel')
However, there is no branch release build for .19 (the builds skip from .18 to .20, no idea why).
And when I look at the release artifacts for 8743.19.0, a huge chunk of them are missing, including anything to show that we successfully generated or tested payloads.
gsutil ls -l gs://chromeos-releases/beta-channel/gale/8743.19.0/
359 2016-09-08T16:39:49Z gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions
1469 2016-09-08T16:44:26Z gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions.json
58076048 2016-09-08T16:39:50Z gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.tar.xz
133634468 2016-09-08T16:39:52Z gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-test-R54-8743.19.0-gale.tar.xz
5828395 2016-09-08T16:39:51Z gs://chromeos-releases/beta-channel/gale/8743.19.0/au-generator.zip
64759064 2016-09-08T16:44:20Z gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin
1176 2016-09-08T16:44:23Z gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.json
90 2016-09-08T16:44:22Z gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.md5
62521955 2016-09-08T16:44:24Z gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip
128 2016-09-08T16:44:25Z gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip.json
gs://chromeos-releases/beta-channel/gale/8743.19.0/payloads/
So... where did these build artifacts get generated, and how did a broken build get pushed?
,
Sep 16 2016
Also, can we confirm that .19 was actually pushed to the gale beta channel?
,
Sep 16 2016
.19 is currently live on beta channel for gale. I know that up until recently, our TPM (Kurt) was signing our beta builds manually, since we needed beta images earlier than normal. Unsure if this is related.
,
Sep 16 2016
Almost certainly. Manual signing isn't really supported, and is generally a bad idea. I'm pretty sure you could have just updated the signing instructions for that board on a newer branch to address this in a more reliable way. Do you know where the build that was signed came from?
,
Sep 16 2016
It did not come off an official release builder.
,
Sep 16 2016
If we can find where the build came from, and the rest of it's build artifacts, we can probably copy the stateful.tgz file to where it's supposed to be: gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz That should fix the release builders, and give us the ability to do future releases on beta channel for gale.
,
Sep 16 2016
,
Sep 17 2016
do you mean that first build in M54 branch that we've signed to Beta manually ? 8743.3.0 was the first build in M54 branch that we've signed to Beta and originally it was dev channel signed and this is what was used for signing ~/chromiumos/crostools ./channelsign $VERSION $FROM $TO $BOARD
,
Sep 17 2016
"8743.19.0" is the current Gale beta release, but we have an invalid set of release artifacts for it. That is breaking our ability to do fugure beta channel builds/releases for the board. The offical release builders have no record of building 8743.19.0 (for any channel). So, where/how was it built? PS: channelsign is unowned, partially broken, and generally unsafe.
,
Sep 19 2016
Sorry, I was OOO Thurs/Fri Here is the record of the build https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/5989 I did use channel sign on 8743.19.0 to go from dev to beta and then kicked the branch builder with: ~/depot_tools/cbuildbot --remote --buildbot --branch=release-R54-8743.B gale-release >PS: channelsign is unowned, partially broken, and generally unsafe. We will need to be ahead of Chrome OS schedule for M55 Gale as well. Sounds like we should plan to update the signing instructions for our board instead of use channelsign. We have also used channelsign in cases where our stable RC is a few builds behind when Chrome OS started signing stable and we need to retroactively sign it stable for release.
,
Sep 19 2016
Correction to #11, the order was 1) Kicked a branch build that included a revert merge that we needed ~/depot_tools/cbuildbot --remote --buildbot --branch=release-R54-8743.B gale-release Output here: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/5989 2) Signed it beta with channelsign 3) Generated the payloads ~/depot_tools/cbuildbot --remote --channel beta-channel --version 8743.19.0 gale-payloads
,
Sep 19 2016
re #2 - how is 8743.19.0 a broken build? It is green for the applicable tests here: https://cros-goldeneye.corp.google.com/jetstream/console/listBuild?milestone=54#/details Our automated testing passed and manual QA passed: https://buganizer.corp.google.com/issues/31346848
,
Sep 19 2016
dgarrett@ how do we get Gale builds passing again?
,
Sep 19 2016
Oh... you pushed artifacts from a tryjob? That's also not really a supported path. To fix this, find the build artifacts from the tryjob, and copy stateful.tgz to where it should have been put automatically. gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz
,
Sep 19 2016
Also, did that paygen tryjob succeed or fail? It really should have failed because of the missing stateful.tgz file.
,
Sep 19 2016
> Also, did that paygen tryjob succeed or fail? It really
> should have failed because of the missing stateful.tgz file.
There are no logs for any of the stages in the trybot build,
so we can't see all that happened. However, the Paygen stage
is green. The Paygen did kick off AU tests for the builds:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=76148118
The runs did nothing (successfully), because there are no
'gale' DUTs in the test lab.
,
Sep 19 2016
I filed crbug.com/648325 to myself about paygen being able to pass with partial artifacts. So... back to #14. Were you able to find the artifacts from the tryjob that was pushed to beta users?
,
Sep 19 2016
Looking now. Yes, the try job succeeded. I forwarded the email to both dgarett@ and jrbarnette@
,
Sep 19 2016
To be sure, I should copy stateful.tgz from: https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/gale-release/R54-8743.19.0 to gs://chromeos-releases/beta-channel/gale/8743.19.0/ Correct?
,
Sep 19 2016
Done jkurtw@jkurtw:~$ gsutil cp ~/Desktop/stateful.tgz gs://chromeos-releases/beta-channel/gale/8743.19.0/ Copying file:///usr/local/google/home/jkurtw/Desktop/stateful.tgz [Content-Type=application/x-tar]... | [1 files][ 73.8 MiB/ 73.8 MiB] Operation completed over 1 objects/73.8 MiB. jkurtw@jkurtw:~$ gsutil ls gs://chromeos-releases/beta-channel/gale/8743.19.0/ gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions.json gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.tar.xz gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-test-R54-8743.19.0-gale.tar.xz gs://chromeos-releases/beta-channel/gale/8743.19.0/au-generator.zip gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.json gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.md5 gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip.json gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz gs://chromeos-releases/beta-channel/gale/8743.19.0/payloads/ jkurtw@jkurtw:~$ Also, re: #17, we don't rely on Chrome OS AU testing in the lab. We do AU testing manually on our side
,
Sep 19 2016
8743.34.0 is building now. Will see what happens...
,
Sep 20 2016
All Jetstream builds failed HW Sanity... is there a lab issue?
,
Sep 20 2016
The HW Sanity failure isn't just Jetstream; it's global. I've filed bug 648505 .
,
Sep 20 2016
... Meanwhile, looking at the Paygen phase for Gale, it _does_ seem to have gotten past the 'stateful.tgz' symptom. Chances are once the HW Sanity failure is taken care of, Gale's Paygen stage will go green once more.
,
Sep 20 2016
gale-release passed, including Paygen. This is fixed. https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/520
,
Sep 20 2016
ToT was never failing, because of this, right? The M54 builds still failed due to HW Sanity. https://uberchromegw.corp.google.com/i/chromeos_release/builders/gale-release%20release-R54-8743.B/builds/22
,
Sep 21 2016
Branch builds still marked failed https://uberchromegw.corp.google.com/i/chromeos_release/builders/gale-release%20release-R54-8743.B/builds/23
,
Sep 21 2016
22:59:58: ERROR: gs://chromeos-releases/beta-channel/gale/8743.24.0/stateful.tgz does not exist. Same problem for .24. I'll copy the file over.
,
Sep 21 2016
Ok, copied it over for .24. I think that our live beta version moved before stateful.tgz in .19 was used for the last build.
,
Sep 21 2016
Have you been generating a lot of builds via tryjobs? If so, why? Forcing a release build via a tryjob is a bad idea, and if you are unlucky can break a build for all of the release builders (you are uprevving version numbers which isn't safe if multiple people do it).
,
Sep 21 2016
Yes, I was kicking Gale builds manually, initially, because the FSI milestone field in GoldenEye was not set. This was not ever set for one of our boards previously, but started mattering recently: https://groups.google.com/a/google.com/forum/?utm_medium=email&utm_source=footer#!msg/chromeos-infra-discuss/aD9_VFL_o0o/6vnzQphcDAAJ
,
Sep 27 2016
Builder is green, please reopen if this is not truly fixed.
,
Oct 7 2016
,
Oct 10 2016
,
Nov 19 2016
,
Nov 29 2016
This seems to be happening again https://uberchromegw.corp.google.com/i/chromeos/builders/arkham-release/builds/615
,
Nov 29 2016
> This seems to be happening again > https://uberchromegw.corp.google.com/i/chromeos/builders/arkham-release/builds/615 I'm not convinced that this failure is the same as the previous failure. In any event, we shouldn't recycle this bug. Please file a new bug for the new failure.
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by hyehia@chromium.org
, Sep 16 2016