New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 647695 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Paygen failing in Gale: can't find stateful.tgz

Reported by jrbarnette@chromium.org, Sep 16 2016

Issue description

Gale Beta builds have been failing since R54-8743.25.0 on Sep 12.
Although other problems have occurred on some of the builds,
all of them include a Paygen failure with an error like this:
    PayloadTestError: cannot find source stateful.tgz for testing chromeos_8743.30.0_gale_beta-channel_full_test.bin-d89313d44866c044291fc6a132473dd6

Looking at the error message, it seems like the problem with
the missing stateful.tgz is for the source build, which should
not be 8743.30.0.  Alas, the error message didn't bother to name
that source build, so I don't know where the problem is, exactly.

 

Comment 1 by hyehia@chromium.org, Sep 16 2016

Cc: bhthompson@google.com dchan@chromium.org dhadd...@chromium.org
+ a few folks to help
Paygen thinks that 8743.19.0 is the current beta channel release for Gale.

22:33:40: INFO: Previous, non-FSI, builds considered:
22:33:40: INFO:   1: Build definition (board='gale', version=u'8743.19.0', channel='beta-channel')


However, there is no branch release build for .19 (the builds skip from .18 to .20, no idea why).

And when I look at the release artifacts for 8743.19.0, a huge chunk of them are missing, including anything to show that we successfully generated or tested payloads.

gsutil ls -l gs://chromeos-releases/beta-channel/gale/8743.19.0/
       359  2016-09-08T16:39:49Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions
      1469  2016-09-08T16:44:26Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions.json
  58076048  2016-09-08T16:39:50Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.tar.xz
 133634468  2016-09-08T16:39:52Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-test-R54-8743.19.0-gale.tar.xz
   5828395  2016-09-08T16:39:51Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/au-generator.zip
  64759064  2016-09-08T16:44:20Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin
      1176  2016-09-08T16:44:23Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.json
        90  2016-09-08T16:44:22Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.md5
  62521955  2016-09-08T16:44:24Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip
       128  2016-09-08T16:44:25Z  gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip.json
                                 gs://chromeos-releases/beta-channel/gale/8743.19.0/payloads/

So... where did these build artifacts get generated, and how did a broken build get pushed?

Also, can we confirm that .19 was actually pushed to the gale beta channel?
.19 is currently live on beta channel for gale.

I know that up until recently, our TPM (Kurt) was signing our beta builds manually, since we needed beta images earlier than normal. Unsure if this is related.
Almost certainly. Manual signing isn't really supported, and is generally a bad idea. I'm pretty sure you could have just updated the signing instructions for that board on a newer branch to address this in a more reliable way.

Do you know where the build that was signed came from?
It did not come off an official release builder.
If we can find where the build came from, and the rest of it's build artifacts, we can probably copy the stateful.tgz file to where it's supposed to be:

  gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz

That should fix the release builders, and give us the ability to do future releases on beta channel for gale.

Cc: hyehia@chromium.org

Comment 9 by hyehia@chromium.org, Sep 17 2016

Cc: jkurtw@chromium.org
do you mean that first build in M54 branch that we've signed to Beta manually ?

8743.3.0 was the first build in M54 branch that we've signed to Beta and originally it was dev channel signed

and this is what was used for signing 
~/chromiumos/crostools
./channelsign $VERSION $FROM $TO $BOARD

"8743.19.0" is the current Gale beta release, but we have an invalid set of release artifacts for it. That is breaking our ability to do fugure beta channel builds/releases for the board.

The offical release builders have no record of building 8743.19.0 (for any channel). So, where/how was it built?

PS: channelsign is unowned, partially broken, and generally unsafe.

Comment 11 by jkurtw@google.com, Sep 19 2016

Sorry, I was OOO Thurs/Fri

Here is the record of the build
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/5989

I did use channel sign on 8743.19.0 to go from dev to beta and then kicked the branch builder with:
~/depot_tools/cbuildbot --remote --buildbot --branch=release-R54-8743.B gale-release

>PS: channelsign is unowned, partially broken, and generally unsafe.
We will need to be ahead of Chrome OS schedule for M55 Gale as well. Sounds like we should plan to update the signing instructions for our board instead of use channelsign. We have also used channelsign in cases where our stable RC is a few builds behind when Chrome OS started signing stable and we need to retroactively sign it stable for release.

Comment 12 by jkurtw@google.com, Sep 19 2016

Correction to #11, the order was
1) Kicked a branch build that included a revert merge that we needed
~/depot_tools/cbuildbot --remote --buildbot --branch=release-R54-8743.B gale-release

Output here:
https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/5989

2) Signed it beta with channelsign

3) Generated the payloads
~/depot_tools/cbuildbot --remote --channel beta-channel --version 8743.19.0 gale-payloads

Comment 13 by jkurtw@google.com, Sep 19 2016

re #2 - how is 8743.19.0 a broken build? It is green for the applicable tests here:
https://cros-goldeneye.corp.google.com/jetstream/console/listBuild?milestone=54#/details

Our automated testing passed and manual QA passed:
https://buganizer.corp.google.com/issues/31346848

Comment 14 by jkurtw@google.com, Sep 19 2016

dgarrett@ how do we get Gale builds passing again?
Oh... you pushed artifacts from a tryjob? That's also not really a supported path. 

To fix this, find the build artifacts from the tryjob, and copy stateful.tgz to where it should have been put automatically.

  gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz
Also, did that paygen tryjob succeed or fail? It really should have failed because of the missing stateful.tgz file.
> Also, did that paygen tryjob succeed or fail? It really
> should have failed because of the missing stateful.tgz file.


There are no logs for any of the stages in the trybot build,
so we can't see all that happened.  However, the Paygen stage
is green.  The Paygen did kick off AU tests for the builds:
    http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=76148118
The runs did nothing (successfully), because there are no
'gale' DUTs in the test lab.

I filed  crbug.com/648325  to myself about paygen being able to pass with partial artifacts.

So... back to #14. Were you able to find the artifacts from the tryjob that was pushed to beta users?


Comment 19 by jkurtw@google.com, Sep 19 2016

Looking now. 

Yes, the try job succeeded. I forwarded the email to both dgarett@ and jrbarnette@

Comment 20 by jkurtw@google.com, Sep 19 2016

To be sure, I should copy stateful.tgz from:
https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/gale-release/R54-8743.19.0

to 

gs://chromeos-releases/beta-channel/gale/8743.19.0/

Correct?

Comment 21 by jkurtw@google.com, Sep 19 2016

Done
jkurtw@jkurtw:~$ gsutil cp ~/Desktop/stateful.tgz gs://chromeos-releases/beta-channel/gale/8743.19.0/
Copying file:///usr/local/google/home/jkurtw/Desktop/stateful.tgz [Content-Type=application/x-tar]...
| [1 files][ 73.8 MiB/ 73.8 MiB]                                                
Operation completed over 1 objects/73.8 MiB.                                     
jkurtw@jkurtw:~$ gsutil ls gs://chromeos-releases/beta-channel/gale/8743.19.0/
gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions
gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.instructions.json
gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-recovery-R54-8743.19.0-gale.tar.xz
gs://chromeos-releases/beta-channel/gale/8743.19.0/ChromeOS-test-R54-8743.19.0-gale.tar.xz
gs://chromeos-releases/beta-channel/gale/8743.19.0/au-generator.zip
gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin
gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.json
gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.md5
gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip
gs://chromeos-releases/beta-channel/gale/8743.19.0/chromeos_8743.19.0_gale_recovery_beta-channel_premp.bin.zip.json
gs://chromeos-releases/beta-channel/gale/8743.19.0/stateful.tgz
gs://chromeos-releases/beta-channel/gale/8743.19.0/payloads/
jkurtw@jkurtw:~$ 


Also, re: #17, we don't rely on Chrome OS AU testing in the lab. We do AU testing manually on our side

Comment 22 by jkurtw@google.com, Sep 19 2016

8743.34.0 is building now. Will see what happens...

Comment 23 by jkurtw@google.com, Sep 20 2016

All Jetstream builds failed HW Sanity... is there a lab issue?
The HW Sanity failure isn't just Jetstream; it's global.

I've filed  bug 648505 .

... Meanwhile, looking at the Paygen phase for Gale, it _does_
seem to have gotten past the 'stateful.tgz' symptom.  Chances are
once the HW Sanity failure is taken care of, Gale's Paygen stage
will go green once more.

Status: Fixed (was: Available)
gale-release passed, including Paygen. This is fixed.

https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/520

Comment 27 by jkurtw@google.com, Sep 20 2016

ToT was never failing, because of this, right? The M54 builds still failed due to HW Sanity. 

https://uberchromegw.corp.google.com/i/chromeos_release/builders/gale-release%20release-R54-8743.B/builds/22

Comment 28 by jkurtw@google.com, Sep 21 2016

Status: Available (was: Fixed)
Branch builds still marked failed
https://uberchromegw.corp.google.com/i/chromeos_release/builders/gale-release%20release-R54-8743.B/builds/23

Comment 29 by jkurtw@google.com, Sep 21 2016

22:59:58: ERROR: gs://chromeos-releases/beta-channel/gale/8743.24.0/stateful.tgz does not exist.

Same problem for .24. I'll copy the file over.

Comment 30 by jkurtw@google.com, Sep 21 2016

Ok, copied it over for .24. I think that our live beta version moved before stateful.tgz in .19 was used for the last build.
Have you been generating a lot of builds via tryjobs? If so, why?

Forcing a release build via a tryjob is a bad idea, and if you are unlucky can break a build for all of the release builders (you are uprevving version numbers which isn't safe if multiple people do it).

Comment 32 by jkurtw@google.com, Sep 21 2016

Yes, I was kicking Gale builds manually, initially, because the FSI milestone field in GoldenEye was not set. This was not ever set for one of our boards previously, but started mattering recently:
https://groups.google.com/a/google.com/forum/?utm_medium=email&utm_source=footer#!msg/chromeos-infra-discuss/aD9_VFL_o0o/6vnzQphcDAAJ

Comment 33 by sbasi@chromium.org, Sep 27 2016

Status: Fixed (was: Available)
Builder is green, please reopen if this is not truly fixed.
Labels: VerifyIn-55

Comment 35 by dchan@chromium.org, Oct 10 2016

Labels: -VerifyIn-55

Comment 36 by dchan@google.com, Nov 19 2016

Labels: VerifyIn-56
Status: Available (was: Fixed)
This seems to be happening again
https://uberchromegw.corp.google.com/i/chromeos/builders/arkham-release/builds/615

Status: Fixed (was: Available)
> This seems to be happening again
> https://uberchromegw.corp.google.com/i/chromeos/builders/arkham-release/builds/615

I'm not convinced that this failure is the same as the previous
failure.  In any event, we shouldn't recycle this bug.  Please
file a new bug for the new failure.

Comment 39 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 40 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59

Comment 41 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 43 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment