New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 773513 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Android PFQ is failing: FAILURE [failure_reason] INVALID_BUILD_DEFINITION

Project Member Reported by khmel@chromium.org, Oct 10 2017

Issue description

betty-arc64-nyc-android-pfq: [status] COMPLETED [result] FAILURE [failure_reason] INVALID_BUILD_DEFINITION 

https://uberchromegw.corp.google.com/i/chromeos/builders/master-nyc-android-pfq/builds/1116


 
The fix for now is to mark that build config as "experimental", which will allow the errors caused by it's not existing to be ignored.

I need to look deeper to see why the new builder wasn't created. After that, we'll need to do another waterfall restart. It's not possible to do that until tomorrow morning.


Until we can get rid of buildbot, the flow for bringing up a new builder is:

0) Create build config without adding to master or waterfall. Run tryjobs.
1) Create the new build config, as an experimental builder.
2) Request waterfall restart.
3) See if new build column (ie: new builder) is working.
4) Mark builder as important.
5) Request low priority waterfall restart. This is only needed to remove "experimental" from the build column name.


Step 0 isn't required, but good practice.


khmel, can you own marking the builder as experimental, and updating whatever docs/process you followed?

I'll figure out why the builder doesn't exist.

Comment 3 by khmel@chromium.org, Oct 10 2017

Tried to make it experimental here:
https://cros-goldeneye.corp.google.com/chromeos/console/buildConfigBoard?name=betty-arc64

However this option is disabled for me (Not sure if this is correct link to do this)/.


Comment 4 by norvez@chromium.org, Oct 11 2017

I'm very confused. Last week betty-arc64 was part of both Chrome and Android PFQ and meant to be stable (https://bugs.chromium.org/p/chromium/issues/detail?id=769808). Why was the builder removed?
Cc: bhthompson@chromium.org
Ah....

You landed https://crrev.com/c/690744, it was reverted (probably because it would have broken the Android PFQ), then you cherry-picked it forward on the same CL.

That doesn't actually work, but I would have expected more errors than you got. If it look for the SHA1 where you cherry-picked it forward (2a520e21fa0e7208889e1b67ef9fe491c1cbf03f), that hash doesn't exist in chromite git history.

So... looking at the TOT build config, these builders exist on the chromeos waterfall:
  betty-arc64-nyc-android-pfq (baremetal)
  betty-arc64-paladin (baremetal)
  betty-arc64-release (baremetal)

However, the buildbot build columns only exist for:
  betty-arc64-paladin (baremetal)
  betty-arc64-release (baremetal)

Since the android-pfq builder doesn't exist on the waterfall, attempts to schedule it by the master fail, which kills the PFQ run. I've marked the builder as experimental, which will allow the android-nyc-pfq to pass until the build column is created (hopefully tomorrow morning). We then mark the build as important, and it counts.

It's not currently listed on the Chrome PFQ at all, and will need to go through the same process there.




PS: To re-land a CL with Repo / GoB, you can re-land it, but need to delete the "Change-Id" line from the commit message. That will cause it to get a new Id, and GoB will treat it as an all-new CL.

    Change-Id: Ifaf66e31ed300fdce6f58a750665a25958c3faaf

PS: If you think this experimental/waterfall restart is a pain, I strongly agree. And it's why we are trying to move away from buildbot to swarming builds with an all new build UI.
Cc: kinaba@chromium.org

Comment 9 by norvez@chromium.org, Oct 11 2017

IIRC I used the Reland feature of Gerrit and it got a new Change-Id: https://chromium-review.googlesource.com/c/chromiumos/chromite/+/701421
Maybe quoting the original commit message confused things?

Not sure what happened, I'm still extremely confused. Looking at the PFQ results on GE (https://cros-goldeneye.corp.google.com/chromeos/console/listAndroidPfqBuild#/) betty-arc64-nyc-android-pfq was instantiated and running fine until the run at 10:36 this morning. Not sure what changed this morning? A waterfall restart with a different config?
There was a waterfall restart this morning.

If the builder disappeared during that restart....... was the config somehow removed/re-added or something?

I failed to update chromite pins correctly while performing the restart, but made no pin changes at all. I didn't roll them backwards in time or anything.
After a lot of debugging, new builders were correctly created during the build this morning.

HOWEVER, the "Android PFQ" link at the top of the waterfall does not include all Android PFQ builders.

betty-arc64 exists on the waterfall, and should be building like any other build slave.

https://uberchromegw.corp.google.com/i/chromeos/builders/betty-arc64-nyc-android-pfq
I've just started a new Android PFQ build to see if things work correctly.
Status: Fixed (was: Untriaged)

Sign in to add a comment