New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 800562 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 807029



Sign in to add a comment

Swarming builds don't have a buildbot build_number.

Project Member Reported by dgarr...@chromium.org, Jan 9 2018

Issue description

This representative build:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8957835531229180704

Has this failure:

************************************************************
** Start Stage BuildStart - Tue, 09 Jan 2018 15:16:34 -0800 (PST)
** 
** The first stage to run.
** 
**   This stage writes a few basic metadata values that are known at the start of
**   build, and inserts the build into the database, if appropriate.
************************************************************
Preconditions for the stage successfully met. Beginning to execute stage...
15:16:34: INFO: Running cidb query on pid 6145, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7fc2eaef4e10>
15:16:34: ERROR: Error: (IntegrityError) (1062, "Duplicate entry '1-lumpy-compile-only-pre-cq--0' for key 'buildbot_generation'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s)' (None, 1, u'lumpy-compile-only-pre-cq', '', 0, u'lumpy-compile-only-pre-cq', 'swarm-cros-0.c.chromeos-bot.internal', True, '8957835531229180704')
 If the buildbucket_id to insert is duplicated to the buildbucket_id of an old build and the old build was canceled because of a waterfall master restart, please ignore this error. Else, the error needs more investigation. More context:  crbug.com/679974  and  crbug.com/685889 
15:16:34: ERROR: <class 'sqlalchemy.exc.IntegrityError'>: (IntegrityError) (1062, "Duplicate entry '1-lumpy-compile-only-pre-cq--0' for key 'buildbot_generation'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s)' (None, 1, u'lumpy-compile-only-pre-cq', '', 0, u'lumpy-compile-only-pre-cq', 'swarm-cros-0.c.chromeos-bot.internal', True, '8957835531229180704')
Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/failures_lib.py", line 229, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/report_stages.py", line 284, in PerformStage
    raise e
IntegrityError: (IntegrityError) (1062, "Duplicate entry '1-lumpy-compile-only-pre-cq--0' for key 'buildbot_generation'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s)' (None, 1, u'lumpy-compile-only-pre-cq', '', 0, u'lumpy-compile-only-pre-cq', 'swarm-cros-0.c.chromeos-bot.internal', True, '8957835531229180704')

15:16:34: INFO: Translating result <class 'sqlalchemy.exc.IntegrityError'>: (IntegrityError) (1062, "Duplicate entry '1-lumpy-compile-only-pre-cq--0' for key 'buildbot_generation'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s)' (None, 1, u'lumpy-compile-only-pre-cq', '', 0, u'lumpy-compile-only-pre-cq', 'swarm-cros-0.c.chromeos-bot.internal', True, '8957835531229180704')
Traceback (most recent call last):
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/failures_lib.py", line 229, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/report_stages.py", line 284, in PerformStage
    raise e
IntegrityError: (IntegrityError) (1062, "Duplicate entry '1-lumpy-compile-only-pre-cq--0' for key 'buildbot_generation'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s)' (None, 1, u'lumpy-compile-only-pre-cq', '', 0, u'lumpy-compile-only-pre-cq', 'swarm-cros-0.c.chromeos-bot.internal', True, '8957835531229180704')
 to fail.
 
I'm going to guess that:

1-lumpy-compile-only-pre-cq--0

Is a combination of buildbot-generation, builder_name, waterfall, build_number

Which means this is breaking because we populate the buildbot build_number with a hard coded value of 0.
What assumptions would be broken by using the buildbucket id there?
Or.... adjust the index in question to include buildbucket id.
The index enforces that (builder_name, build_number) is unique. Sounds like something here caused build_number (which was a buildbot concept) to either be reset to 0, or to simply stop incrementing at all.

If it's the former and we can't un-reset it, then we can bump the "buildbot-generation" number by 1, it's hard coded in chromite. This was a release valve I put in place for precisely this situation.

If it's the latter and sequential build_number is not going to be a meaningful concept at all, then it needs some more thought on how we want to migrate away from using build_number.
Currently, "build_number" is ALWAYS 0 for swarming builds. There is no similar concept, except for the buildbucket id, or swarming id.

My belief is that this is an issue for CIDB interaction, version strings, and (indirectly) artifact URLs.

For swarming builds, they could also be replaced with CIDB id's and/or with buildbucket ids. CIDB id's are ordered which is nice when looking at build artifacts in GS. But they create a problem creating the build table entry, since you need to create an entry to allocate a CIDB id.

buildbucket id's create an issue because they are non-sequential as well as not formatted as integers (and are really long strings).

I'm also fairly certain that the stats scripts use timestamps for ordering, but I'm not 100%.

My thinking is to modify the UNIQUE KEY first, then work through the other issues. The need to do so is called out in my roadmap docs.
Project Member

Comment 7 by bugdroid1@chromium.org, Jan 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/452339af383f081bbf48a0c0cb8d5660460ba5c3

commit 452339af383f081bbf48a0c0cb8d5660460ba5c3
Author: Don Garrett <dgarrett@google.com>
Date: Fri Jan 12 03:37:37 2018

cidb: Add buildbucket_id to build_table index.

Swarming builds don't have a buildbot build_number. So... they don't
appear to be unique in the the build table unique index. Add
buildbucket_id to that index.

BUG= chromium:800562 
TEST=cidb_integration_test

Change-Id: I7ffdae1f3d46510d624fb5af86c3113f43bb702f
Reviewed-on: https://chromium-review.googlesource.com/858579
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/452339af383f081bbf48a0c0cb8d5660460ba5c3/cidb/schema.dump
[add] https://crrev.com/452339af383f081bbf48a0c0cb8d5660460ba5c3/cidb/migrations/00064_add_buildbucket_to_build_table_index.sql

That schema migration ran in substantially < 1 second on the debug db (~40k rows).

If the cost is linear with the database size, this means that prod (~2.2 million rows) will take < 1 minute.
I created a clone of prod CIDB and it finshed in < 14 seconds.

Still, it's Friday before a 3 day weekend. Waiting until Monday to apply in prod.
Summary: Swarming builds don't have a buildbot build_number. (was: Swarming builds failing with CIDB error.)
Builds are now working, but the unique IDs used for builds in a variety of places aren't always unique.

https://ci.chromium.org/p/chromeos/builds/b8957224010844851056

This may affect CIDB behavior (needs investigation), may affect build version strings, and certainly affect build artifact URLs.
Project Member

Comment 11 by bugdroid1@chromium.org, Jan 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/1c6f6d536bbaad829a610332f1f8dbf7a576eb8e

commit 1c6f6d536bbaad829a610332f1f8dbf7a576eb8e
Author: Don Garrett <dgarrett@google.com>
Date: Thu Jan 18 22:44:46 2018

cidb: Remove start/ending_build_number.

build_number is based on a buildbot concept that doesn't exist for
swarming builds. Remove the use of it for build history searches.

BUG= chromium:800562 
TEST=run_tests

Change-Id: Ib5e43dabe066f20bd4425ebcbe5512b243815386
Reviewed-on: https://chromium-review.googlesource.com/869193
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/1c6f6d536bbaad829a610332f1f8dbf7a576eb8e/scripts/summarize_build_stats.py
[modify] https://crrev.com/1c6f6d536bbaad829a610332f1f8dbf7a576eb8e/lib/fake_cidb.py
[modify] https://crrev.com/1c6f6d536bbaad829a610332f1f8dbf7a576eb8e/lib/cidb.py
[modify] https://crrev.com/1c6f6d536bbaad829a610332f1f8dbf7a576eb8e/scripts/som_alerts_dispatcher.py

Project Member

Comment 12 by bugdroid1@chromium.org, Jan 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/8cd2a1c6b37480959bd001f932904b74b1f1d2b2

commit 8cd2a1c6b37480959bd001f932904b74b1f1d2b2
Author: Don Garrett <dgarrett@google.com>
Date: Fri Jan 19 08:29:46 2018

cbuildbot_run: Adjust GetVersion to use CIDB Id.

build_number is a buildbot concept which doesn't exist for swarming
builds (and defaults to 0 there). Switch to CIDB ids for the same
purpose.

This needs unittests and more thought, but uploading now to allow
tryjobs.

BUG= chromium:800562 
TEST=run_tests

Change-Id: I4850d1d2438786c6e688db90926f66a047eed6d4
Reviewed-on: https://chromium-review.googlesource.com/869450
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/8cd2a1c6b37480959bd001f932904b74b1f1d2b2/cbuildbot/cbuildbot_run.py
[modify] https://crrev.com/8cd2a1c6b37480959bd001f932904b74b1f1d2b2/cbuildbot/cbuildbot_run_unittest.py

Project Member

Comment 13 by bugdroid1@chromium.org, Jan 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/10ace85c7a3e6c42cbc0805da3379fe70be70d68

commit 10ace85c7a3e6c42cbc0805da3379fe70be70d68
Author: Dan Erat <derat@chromium.org>
Date: Fri Jan 19 19:46:57 2018

Revert "cbuildbot_run: Adjust GetVersion to use CIDB Id."

This reverts commit 8cd2a1c6b37480959bd001f932904b74b1f1d2b2.

Reason for revert: Apparently breaking provisioning on
informational Chrome PFQ builders:  https://crbug.com/803854 

Original change's description:
> cbuildbot_run: Adjust GetVersion to use CIDB Id.
> 
> build_number is a buildbot concept which doesn't exist for swarming
> builds (and defaults to 0 there). Switch to CIDB ids for the same
> purpose.
> 
> This needs unittests and more thought, but uploading now to allow
> tryjobs.
> 
> BUG= chromium:800562 
> TEST=run_tests
> 
> Change-Id: I4850d1d2438786c6e688db90926f66a047eed6d4
> Reviewed-on: https://chromium-review.googlesource.com/869450
> Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
> Tested-by: Don Garrett <dgarrett@chromium.org>
> Reviewed-by: Ningning Xia <nxia@chromium.org>

Bug:  chromium:800562 
Change-Id: If865e002f1f0e17ce47707655f6df6d38dd05dfe
Reviewed-on: https://chromium-review.googlesource.com/876243
Commit-Queue: Dan Erat <derat@chromium.org>
Tested-by: Dan Erat <derat@chromium.org>
Trybot-Ready: Dan Erat <derat@chromium.org>
Reviewed-by: Lann Martin <lannm@chromium.org>

[modify] https://crrev.com/10ace85c7a3e6c42cbc0805da3379fe70be70d68/cbuildbot/cbuildbot_run.py
[modify] https://crrev.com/10ace85c7a3e6c42cbc0805da3379fe70be70d68/cbuildbot/cbuildbot_run_unittest.py

To fix the version testing, you'll need to fix this method:
    https://cs.corp.google.com/chromeos_internal/src/third_party/autotest/files/server/hosts/cros_host.py?l=1414

The change is along these lines:
  * Create a new function in lsbrelease_utils.py that can read
    CHROMEOS_RELEASE_BUILDER_PATH.
  * Plumb the lsrelease_utils function into a method in CrosHost.
  * Drop the current complicated check in favor of a check that
    the value of the "cros-version" label is equal to the value
    returned by the new method (that is, don't call utils.version_match()).

For reference, a modern /etc/lsb-release with CHROMEOS_RELEASE_BUILDER_PATH
looks like this:
    CHROMEOS_RELEASE_APPID={3C46DD14-D42A-9F77-E2CB-7D83422F5B73}
    CHROMEOS_BOARD_APPID={3C46DD14-D42A-9F77-E2CB-7D83422F5B73}
    CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
    DEVICETYPE=CHROMEBOX
    CHROMEOS_RELEASE_BUILDER_PATH=tricky-tot-chrome-pfq-informational/R65-10321.0.0-c2223837
    GOOGLE_RELEASE=10321.0.0
    CHROMEOS_DEVSERVER=
    CHROMEOS_RELEASE_BOARD=tricky
    CHROMEOS_RELEASE_BUILD_NUMBER=10321
    CHROMEOS_RELEASE_BRANCH_NUMBER=0
    CHROMEOS_RELEASE_CHROME_MILESTONE=65
    CHROMEOS_RELEASE_PATCH_NUMBER=0
    CHROMEOS_RELEASE_TRACK=testimage-channel
    CHROMEOS_RELEASE_DESCRIPTION=10321.0.0 (Official Build) dev-channel tricky test
    CHROMEOS_RELEASE_BUILD_TYPE=Official Build
    CHROMEOS_RELEASE_NAME=Chrome OS
    CHROMEOS_RELEASE_VERSION=10321.0.0
    CHROMEOS_AUSERVER=https://tools.google.com/service/update2

Project Member

Comment 15 by bugdroid1@chromium.org, Feb 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b9f3580ea0c8a766d8fcf42d00abbf51558f49b0

commit b9f3580ea0c8a766d8fcf42d00abbf51558f49b0
Author: Don Garrett <dgarrett@google.com>
Date: Tue Feb 13 19:08:43 2018

[autotest] Adjust DUT version labels verification.

Compare lsbrelease's CHROMEOS_RELEASE_BUILD_PATH againt the version
label to see if the correct version is installed, instead of
CHROMEOS_RELEASE_VERSION. A simple string compare makes the lab
slightly more robust against changes to the format of that string.

This does NOT affect the provision flow and verifications done there.

BUG= chromium:800562 
TEST=None

Change-Id: Ia98c62744d1cbbacc22cb10500ae0e31b6e61755
Reviewed-on: https://chromium-review.googlesource.com/907326
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/b9f3580ea0c8a766d8fcf42d00abbf51558f49b0/client/common_lib/lsbrelease_utils_unittest.py
[modify] https://crrev.com/b9f3580ea0c8a766d8fcf42d00abbf51558f49b0/server/hosts/cros_host.py
[modify] https://crrev.com/b9f3580ea0c8a766d8fcf42d00abbf51558f49b0/client/common_lib/lsbrelease_utils.py

Project Member

Comment 16 by bugdroid1@chromium.org, Feb 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/68134125081027f1ae98be03a8e5b882ac24a39a

commit 68134125081027f1ae98be03a8e5b882ac24a39a
Author: Don Garrett <dgarrett@google.com>
Date: Thu Feb 15 08:10:53 2018

cbuildbot_run: Adjust GetVersion to use CIDB Id.

build_number is a buildbot concept which doesn't exist for swarming
builds (and defaults to 0 there). Switch to CIDB ids for the same
purpose.

Builder version strings for assorted unofficial builders (including
tryjobs) will change formats from:

  chell-incremental/R66-10401.0.0-b4390
->
  chell-incremental/R66-10401.0.0-c123456

Hopefully, there won't be anyone left trying to explicitly parse those
strings.

This was previously landed as CL:869450, and reverted as CL:876243.

Relanding, now that the lab has been adjusted to accept the new build
version string format, and "cros buildresult" is available for the
toolchain team.

BUG= chromium:800562 
TEST=run_tests

Change-Id: I7e007f1c660c22efce704233792c9716aca6c3f5
Reviewed-on: https://chromium-review.googlesource.com/917601
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/68134125081027f1ae98be03a8e5b882ac24a39a/cbuildbot/cbuildbot_run.py
[modify] https://crrev.com/68134125081027f1ae98be03a8e5b882ac24a39a/cbuildbot/cbuildbot_run_unittest.py

Project Member

Comment 17 by bugdroid1@chromium.org, Feb 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/ab252592101b21ba15a2372f0998ed56fb1f047c

commit ab252592101b21ba15a2372f0998ed56fb1f047c
Author: Don Garrett <dgarrett@chromium.org>
Date: Thu Feb 15 18:27:22 2018

Revert "cbuildbot_run: Adjust GetVersion to use CIDB Id."

This reverts commit 68134125081027f1ae98be03a8e5b882ac24a39a.

Reason for revert: This CL works, but breaks toolchain team scripts.

A new tool "cros buildresult" was built to let the toolchain scripts be less dependent on how we run builds, but has a significant bug which prevents them from using it yet.

Reverting until the new tool works for them.

BUG= chromium:807029 

Original change's description:
> cbuildbot_run: Adjust GetVersion to use CIDB Id.
> 
> build_number is a buildbot concept which doesn't exist for swarming
> builds (and defaults to 0 there). Switch to CIDB ids for the same
> purpose.
> 
> Builder version strings for assorted unofficial builders (including
> tryjobs) will change formats from:
> 
>   chell-incremental/R66-10401.0.0-b4390
> ->
>   chell-incremental/R66-10401.0.0-c123456
> 
> Hopefully, there won't be anyone left trying to explicitly parse those
> strings.
> 
> This was previously landed as CL:869450, and reverted as CL:876243.
> 
> Relanding, now that the lab has been adjusted to accept the new build
> version string format, and "cros buildresult" is available for the
> toolchain team.
> 
> BUG= chromium:800562 
> TEST=run_tests
> 
> Change-Id: I7e007f1c660c22efce704233792c9716aca6c3f5
> Reviewed-on: https://chromium-review.googlesource.com/917601
> Commit-Ready: Don Garrett <dgarrett@chromium.org>
> Tested-by: Don Garrett <dgarrett@chromium.org>
> Reviewed-by: Ningning Xia <nxia@chromium.org>

Bug:  chromium:800562 
Change-Id: I833e2324439c6eaf750f47005130f02470336d4d
Reviewed-on: https://chromium-review.googlesource.com/922521
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Commit-Queue: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/ab252592101b21ba15a2372f0998ed56fb1f047c/cbuildbot/cbuildbot_run.py
[modify] https://crrev.com/ab252592101b21ba15a2372f0998ed56fb1f047c/cbuildbot/cbuildbot_run_unittest.py

Blockedon: 807029
Project Member

Comment 19 by bugdroid1@chromium.org, Mar 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/8e3708c8093dd77a78e8425d0d23f79fc8779f1e

commit 8e3708c8093dd77a78e8425d0d23f79fc8779f1e
Author: Don Garrett <dgarrett@google.com>
Date: Sat Mar 10 06:59:24 2018

cbuildbot_run: Adjust GetVersion to use CIDB Id.

A similar CL:869450 was submitted and reverted previously because
assorted lab code could not handle the change in the ChromeOS build
version string from *-bXXX to *-cXXX.

I've given up trying to make the lab more flexible, so this CL keeps
the *-bXXX, but changes the meaning of XXX from builder run number to
CIDB id.

This is still expected to cause issues for the toolchain team until
chromium:807029 is fixed.

BUG= chromium:800562 
TEST=run_tests
CQ-DEPEND=CL:954052

Change-Id: I1132f8e2618314f2cdcc8705756394d275f9ce51
Reviewed-on: https://chromium-review.googlesource.com/951915
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Reviewed-by: Richard Barnette <jrbarnette@google.com>

[modify] https://crrev.com/8e3708c8093dd77a78e8425d0d23f79fc8779f1e/cbuildbot/cbuildbot_run.py
[modify] https://crrev.com/8e3708c8093dd77a78e8425d0d23f79fc8779f1e/cbuildbot/cbuildbot_run_unittest.py

Status: Fixed (was: Started)

Sign in to add a comment