|buildstart stage failing with IntegrityError|
|Project Member Reported by firstname.lastname@example.org, Sep 18||Back to list|
Chrome Version: ToT OS: Chrome OS All toolchain builders failed with: @@@STEP_LINK@Builder documentation@http://www.chromium.org/chromium-os/build/builder-overview#TOC-Continuous@@@ 10:01:42: INFO: Running cidb query on pid 19869, repr(query) starts with 'SELECT NOW()' 10:01:42: INFO: Running cidb query on pid 19869, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f846a485590> 10:01:42: ERROR: Error: (IntegrityError) (1062, "Duplicate entry '8968103418167764992' for key 'buildbucket_id_index'") 'INSERT INTO `buildTable` (master_build_id, buildbot_generation, builder_name, waterfall, build_number, build_config, bot_hostname, start_time, deadline, important, buildbucket_id) VALUES (%s, %s, %s, %s, %s, %s, %s, CURRENT_TIMESTAMP, %s, %s, %s)' (1860550, 1, 'amd64-llvm-next-toolchain', 'chromeos', 388, u'amd64-llvm-next-toolchain', 'cros-beefy272-c2.c.chromeos-bot.internal', datetime.datetime(2017, 9, 19, 6, 53, 24), True, '8968103418167764992') If the buildbucket_id to insert is duplicated to the buildbucket_id of an old build and the old build was canceled because of a waterfall master restart, please ignore this error. Else, the error needs more investigation. More context: crbug.com/679974 and crbug.com/685889 What steps will reproduce the problem? Here is an example: https://chromegw.corp.google.com/i/chromeos/builders/arm64-llvm-next-toolchain/builds/387/steps/BuildStart/logs/stdio and https://chromegw.corp.google.com/i/chromeos/builders/amd64-llvm-next-toolchain/builds/388/steps/BuildStart/logs/stdio all the toolchain builders failed with that this morning. https://chromegw.corp.google.com/i/chromeos/waterfall?builder=master-toolchain&builder=amd64-llvm-next-toolchain&builder=arm-llvm-next-toolchain&builder=arm64-llvm-next-toolchain&titles=off&reload=30 I see the error message refers to this issue: crbug.com/679974 And I can see that previous iteration of the builders was "interrupted" (purple color) so maybe I should be ignoring this error. But, it does not sound right to ignore. The builder yesterday was "interrupted" and the one today failed because of this error. I don't think this should be expected behavior. It is just another day of testing that was not done. So, every interrupt on one day means a failure on the next day? assigning to Sheriff (akeshet) for clarification.
I believe there was a waterfall restart this morning. That may have caused buildbot to "forget" about one of its previous ongoing builds, which could have caused this issue. If this happens again on the next build, warrants further investigation. Otherwise I believfe it should resolve on its own.
we can close this now, it was a corner case.
This issue occurred again. All paladins failed or stopped at exception. The previous master-paladin build failed to clean up (https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16679). Maybe it's the reason. Here is the example of the failure build: https://uberchromegw.corp.google.com/i/chromeos/builders/beaglebone-paladin/builds/15071 I Will keep eyes on it if it's flaky failure or not.
Re #5, yes it's a flaky failure.
This happens (rarely) when buildbot forgets about a previous build and re-uses its buildbot #.
|► Sign in to add a comment|