Chromium Perf FYI console is down |
||||||
Issue descriptionSteps to repro: navigate to https://uberchromegw.corp.google.com/i/chromium.perf.fyi/console Screenshot:
,
Jul 15 2016
,
Jul 15 2016
error log:
File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 810, in got_changes
revisions)
File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 445, in getAllBuildsForRevision
revisions)
File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 370, in getBuildsForRevision
build, revision)
File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 389, in getChangeForBuild
changes.sort(key=self.comparator.getSortingKey())
File "/home/chrome-bot/buildbot/build/scripts/master/gitiles_poller.py", line 121, in <lambda>
return lambda c: self.sha1_lookup.__getitem__(c.revision)
exceptions.KeyError: u'f449a7f4828a5610b1e21b8d2e378555fa59ca7b'
,
Jul 15 2016
this is an interesting bug to analyze: gitiles_poller's storage of revision order is "changes" postgres table https://cs.chromium.org/chromium/build/scripts/master/gitiles_poller.py?q=gitiles_poller.py&sq=package:chromium&l=89 buildbot cleans up that table periodically on restart https://cs.chromium.org/chromium/build/third_party/buildbot_8_4p1/buildbot/db/changes.py?q=prunechanges+file:%5Ebuild/third_party/buildbot_8_4p1/buildbot/&sq=package:chromium&l=264 , which happened few times recently the "changeHorizon" value for master.chromium.perf.fyi seems to be 1000 https://cs.chromium.org/chromium/build/scripts/master/master_utils.py?q=buildHorizo+file:%5Ebuild/&sq=package:chromium&l=435 but current # of rows in the changes table is 3106. On other masters the value is 3000, so either I cannot find where did does it configure 3000 (more likely), or 2106 rows were created since last prune. out of 3000+ changes 2617 are created by scheduling buildbucket builds: chromiumperffyi=> select count(*) from changes where category = 'buildbucket'; count ------- 2617 (1 row) note that this correlates with the fact that perf team recently switched to buildbucket for scheduling builds There is a lot of changes for the same revision: chromiumperffyi=> select count(*) from (select distinct revision from changes where category = 'buildbucket') as distinct_revisions; count ------- 885 (1 row) This happens when a build scheduler does not specify a globally unique id for a change: see "If id and revision are specified, buildbot master will search for an existing buildbot change prior creating a new one." in https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/master/buildbucket/README.md#Build-parameters I think the best way to go is to solve the root cause, so I will think if I can avoid the unique change id parameter or change the way perf builders schedule changes
,
Jul 15 2016
ftr we didn't experience this before because buildbucket was used only on tryservers which don't use console view
,
Jul 15 2016
some of builds where pushed through trigger recipe module, not buildbucket module (not by auto_bisect): see step "trigger" in https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/11098 The trigger module also uses buildbucket behind the scenes, but copies change of the triggering buidlbot build to the buildbucket build being triggered. If that happens across masters (the case), there is no way to get a globally unique change id. A simpler solution might be as follows: if a master is not a tryserver (the case), treat revision url (revlink column) as a unique identifier of the change. This will prevent creation of duplicate rows in changes table this relies on the fact that revlink is present: chromiumperffyi=> select count(*) from changes where revlink is null or revlink = ''; count ------- 0 (1 row)
,
Jul 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/5ff472dc7cd21a25d968e4a7d901000bee38304e commit 5ff472dc7cd21a25d968e4a7d901000bee38304e Author: nodir <nodir@chromium.org> Date: Fri Jul 15 19:04:10 2016 master.chromium.perf.fyi: order console by time gitiles revision comparator assumes that all rows in the changes table were created by it, which is not the case with buildbucket, so we are switching to a more generic comparator there is a possibility that it may cause some weird presentation of console view, in particular some rows in the console view may look like rev deadbeef rev badcoffee rev deadbeef because changes created by buidlbucket module have timestamp equal to the build creation time R=dnj@chromium.org BUG= 626019 Review-Url: https://codereview.chromium.org/2150093003 [modify] https://crrev.com/5ff472dc7cd21a25d968e4a7d901000bee38304e/masters/master.chromium.perf.fyi/master.cfg
,
Jul 15 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/21b50cd0138c82a2f2988b88f3b39c1dc13a07c2 commit 21b50cd0138c82a2f2988b88f3b39c1dc13a07c2 Author: nodir <nodir@google.com> Date: Fri Jul 15 19:22:02 2016
,
Jul 15 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/21b50cd0138c82a2f2988b88f3b39c1dc13a07c2 commit 21b50cd0138c82a2f2988b88f3b39c1dc13a07c2 Author: nodir <nodir@google.com> Date: Fri Jul 15 19:22:02 2016
,
Jul 15 2016
console works now, but now some changes are displayed more than once: fc28338d5f07 qyearsley@chromium.org 7bc4a4651fcd skia-deps-roller@chromium.org e0d4bda9d598 dtseng@chromium.org 4f4fe6bcbcbf thestig@chromium.org 37d40efe1477 dgrogan@chromium.org dc0e4f4f0b1b ryansturm@chromium.org 1da12030f6c1 juncai@chromium.org dc0e4f4f0b1b ryansturm@chromium.org eaae3bb1ac1b zmo@chromium.org f8d97238d977 jbudorick@chromium.org 5ccd88af4b33 tomhudson@google.com 7fc01f2b3c63 malaykeshav@chromium.org bd27a25399b2 ortuno@chromium.org 7fc01f2b3c63 malaykeshav@chromium.org bd27a25399b2 ortuno@chromium.org 7fc01f2b3c63 malaykeshav@chromium.org bd27a25399b2 ortuno@chromium.org 7fc01f2b3c63 malaykeshav@chromium.org bd27a25399b2 ortuno@chromium.org especially when switching to merged view on the bot. ----------- this also made me realize that buildbucket is being used on a master that merges build requests, but that's explicitly not supported by buildbucket: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/master/buildbucket/README.md#Limitations it is not supported because buildbucket allows scheduling arbitrary builds and it does not make sense to merge them (like in this case), as opposed waterfalls where order-of-builds=order-of-changes=order-of-commits
,
Jul 15 2016
here are particular analysis of build cr-buildbucket.appspot.com/b/9007148542888926368 triggered by https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20x64%20Builder/builds/10916 it was picked up by master.chromium.perf.fyi: 2016-07-14 08:39:58-0700 [-] [buildbucket] Scheduling build 9007148542888926368 (Win 7 Intel GPU Perf (Xeon))... 2016-07-14 08:39:58-0700 [-] added buildset 21313 to database (build requests: {u'Win 7 Intel GPU Perf (Xeon)': 23988}) where 23988 is build request id (brid) chromiumperffyi=> select * from builds where brid=23988; id | number | brid | start_time | finish_time -------+--------+-------+------------+------------- 23095 | 1092 | 23988 | 1468522342 | 1468553645 (1 row) which means the associated build is https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%207%20Intel%20GPU%20Perf%20(Xeon)/builds/1092 but it has different buildbucket build id: 9007136728446330032 this causes the following error in the master log: twistd.log.2-2016-07-15 08:43:45-0700 [-] [buildbucket] Sending heartbeats for 459 leases twistd.log.2-2016-07-15 08:43:51-0700 [-] [buildbucket] buildbucket response contains an error: "" (reason BUILD_IS_COMPLETED) twistd.log.2:2016-07-15 08:43:51-0700 [-] [buildbucket] Canceling build request for build "9007148542888926368" twistd.log.2-2016-07-15 08:43:51-0700 [-] Unhandled error in Deferred: twistd.log.2-2016-07-15 08:43:51-0700 [-] Unhandled Error twistd.log.2- Traceback (most recent call last): twistd.log.2- Failure: twisted.internet.defer.FirstError: FirstError[#5, [Failure instance: Traceback: <class 'buildbot.db.buildrequests.NotClaimedError'>: twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:388:errback twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:455:_startRunCallbacks twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:542:_runCallbacks twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1076:gotResult twistd.log.2- --- <exception caught here> --- twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1018:_inlineCallbacks twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator twistd.log.2- /home/chrome-bot/buildbot/build/scripts/master/buildbucket/integration.py:495:send twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1018:_inlineCallbacks twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator twistd.log.2- /home/chrome-bot/buildbot/build/scripts/master/buildbucket/buildbot_gateway.py:50:cancel twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:866:_deferGenerator twistd.log.2- /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/process/buildrequest.py:175:cancelBuildRequest twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:845:getResult twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/threadpool.py:207:_worker twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py:59:callWithContext twistd.log.2- /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py:37:callWithContext twistd.log.2- /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/db/pool.py:112:thd twistd.log.2- /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/db/buildrequests.py:350:thd twistd.log.2- ]]
,
Jul 15 2016
I don't understand why builds had to be triggered from master.chromium.perf to master.chromium.perf.fyi if the poll the same source (chromium/src.git). This triggers all these problems. dtu, do you know anything about this?
,
Jul 15 2016
They trigger cross-master so we don't need to maintain a set of perf builders on chromium.perf.fyi.
,
Jul 15 2016
I've discovered https://cs.chromium.org/chromium/build/masters/master.chromium.perf.fyi/master.cfg?l=91 which seems to mean that "Win Clang Builder" is the only builder that got scheduled on changes polled from gitiles which means console view on this master is not very useful in general, and charliea@ filed this bug only because he noticed it is broken, not because it is important that it works. charliea@, is this correct? Maybe we can just disable console view on this master and make waterfall landing page?
,
Jul 15 2016
talked to dtu@ and charliea@: - master.chromium.perf triggers builds on master.chromium.perf.fyi because the latter does not have builders, only testers - we need build merging on fyi because it does not have enough resources to run tests on each build. Some builders on master.chromium.perf run builds on each change
,
Jul 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/26b2357c57a92ccf59fca1304785b796bab105fa commit 26b2357c57a92ccf59fca1304785b796bab105fa Author: nodir <nodir@chromium.org> Date: Sat Jul 16 00:45:30 2016 buildbucket: support unique change urls historically buildbucket was used for tryservers where changes are CLs and a unique change id must be provided to correctly lookup a row in the buildbot changes table. If unique id is not specified, a row in changes table is added each time. master.chromium.perf uses buildbucket differently. It polls chromium changes from gitiles, compiles chromium and triggers testers on fyi. The changes in the triggered builds are same valid chromium commits polled from gitiles, as opposed to CLs in tryserver. In this scenario a commit URL identifies a change, so use it to lookup a row in the changes table It also fixes changestore tests R=dnj@chromium.org BUG= 626019 Review-Url: https://codereview.chromium.org/2157703002 [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/masters/master.chromium.perf.fyi/master_site_config.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/__init__.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/buildbot_gateway.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/changestore.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/integration.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/unittests/changestore_test.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/unittests/integration_test.py [modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/master_utils.py
,
Jul 16 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/d722c2bece5df7f586e6d35c16afcdadc2f512a9 commit d722c2bece5df7f586e6d35c16afcdadc2f512a9 Author: nodir <nodir@google.com> Date: Sat Jul 16 00:50:49 2016
,
Jul 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/725c854816b7177d0e25c4840e708f47ac6775cc commit 725c854816b7177d0e25c4840e708f47ac6775cc Author: nodir <nodir@chromium.org> Date: Sat Jul 16 00:57:50 2016 Revert of master.chromium.perf.fyi: order console by time (patchset #1 id:1 of https://codereview.chromium.org/2150093003/ ) Reason for revert: reverting in favor of https://codereview.chromium.org/2157703002 Original issue's description: > master.chromium.perf.fyi: order console by time > > gitiles revision comparator assumes that all rows in the changes table > were created by it, which is not the case with buildbucket, so we are > switching to a more generic comparator > > there is a possibility that it may cause some weird presentation of > console view, in particular some rows in the console view may look like > > rev deadbeef > rev badcoffee > rev deadbeef > > because changes created by buidlbucket module have timestamp equal > to the build creation time > > R=dnj@chromium.org > BUG= 626019 > > Committed: https://chromium.googlesource.com/chromium/tools/build/+/5ff472dc7cd21a25d968e4a7d901000bee38304e TBR=dnj@chromium.org,dnj@google.com # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 626019 Review-Url: https://codereview.chromium.org/2154043002 [modify] https://crrev.com/725c854816b7177d0e25c4840e708f47ac6775cc/masters/master.chromium.perf.fyi/master.cfg
,
Jul 16 2016
The CL in #16 worked as intended: I've manually triggered https://cr-buildbucket.appspot.com/b/9007018974782666992 using buildbucket API. It caused a buildbot build request on It got properly scheduled on https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Mac%20Test%20Retina%20Perf builder, but a new row was not added to the changes table the console view will not look weird from now on, it won't break (because buildbucket won't create a lot of duplicate rows)
,
Jul 18 2016
Re build request merging: current situation is "fine" and I've documented it in https://codereview.chromium.org/2158963002 |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by charliea@chromium.org
, Jul 15 2016Labels: Infra-Troopers