New issue
Advanced search Search tips

Issue 626019 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

Chromium Perf FYI console is down

Project Member Reported by charliea@chromium.org, Jul 6 2016

Issue description

Steps to repro: navigate to https://uberchromegw.corp.google.com/i/chromium.perf.fyi/console

Screenshot: 
 
aziQnOHzBsY.png
35.3 KB View Download
Components: -Infra>Client Infra
Labels: Infra-Troopers
Infra trooper, would you mind help triaging this?

Comment 2 by no...@chromium.org, Jul 15 2016

Owner: no...@chromium.org
Status: Started (was: Untriaged)

Comment 3 by no...@chromium.org, Jul 15 2016

error log:

          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 810, in got_changes
            revisions)
          File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 445, in getAllBuildsForRevision
            revisions)
          File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 370, in getBuildsForRevision
            build, revision)
          File "/home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/status/web/console.py", line 389, in getChangeForBuild
            changes.sort(key=self.comparator.getSortingKey())
          File "/home/chrome-bot/buildbot/build/scripts/master/gitiles_poller.py", line 121, in <lambda>
            return lambda c: self.sha1_lookup.__getitem__(c.revision)
        exceptions.KeyError: u'f449a7f4828a5610b1e21b8d2e378555fa59ca7b'

Comment 4 by no...@chromium.org, Jul 15 2016

Cc: robert...@chromium.org
this is an interesting bug to analyze:

gitiles_poller's storage of revision order is "changes" postgres table https://cs.chromium.org/chromium/build/scripts/master/gitiles_poller.py?q=gitiles_poller.py&sq=package:chromium&l=89

buildbot cleans up that table periodically on restart
https://cs.chromium.org/chromium/build/third_party/buildbot_8_4p1/buildbot/db/changes.py?q=prunechanges+file:%5Ebuild/third_party/buildbot_8_4p1/buildbot/&sq=package:chromium&l=264
, which happened few times recently

the "changeHorizon" value for master.chromium.perf.fyi seems to be 1000
https://cs.chromium.org/chromium/build/scripts/master/master_utils.py?q=buildHorizo+file:%5Ebuild/&sq=package:chromium&l=435
but current # of rows in the changes table is 3106. On other masters the value is 3000, so either I cannot find where did does it configure 3000 (more likely), or 2106 rows were created since last prune.

out of 3000+ changes 2617 are created by scheduling buildbucket builds:
chromiumperffyi=> select count(*) from changes where category = 'buildbucket';
 count 
-------
  2617
(1 row)

note that this correlates with the fact that perf team recently switched to buildbucket for scheduling builds

There is a lot of changes for the same revision:

chromiumperffyi=> select count(*) from (select distinct revision from changes where category = 'buildbucket') as distinct_revisions;
 count 
-------
   885
(1 row)

This happens when a build scheduler does not specify a globally unique id for a change: see "If id and revision are specified, buildbot master will search for an existing buildbot change prior creating a new one." in https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/master/buildbucket/README.md#Build-parameters

I think the best way to go is to solve the root cause, so I will think if I can avoid the unique change id parameter or change the way perf builders schedule changes

Comment 5 by no...@chromium.org, Jul 15 2016

ftr we didn't experience this before because buildbucket was used only on tryservers which don't use console view

Comment 6 by no...@chromium.org, Jul 15 2016

some of builds where pushed through trigger recipe module, not buildbucket module (not by auto_bisect):
see step "trigger" in https://build.chromium.org/p/chromium.perf/builders/Win%20x64%20Builder/builds/11098

The trigger module also uses buildbucket behind the scenes, but copies change of the triggering buidlbot build to the buildbucket build being triggered. If that happens across masters (the case), there is no way to get a globally unique change id.

A simpler solution might be as follows:

if a master is not a tryserver (the case), treat revision url (revlink column) as a unique identifier of the change. This will prevent creation of duplicate rows in changes table

this relies on the fact that revlink is present:
chromiumperffyi=> select count(*) from changes where revlink is null or revlink = '';
 count 
-------
     0
(1 row)

Project Member

Comment 7 by bugdroid1@chromium.org, Jul 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/5ff472dc7cd21a25d968e4a7d901000bee38304e

commit 5ff472dc7cd21a25d968e4a7d901000bee38304e
Author: nodir <nodir@chromium.org>
Date: Fri Jul 15 19:04:10 2016

master.chromium.perf.fyi: order console by time

gitiles revision comparator assumes that all rows in the changes table
were created by it, which is not the case with buildbucket, so we are
switching to a more generic comparator

there is a possibility that it may cause some weird presentation of
console view, in particular some rows in the console view may look like

rev deadbeef
rev badcoffee
rev deadbeef

because changes created by buidlbucket module have timestamp equal
to the build creation time

R=dnj@chromium.org
BUG= 626019 

Review-Url: https://codereview.chromium.org/2150093003

[modify] https://crrev.com/5ff472dc7cd21a25d968e4a7d901000bee38304e/masters/master.chromium.perf.fyi/master.cfg

Project Member

Comment 8 by bugdroid1@chromium.org, Jul 15 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/21b50cd0138c82a2f2988b88f3b39c1dc13a07c2

commit 21b50cd0138c82a2f2988b88f3b39c1dc13a07c2
Author: nodir <nodir@google.com>
Date: Fri Jul 15 19:22:02 2016

Project Member

Comment 9 by bugdroid1@chromium.org, Jul 15 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/21b50cd0138c82a2f2988b88f3b39c1dc13a07c2

commit 21b50cd0138c82a2f2988b88f3b39c1dc13a07c2
Author: nodir <nodir@google.com>
Date: Fri Jul 15 19:22:02 2016

Comment 10 by no...@chromium.org, Jul 15 2016

console works now, but now some changes are displayed more than once:

fc28338d5f07	qyearsley@chromium.org	
7bc4a4651fcd	skia-deps-roller@chromium.org	
e0d4bda9d598	dtseng@chromium.org	
4f4fe6bcbcbf	thestig@chromium.org	
37d40efe1477	dgrogan@chromium.org	
dc0e4f4f0b1b	ryansturm@chromium.org	
1da12030f6c1	juncai@chromium.org	
dc0e4f4f0b1b	ryansturm@chromium.org	
eaae3bb1ac1b	zmo@chromium.org	
f8d97238d977	jbudorick@chromium.org	
5ccd88af4b33	tomhudson@google.com	
7fc01f2b3c63	malaykeshav@chromium.org	
bd27a25399b2	ortuno@chromium.org	
7fc01f2b3c63	malaykeshav@chromium.org	
bd27a25399b2	ortuno@chromium.org	
7fc01f2b3c63	malaykeshav@chromium.org	
bd27a25399b2	ortuno@chromium.org	
7fc01f2b3c63	malaykeshav@chromium.org	
bd27a25399b2	ortuno@chromium.org

especially when switching to merged view on the bot.


-----------

this also made me realize that buildbucket is being used on a master that merges build requests, but that's explicitly not supported by buildbucket:
https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/master/buildbucket/README.md#Limitations
it is not supported because buildbucket allows scheduling arbitrary builds and it does not make sense to merge them (like in this case), as opposed waterfalls where order-of-builds=order-of-changes=order-of-commits

Comment 11 by no...@chromium.org, Jul 15 2016

here are particular analysis of build cr-buildbucket.appspot.com/b/9007148542888926368 triggered by https://uberchromegw.corp.google.com/i/chromium.perf/builders/Win%20x64%20Builder/builds/10916

it was picked up by master.chromium.perf.fyi:
2016-07-14 08:39:58-0700 [-] [buildbucket] Scheduling build 9007148542888926368 (Win 7 Intel GPU Perf (Xeon))...
2016-07-14 08:39:58-0700 [-] added buildset 21313 to database (build requests: {u'Win 7 Intel GPU Perf (Xeon)': 23988})

where 23988 is build request id (brid)

chromiumperffyi=> select * from builds where brid=23988;
  id   | number | brid  | start_time | finish_time 
-------+--------+-------+------------+-------------
 23095 |   1092 | 23988 | 1468522342 |  1468553645
(1 row)

which means the associated build is https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Win%207%20Intel%20GPU%20Perf%20(Xeon)/builds/1092

but it has different buildbucket build id: 9007136728446330032

this causes the following error in the master log:
twistd.log.2-2016-07-15 08:43:45-0700 [-] [buildbucket] Sending heartbeats for 459 leases
twistd.log.2-2016-07-15 08:43:51-0700 [-] [buildbucket] buildbucket response contains an error: "" (reason BUILD_IS_COMPLETED)
twistd.log.2:2016-07-15 08:43:51-0700 [-] [buildbucket] Canceling build request for build "9007148542888926368"
twistd.log.2-2016-07-15 08:43:51-0700 [-] Unhandled error in Deferred:
twistd.log.2-2016-07-15 08:43:51-0700 [-] Unhandled Error
twistd.log.2-   Traceback (most recent call last):
twistd.log.2-   Failure: twisted.internet.defer.FirstError: FirstError[#5, [Failure instance: Traceback: <class 'buildbot.db.buildrequests.NotClaimedError'>: 
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:388:errback
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:455:_startRunCallbacks
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:542:_runCallbacks
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1076:gotResult
twistd.log.2-   --- <exception caught here> ---
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1018:_inlineCallbacks
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator
twistd.log.2-   /home/chrome-bot/buildbot/build/scripts/master/buildbucket/integration.py:495:send
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1018:_inlineCallbacks
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator
twistd.log.2-   /home/chrome-bot/buildbot/build/scripts/master/buildbucket/buildbot_gateway.py:50:cancel
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:866:_deferGenerator
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/process/buildrequest.py:175:cancelBuildRequest
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:845:getResult
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/threadpool.py:207:_worker
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py:59:callWithContext
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py:37:callWithContext
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/db/pool.py:112:thd
twistd.log.2-   /home/chrome-bot/buildbot/build/third_party/buildbot_8_4p1/buildbot/db/buildrequests.py:350:thd
twistd.log.2-   ]]

Comment 12 by no...@chromium.org, Jul 15 2016

Cc: dtu@chromium.org
I don't understand why builds had to be triggered from master.chromium.perf to master.chromium.perf.fyi if the poll the same source (chromium/src.git). This triggers all these problems.

dtu, do you know anything about this?

Comment 13 by dtu@chromium.org, Jul 15 2016

They trigger cross-master so we don't need to maintain a set of perf builders on chromium.perf.fyi.

Comment 14 by no...@chromium.org, Jul 15 2016

I've discovered https://cs.chromium.org/chromium/build/masters/master.chromium.perf.fyi/master.cfg?l=91
which seems to mean that "Win Clang Builder" is the only builder that got scheduled on changes polled from gitiles

which means console view on this master is not very useful in general, and charliea@ filed this bug only because he noticed it is broken, not because it is important that it works. charliea@, is this correct? Maybe we can just disable console view on this master and make waterfall landing page?

Comment 15 by no...@chromium.org, Jul 15 2016

talked to dtu@ and charliea@:

- master.chromium.perf triggers builds on master.chromium.perf.fyi because the latter does not have builders, only testers
- we need build merging on fyi because it does not have enough resources to run tests on each build. Some builders on master.chromium.perf run builds on each change


Project Member

Comment 16 by bugdroid1@chromium.org, Jul 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/26b2357c57a92ccf59fca1304785b796bab105fa

commit 26b2357c57a92ccf59fca1304785b796bab105fa
Author: nodir <nodir@chromium.org>
Date: Sat Jul 16 00:45:30 2016

buildbucket: support unique change urls

historically buildbucket was used for tryservers where changes are CLs
and a unique change id must be provided to correctly lookup a row in the
buildbot changes table. If unique id is not specified, a row in changes
table is added each time.

master.chromium.perf uses buildbucket differently. It polls chromium
changes from gitiles, compiles chromium and triggers testers on fyi. The
changes in the triggered builds are same valid chromium commits polled
from gitiles, as opposed to CLs in tryserver.

In this scenario a commit URL identifies a change, so use it to lookup a row in
the changes table

It also fixes changestore tests

R=dnj@chromium.org
BUG= 626019 

Review-Url: https://codereview.chromium.org/2157703002

[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/masters/master.chromium.perf.fyi/master_site_config.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/__init__.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/buildbot_gateway.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/changestore.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/integration.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/unittests/changestore_test.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/buildbucket/unittests/integration_test.py
[modify] https://crrev.com/26b2357c57a92ccf59fca1304785b796bab105fa/scripts/master/master_utils.py

Project Member

Comment 17 by bugdroid1@chromium.org, Jul 16 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager.git/+/d722c2bece5df7f586e6d35c16afcdadc2f512a9

commit d722c2bece5df7f586e6d35c16afcdadc2f512a9
Author: nodir <nodir@google.com>
Date: Sat Jul 16 00:50:49 2016

Project Member

Comment 18 by bugdroid1@chromium.org, Jul 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/725c854816b7177d0e25c4840e708f47ac6775cc

commit 725c854816b7177d0e25c4840e708f47ac6775cc
Author: nodir <nodir@chromium.org>
Date: Sat Jul 16 00:57:50 2016

Revert of master.chromium.perf.fyi: order console by time (patchset #1 id:1 of https://codereview.chromium.org/2150093003/ )

Reason for revert:
reverting in favor of https://codereview.chromium.org/2157703002

Original issue's description:
> master.chromium.perf.fyi: order console by time
>
> gitiles revision comparator assumes that all rows in the changes table
> were created by it, which is not the case with buildbucket, so we are
> switching to a more generic comparator
>
> there is a possibility that it may cause some weird presentation of
> console view, in particular some rows in the console view may look like
>
> rev deadbeef
> rev badcoffee
> rev deadbeef
>
> because changes created by buidlbucket module have timestamp equal
> to the build creation time
>
> R=dnj@chromium.org
> BUG= 626019 
>
> Committed: https://chromium.googlesource.com/chromium/tools/build/+/5ff472dc7cd21a25d968e4a7d901000bee38304e

TBR=dnj@chromium.org,dnj@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 626019 

Review-Url: https://codereview.chromium.org/2154043002

[modify] https://crrev.com/725c854816b7177d0e25c4840e708f47ac6775cc/masters/master.chromium.perf.fyi/master.cfg

Comment 19 by no...@chromium.org, Jul 16 2016

Status: Fixed (was: Started)
The CL in #16 worked as intended:

I've manually triggered https://cr-buildbucket.appspot.com/b/9007018974782666992 using buildbucket API. It caused a buildbot build request on 

It got properly scheduled on https://uberchromegw.corp.google.com/i/chromium.perf.fyi/builders/Mac%20Test%20Retina%20Perf builder, but a new row was not added to the changes table

the console view will not look weird from now on, it won't break (because buildbucket won't create a lot of duplicate rows)

Comment 20 by no...@chromium.org, Jul 18 2016

Cc: -robert...@chromium.org
Re build request merging: current situation is "fine" and I've documented it in https://codereview.chromium.org/2158963002

Sign in to add a comment