Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 160709 Try Server <-> Rietveld integration is flaky and is dropping jobs
Starred by 11 users Project Member Reported by mar...@chromium.org, Nov 13, 2012 Back to list
Status: Fixed
Owner: serg...@chromium.org
Closed: Sep 2014
Cc: serg...@chromium.org, phajdan.jr@chromium.org, cmp@chromium.org, sergeybe...@chromium.org
Components:
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 153761


Sign in to add a comment
What steps will reproduce the problem?
1. Trigger a try job with Rietveld, either with git cl try, with the Web UI or from the CQ.

What is the expected result?
The Try Server polls Rietveld every 10 seconds and triggers a try job corresponding to the request.

What happens instead of that?
The Try Server ignores the trigger and the "try job box" on Rietveld stays gray. Also the Try Server keeps a query cursor so it never hear about this request again. Because of the presence of the request, it's impossible to trigger a second try job for the same builder, so the CQ is blocked by that.


Details
https://code.google.com/p/rietveld/source/browse/codereview/views_chromium.py?name=chromium#740
saves the query cursor so that the next query for pending try jobs ignore the ones requested before, so a try job is not triggered twice.

https://code.google.com/p/rietveld/source/browse/codereview/views_chromium.py?name=chromium#460
ignored try job requests for a try builder already pending. Problably just disabling this code would be sufficient in fact.

http://git.chromium.org/gitweb/?p=chromium/tools/build.git;a=blob;f=scripts/master/try_job_rietveld.py;h=11b5d5b4c8dc6ac9fe69659f0459b07205054523;hb=HEAD#l149
reuses the cursor received by the last request to /get_pending_try_patchsets, so if processing failed for example because of bug 153761, the try builder can't be triggered anymore.
 
Labels: TaskForce-GreenTree
Comment 2 by cmp@chromium.org, Mar 6, 2013
Cc: -nsylvain@chromium.org
Project Member Comment 3 by bugdroid1@chromium.org, Mar 10, 2013
Labels: -Area-Build Build
Issue 355108 has been merged into this issue.
See also issue 357677.
Labels: Infra
Issue 357854 has been merged into this issue.
Cc: sergeybe...@chromium.org cmp@chromium.org phajdan.jr@chromium.org
Issue 362815 has been merged into this issue.
Comment 9 by pgervais@chromium.org, May 12, 2014
Issue 371831 has been merged into this issue.
Cc: serg...@chromium.org
Owner: serg...@chromium.org
Status: Started
sergiyb@ is working on a new Rietveld poller for the try server. 

From discussion with jrobbins@ we have determined that is is blocking Rietveld's transition to NDB.

For simplicity, the new poller can fetch only the first page of pending jobs with limit=1000, and not use the cursor at all. Together with local cache and deduplication logic, this is enough to guarantee that all the jobs will eventually be scheduled without duplicates.

The only exception is if the master is restarted while still having pending jobs. Then pending jobs may be scheduled twice. An interim solution could be to flush all pending queues from the master before restart, and rely on the new poller to re-queue them from Rietveld.
FWIW I don't think we can deploy something that requires manual flushing of queues on master restart. Please save state to a file that allows us to avoid this.
Comment 12 by jrobb...@google.com, May 20, 2014
On the Rietveld side, I committed a change to that should prevent under-filling of pagination pages in get_pending_try_patchsets.


Regarding flushing old jobs on restart, some duplication on certain occasions is not too bad.  In fact, having 2x the number of all jobs sometimes probably would not be too bad as long as they clear out over time.  The thing to really avoid is unbounded duplication of jobs.
The best way, in my opinion, is to query buildbot from Rietveld poller on the set of pending builds and populate the cache from that. I'm pretty sure there is an API for that, hopefully documented on buildbot.net.

I'd argue we shouldn't be keeping another copy of the same state on disk, since it may go out of sync with the master itself, and will not sync up even after master restart. The master already remembers the state in its persistent DB, so let's just reuse it.

In particular, it is the same problem with saving the cursor right now. If the cursor happens to be bad (as it was last Friday), master restart would require knowing about the cursor (nobody did except me), and remembering to delete the file before restart to reset it.
Project Member Comment 14 by bugdroid1@chromium.org, May 30, 2014
------------------------------------------------------------------
r273804 | sergiyb@chromium.org | 2014-05-30T11:51:52.621174Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.v8/master.cfg?r1=273804&r2=273803&pathrev=273804

Fixed config for the tryserver.v8

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/303393002
-----------------------------------------------------------------
Project Member Comment 16 by bugdroid1@chromium.org, Jun 2, 2014
------------------------------------------------------------------
r274199 | sergiyb@chromium.org | 2014-06-02T10:18:18.862717Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium/master.cfg?r1=274199&r2=274198&pathrev=274199
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium.gpu/master.cfg?r1=274199&r2=274198&pathrev=274199
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.nacl/master.cfg?r1=274199&r2=274198&pathrev=274199
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.blink/master.cfg?r1=274199&r2=274198&pathrev=274199

Revert of Enabled new RietveldPollerWithCache on remaining masters (https://codereview.chromium.org/308733006/)

Reason for revert:
New poller doesn't initialize well in the production.

Original issue's description:
> Enabled new RietveldPollerWithCache on remaining masters
> 
> BUG= 160709 
> R=phajdan.jr@chromium.org
> 
> Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=274190

TBR=phajdan.jr@chromium.org
NOTREECHECKS=true
NOTRY=true
BUG= 160709 

Review URL: https://codereview.chromium.org/309763004
-----------------------------------------------------------------
Project Member Comment 17 by bugdroid1@chromium.org, Jun 2, 2014
------------------------------------------------------------------
r274227 | sergiyb@chromium.org | 2014-06-02T13:53:48.278140Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=274227&r2=274226&pathrev=274227

Added more logging

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/309013004
-----------------------------------------------------------------
Project Member Comment 18 by bugdroid1@chromium.org, Jun 2, 2014
------------------------------------------------------------------
r274229 | sergiyb@chromium.org | 2014-06-02T14:02:10.360032Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium/master.cfg?r1=274229&r2=274228&pathrev=274229

Enabled new RietveldPollerWithCache on tryserver.chromium

R=phajdan.jr@chromium.org
BUG= 160709 

Review URL: https://codereview.chromium.org/305343003
-----------------------------------------------------------------
Project Member Comment 19 by bugdroid1@chromium.org, Jun 2, 2014
------------------------------------------------------------------
r274230 | sergiyb@chromium.org | 2014-06-02T14:21:43.929443Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=274230&r2=274229&pathrev=274230

Added number of builddicts and buildsets to the logging

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/310593005
-----------------------------------------------------------------
Project Member Comment 20 by bugdroid1@chromium.org, Jun 2, 2014
------------------------------------------------------------------
r274234 | sergiyb@chromium.org | 2014-06-02T14:32:57.654011Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium/master.cfg?r1=274234&r2=274233&pathrev=274234

Revert of Enabled new RietveldPollerWithCache on tryserver.chromium (https://codereview.chromium.org/305343003/)

Reason for revert:
Change was done to find out what takes so long when initializing new poller in production. Reverting as planned.

Original issue's description:
> Enabled new RietveldPollerWithCache on tryserver.chromium
> 
> R=phajdan.jr@chromium.org
> BUG= 160709 
> 
> Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=274229

TBR=phajdan.jr@chromium.org
NOTREECHECKS=true
NOTRY=true
BUG= 160709 

Review URL: https://codereview.chromium.org/308123009
-----------------------------------------------------------------
Project Member Comment 21 by bugdroid1@chromium.org, Jun 3, 2014
------------------------------------------------------------------
r274427 | sergiyb@chromium.org | 2014-06-03T08:08:40.545511Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/third_party/buildbot_8_4p1/buildbot/test/unit/test_db_buildsets.py?r1=274427&r2=274426&pathrev=274427
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/third_party/buildbot_8_4p1/buildbot/db/buildsets.py?r1=274427&r2=274426&pathrev=274427
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/third_party/buildbot_8_4p1/README.chromium?r1=274427&r2=274426&pathrev=274427

Backported getRecentBuildsets from buildbot 0.8.8 (to use it in RietveldPollerWithCache)

Based on https://github.com/buildbot/buildbot/commit/1ee6d421be2ea814c11757263eb43152f8c3928e#diff-1

R=sergeyberezin@chromium.org, stip@chromium.org, phajdan.jr@chromium.org
BUG= 160709 

Review URL: https://codereview.chromium.org/302283005
-----------------------------------------------------------------
Project Member Comment 22 by bugdroid1@chromium.org, Jun 3, 2014
------------------------------------------------------------------
r274428 | sergiyb@chromium.org | 2014-06-03T08:24:48.906944Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=274428&r2=274427&pathrev=274428
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/unittests/try_job_rietveld_test.py?r1=274428&r2=274427&pathrev=274428

Limited number of buildsets that can be used to initialize RietveldPollerWithCache's cache

Depends on https://codereview.chromium.org/302283005/

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/305423002
-----------------------------------------------------------------
Project Member Comment 23 by bugdroid1@chromium.org, Jun 3, 2014
------------------------------------------------------------------
r274429 | sergiyb@chromium.org | 2014-06-03T08:28:16.809536Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium/master.cfg?r1=274429&r2=274428&pathrev=274429

Enabled new RietveldPollerWithCache on tryserver.chromium

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/314663002
-----------------------------------------------------------------
Project Member Comment 24 by bugdroid1@chromium.org, Jun 3, 2014
------------------------------------------------------------------
r274555 | sergiyb@chromium.org | 2014-06-03T17:09:34.221220Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.blink/master.cfg?r1=274555&r2=274554&pathrev=274555
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.chromium.gpu/master.cfg?r1=274555&r2=274554&pathrev=274555
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/masters/master.tryserver.nacl/master.cfg?r1=274555&r2=274554&pathrev=274555

Enabled RietveldPollerWithCache on remaining tryserver masters

BUG= 160709 
R=phajdan.jr@chromium.org

Review URL: https://codereview.chromium.org/315673003
-----------------------------------------------------------------
Blockedon: chromium:380370
Project Member Comment 27 by bugdroid1@chromium.org, Jun 17, 2014
------------------------------------------------------------------
r277906 | luqui@chromium.org | 2014-06-17T23:49:14.023175Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=277906&r2=277905&pathrev=277906

Re-add _RietveldPoller to try_job_rietveld module

Removed by https://codereview.chromium.org/330283006, which broke presubmit

BUG= 160709 

Review URL: https://codereview.chromium.org/339183005
-----------------------------------------------------------------
Project Member Comment 28 by bugdroid1@chromium.org, Jun 20, 2014
------------------------------------------------------------------
r278819 | sergiyb@chromium.org | 2014-06-20T21:06:21.657521Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=278819&r2=278818&pathrev=278819

Factored out calls to client.getPage to factor out common constants (timeout and buildbot name) and to allow easy override in subclasses

BUG= 160709 
R=stip@google.com, phajdan@google.com, pgervais@google.com

Review URL: https://codereview.chromium.org/334013004
-----------------------------------------------------------------
Project Member Comment 29 by bugdroid1@chromium.org, Jul 1, 2014
The following revision refers to this bug:
  http://goto.ext.google.com/viewvc/chrome-internal?view=rev&revision=56300

------------------------------------------------------------------
r56300 | sergiyb@google.com | 2014-07-01T17:24:58.223135Z

-----------------------------------------------------------------
Project Member Comment 30 by bugdroid1@chromium.org, Jul 1, 2014
The following revision refers to this bug:
  http://goto.ext.google.com/viewvc/chrome-internal?view=rev&revision=56322

------------------------------------------------------------------
r56322 | luqui@google.com | 2014-07-01T23:52:50.193865Z

-----------------------------------------------------------------
Project Member Comment 31 by bugdroid1@chromium.org, Jul 2, 2014
The following revision refers to this bug:
  http://goto.ext.google.com/viewvc/chrome-internal?view=rev&revision=56330

------------------------------------------------------------------
r56330 | luqui@google.com | 2014-07-02T00:26:42.255801Z

-----------------------------------------------------------------
Project Member Comment 32 by bugdroid1@chromium.org, Jul 17, 2014
------------------------------------------------------------------
r283797 | sergiyb@chromium.org | 2014-07-17T16:39:16.622264Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/tools/build/scripts/master/try_job_rietveld.py?r1=283797&r2=283796&pathrev=283797

Removed old poller as it is not used anymore

BUG= 160709 
R=sergeyberezin@chromium.org, agable@chromium.org

Review URL: https://codereview.chromium.org/395183002
-----------------------------------------------------------------
Looks like this can be closed now, right?
Status: Fixed
Sign in to add a comment