New issue
Advanced search Search tips

Issue 889005 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 886575
Owner:
Closed: Oct 2
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 886575



Sign in to add a comment

Gatekeeper sometimes misses to close the tree

Project Member Reported by machenb...@chromium.org, Sep 25

Issue description

E.g.:
https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20presubmit/27673

Should have been closed by:
https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277379
(+/- one build)

But was not closed.

The step should be included in the config:
https://cs.chromium.org/chromium/build/scripts/slave/v8_gatekeeper/tree_closers.json?l=117

Maybe milo doesn't give gatekeeper a list of all builders, such that the * isn't resolved right?
 
Components: Infra>Client>V8 Infra>Sheriffing>Gatekeeper
Investigated a bit using http://shortn/_KRf2WlbL3e and the line from the log:
INFO:root:fetching https://luci-milo.appspot.com/prpc/milo.Buildbot/GetCompressedMasterJSON with {"exclude_deprecated": true, "name": "client.v8"}

The result includes the builder for which the tree wasn't closed:
V8 Linux - presubmit

But the log above doesn't seem to fetch builds for the presubmit bot. Bot sure yet why...
*Not
Now gatekeeper did not close for:
https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20MSAN/23024

Should hvae been closed by:
https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277428
(+/- 1 build)
Also here, the msan builder seems not to be in the logs. I'd expected a line for it.
Labels: -Pri-2 Pri-1
Higher prio, since this misses several important tree closures each day.
Analysis update:
Gatekeeper caches the last seen builds in db. The call to GetBuildbotBuildJSON is only made for ongoing builds. So it's expected to not see it in the logs all the time for all bots.
Cc: zhangtiff@chromium.org seanmccullough@chromium.org
CCing gatekeeper owners.
I think to go on with this we should make gatekeeper a bit more verbose. We should print which builds milo reports as ongoing in the GetCompressedMasterJSON. Maybe there are some builds missing already there.

In case the builds are there, then there'd be a bug in gatekeepers logic to check if the last known build is in local DB or if there's an ongoing build that needs fetching via GetBuildbotBuildJSON.
Was this v8 builder recently migrated to LUCI?

Yes, all V8 builders migrated to LUCI. For most it seems to work. But some are strangely ignored at times.
Project Member

Comment 11 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/e8726a8591280dce0dcfe913db02aeab08f08e34

commit e8726a8591280dce0dcfe913db02aeab08f08e34
Author: Michael Achenbach <machenbach@chromium.org>
Date: Thu Sep 27 09:37:04 2018

[Gatekeeper] Add more logging to debug

Bug:  889005 
Change-Id: I9d1993ce952beac20492b306ce6c60819d68ba7f
Reviewed-on: https://chromium-review.googlesource.com/1248602
Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/e8726a8591280dce0dcfe913db02aeab08f08e34/scripts/slave/build_scan.py

The new debug output is not yet conclusive. Here https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277936%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout the presubmit builder has a new build (27729).

But at the time that I checked it already had 2 newer builds, which don't show up in the subsequent gatekeeper runs. E.g. https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20presubmit/27731 finished at 2018-09-27 12:28 PM (CEST) and should show up latest in https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277943 which ran at 2018-09-27 12:44 PM (CEST). But in the gatekeeper runs 1277937 until 1277943 it is not there.

We probably need more debug output confirming the raw data coming from milo. Most likely it is sometimes incomplete or so.
Project Member

Comment 13 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/bf164e8f81308d5417281afc60a0278baf8e8561

commit bf164e8f81308d5417281afc60a0278baf8e8561
Author: Michael Achenbach <machenbach@chromium.org>
Date: Thu Sep 27 12:11:08 2018

[Gatekeeper] Add more debug logging

Bug:  889005 
Change-Id: Id9264eed6f89f2eba60255b333d6cff2b07ae414
Reviewed-on: https://chromium-review.googlesource.com/1249025
Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/bf164e8f81308d5417281afc60a0278baf8e8561/scripts/slave/build_scan.py

Found an inconsistency that looks like a gatekeeper bug indeed. Compare the following subsequent runs:

https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277970%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277971%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout

Lets look at the log output for the presubmit builder in both runs:
Run 1:
INFO:root:builder: V8 Linux - presubmit, current builds: [27734], cached builds: [27733, ...]
INFO:root:builder: V8 Linux - presubmit, new builds: [27734]
INFO:root:fetching https://luci-milo.appspot.com/prpc/milo.Buildbot/GetBuildbotBuildJSON with {"build_num": 27734, "exclude_deprecated": true, "builder": "V8 Linux - presubmit", "master": "client.v8"}

Run 2:
INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27735, 27734, ...]
INFO:root:builder: V8 Linux - presubmit, new builds: []

The data from milo is consistent. Between run 1 and run 2 the presubmit run 27734 finished and also 27735 started and finished.

In the first run, gatekeeper correctly fetches data from milo up to build 27734. In the second run it should fetch data for 27735 (including 27734 which was previously still ongoing), but it doesn't.

---

Guesses:
1. There's a simple logic bug in the find_new_builds method.
2. The DB gets corrupted between runs, e.g. another gatekeeper instance running? Or the fields in the DB being overwritten with data from the other gatekeeper trees?
Project Member

Comment 15 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/41e6509b8fce221ffe975d05ac874c82074cf152

commit 41e6509b8fce221ffe975d05ac874c82074cf152
Author: Michael Achenbach <machenbach@chromium.org>
Date: Thu Sep 27 14:01:09 2018

[Gatekeeper] Add temporary debug output

TBR=sergiyb@chromium.org

Bug:  889005 
Change-Id: I5529fcb8397290dd1e85172ce0312ee82dfabbe9
Reviewed-on: https://chromium-review.googlesource.com/1249027
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/41e6509b8fce221ffe975d05ac874c82074cf152/scripts/slave/build_scan.py

Analysis comparing two consecutive runs:
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1278228%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout
and
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1278229%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout

1. run
INFO:root:Build db state for V8 Linux - presubmit: {27752: BuildDBBuild(finished=True, succeeded=False, triggered={})}
INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27752, ...]
INFO:root:builder: V8 Linux - presubmit, new builds: []

2. run
INFO:root:Build db state for V8 Linux - presubmit: {27753: BuildDBBuild(finished=True, succeeded=False, triggered={})}
INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27753, ...]
INFO:root:builder: V8 Linux - presubmit, new builds: []

In the second run the build 27753 appears in the DB without ever been added as new build. Because of this build 27753 is omitted by gatekeeper.

Maybe there's another logic in gatekeeper somewhere that writes to the db?
Project Member

Comment 17 by bugdroid1@chromium.org, Sep 28

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/ae7380bc5c6546126f680d9664e02c9fe7a45fcf

commit ae7380bc5c6546126f680d9664e02c9fe7a45fcf
Author: Michael Achenbach <machenbach@chromium.org>
Date: Fri Sep 28 12:36:14 2018

[Gatekeeper] Print build-db before and after each run for debugging

Bug:  889005 
Change-Id: I14668b0afe47cb66a43e57fe839c388a19b924b8
Reviewed-on: https://chromium-review.googlesource.com/1251501
Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/ae7380bc5c6546126f680d9664e02c9fe7a45fcf/scripts/slave/recipe_modules/gatekeeper/api.py

Blockedon: 886575
Owner: martiniss@chromium.org
Status: Assigned (was: Untriaged)
The last debug output makes things clear. Looks like the fix from  issue 886575  is still not good enough.

I added steps printing build-db before and after a gatekeeper run if it exists. It only prints the after-version. Hence, build-db doesn't exist and gatekeeper runs each time with an empty DB. Guess the cache doesn't cache it?

E.g.:
https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1278260
http://shortn/_av1vNfPATM
I have the feeling the builder cache implementation in remote_run is broken. Maybe the builder should be migrated to luci (fast)? On luci it seems to work.
Mergedinto: 886575
Status: Duplicate (was: Assigned)
Project Member

Comment 21 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/03670d67c499cc5197ad133a53853d92a954aae1

commit 03670d67c499cc5197ad133a53853d92a954aae1
Author: Michael Achenbach <machenbach@chromium.org>
Date: Fri Oct 05 07:02:29 2018

Revert "[Gatekeeper] Add temporary debug output"

This reverts commit 41e6509b8fce221ffe975d05ac874c82074cf152.

Reason for revert: Bug was fixed.

Original change's description:
> [Gatekeeper] Add temporary debug output
> 
> TBR=sergiyb@chromium.org
> 
> Bug:  889005 
> Change-Id: I5529fcb8397290dd1e85172ce0312ee82dfabbe9
> Reviewed-on: https://chromium-review.googlesource.com/1249027
> Reviewed-by: Michael Achenbach <machenbach@chromium.org>
> Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
> Commit-Queue: Michael Achenbach <machenbach@chromium.org>

TBR=machenbach@chromium.org,sergiyb@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  889005 
Change-Id: Ide04899efcbeb673c25455cf1dd410f4d1f51f63
Reviewed-on: https://chromium-review.googlesource.com/c/1263977
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/03670d67c499cc5197ad133a53853d92a954aae1/scripts/slave/build_scan.py

Project Member

Comment 22 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/518e3bda744bcea6618d770b8f73448440d4e77d

commit 518e3bda744bcea6618d770b8f73448440d4e77d
Author: Michael Achenbach <machenbach@chromium.org>
Date: Fri Oct 05 07:37:22 2018

[Gatekeeper] Update comment after bugfix

Bug:  889005 
Change-Id: I452fb16b2da604b3ad8f7ebfde10e7c4761a3d6d
Reviewed-on: https://chromium-review.googlesource.com/c/1263978
Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>

[modify] https://crrev.com/518e3bda744bcea6618d770b8f73448440d4e77d/scripts/slave/recipe_modules/gatekeeper/api.py

Sign in to add a comment