Issue metadata
Sign in to add a comment
|
Gatekeeper sometimes misses to close the tree |
||||||||||||||||||||||||
Issue descriptionE.g.: https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20presubmit/27673 Should have been closed by: https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277379 (+/- one build) But was not closed. The step should be included in the config: https://cs.chromium.org/chromium/build/scripts/slave/v8_gatekeeper/tree_closers.json?l=117 Maybe milo doesn't give gatekeeper a list of all builders, such that the * isn't resolved right?
,
Sep 25
Investigated a bit using http://shortn/_KRf2WlbL3e and the line from the log: INFO:root:fetching https://luci-milo.appspot.com/prpc/milo.Buildbot/GetCompressedMasterJSON with {"exclude_deprecated": true, "name": "client.v8"} The result includes the builder for which the tree wasn't closed: V8 Linux - presubmit But the log above doesn't seem to fetch builds for the presubmit bot. Bot sure yet why...
,
Sep 25
*Not
,
Sep 25
Now gatekeeper did not close for: https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20MSAN/23024 Should hvae been closed by: https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277428 (+/- 1 build) Also here, the msan builder seems not to be in the logs. I'd expected a line for it.
,
Sep 25
Higher prio, since this misses several important tree closures each day.
,
Sep 26
Analysis update: Gatekeeper caches the last seen builds in db. The call to GetBuildbotBuildJSON is only made for ongoing builds. So it's expected to not see it in the logs all the time for all bots.
,
Sep 26
CCing gatekeeper owners.
,
Sep 26
I think to go on with this we should make gatekeeper a bit more verbose. We should print which builds milo reports as ongoing in the GetCompressedMasterJSON. Maybe there are some builds missing already there. In case the builds are there, then there'd be a bug in gatekeepers logic to check if the last known build is in local DB or if there's an ongoing build that needs fetching via GetBuildbotBuildJSON.
,
Sep 26
Was this v8 builder recently migrated to LUCI?
,
Sep 27
Yes, all V8 builders migrated to LUCI. For most it seems to work. But some are strangely ignored at times.
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/e8726a8591280dce0dcfe913db02aeab08f08e34 commit e8726a8591280dce0dcfe913db02aeab08f08e34 Author: Michael Achenbach <machenbach@chromium.org> Date: Thu Sep 27 09:37:04 2018 [Gatekeeper] Add more logging to debug Bug: 889005 Change-Id: I9d1993ce952beac20492b306ce6c60819d68ba7f Reviewed-on: https://chromium-review.googlesource.com/1248602 Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/e8726a8591280dce0dcfe913db02aeab08f08e34/scripts/slave/build_scan.py
,
Sep 27
The new debug output is not yet conclusive. Here https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277936%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout the presubmit builder has a new build (27729). But at the time that I checked it already had 2 newer builds, which don't show up in the subsequent gatekeeper runs. E.g. https://ci.chromium.org/p/v8/builders/luci.v8.ci/V8%20Linux%20-%20presubmit/27731 finished at 2018-09-27 12:28 PM (CEST) and should show up latest in https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1277943 which ran at 2018-09-27 12:44 PM (CEST). But in the gatekeeper runs 1277937 until 1277943 it is not there. We probably need more debug output confirming the raw data coming from milo. Most likely it is sometimes incomplete or so.
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/bf164e8f81308d5417281afc60a0278baf8e8561 commit bf164e8f81308d5417281afc60a0278baf8e8561 Author: Michael Achenbach <machenbach@chromium.org> Date: Thu Sep 27 12:11:08 2018 [Gatekeeper] Add more debug logging Bug: 889005 Change-Id: Id9264eed6f89f2eba60255b333d6cff2b07ae414 Reviewed-on: https://chromium-review.googlesource.com/1249025 Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/bf164e8f81308d5417281afc60a0278baf8e8561/scripts/slave/build_scan.py
,
Sep 27
Found an inconsistency that looks like a gatekeeper bug indeed. Compare the following subsequent runs: https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277970%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1277971%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout Lets look at the log output for the presubmit builder in both runs: Run 1: INFO:root:builder: V8 Linux - presubmit, current builds: [27734], cached builds: [27733, ...] INFO:root:builder: V8 Linux - presubmit, new builds: [27734] INFO:root:fetching https://luci-milo.appspot.com/prpc/milo.Buildbot/GetBuildbotBuildJSON with {"build_num": 27734, "exclude_deprecated": true, "builder": "V8 Linux - presubmit", "master": "client.v8"} Run 2: INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27735, 27734, ...] INFO:root:builder: V8 Linux - presubmit, new builds: [] The data from milo is consistent. Between run 1 and run 2 the presubmit run 27734 finished and also 27735 started and finished. In the first run, gatekeeper correctly fetches data from milo up to build 27734. In the second run it should fetch data for 27735 (including 27734 which was previously still ongoing), but it doesn't. --- Guesses: 1. There's a simple logic bug in the find_new_builds method. 2. The DB gets corrupted between runs, e.g. another gatekeeper instance running? Or the fields in the DB being overwritten with data from the other gatekeeper trees?
,
Sep 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/41e6509b8fce221ffe975d05ac874c82074cf152 commit 41e6509b8fce221ffe975d05ac874c82074cf152 Author: Michael Achenbach <machenbach@chromium.org> Date: Thu Sep 27 14:01:09 2018 [Gatekeeper] Add temporary debug output TBR=sergiyb@chromium.org Bug: 889005 Change-Id: I5529fcb8397290dd1e85172ce0312ee82dfabbe9 Reviewed-on: https://chromium-review.googlesource.com/1249027 Reviewed-by: Michael Achenbach <machenbach@chromium.org> Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/41e6509b8fce221ffe975d05ac874c82074cf152/scripts/slave/build_scan.py
,
Sep 28
Analysis comparing two consecutive runs: https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1278228%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout and https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1278229%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_v8-tree-closers%2F0%2Fstdout 1. run INFO:root:Build db state for V8 Linux - presubmit: {27752: BuildDBBuild(finished=True, succeeded=False, triggered={})} INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27752, ...] INFO:root:builder: V8 Linux - presubmit, new builds: [] 2. run INFO:root:Build db state for V8 Linux - presubmit: {27753: BuildDBBuild(finished=True, succeeded=False, triggered={})} INFO:root:builder: V8 Linux - presubmit, current builds: [], cached builds: [27753, ...] INFO:root:builder: V8 Linux - presubmit, new builds: [] In the second run the build 27753 appears in the DB without ever been added as new build. Because of this build 27753 is omitted by gatekeeper. Maybe there's another logic in gatekeeper somewhere that writes to the db?
,
Sep 28
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/ae7380bc5c6546126f680d9664e02c9fe7a45fcf commit ae7380bc5c6546126f680d9664e02c9fe7a45fcf Author: Michael Achenbach <machenbach@chromium.org> Date: Fri Sep 28 12:36:14 2018 [Gatekeeper] Print build-db before and after each run for debugging Bug: 889005 Change-Id: I14668b0afe47cb66a43e57fe839c388a19b924b8 Reviewed-on: https://chromium-review.googlesource.com/1251501 Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/ae7380bc5c6546126f680d9664e02c9fe7a45fcf/scripts/slave/recipe_modules/gatekeeper/api.py
,
Sep 28
The last debug output makes things clear. Looks like the fix from issue 886575 is still not good enough. I added steps printing build-db before and after a gatekeeper run if it exists. It only prints the after-version. Hence, build-db doesn't exist and gatekeeper runs each time with an empty DB. Guess the cache doesn't cache it? E.g.: https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1278260 http://shortn/_av1vNfPATM
,
Sep 28
I have the feeling the builder cache implementation in remote_run is broken. Maybe the builder should be migrated to luci (fast)? On luci it seems to work.
,
Oct 2
,
Oct 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/03670d67c499cc5197ad133a53853d92a954aae1 commit 03670d67c499cc5197ad133a53853d92a954aae1 Author: Michael Achenbach <machenbach@chromium.org> Date: Fri Oct 05 07:02:29 2018 Revert "[Gatekeeper] Add temporary debug output" This reverts commit 41e6509b8fce221ffe975d05ac874c82074cf152. Reason for revert: Bug was fixed. Original change's description: > [Gatekeeper] Add temporary debug output > > TBR=sergiyb@chromium.org > > Bug: 889005 > Change-Id: I5529fcb8397290dd1e85172ce0312ee82dfabbe9 > Reviewed-on: https://chromium-review.googlesource.com/1249027 > Reviewed-by: Michael Achenbach <machenbach@chromium.org> > Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> > Commit-Queue: Michael Achenbach <machenbach@chromium.org> TBR=machenbach@chromium.org,sergiyb@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 889005 Change-Id: Ide04899efcbeb673c25455cf1dd410f4d1f51f63 Reviewed-on: https://chromium-review.googlesource.com/c/1263977 Reviewed-by: Michael Achenbach <machenbach@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/03670d67c499cc5197ad133a53853d92a954aae1/scripts/slave/build_scan.py
,
Oct 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/518e3bda744bcea6618d770b8f73448440d4e77d commit 518e3bda744bcea6618d770b8f73448440d4e77d Author: Michael Achenbach <machenbach@chromium.org> Date: Fri Oct 05 07:37:22 2018 [Gatekeeper] Update comment after bugfix Bug: 889005 Change-Id: I452fb16b2da604b3ad8f7ebfde10e7c4761a3d6d Reviewed-on: https://chromium-review.googlesource.com/c/1263978 Reviewed-by: Sergiy Byelozyorov <sergiyb@chromium.org> Commit-Queue: Michael Achenbach <machenbach@chromium.org> [modify] https://crrev.com/518e3bda744bcea6618d770b8f73448440d4e77d/scripts/slave/recipe_modules/gatekeeper/api.py |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by machenb...@chromium.org
, Sep 25