LUCI-{GateKeeper,FindIt,SOM} integration does not work for new LUCI builders |
||||||||
Issue description
LUCI-{Gatekeeper,FindIt,SOM} integration is currently entirely based on Buildbot API Emulation which currently uses migration app for {(master, builder):bucket} mapping. If a new LUCI builder is created, migration app doesn't see it (because there is nothing to migrate, it is not on Buildbot), so GK/FindIt/SOM do not see it either.
context: Dart cannot use GK to send blame emails (cannot find the bug)
,
Feb 2 2018
both look OKish to me, with slight preference for 1) because it accumulates tech debt in a migration app which we'll delete at once anyway.
,
Feb 7 2018
Dirk, I have two CLs that fix the problem for new *builders*, but not for new *masters*. How OK is that?
,
Feb 7 2018
That sounds fine. I have no expectation that we'll want new masters in the near future.
,
Feb 7 2018
I meant that if we spin up a new LUCI bucket with no associated master, integrations won't work
,
Feb 7 2018
Yes, that is what I understood you to mean.
,
Feb 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/2eb4ff0688a7b63c49b6edaa71c66d8d323107f3 commit 2eb4ff0688a7b63c49b6edaa71c66d8d323107f3 Author: Nodir Turakulov <nodir@google.com> Date: Wed Feb 07 23:36:09 2018 [luci-migration] add JSON response format in master handler Add JSON response format to master handler to be able to read LUCI bucket of a buildbot master. Bug: 808193 Change-Id: I737f2310381ad9b55c3b30d2f9de36a1926de6a6 Reviewed-on: https://chromium-review.googlesource.com/907375 Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/2eb4ff0688a7b63c49b6edaa71c66d8d323107f3/go/src/infra/appengine/luci-migration/app/master.go
,
Feb 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/665a380f6bbd1cf51460a4b8f82708f1d8f9c6bd commit 665a380f6bbd1cf51460a4b8f82708f1d8f9c6bd Author: Nodir Turakulov <nodir@google.com> Date: Fri Feb 09 02:44:54 2018 [milo] use new migration app endpoint to read bucket of a master Use the new endpoint added in https://chromium-review.googlesource.com/#/c/infra/infra/+/907375 This solves crbug.com/808193 for new builders, but not new masters. Bug: 808193 Change-Id: I5627195468d2077a78d5a95d13a5a402b4c79c30 Reviewed-on: https://chromium-review.googlesource.com/907376 Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/665a380f6bbd1cf51460a4b8f82708f1d8f9c6bd/milo/buildsource/buildbot/buildstore/build.go [modify] https://crrev.com/665a380f6bbd1cf51460a4b8f82708f1d8f9c6bd/milo/buildsource/buildbot/buildstore/emulation.go [modify] https://crrev.com/665a380f6bbd1cf51460a4b8f82708f1d8f9c6bd/milo/buildsource/buildbot/buildstore/query.go
,
Feb 15 2018
Are the changes above in production, yet? Or is there additional configuration required to enable this? I would have expected to get a mail for this build but didn't: https://ci.chromium.org/p/dart/builds/b8954509610657029808
,
Feb 21 2018
looks like changes above were not sufficient. GK discovers builders by calling https://ci.chromium.org/rpcexplorer/services/milo.Buildbot/GetCompressedMasterJSON?request={%20%20%20%20%22name%22:%20%22client.dart%22} but it contains only builders present on Buildbot master. Milo also does not know about the new builder.
,
Feb 21 2018
,
Feb 21 2018
I've uploaded https://chromium-review.googlesource.com/c/infra/luci/luci-go/+/929663 but perhaps it wouldn't really work: we cannot inject all luci.foo.bar builders in a master that is being migrated to luci.foo.bar because that might be not the only master. Otherwise, for example, all luci.chromium.try builders will show up on tryserver.chromium.linux, including windows ones. I am thinking we will have to resort to explicit whitelisting of builders in milo config. tandrii, hinoka, WDYT?
,
Feb 26 2018
talked to tandrii@ offline. We don't want another manually-maintained config, but use what I've started in comment 12, except will include properties in the swarmbucket response and will filter builders by mastername property when emulating a master.
,
Feb 27 2018
hinoka agreed to take over
,
Feb 27 2018
,
Feb 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/2e4df7a137bc82ed1b26d3db248b8908316c238b commit 2e4df7a137bc82ed1b26d3db248b8908316c238b Author: Nodir Turakulov <nodir@google.com> Date: Tue Feb 27 20:19:55 2018 [milo] Export getting swarmbucket builders API. Expose a buildbucket package function that returns swarmbucket builders to be used in the next CL that will include swarmbucket-only builders in emulated GetCompressedMasterJSON responses. Rename GetAllBuilders functions to CIService because they return a ui.CIService. Add memcaching for loading swarmbucket builders. Cache key includes current identity so that access control is preserved. The cache hit rate is expected to be high for service accounts calling Buildbot API. Bug: 808193 Change-Id: I127070b120a172767a21f19b5fa527815716491e Reviewed-on: https://chromium-review.googlesource.com/929663 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Ryan Tseng <hinoka@chromium.org> [modify] https://crrev.com/2e4df7a137bc82ed1b26d3db248b8908316c238b/milo/buildsource/buildbot/builder.go [modify] https://crrev.com/2e4df7a137bc82ed1b26d3db248b8908316c238b/milo/buildsource/buildbot/master_test.go [modify] https://crrev.com/2e4df7a137bc82ed1b26d3db248b8908316c238b/milo/buildsource/buildbucket/buckets.go [modify] https://crrev.com/2e4df7a137bc82ed1b26d3db248b8908316c238b/milo/frontend/view_search.go
,
Feb 28 2018
,
Mar 2 2018
Issue 817988 has been merged into this issue.
,
Mar 5 2018
,
Mar 5 2018
The Gatekeeper step for dart is failing with a 500 Error now, is this connected? https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gatekeeper%2FChromium_Gatekeeper%2F1178881%2F%2B%2Frecipes%2Fsteps%2Fgatekeeper%3A_dart%2F0%2Fstdout Also, when I attempt to go here: http://chrome-build-extract.appspot.com/get_master/client.dart I get a 500 error (not sure if that is supposed to work for the Dart masters).
,
Mar 5 2018
Looking
,
Mar 5 2018
Root cause is found, new version was rolled back in the meantime. Fix incoming.
,
Mar 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/cc792213bfde15e423ba7f8b9fbbbc1e61d844bc commit cc792213bfde15e423ba7f8b9fbbbc1e61d844bc Author: Ryan Tseng <hinoka@google.com> Date: Tue Mar 06 00:00:22 2018 [milo] Don't return builds without build numbers in emulated mode Bug: 808193 Change-Id: Ia8c54c312600b391d5e9c81b7affaefe5fd20966 Reviewed-on: https://chromium-review.googlesource.com/949770 Commit-Queue: Ryan Tseng <hinoka@chromium.org> Reviewed-by: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/cc792213bfde15e423ba7f8b9fbbbc1e61d844bc/milo/buildsource/buildbot/buildstore/build.go [modify] https://crrev.com/cc792213bfde15e423ba7f8b9fbbbc1e61d844bc/milo/buildsource/buildbot/buildstore/buildbucket.go [modify] https://crrev.com/cc792213bfde15e423ba7f8b9fbbbc1e61d844bc/milo/buildsource/buildbot/buildstore/query.go
,
Mar 6 2018
Now it should work.
,
Mar 6 2018
I still can't verify that this is fixed: This build should have triggered a blamelist notification: https://ci.chromium.org/p/dart/builds/b8952821537349105888 The builder vm-kernel-linux-release-simdbc64 is in http://chrome-build-extract.appspot.com/get_master/client.dart The user on the blamelist claims not to have received any mail (I did check his inbox myself to make sure he didn't just miss it) and I can't see any evidence of Gatekeeper reacting to the failure in the "dart" step in the Gatekeeper logs from that time (builds #1179174 - #1179176 on https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/).
,
Mar 6 2018
GK requires build numbers to work. Buildbot API assumes build numbers, so API emulation requires build numbers too and ignores builds without numbers. You may have to enable them in your cr-buildbucket.cfg, at least until we have a better alternative. You may like them, though.
,
Mar 6 2018
After adding build numbers, GK became aware of the Luci builders, but still fails: https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1179561 Traceback (most recent call last): File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 1008, in <module> sys.exit(main(sys.argv[1:])) File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 998, in main simulate) File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 694, in notify_failures build_data['revisions'] = [x['revision'] for x in build_data['changes']] TypeError: 'NoneType' object is not iterable A build that triggered the failure in GK: https://ci.chromium.org/p/dart/builders/luci.dart.ci/vm-kernel-precomp-win-release-x64/1
,
Mar 7 2018
GK seems to have recovered from the failure above, but now GK fails in a different error:
Traceback (most recent call last):
File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 1008, in <module>
sys.exit(main(sys.argv[1:]))
File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 998, in main
simulate)
File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 738, in notify_failures
submit_email(email_app_url, build_data, secret, simulate)
File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 491, in submit_email
with closing(logging_urlopen(req)) as f:
File "/mnt/data/b/build/scripts/slave/gatekeeper_ng.py", line 62, in logging_urlopen
return urllib2.urlopen(url, *args, **kwargs)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error
Failed build: https://ci.chromium.org/p/dart/builders/luci.dart.ci/vm-kernel-win-release-x64/12
GK: https://ci.chromium.org/buildbot/chromium.gatekeeper/Chromium%20Gatekeeper/1180111
,
Mar 7 2018
#28 looks like an issue during sending submit_email, which is unrelated Luci/Milo. I suggest filing a new GK bug.
,
Mar 7 2018
Ok, I'll file a separate bug. I reported it here because it worked fine before this fix for the Buildbot builders (and it still does). Only the Luci builders trigger the errors in GK when failing.
,
Mar 7 2018
I filed bug #819690. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by no...@chromium.org
, Feb 1 2018Status: Assigned (was: Untriaged)
options 1) change migration app to expose new API get_luci_bucket_of(master) and change Milo to use that instead of the current get_builder_info(master, builder), where the response includes luci bucket. Then it does not matter if new builder exists on a master, or even if the master exists. The only thing that matters is that the master is defined in migration app's config. This solves Dart's problem. 2) change API emulation to add a second source of {(master, builder):bucket} mapping, e.g. # luci-milo.cfg ... consoles, etc... buildbot { master { name: "tryserver.chromium.linux" luci_bucket: "luci.chromium.try" } }