CQ triggers too many builds on active CLs |
|||
Issue descriptionIn other words, CQ appears to not recognize builds triggered before, and so keep triggering builds.
,
Dec 19
Recent release of CQ was around 16:15, but it's not clear when it was actually deployed by puppet (will be investigated later), so for now - reverted. Last buildbucket release was 17:15 PM, also undone. CQ triggering became too high around buildbucket release time (see screenshot)
,
Dec 19
The above graph was with window size of 1h, with 2m window we can clearly see insane triggering is over (internal URL http://shortn/_LRmLaoWsp2)
,
Dec 19
From CQ log, it appears around 17:30 there most frequent message was like this: 2018-12-18 17:28:32.655 UTC-8 [pid:43064 tid:140478196455168 infra_internal.services.cq.buildbucket_util:515] Skipping bucket result 8926752504619894192 for issue 1379183 patchset 4: not required builder. In another news, jbudorick@ has sent PSA.
,
Dec 19
purging all scheduled builds w/ tag user_agent:cq from luci.chromium.try via buildbucket's delete_many_builds
,
Dec 19
Issue 916358 has been merged into this issue.
,
Dec 19
i've checked more CQ logs: 1. last push was 2018-12-18 16:39:06.760 UTC-8 [pid:43064 tid:140482652575552 infra_internal.services.cq.cq:176] The Commit Queue is going to commit stuff. 2. the revert of that push didn't reach prod yet, so the cause is 100% buildbucket push.
,
Dec 19
purging all scheduled build from luci.chromium.try regardless of tags.
,
Dec 19
Ran the same thing on a bunch of other buckets with big number of pending builds (e.g., for v8 https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/buildbucket/v1/buildbucket.delete_many_builds?bucket=luci.v8.try&status=SCHEDULED&_h=8 ) Downgrading to Pri1, but I'll continue monitoring reduction of actual pending build counts.
,
Dec 19
i think buildbucket backend is having hard time chewing through the backlog of things to delete:
Expected Future, received <class 'google.appengine.api.apiproxy_stub_map.UserRPC'>: <google.appengine.api.apiproxy_stub_map.UserRPC object at 0x2a77fd1ef350> (/base/alloc/tmpfs/dynamic_runtimes/python27g/d22767677e9aa897/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py:1552)
Traceback (most recent call last):
File "third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "appengine/ext/deferred/deferred.py", line 318, in post
self.run_from_request()
File "appengine/ext/deferred/deferred.py", line 313, in run_from_request
run(self.request.body)
File "appengine/ext/deferred/deferred.py", line 155, in run
return func(*args, **kwds)
File "service.py", line 684, in _task_delete_many_builds
q.map(del_if_unchanged, keys_only=True)
File "appengine/ext/ndb/utils.py", line 160, in positional_wrapper
return wrapped(*args, **kwds)
File "appengine/ext/ndb/query.py", line 1190, in map
**q_options).get_result()
File "appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "appengine/ext/ndb/tasklets.py", line 624, in _finish
result = [r.get_result() for r in self._results]
File "appengine/ext/ndb/tasklets.py", line 383, in get_result
self.check_success()
File "appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "service.py", line 671, in del_if_unchanged
if (yield txn(key)): # pragma: no branch
File "appengine/ext/ndb/tasklets.py", line 430, in _help_tasklet_along
value = gen.send(val)
File "appengine/ext/ndb/context.py", line 1029, in transaction
result = yield result
File "appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "service.py", line 666, in txn
yield futs
File "appengine/ext/ndb/tasklets.py", line 496, in _help_tasklet_along
mfut.add_dependent(subfuture)
File "appengine/ext/ndb/tasklets.py", line 648, in add_dependent
raise TypeError('Expected Future, received %s: %r' % (type(fut), fut))
TypeError: Expected Future, received <class 'google.appengine.api.apiproxy_stub_map.UserRPC'>: <google.appengine.api.apiproxy_stub_map.UserRPC object at 0x2a77fd1ef350>
,
Dec 19
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/01045b09928f9967a6e687ee6a128838a5211b6a commit 01045b09928f9967a6e687ee6a128838a5211b6a Author: Nodir Turakulov <nodir@google.com> Date: Wed Dec 19 02:52:22 2018 [buildbucket] Make cancel_task_transactionally_async a tasklet cancel_task_transactionally_async currently returns a UserRPC but yield [multipleFutures] does not like that. Make it return a future. R=tandrii@chromium.org Bug: 916359 Change-Id: I37059bd5f089319173938d877459e570665f8cc9 Reviewed-on: https://chromium-review.googlesource.com/c/1383376 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Commit-Queue: Vadim Shtayura <vadimsh@chromium.org> Auto-Submit: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Cr-Commit-Position: refs/heads/master@{#19661} [modify] https://crrev.com/01045b09928f9967a6e687ee6a128838a5211b6a/appengine/cr-buildbucket/swarming/swarming.py
,
Dec 19
Chromium appears to be almost back to normal, only limited backlog of longest-running builders, which should get cleared within 1-2 hours.
,
Dec 19
Sent updated PSA. Checked other projects, situation is similar.
,
Dec 19
Postmortem TBD
,
Dec 19
Issue 916336 has been merged into this issue. Issue 916375 has been merged into this issue. Issue 916378 has been merged into this issue. Issue 916383 has been merged into this issue. Issue 916384 has been merged into this issue. Issue 916385 has been merged into this issue.
,
Dec 20
postmortem: go/chops-pm-110 |
|||
►
Sign in to add a comment |
|||
Comment 1 by tandrii@chromium.org
, Dec 19