Buildbot pubsub pusher broken on many masters |
|||||||||
Issue descriptionWhat steps will reproduce the problem? (1) Open https://luci-milo.appspot.com/buildbot/chromium.infra/infra-continuous-mac-10.9-64/ (2) Open https://build.chromium.org/p/chromium.infra/builders/infra-continuous-mac-10.9-64 (3) Compare list of builds. What is the expected result? List of builds should be the same. What happens instead? Milo does not have last 16 builds from Buildbot. See captured screenshots attached.
,
Aug 4 2017
This does not happen to chromium.webkit, which was restarted yesterday, but is also happening to chromium.goma/chromedriver, and probably a few more masters.
,
Aug 4 2017
,
Aug 4 2017
Here is the timestamp of the first instance
2017-08-03 10:45:13-0700 [-] Failed to retrieve access token: {
"error" : "invalid_grant",
"error_description" : "Invalid JWT Signature."
}
2017-08-03 10:45:13-0700 [-] RPC "('projects', 'topics', 'publish')" failed: Traceback (most recent call last):
File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 304, in _retry
res = yield threads.deferToThreadPool(reactor, self._pool, call)
File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/threadpool.py", line 242, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py", line 59, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py", line 37, in callWithContext
return func(*args,**kw)
File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 371, in single_call
self._th_local.credentials.refresh(self._th_local.http)
File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 558, in refresh
self._refresh(http.request)
File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 727, in _refresh
self._do_refresh_request(http_request)
File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 789, in _do_refresh_request
raise AccessTokenRefreshError(error_msg)
AccessTokenRefreshError: invalid_grant: Invalid JWT Signature.
2017-08-03 10:45:13-0700 [-] PubSub: Failed to push: [Failure instance: Traceback: <class 'oauth2client.client.AccessTokenRefreshError'>: invalid_grant: Invalid JWT Signature.
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:392:errback
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:459:_startRunCallbacks
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:547:_runCallbacks
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1081:gotResult
--- <exception caught here> ---
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1023:_inlineCallbacks
/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator
/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py:319:_retry
]
,
Aug 4 2017
Well crap, this has been happening to a lot more than master1 masters: https://pantheon.corp.google.com/datastore/entities/query?project=luci-milo&ns=&kind=buildbotMasterEntry&filter=8%2FModified%7CDT%7CLT%7C24%2F2017-08-04T04:09:29.000Z
,
Aug 4 2017
Issue 752259 has been merged into this issue.
,
Aug 4 2017
Affected masters: chromeos.branch chromeos.chrome chromeos.continuous chromeos.infra chromium chromium.android chromium.android.fyi chromium.chrome chromium.chromedriver chromium.chromiumos chromium.gatekeeper chromium.goma chromium.gpu chromium.gpu.fyi chromium.infra chromium.infra.codesearch chromium.infra.cron chromium.linux chromium.lkgr chromium.mac chromium.memory chromiumos chromiumos.chromium chromium.perf chromium.perf.fyi chromium.swarm chromium.tools.build chromium.webrtc chromium.webrtc.fyi chromium.win client.arc client.arc.release client.arc.tryserver client.art client.boringssl client.catapult client.cdm client.chromeoffice.try client.dart client.dart.fyi client.dart.internal client.dart.packages client.drmemory client.dynamorio client.goma client.gyp client.libyuv client.mojo client.nacl client.nacl.ports client.nacl.sdk client.nacl.toolchain client.pdfium client.syzygy client.v8 client.v8.branches client.v8.chromium client.v8.clusterfuzz client.v8.fyi client.v8.official client.v8.ports client.wasm.llvm client.webrtc client.webrtc.branches client.webrtc.fyi client.webrtc.perf internal.bling.tryserver internal.client.clank internal.client.clank_experimental internal.client.clank_qa internal.client.clank_tot internal.client.cronet internal.client.kitchensync internal.client.v8 internal.client.webrtc internal.gatekeeper internal.infra.codesearch internal.infra.cron internal.tryserver.clankium official.android official.android.continuous official.desktop.continuous official.diffs official.gatekeeper official.infra.cron tryserver.blink tryserver.chromium.android tryserver.chromium.angle tryserver.chromium.chromiumos tryserver.chromium.mac tryserver.chromium.perf tryserver.chromium.win tryserver.client.catapult tryserver.client.custom_tabs_client tryserver.client.mojo tryserver.client.pdfium tryserver.client.syzygy tryserver.libyuv tryserver.nacl tryserver.webrtc
,
Aug 4 2017
Restarting all masters except official now (inc. ChromeOS, since it's before 10am)
,
Aug 4 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager/+/ca1bceea830dc05cdc1cfa54875f7faa4085edbb commit ca1bceea830dc05cdc1cfa54875f7faa4085edbb Author: Ryan Tseng <hinoka@google.com> Date: Fri Aug 04 16:28:24 2017
,
Aug 4 2017
probable cause: crbug.com/697545 (key intentionally revoked around 10am Thu) If the original refresh token was invalidated and then a new one was swapped out onto the disk, buildbot doesn't reload the new token until a restart.
,
Aug 4 2017
Only official masters and chromium.perf are stuck now, lowering to P1
,
Aug 4 2017
In the future, can you cc the sheriffs for the masters being restarted?
,
Aug 4 2017
Will do. I sent the PSA out to chrome-team@ but I can explicitly include sheriffs next time.
,
Aug 4 2017
,
Aug 7 2017
Issue 752949 has been merged into this issue.
,
Aug 7 2017
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by hinoka@chromium.org
, Aug 4 2017Thanks for the report, I'm investigating. It looks like something bad happened on Master1: 2017-08-04 08:48:12-0700 [-] Unhandled Error Traceback (most recent call last): File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 392, in errback self._startRunCallbacks(fail) File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 459, in _startRunCallbacks self._runCallbacks() File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 547, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 1081, in gotResult _inlineCallbacks(r, g, deferred) --- <exception caught here> --- File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 1023, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py", line 349, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 319, in _retry raise ex oauth2client.client.AccessTokenRefreshError: invalid_grant: Invalid JWT Signature.