New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 752459 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 752632



Sign in to add a comment

Buildbot pubsub pusher broken on many masters

Project Member Reported by serg...@chromium.org, Aug 4 2017

Issue description

What steps will reproduce the problem?
(1) Open https://luci-milo.appspot.com/buildbot/chromium.infra/infra-continuous-mac-10.9-64/
(2) Open https://build.chromium.org/p/chromium.infra/builders/infra-continuous-mac-10.9-64
(3) Compare list of builds.

What is the expected result?
List of builds should be the same.

What happens instead?
Milo does not have last 16 builds from Buildbot. See captured screenshots attached.
 
buildbot.png
199 KB View Download
milo.png
289 KB View Download
Thanks for the report, I'm investigating.

It looks like something bad happened on Master1:

2017-08-04 08:48:12-0700 [-] Unhandled Error
        Traceback (most recent call last):
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 392, in errback
            self._startRunCallbacks(fail)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 459, in _startRunCallbacks
            self._runCallbacks()
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 547, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 1081, in gotResult
            _inlineCallbacks(r, g, deferred)
        --- <exception caught here> ---
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py", line 1023, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py", line 349, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 319, in _retry
            raise ex
        oauth2client.client.AccessTokenRefreshError: invalid_grant: Invalid JWT Signature.
Summary: Buildbot pubsub pusher broken on many master1 masters (was: LUCI for chromium.infra/infra-continuous-mac-10.9-64 is behind Buildbot by 16 builds)
This does not happen to chromium.webkit, which was restarted yesterday, but is also happening to chromium.goma/chromedriver, and probably a few more masters.
Labels: -Pri-1 Pri-0
Owner: hinoka@chromium.org
Status: Started (was: Untriaged)
Here is the timestamp of the first instance

2017-08-03 10:45:13-0700 [-] Failed to retrieve access token: {
          "error" : "invalid_grant",
          "error_description" : "Invalid JWT Signature."
        }
2017-08-03 10:45:13-0700 [-] RPC "('projects', 'topics', 'publish')" failed: Traceback (most recent call last):
          File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 304, in _retry
            res = yield threads.deferToThreadPool(reactor, self._pool, call)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/threadpool.py", line 242, in _worker
            result = context.call(ctx, function, *args, **kwargs)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
          File "/home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py", line 371, in single_call
            self._th_local.credentials.refresh(self._th_local.http)
          File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 558, in refresh
            self._refresh(http.request)
          File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 727, in _refresh
            self._do_refresh_request(http_request)
          File "/home/chrome-bot/buildbot/build/third_party/oauth2client/oauth2client/client.py", line 789, in _do_refresh_request
            raise AccessTokenRefreshError(error_msg)
        AccessTokenRefreshError: invalid_grant: Invalid JWT Signature.

2017-08-03 10:45:13-0700 [-] PubSub: Failed to push: [Failure instance: Traceback: <class 'oauth2client.client.AccessTokenRefreshError'>: invalid_grant: Invalid JWT Signature.
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:392:errback
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:459:_startRunCallbacks
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:547:_runCallbacks
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1081:gotResult
        --- <exception caught here> ---
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/internet/defer.py:1023:_inlineCallbacks
        /home/chrome-bot/buildbot/build/third_party/twisted_10_2/twisted/python/failure.py:349:throwExceptionIntoGenerator
        /home/chrome-bot/buildbot/build/scripts/master/deferred_resource.py:319:_retry
        ]

Summary: Buildbot pubsub pusher broken on many masters (was: Buildbot pubsub pusher broken on many master1 masters)
Well crap, this has been happening to a lot more than master1 masters:

https://pantheon.corp.google.com/datastore/entities/query?project=luci-milo&ns=&kind=buildbotMasterEntry&filter=8%2FModified%7CDT%7CLT%7C24%2F2017-08-04T04:09:29.000Z
Cc: zhangtiff@chromium.org seanmccullough@chromium.org
 Issue 752259  has been merged into this issue.
Affected masters:

chromeos.branch
chromeos.chrome
chromeos.continuous
chromeos.infra
chromium
chromium.android
chromium.android.fyi
chromium.chrome
chromium.chromedriver
chromium.chromiumos
chromium.gatekeeper
chromium.goma
chromium.gpu
chromium.gpu.fyi
chromium.infra
chromium.infra.codesearch
chromium.infra.cron
chromium.linux
chromium.lkgr
chromium.mac
chromium.memory
chromiumos
chromiumos.chromium
chromium.perf
chromium.perf.fyi
chromium.swarm
chromium.tools.build
chromium.webrtc
chromium.webrtc.fyi
chromium.win
client.arc
client.arc.release
client.arc.tryserver
client.art
client.boringssl
client.catapult
client.cdm
client.chromeoffice.try
client.dart
client.dart.fyi
client.dart.internal
client.dart.packages
client.drmemory
client.dynamorio
client.goma
client.gyp
client.libyuv
client.mojo
client.nacl
client.nacl.ports
client.nacl.sdk
client.nacl.toolchain
client.pdfium
client.syzygy
client.v8
client.v8.branches
client.v8.chromium
client.v8.clusterfuzz
client.v8.fyi
client.v8.official
client.v8.ports
client.wasm.llvm
client.webrtc
client.webrtc.branches
client.webrtc.fyi
client.webrtc.perf
internal.bling.tryserver
internal.client.clank
internal.client.clank_experimental
internal.client.clank_qa
internal.client.clank_tot
internal.client.cronet
internal.client.kitchensync
internal.client.v8
internal.client.webrtc
internal.gatekeeper
internal.infra.codesearch
internal.infra.cron
internal.tryserver.clankium
official.android
official.android.continuous
official.desktop.continuous
official.diffs
official.gatekeeper
official.infra.cron
tryserver.blink
tryserver.chromium.android
tryserver.chromium.angle
tryserver.chromium.chromiumos
tryserver.chromium.mac
tryserver.chromium.perf
tryserver.chromium.win
tryserver.client.catapult
tryserver.client.custom_tabs_client
tryserver.client.mojo
tryserver.client.pdfium
tryserver.client.syzygy
tryserver.libyuv
tryserver.nacl
tryserver.webrtc
Restarting all masters except official now (inc. ChromeOS, since it's before 10am)
Project Member

Comment 9 by bugdroid1@chromium.org, Aug 4 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager/+/ca1bceea830dc05cdc1cfa54875f7faa4085edbb

commit ca1bceea830dc05cdc1cfa54875f7faa4085edbb
Author: Ryan Tseng <hinoka@google.com>
Date: Fri Aug 04 16:28:24 2017

probable cause: crbug.com/697545 (key intentionally revoked around 10am Thu)

If the original refresh token was invalidated and then a new one was swapped out onto the disk, buildbot doesn't reload the new token until a restart.
Labels: -Pri-0 Pri-1
Only official masters and chromium.perf are stuck now, lowering to P1
Cc: kbr@chromium.org sunn...@chromium.org
In the future, can you cc the sheriffs for the masters being restarted?
Will do.  I sent the PSA out to chrome-team@ but I can explicitly include sheriffs next time.
Blockedon: 752632

Comment 15 by d...@chromium.org, Aug 7 2017

Issue 752949 has been merged into this issue.
Status: Fixed (was: Started)

Sign in to add a comment