Sheriff-o-matic complains about stale masters and offline builders and displays stale data |
||||||
Issue descriptionMost of the day today, the SOM has been complaining about stale masters and offline builders, even though the builders appear to be otherwise healthy. It is also not noticing that the problem with the webkit builders has been resolved and some of the builders cycled green. There was a different, unrelated, problem that cause some further failures, but that also has been fixed - in any case, the builders are still complaining about the old fixed problem and didn't notice the new failure at all (I only noticed it because I was checking builder status manually).
,
Oct 14 2016
The master data is stale. There's a problem with how we're storing the data in a caching layer we have.
,
Oct 14 2016
Moving discussion here... Milo is having trouble storing builds from chromium.fyi and chromium.perf because the raw data is over 6MB, and the compressed data is over 1MB, which is over the 1MB datastore limit. Example from chromium.fyi: Length of json data: 7385135 Length of gzipped data: 1121537 My current theory is that there are so many pending builds that it's pushing it over the limit. Right now the code restricts the number of pending builds to only send 75 per builder. I'll reduce this down to 25 to see if it makes an improvement.
,
Oct 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/363fb29ae42f5a475c3b93c857d0eed5ea58588d commit 363fb29ae42f5a475c3b93c857d0eed5ea58588d Author: hinoka <hinoka@chromium.org> Date: Fri Oct 14 01:52:49 2016 Pubsub: Restrict full pending builds states to 25 per builder (from 75) BUG= 655863 Review-Url: https://codereview.chromium.org/2422503002 [modify] https://crrev.com/363fb29ae42f5a475c3b93c857d0eed5ea58588d/scripts/master/pubsub_json_status_push.py
,
Oct 14 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/master-manager.git/+/1e31d26a87655c0c5deb8ece029f7091c5278fb2 commit 1e31d26a87655c0c5deb8ece029f7091c5278fb2 Author: hinoka <hinoka@google.com> Date: Fri Oct 14 02:01:05 2016
,
Oct 14 2016
Should be fixed (for now)
>>> o = json.load(urllib.urlopen('http://chrome-build-extract.appspot.com/get_master/chromium.perf?json=true'))
>>> o['created']
u'2016-10-14T03:35:54.266699Z'
,
Oct 14 2016
This may happen again if we add like 30 more builders to chromium.perf, and each builder has 25 or more pending builds. But I'd expect the master to topple over way before that happens.
,
Oct 14 2016
Can we get some monitoring on those datastore insert failures?
,
Oct 14 2016
Should it be a ts_mon metric that sends master insertion events tagged with "success"/"failure"?
,
Oct 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/39c1f5c6da051287a0e84f27c6f611181fecb925 commit 39c1f5c6da051287a0e84f27c6f611181fecb925 Author: hinoka <hinoka@google.com> Date: Wed Oct 19 22:39:39 2016 Milo: Pubsub - Trim out pending build states if there are more than 25 per builder BUG= 655863 Review-Url: https://chromiumcodereview.appspot.com/2421713003 [modify] https://crrev.com/39c1f5c6da051287a0e84f27c6f611181fecb925/milo/appengine/buildbot/pubsub.go
,
Oct 20 2016
,
Oct 20 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal.git/+/12dcae3b065d4b21435a894be863aee471bc4e1c commit 12dcae3b065d4b21435a894be863aee471bc4e1c Author: hinoka <hinoka@google.com> Date: Thu Oct 20 21:59:26 2016
,
Nov 4 2016
Stability patches have been landed, this should be fixed.
,
Nov 18 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/7d92eccc01a9b20b65de213753701e35658ae57f commit 7d92eccc01a9b20b65de213753701e35658ae57f Author: hinoka <hinoka@google.com> Date: Fri Nov 18 23:04:24 2016 Milo: Add ts_mon metrics for master json datastore success BUG= 655863 Review-Url: https://codereview.chromium.org/2418063002 [modify] https://crrev.com/7d92eccc01a9b20b65de213753701e35658ae57f/milo/appengine/buildbot/pubsub.go |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dsansome@chromium.org
, Oct 14 2016