Alert if important PFQ and release builders don't run for X number of hours |
||||||
Issue descriptionOn my deputy shifts, I've handled issues from our client teams pointing out that some release builder hasn't run in days. Example: issue 819357 The root cause is that the buildslave dies for some reason and we have a 1:1 mapping between these builders and the buildslaves. This leaves the builder with no slave to run on. We should add viceroy alerts for the important pfq / android pfq / release builders not even running for, say 1 day. This situation doesn't arise for paladins where we'd notice immediately.
,
Mar 12 2018
1) Master could own this metric, and increment an "important board didn't run" counter 2) Master could export an "important build existence metric" which we could join on build run metric (this is analagous to prod role metric).
,
Mar 12 2018
,
Mar 14 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/0ac72f67b0730bf4255f555b13d7ee34960e6f68 commit 0ac72f67b0730bf4255f555b13d7ee34960e6f68 Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Mar 14 21:21:49 2018 completion_stages: add a has_important_slave metric to master completion BUG= chromium:819419 TEST=None Change-Id: I5d9ecda99040ba134ab5d013d7997a99038bc327 Reviewed-on: https://chromium-review.googlesource.com/961279 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Paul Hobbs <phobbs@google.com> [modify] https://crrev.com/0ac72f67b0730bf4255f555b13d7ee34960e6f68/cbuildbot/stages/completion_stages.py
,
Mar 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b8e7c2d9a192a0e7c441c6f34284fea6ef68dcd5 commit b8e7c2d9a192a0e7c441c6f34284fea6ef68dcd5 Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Fri Mar 16 22:50:01 2018 Roll src/third_party/chromite/ 3b75c9d82..3ad8f333d (31 commits) https://chromium.googlesource.com/chromiumos/chromite.git/+log/3b75c9d82ebf..3ad8f333d567 $ git log 3b75c9d82..3ad8f333d --date=short --no-merges --format='%ad %ae %s' 2018-03-16 dgarrett Revert "Reland "pre_cq_launcher: Swarming for chromeos-infra-puppet-pre-cq."" 2018-03-16 dgarrett Reland "pre_cq_launcher: Swarming for chromeos-infra-puppet-pre-cq." 2018-03-14 ayatane autotest-pre-cq: Remove builder and stage [2/2] 2018-03-16 dgarrett Revert "pre_cq_launcher: Swarming for chromeos-infra-puppet-pre-cq." 2018-03-15 dgarrett chromeos_config: Move fuzzer builds into new bucket. 2018-03-16 dgarrett Revert "commands: RunBranchUtilTest -> RunLocalTryjob" 2018-03-13 dgarrett pre_cq_launcher: Swarming for chromeos-infra-puppet-pre-cq. 2018-02-07 dgarrett commands: RunBranchUtilTest -> RunLocalTryjob 2018-03-14 dgarrett cbuildbot_run: Switch more build links to Legoland. 2018-03-13 dgarrett swarming_lib: Remove SWARMING_TASK_ID from cmds. 2018-03-08 dgarrett moblab_vm_unitest: Fix lint issues. 2018-03-14 ihf chromeos_config: add more arcnext experimental coverage. 2018-03-14 ayatane autotest-pre-cq: Remove this [1/2] 2018-03-14 norvez chromeos_config: remove dead code 2018-03-09 dgarrett summarize_build_stats: Add blank line at beginning. 2018-01-09 dgarrett cros tryjob: Remove buildbot URL generation. 2017-09-14 craigb image_test: Remove check that kernel is not ELF. 2018-03-15 ihf Revert "chromeos_config: temporarily mark eve-arcnext-paladin experimental" 2018-03-15 ihf Revert "chromeos_config: temporarily experimental eve-arcnext-mst-android-pfq" 2018-03-13 lhchavez chromeos_config: Add betty-arcnext builder config 2018-03-13 achuith cbuildbot: Add missing files to index. 2018-03-13 akeshet completion_stages: add a has_important_slave metric to master completion 2018-03-13 dgarrett precq-launcher: Start using Legoland build details page. 2018-03-08 dgarrett chromite-pre-cq: Disable CidbIntegrationTest. 2018-03-14 akeshet chromeos_config: temporarily experimental eve-arcnext-mst-android-pfq 2018-03-13 akeshet chromeos_config: temporarily mark eve-arcnext-paladin experimental 2018-03-12 haddowk [chromite] Make guado_moblab important again 2018-03-13 chrome-bot Update config settings by config-updater. 2018-03-12 gmeinke chromium-config: replace cros_config_host_py in chromite 2018-03-12 yunlian Enable ThinLTO on all AMD64 boards. 2018-03-12 achuith cbuildbot: Log timing of GenerateUploadJSON. Created with: roll-dep src/third_party/chromite BUG=821930, 822517 , 821615 ,None,821618,821227,None,821664,821930,None,815377,747385,461595,821664,821664,811989,819419,821618,820305,821664,821664,819017,813442,707803,811989 The AutoRoll server is located here: https://chromite-chromium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=chrome-os-gardeners@chromium.org Change-Id: Ib6aaddf338307e994865a092ecb322a432148692 Reviewed-on: https://chromium-review.googlesource.com/967273 Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#543855} [modify] https://crrev.com/b8e7c2d9a192a0e7c441c6f34284fea6ef68dcd5/DEPS
,
Mar 19 2018
1 more CL to go
,
Mar 26 2018
Alert expected this week.
,
Mar 29 2018
This looks like a good "how many times has important slave run in 1 day" counter. It appears to also work for 0-values. http://shortn/_zFmy2bvS87
,
Mar 30 2018
+jclinton fyi I'm adding this alert in https://critique.corp.google.com/#review/191011875
,
Mar 30 2018
,
Apr 2 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by pprabhu@chromium.org
, Mar 7 2018