Alert on inventory run failures.
Reported by
jrbarnette@chromium.org,
Apr 23 2018
|
|||||||
Issue description
We have new metrics and a new dashboard that depend on the inventory
runs completing more-or-less reliably once every 8 hours:
https://viceroy.corp.google.com/chromeos/untestable?duration=8d
If a run fails to complete, the data for that dashboard will show
empty tables. We need an alert that will fire when an inventory run
fails to produce metrics.
,
Apr 26 2018
,
Apr 30 2018
Possibly mine, but the assignment needs ratification.
,
Apr 30 2018
sounds related directly to your other work
,
May 7 2018
,
May 7 2018
Add service liveness metrics to lab_inventory (cl stack): https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1048608
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/51ad14e50d98a2b5d608cb10b8e1a6a38ac05ea8 commit 51ad14e50d98a2b5d608cb10b8e1a6a38ac05ea8 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed May 09 21:39:56 2018 lab_inventory: Let exceptions escape main() Exceptions from the lab_inventory script should be allowed to escape main so that callers can correctly handle the failure case. BUG= chromium:835941 TEST=None Change-Id: Ia75cda1c032cca31a5827cf56aeff2e564513515 Reviewed-on: https://chromium-review.googlesource.com/1048605 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/51ad14e50d98a2b5d608cb10b8e1a6a38ac05ea8/site_utils/lab_inventory.py
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/6b48edefd74f028d5ffba82007cd4aaa743af476 commit 6b48edefd74f028d5ffba82007cd4aaa743af476 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed May 09 21:39:57 2018 lab_inventory: Let --debug imply --debug-metrics It is natural to not actually report metrics when lab_inventory is run with --debug. This reduces some complexity in the script that was only needed to support the weird use case where someone wants to run with --debug, but still report metrics. BUG= chromium:835941 TEST=None Change-Id: Ieb3752d917737627173faee99d9b4087de68ea59 Reviewed-on: https://chromium-review.googlesource.com/1048606 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/6b48edefd74f028d5ffba82007cd4aaa743af476/site_utils/lab_inventory.py
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/58728f40fe2a952416efb40012ccafb87c755742 commit 58728f40fe2a952416efb40012ccafb87c755742 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed May 09 21:39:57 2018 lab_inventory: Flush metrics even in case of errors This ensures that we will not drop metrics on the floor when exceptions happen. BUG= chromium:835941 TEST=None Change-Id: Icbcb5e52e48b3eed4e5122906aab3772b844932f Reviewed-on: https://chromium-review.googlesource.com/1048607 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/58728f40fe2a952416efb40012ccafb87c755742/site_utils/lab_inventory.py
,
May 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b69a6cc8cedebdc2f1eaca6c45c8d71668aae694 commit b69a6cc8cedebdc2f1eaca6c45c8d71668aae694 Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Wed May 09 21:39:58 2018 lab_inventory: Report service liveness and duration metrics BUG= chromium:835941 TEST=Run with --debug Change-Id: I9f925584facbe5e55ecb5268b47ddbfbe63bcdc9 Reviewed-on: https://chromium-review.googlesource.com/1048608 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Richard Barnette <jrbarnette@google.com> [modify] https://crrev.com/b69a6cc8cedebdc2f1eaca6c45c8d71668aae694/site_utils/lab_inventory.py
,
May 11 2018
dashboard created with the tick metrics: cr/196286013
,
May 14 2018
pending final review for alerts
,
May 16 2018
Alerts landed in staging. I'll promote to prod once they fire a few times. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jrbarnette@chromium.org
, Apr 23 2018