handle infra failure alerts better |
||||||
Issue description[from cit-sheriffing thread] The right way to address infra failures in SoM might be: - (re)Add a trooper-oriented tab that shows infra failures (perhaps grouped by trees, but in a single list). - Then for sheriff tabs, add some indication of the *existence* of infra failures. It shouldn't crowd out the list of alerts that they can actually act on though. Perhaps a single "There are X infra failures now affecting your tree. Troopers should be working on them. Click here for more details" expando somewhere conspicuous but not obstructive. - And no "Show infra failures" checkbox. Infra failures affect two different groups of oncallers who have somewhat overlapping responsibilities and needs for awareness, but different methods and abilities to fix things. A single checkbox is probably just too binary for this problem.
,
Sep 1 2016
,
Oct 5 2016
,
Oct 6 2016
Issue 653239 has been merged into this issue.
,
Oct 6 2016
I am interested in working on this. Though I have been wondering what the best way to handle pulling all infra failures from the different trees is? Should we just pull the alerts JSON for every tree and then parse through it on the client side? Or should be potentially change the way things are stored in the backend so this can be filtered from that end?
,
Oct 6 2016
That's a good question. We don't currently parse the alerts json on the SoM server side. It just stores the alerts as raw json data, so we don't have an easy way to construct an infra-only alerts feed on the server today. Short term, it might make more sense for the client to grab all of the trees' alerts when rendering /trooper, and filter to just show the infra failures by tree. Longer term, it would be better to issue one single request from the client for all infra failures across trees, but that'll require a lot more work on the backend.
,
Oct 18 2016
I am interested in working on this.
,
Oct 18 2016
Awesome! How about a short design doc and some mockups of the UI? Use those to hash out answers to some open questions: - whether to do the infra alerts filtering on the server or on the client - how the UI will change for sheriffs and troopers (For sheriffs, just link to infra alerts on the trooper page, or do progressive disclosure in-page? For troopers: visually group infra alerts by tree on on the trooper page, or just one unified list somehow ordered by severity, number of builders affected etc)
,
Oct 18 2016
Sure, writing a mini design doc sounds good to me. I have a few rough initial ideas that I'll post here if that's alright: - Implement the Trooper tab on the frontend the same way as a new tree named "Trooper" and take advantage of things like playbook linking and a Chrome Infra logo for the tree (and maybe infra-status.appspot.com added to the list of status apps?). - The bug queue view is reused as the trooper queue for the trooper tab. It is adjusted to sort bugs by priority and maybe a bit of extra info display, but otherwise few frontend changes. The main change will be the backend which, when a request is made for the "trooper" bug queue, will look up the trooper-queue query on the Monorail API to get a list of bugs - Infra failures are shown in the Trooper tab and sorted by the tree they came from. - Other trees will still show infra failures. It would not be difficult to have the frontend just pull all alerts from every tree and filter for only infra failures. However, I think think this is a feature we are not in a hurry to get out, so it would probably be better to try to do it the "good" way. How feasible/desirable would it be to restructure the Datastore storage for alerts to actually format the data based on the JSON format? If we did that, we would be able to look up alerts by whether they are infra failures.
,
Oct 19 2016
Regarding the server-side construction of the feed: Restructuring the JSON format to fit into the datastore is going to be a lot of work. It's necessary and desirable, but we shouldn't block this work on that. A third option would be to have the server pull all of the current trees' alerts from the datastore, parse the json blobs on the server and then filter for infra failures before returning the infra failure feed. You should be able to query datastore for all treess' alerts and for each, parse AlertsJSON.Content into a infra/monitoring/messages.AlertsSummary struct. Then filter out the infra failures from there.
,
Oct 31 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/43af24ed68d16c83f4227d961dfe452db3820f0c commit 43af24ed68d16c83f4227d961dfe452db3820f0c Author: Tiffany Zhang <zhangtiff@google.com> Date: Fri Oct 28 23:19:45 2016 SoM: Make bug queue labels more general + some setup for Trooper tab. BUG= 637006 Change-Id: I435175a61f5c066a11734305bd20c772cc7faa64 Reviewed-on: https://chromium-review.googlesource.com/404938 Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/43af24ed68d16c83f4227d961dfe452db3820f0c/go/src/infra/appengine/sheriff-o-matic/cron.yaml [modify] https://crrev.com/43af24ed68d16c83f4227d961dfe452db3820f0c/go/src/infra/appengine/sheriff-o-matic/elements/som-tree-status.html [modify] https://crrev.com/43af24ed68d16c83f4227d961dfe452db3820f0c/go/src/infra/appengine/sheriff-o-matic/main.go
,
Nov 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/3e2fb96235a9498bcaef9e287c8d2751c418422b commit 3e2fb96235a9498bcaef9e287c8d2751c418422b Author: Stephen Martinis <martiniss@chromium.org> Date: Mon Oct 31 23:25:00 2016 Update relnotes for SOM BUG= 655234 , 655286 , 637006 Change-Id: I914008d6cb9e0c684ceaa53552b57e4700149f38 Reviewed-on: https://chromium-review.googlesource.com/405827 Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/3e2fb96235a9498bcaef9e287c8d2751c418422b/go/src/infra/appengine/sheriff-o-matic/RELNOTES.md
,
Nov 4 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/e2f26edb637565f39cdd4c936e26b4305d4cf0c4 commit e2f26edb637565f39cdd4c936e26b4305d4cf0c4 Author: Tiffany Zhang <zhangtiff@google.com> Date: Fri Nov 04 18:04:44 2016 SoM: Set up server for trooper tab. BUG= 637006 Change-Id: Ib3ff3e65bb9c89ddcada896f2b9d8d14613d21a0 Reviewed-on: https://chromium-review.googlesource.com/406681 Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/e2f26edb637565f39cdd4c936e26b4305d4cf0c4/go/src/infra/appengine/sheriff-o-matic/cron.yaml [modify] https://crrev.com/e2f26edb637565f39cdd4c936e26b4305d4cf0c4/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue.html [modify] https://crrev.com/e2f26edb637565f39cdd4c936e26b4305d4cf0c4/go/src/infra/appengine/sheriff-o-matic/main.go [modify] https://crrev.com/e2f26edb637565f39cdd4c936e26b4305d4cf0c4/go/src/infra/appengine/sheriff-o-matic/main_test.go
,
Nov 15 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/fe955134226cf50a39b73dfab21c5753f08ef692 commit fe955134226cf50a39b73dfab21c5753f08ef692 Author: Tiffany Zhang <zhangtiff@google.com> Date: Sat Nov 12 00:42:25 2016 SoM: Polish for trooper tab. BUG= 637006 Change-Id: Ie2ba670c222c563e9cb7772904db0ba393396768 Reviewed-on: https://chromium-review.googlesource.com/410038 Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/cron.yaml [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/elements/som-app.html [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue.html [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/elements/som-drawer.html [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/main.go [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/main_test.go [modify] https://crrev.com/fe955134226cf50a39b73dfab21c5753f08ef692/go/src/infra/appengine/sheriff-o-matic/test/som-app-test.html
,
Nov 22 2016
,
Nov 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/1da8369a532291f8bcb2fbec66c0f314643d3ad8 commit 1da8369a532291f8bcb2fbec66c0f314643d3ad8 Author: Tiffany Zhang <zhangtiff@google.com> Date: Wed Nov 23 22:47:47 2016 SoM: Fix trooper bug query. BUG= 637006 Change-Id: Ife105d84669efd92d69ded3e30ff78470e26e6bd Reviewed-on: https://chromium-review.googlesource.com/414291 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> [modify] https://crrev.com/1da8369a532291f8bcb2fbec66c0f314643d3ad8/go/src/infra/appengine/sheriff-o-matic/som/bugqueue.go
,
Dec 7 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/a585e41ff38a29d20f80ed067294c5bd50b7ffc8 commit a585e41ff38a29d20f80ed067294c5bd50b7ffc8 Author: Tiffany Zhang <zhangtiff@google.com> Date: Wed Dec 07 20:19:43 2016 SoM: Enable caching for trooper queue by getting bugs in two steps. BUG= 637006 Change-Id: I617e8187a51f2aafae036d69eae0994c426af2b3 Reviewed-on: https://chromium-review.googlesource.com/417106 Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/cron.yaml [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue.html [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/som/bugqueue.go [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/som/main.go [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/som/main_test.go [modify] https://crrev.com/a585e41ff38a29d20f80ed067294c5bd50b7ffc8/go/src/infra/appengine/sheriff-o-matic/test/som-bug-queue-test.html
,
Dec 13 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/b988aa67aac4701be960c74854c3d8c619a68461 commit b988aa67aac4701be960c74854c3d8c619a68461 Author: Tiffany Zhang <zhangtiff@google.com> Date: Tue Dec 13 23:36:19 2016 SoM: Fix getting owned trooper bugs + make trooper page title count based on trooper queue. BUG= 637006 Change-Id: I1395bea87b2b30536c82aa2d88864f9e71036f3c Reviewed-on: https://chromium-review.googlesource.com/419778 Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/b988aa67aac4701be960c74854c3d8c619a68461/go/src/infra/appengine/sheriff-o-matic/elements/som-app.html [modify] https://crrev.com/b988aa67aac4701be960c74854c3d8c619a68461/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue.html [modify] https://crrev.com/b988aa67aac4701be960c74854c3d8c619a68461/go/src/infra/appengine/sheriff-o-matic/test/som-bug-queue-test.html
,
Jan 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/8f84797d9922a3d6f2253f4c1bc36993847d39f1 commit 8f84797d9922a3d6f2253f4c1bc36993847d39f1 Author: Tiff Zhang <zhangtiff@google.com> Date: Fri Jan 27 22:17:41 2017 SoM: Add bug priority category headers + more whitespace adjustments. BUG= 637006 BUG= 669254 Change-Id: Ie9ec0a48fc4a3c973c908dc0f36e18dd7fcd36f5 Reviewed-on: https://chromium-review.googlesource.com/433819 Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-alert-item/som-alert-item.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-app/som-app.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue/som-bug-queue.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue/som-bug-queue.js [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-extension-build-failure/som-extension-build-failure.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-swarming-bots/som-swarming-bots.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/test/som-bug-queue-test.html
,
Jun 8 2017
I think I should go ahead and close this since we have a trooper page and working on this bug is mostly just adding polish. Some other related requests are better covered in more specific bugs.
,
Mar 8 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/8f84797d9922a3d6f2253f4c1bc36993847d39f1 commit 8f84797d9922a3d6f2253f4c1bc36993847d39f1 Author: Tiff Zhang <zhangtiff@google.com> Date: Fri Jan 27 22:29:45 2017 SoM: Add bug priority category headers + more whitespace adjustments. BUG= 637006 BUG= 669254 Change-Id: Ie9ec0a48fc4a3c973c908dc0f36e18dd7fcd36f5 Reviewed-on: https://chromium-review.googlesource.com/433819 Commit-Queue: Tiffany Zhang <zhangtiff@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-alert-item/som-alert-item.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue/som-bug-queue.js [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-app/som-app.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-extension-build-failure/som-extension-build-failure.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-swarming-bots/som-swarming-bots.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/test/som-bug-queue-test.html [modify] https://crrev.com/8f84797d9922a3d6f2253f4c1bc36993847d39f1/go/src/infra/appengine/sheriff-o-matic/elements/som-bug-queue/som-bug-queue.html |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sheriffbot@chromium.org
, Aug 12 2016