[LUCI-Feedback] Distinguishing new test failures from the same old failures |
||||||
Issue descriptionAt Dart we're implementing a first class test results workflow. Our systems can compare with the previous test results and approved test failures that shouldn't turn builders green. We'd like to be able to notify developers if a builder has a new test failure, even if the builder is already red. We currently use edge triggering with luci-notify, so developers know when builders turn red, however they are not notified about any additional test failures. Likewise the milo console has red builds, where usually only the first in a red column is interesting, but there could be other interesting builds hidden in a red column that add additional regressions. Buildbot supported the 'Failed Again' outcome, which Milo still supports showing, however the new LUCI recipes system doesn't let us use that color. If Failed Again was supported, our recipe could readily determine whether there are new regressions and make the builder red or make it orange (failed again) if the same test failures were seen. luci-notify could then be put in level triggered mode and mail on every red build, and ignore the orange ones. Developers would then be notified if additional tests break even if the builder is red and it would be easy to spot interesting (red) builds on the milo console. Some extensibility with additional colors could be useful. For instance we have deflaking and could use a dark green color to signify a build that had flakes that were ignored. This is not as important. This feature isn't critical to us shipping first class test results, but it will improve how well it works and give our developers a better experience.
,
Dec 20
Would the recipe talk to some external state to retrieve previous state?
,
Dec 20
Yes we store our test results in cloud storage and the recipe compares the current test results with the baseline (for CQ that would be the previous CI results to see if there are new regressions, for CI that would be a set of approved test failures to see if there are any test failures that are unapproved).
,
Dec 21
Adding representatives of other teams. How do you solve this problem today, or need it to be solved?
,
Dec 21
,
Dec 21
(monorail on phone is hard)
,
Jan 7
Browser infrastructure doesn't have a good way to handle this case. As designed, I believe SoM is supposed to obviate the need for this for the sheriff use case, though I'm not sure how well that works in practice (and it only addresses the sheriff use case, of course). I do think this would be a useful capability.
,
Jan 7
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by sortie@google.com
, Dec 20