Findit filed too many bugs for different tests of the same root cause |
||||||||
Issue descriptionConcern raised that Findit filed too many bugs for different tests of the same root cause. Example https://bugs.chromium.org/p/chromium/issues/detail?id=900392#c8 Initial investigation shows that this is triggered by the new logic to file bugs for flakes hidden by "retry with patch".
,
Oct 30
Thanks. What does that mean for issue 900392 , though? Is that test actually failing somewhere, or was it a false positive? If it's failing, is there a way to get to the log where it happened?
,
Oct 30
Looking at the steps from https://cr-buildbucket.appspot.com/rpcexplorer/services/buildbucket.v2.Builds/GetBuild?request={%20%20%20%20%22id%22:%20%228931228707859609232%22,%20%20%20%20%22fields%22:%20%22id,status,builder,steps%22}, which is for https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win10_chromium_x64_rel_ng/127008 The 'interactive_ui_tests (with patch) on Windows-10-15063' step is actually failing. And one task containing the failed test is https://chromium-swarm.appspot.com/task?id=40de7b7b21f33910&refresh=10&show_raw=1 Hope this information helps.
,
Oct 30
Thanks! That gave a lead.
,
Oct 31
Issue 900507 has been merged into this issue.
,
Oct 31
I'm very concerned about FindIt filing individual bugs for huge numbers of tests with the same underlying failure type. In this case, there were at least 174 tests that failed on a global_browser_set_up_function_ assertion (see https://bugs.chromium.org/p/chromium/issues/detail?id=900392#c8). We have now seen at least 6 bugs get duped into either this or issue 900392 , all demanding that the individual tests get disabled. That seems like very much the wrong thing to do in a case like this. It is counter-productive to have sheriffs and engineers disable a wide swath of tests and reduce coverage for regressions when an underlying issue (either infra problems or a test framework problem like issue 711256) is to blame. Can FindIt be improved to notice the common failures across tests in cases like this to suggest a different course of action? (CC'ing gab@ and jam@ from issue 711256 for FYI, but I'm not sure who to CC or assign to on the FindIt side.)
,
Oct 31
dpranke@: Do you know who might be able to look at the FindIt overly-broad-disabling problem in comment 6?
,
Oct 31
Hi creis@, thank you for raising this concern. I'll start to look into this issue now.
,
Oct 31
Many thanks creis@ for the feedback! lijeffrey@ is working on auto-deduplicate flake bugs, and to file one bug per culprit/root cause that could have caused many flaky tests. This is a trade-off between filing bugs first for manual disabling to reduce the impacts on other developers and waiting for all culprit-finding analyses to complete before filing bugs. We'd like to hear your feedback too. The one-pager doc is https://docs.google.com/document/d/15vt2RcW6kRr8VS8No3YwXJhcZsjO4UfV88l7I3Qkphs/edit?ts=5bd8a28e As of the current policy for flaky tests, they should be disabled asap, and the culprit should be reverted instead if one is identified. We could also discuss how to adjust this policy to make it more reasonable to handle more scenarios.
,
Oct 31
,
Oct 31
,
Oct 31
Fantastic! I'm glad there's some work in progress to spot common culprits rather than mass-disabling tests, and I'm looking forward to that rolling out. I think it's worth adjusting the current policy a bit, if there's a way to officially exempt a test from being disabled when the cause is known to be unrelated to the test (and may affect multiple tests). More specifically, the current text (e.g., in issue 900392 ) is: "Unless the culprit CL is found and reverted, please disable this test first within 30 minutes then find an appropriate owner." I think we should expand that to cover the case that a known issue is affecting unrelated tests. Otherwise we'll end up with a lot of disabled tests that will (most likely) never get re-enabled, while ignoring the root cause. See issue 826599 for another case where we ran into exactly this issue. One other thing that may help on this front: is it possible to include the text of the test failure in the bug report (e.g., like what I posted in https://bugs.chromium.org/p/chromium/issues/detail?id=900392#c6)? Right now it's hard to search for bugs with the same root cause because most people don't put the test output in the bug itself. See all the bugs duped to issue 900392 as an example: it's not obvious why they're dupes unless you click deep into the reports. (In fact, you can't even do that on issue 900392 because of issue 900148 .) Thanks for your work on FindIt! It's definitely a useful tool overall, and hopefully cases like this can get smoothed out soon.
,
Nov 1
Many thanks again for all the feedback! They are very helpful! Re grouping flaky tests together: we will try to make more progress here, but this is a difficult problem with edge cases. * Difficult to make a good trade-off between timely bug filing (+kbr@ who wants to file bug first) and time-consuming analysis (for flakes, the regression range is unknown and is usually 100s of commits * Some flakes are not reproducible, and analyses can't identify the root cause -- culprit * Long-standing flakes: the flaky tests exist for a long time, and the culprit can't be identified (currently we look back 5000 commits in flake analysis, we could extend the range here, but there is no guarantee) One quick and most likely reliable solution here could be: file bug by the suite name instead of full test names. Example: for https://screenshot.googleplex.com/BvVGYnqkA4T.png, we file a bug for __main__.InstallerTest.* instead of for individual tests. WDYT? Re adding test failure logs to the bug report, I have wanted to add the feature for long, but we have other higher priority. I filed a separate bug 901018. We'd like to hear your feedback on the details, but let's continue the discussion there. Once details are settled down, we could make a small design and add it.
,
Nov 2
Another idea is: if a set of tests always fail together, we group them together and file one bug. A short doc on this proposal: https://docs.google.com/document/d/17yIXFBl5wEcxdiekWjp7CKIwhuoXh3aJxLA7cDHmxHc/edit?usp=sharing
,
Nov 12
Thanks Charlie for raising this issue. I agree that we should avoid mass disabling a broad set of unrelated tests because of a systemic flake (and that they will sadly rarely be re-enabled when fixed). I've complained about this in the past. There currently isn't a way to even ask the CQ to not retry (to identify flakes in a potentially dangerous change to a core system). At the moment the CL introducing flakiness makes it to the waterfall thanks to retries and then starts making tests fail. At least today : FindIt eventually spots it (assuming it's Chromium based) but I'm never confident it didn't leave a scattered set of tests disabled by overzealous sheriffs who took action upon "the test being flaky" (we've seen this in the past).
,
Nov 12
When unsure about the root cause, I think it's better to group more bugs together than not. It's much easier for sheriffs and devs to separate a single bug into many (and the FindIt UI can help there) than it it is to coalesce a hundred different bugs.
,
Dec 10
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/157fb0feab79011860102bb0c69ed3c8f02aab8d commit 157fb0feab79011860102bb0c69ed3c8f02aab8d Author: Chan <chanli@chromium.org> Date: Mon Dec 10 17:49:22 2018 [Findit] Add a new issue_generator FlakeDetectionGroupIssueGenerator. Bug: 900409 Change-Id: I88b516d6b95d295ac8304a653226a83aefc6b208 Reviewed-on: https://chromium-review.googlesource.com/c/1344854 Commit-Queue: Chan Li <chanli@chromium.org> Reviewed-by: Shuotao Gao <stgao@chromium.org> Reviewed-by: Yuke Liao <liaoyuke@chromium.org> Reviewed-by: Jeffrey Li <lijeffrey@chromium.org> Cr-Commit-Position: refs/heads/master@{#19447} [modify] https://crrev.com/157fb0feab79011860102bb0c69ed3c8f02aab8d/appengine/findit/services/issue_constants.py [modify] https://crrev.com/157fb0feab79011860102bb0c69ed3c8f02aab8d/appengine/findit/services/test/issue_generator_test.py [modify] https://crrev.com/157fb0feab79011860102bb0c69ed3c8f02aab8d/appengine/findit/services/test/flake_issue_util_test.py [modify] https://crrev.com/157fb0feab79011860102bb0c69ed3c8f02aab8d/appengine/findit/services/issue_generator.py
,
Dec 10
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/08e387cab43a0bb6c475b8a3917b5714517322f1 commit 08e387cab43a0bb6c475b8a3917b5714517322f1 Author: Chan <chanli@chromium.org> Date: Mon Dec 10 18:05:22 2018 [Findit] Group detected flakes before report to monorail. Group flakes by: 1. FlakeIssue if some flakes link to the same open issue 2. Heuristic if flakes don't link to issues or link to closed issues. Heuristic rules are: a. same luci_project, normalized_step_name and test_suite_name b. occurred in the same builds Bug: 900409 Change-Id: If3a23be2920ec722f08585a09600e893986d8733 Reviewed-on: https://chromium-review.googlesource.com/c/1351403 Commit-Queue: Chan Li <chanli@chromium.org> Reviewed-by: Yuke Liao <liaoyuke@chromium.org> Reviewed-by: Jeffrey Li <lijeffrey@chromium.org> Cr-Commit-Position: refs/heads/master@{#19448} [modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/model/flake/test/flake_test.py [modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/services/flake_issue_util.py [modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/model/flake/flake.py [modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/services/test/flake_issue_util_test.py
,
Dec 13
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/e1211d050dc8a631a4909d6702a9366339d0666b commit e1211d050dc8a631a4909d6702a9366339d0666b Author: Chan <chanli@chromium.org> Date: Thu Dec 13 00:26:05 2018 [Findit] Flake Detection: Report detected flakes to monorail in group. After having flake groups, create/update monorail bugs for each group. Bug: 900409 Change-Id: I845e6b099085e86e749db40089c529acb5a3bf87 Reviewed-on: https://chromium-review.googlesource.com/c/1350933 Commit-Queue: Chan Li <chanli@chromium.org> Reviewed-by: Yuke Liao <liaoyuke@chromium.org> Reviewed-by: Shuotao Gao <stgao@chromium.org> Reviewed-by: Jeffrey Li <lijeffrey@chromium.org> Cr-Commit-Position: refs/heads/master@{#19536} [modify] https://crrev.com/e1211d050dc8a631a4909d6702a9366339d0666b/appengine/findit/handlers/flake/detection/test/detect_flakes_test.py [modify] https://crrev.com/e1211d050dc8a631a4909d6702a9366339d0666b/appengine/findit/services/flake_issue_util.py [modify] https://crrev.com/e1211d050dc8a631a4909d6702a9366339d0666b/appengine/findit/services/monorail_util.py [modify] https://crrev.com/e1211d050dc8a631a4909d6702a9366339d0666b/appengine/findit/services/test/flake_issue_util_test.py [modify] https://crrev.com/e1211d050dc8a631a4909d6702a9366339d0666b/appengine/findit/handlers/flake/detection/detect_flakes.py |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by liaoyuke@chromium.org
, Oct 30