New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 900409 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----

Blocked on:
issue 900148



Sign in to add a comment

Findit filed too many bugs for different tests of the same root cause

Project Member Reported by creis@chromium.org, Oct 30

Issue description

Blockedon: 900148
Hi creis@,

Thanks for filing the bug, I think this is due to  crbug.com/900148 : 'Milo reports spurious step results for completed builds'. I think FindIt is still WAI as it doesn't reply on the UI (milo).
Thanks.  What does that mean for  issue 900392 , though?  Is that test actually failing somewhere, or was it a false positive?  If it's failing, is there a way to get to the log where it happened?
Thanks!  That gave a lead.
 Issue 900507  has been merged into this issue.
Cc: jam@chromium.org gab@chromium.org
I'm very concerned about FindIt filing individual bugs for huge numbers of tests with the same underlying failure type.  

In this case, there were at least 174 tests that failed on a global_browser_set_up_function_ assertion (see https://bugs.chromium.org/p/chromium/issues/detail?id=900392#c8).  We have now seen at least 6 bugs get duped into either this or  issue 900392 , all demanding that the individual tests get disabled.

That seems like very much the wrong thing to do in a case like this.  It is counter-productive to have sheriffs and engineers disable a wide swath of tests and reduce coverage for regressions when an underlying issue (either infra problems or a test framework problem like issue 711256) is to blame.

Can FindIt be improved to notice the common failures across tests in cases like this to suggest a different course of action?

(CC'ing gab@ and jam@ from issue 711256 for FYI, but I'm not sure who to CC or assign to on the FindIt side.)
Cc: dpranke@chromium.org
dpranke@: Do you know who might be able to look at the FindIt overly-broad-disabling problem in comment 6?
Owner: chanli@chromium.org
Status: Started (was: Unconfirmed)
Hi creis@, thank you for raising this concern. I'll start to look into this issue now.
Summary: Findit filed too many bugs for different tests of the same root cause (was: [Findit] Flake Detection - Wrong result for SitePerProcessInteractiveBrowserTest.DocumentHasFocus)
Many thanks creis@ for the feedback!

lijeffrey@ is working on auto-deduplicate flake bugs, and to file one bug per culprit/root cause that could have caused many flaky tests.
This is a trade-off between filing bugs first for manual disabling to reduce the impacts on other developers and waiting for all culprit-finding analyses to complete before filing bugs.
We'd like to hear your feedback too. The one-pager doc is https://docs.google.com/document/d/15vt2RcW6kRr8VS8No3YwXJhcZsjO4UfV88l7I3Qkphs/edit?ts=5bd8a28e

As of the current policy for flaky tests, they should be disabled asap, and the culprit should be reverted instead if one is identified.
We could also discuss how to adjust this policy to make it more reasonable to handle more scenarios.
Labels: -Test-Findit-Wrong
Description: Show this description
Fantastic!  I'm glad there's some work in progress to spot common culprits rather than mass-disabling tests, and I'm looking forward to that rolling out.

I think it's worth adjusting the current policy a bit, if there's a way to officially exempt a test from being disabled when the cause is known to be unrelated to the test (and may affect multiple tests).  More specifically, the current text (e.g., in  issue 900392 ) is:
"Unless the culprit CL is found and reverted, please disable this test first
within 30 minutes then find an appropriate owner."

I think we should expand that to cover the case that a known issue is affecting unrelated tests.  Otherwise we'll end up with a lot of disabled tests that will (most likely) never get re-enabled, while ignoring the root cause.  See  issue 826599  for another case where we ran into exactly this issue.

One other thing that may help on this front: is it possible to include the text of the test failure in the bug report (e.g., like what I posted in https://bugs.chromium.org/p/chromium/issues/detail?id=900392#c6)?  Right now it's hard to search for bugs with the same root cause because most people don't put the test output in the bug itself.  See all the bugs duped to  issue 900392  as an example: it's not obvious why they're dupes unless you click deep into the reports.  (In fact, you can't even do that on  issue 900392  because of  issue 900148 .)

Thanks for your work on FindIt!  It's definitely a useful tool overall, and hopefully cases like this can get smoothed out soon.
Cc: jparent@chromium.org
Many thanks again for all the feedback! They are very helpful!

Re grouping flaky tests together: we will try to make more progress here, but this is a difficult problem with edge cases.
* Difficult to make a good trade-off between timely bug filing (+kbr@ who wants to file bug first) and time-consuming analysis (for flakes, the regression range is unknown and is usually 100s of commits
* Some flakes are not reproducible, and analyses can't identify the root cause -- culprit
* Long-standing flakes: the flaky tests exist for a long time, and the culprit can't be identified (currently we look back 5000 commits in flake analysis, we could extend the range here, but there is no guarantee)

One quick and most likely reliable solution here could be: file bug by the suite name instead of full test names.
Example: for https://screenshot.googleplex.com/BvVGYnqkA4T.png, we file a bug for __main__.InstallerTest.* instead of for individual tests.
WDYT?


Re adding test failure logs to the bug report, I have wanted to add the feature for long, but we have other higher priority.
I filed a separate bug 901018. We'd like to hear your feedback on the details, but let's continue the discussion there.
Once details are settled down, we could make a small design and add it.
Another idea is: if a set of tests always fail together, we group them together and file one bug. A short doc on this proposal: https://docs.google.com/document/d/17yIXFBl5wEcxdiekWjp7CKIwhuoXh3aJxLA7cDHmxHc/edit?usp=sharing


Thanks Charlie for raising this issue. I agree that we should avoid mass disabling a broad set of unrelated tests because of a systemic flake (and that they will sadly rarely be re-enabled when fixed).

I've complained about this in the past. There currently isn't a way to even ask the CQ to not retry (to identify flakes in a potentially dangerous change to a core system). At the moment the CL introducing flakiness makes it to the waterfall thanks to retries and then starts making tests fail.

At least today : FindIt eventually spots it (assuming it's Chromium based) but I'm never confident it didn't leave a scattered set of tests disabled by overzealous sheriffs who took action upon "the test being flaky" (we've seen this in the past).
When unsure about the root cause, I think it's better to group more bugs together than not. It's much easier for sheriffs and devs to separate a single bug into many (and the FindIt UI can help there) than it it is to coalesce a hundred different bugs.
Project Member

Comment 18 by bugdroid1@chromium.org, Dec 10

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/08e387cab43a0bb6c475b8a3917b5714517322f1

commit 08e387cab43a0bb6c475b8a3917b5714517322f1
Author: Chan <chanli@chromium.org>
Date: Mon Dec 10 18:05:22 2018

[Findit] Group detected flakes before report to monorail.

Group flakes by:
1. FlakeIssue if some flakes link to the same open issue
2. Heuristic if flakes don't link to issues or link to closed issues. Heuristic rules are:
  a. same luci_project, normalized_step_name and test_suite_name
  b. occurred in the same builds

Bug: 900409
Change-Id: If3a23be2920ec722f08585a09600e893986d8733
Reviewed-on: https://chromium-review.googlesource.com/c/1351403
Commit-Queue: Chan Li <chanli@chromium.org>
Reviewed-by: Yuke Liao <liaoyuke@chromium.org>
Reviewed-by: Jeffrey Li <lijeffrey@chromium.org>
Cr-Commit-Position: refs/heads/master@{#19448}
[modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/model/flake/test/flake_test.py
[modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/services/flake_issue_util.py
[modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/model/flake/flake.py
[modify] https://crrev.com/08e387cab43a0bb6c475b8a3917b5714517322f1/appengine/findit/services/test/flake_issue_util_test.py

Sign in to add a comment