New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 840074 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

False positive for flaky test CancelResumedDownload

Project Member Reported by st...@chromium.org, May 5 2018

Issue description

Comment 1 by st...@chromium.org, May 13 2018

Labels: Test-Findit-Wrong

Comment 2 by st...@chromium.org, May 13 2018

Labels: -Findit-Incorrect-Result
Project Member

Comment 3 by bugdroid1@chromium.org, May 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3

commit b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3
Author: Jeffrey Li <lijeffrey@chromium.org>
Date: Thu May 24 22:48:24 2018

[Findit] Flake Analyzer - Avoid false positives with better confidence score

1. Raise the bar for stable --> flaky points requiring 2+ fully (100% or 0%
   instead of 98%) stable points before considering it for 70% confidence
   to avoid filing bugs/notifying culprits that are unlikely correct. 2 stable
   in a row is still strict enough to avoid most false positives without
   bailing out unnecessarily.
2. If the data point is proposed to have 70% confidence, use max(.7, steppiness)
   so not all culprits are either 100% vs 70% which looks silly.
3. Relax requirement of 3+ fully-stable --> flaky to just 2 for notifying
   culprits, since bugs are still filed regardless. Filing bugs and sending
   notifications should follow similar criteria, since bugs filed usually are
   assigned back to the CL owner anyway for an initial investigation.
4. Fallback to steppiness in all other cases.

With this change, because of the requirement for 2+ fully-stable points before
a culprit will be assigned a 0.7+ confidence score, a few more false negatives
may be observed, but the trade off is a large reduction in false positives.
Historically, many cases observe pass rate patterns of 99% -> 100% -> 75%, which
were incorrect, however 100% -> 100% -> 75% were much more reliable. In the former
case, steppiness would be the primary scoring mechanism, which would assign a lower
confidence score, and in the latter, 0.7+ would be used.

Bug: 840413, 840074

Change-Id: I57fb0e40fb60d39b2e5b44c018c7b030f9c080fc
Reviewed-on: https://chromium-review.googlesource.com/1069832
Commit-Queue: Jeffrey Li <lijeffrey@chromium.org>
Reviewed-by: Shuotao Gao <stgao@chromium.org>

[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/culprit_util.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/pass_rate_util.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/confidence_score_util.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/data_point_util.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/test/culprit_util_test.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/test/confidence_score_util_test.py
[modify] https://crrev.com/b4c0dcb9d49688898db4c86b9cc3f59f2e7babb3/appengine/findit/services/flake_failure/flake_constants.py

Sign in to add a comment