New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 874228 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 22
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----



Sign in to add a comment

[Findit] Flake Analyzer - 70% confidence set for 100% -> 99.8% passing

Project Member Reported by tandrii@chromium.org, Aug 14

Issue description

I suspect it's because i've erased a bunch of commits, which might have made it to database anyway.
Owner: lijeffrey@chromium.org
Status: Assigned (was: Unconfirmed)
Summary: [Findit] Flake Analyzer - 70% confidence set for 100% -> 99.8% passing (was: [Findit] Flake Analyzer - Wrong result for http/tests/media/media-source/mediasource-duration.html)
This should no way have been 70% confidence (Findit just detects 100% stable to anything flaky) and labels it 70% confidence, which worked OK when the flakiness threshold was 98%. However we now use 99.9999% as the flakiness threshold, so confidence score now needs to be redesigned slightly, possibly with some statistical analysis to avoid false positives like these and bugs getting logged
Can we make a quick change to get rid of the hard-coded 70% for now?
Labels: -Pri-0 Pri-1
Auto actions (bug filing, updating bugs, notifying culprits) temporarily disabled until better confidence scoring mechanism is in place.

Setting pri back to 1
Project Member

Comment 7 by bugdroid1@chromium.org, Aug 22

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/2d86ca27796637f369b47a6fc957e280093681fb

commit 2d86ca27796637f369b47a6fc957e280093681fb
Author: Jeffrey Li <lijeffrey@chromium.org>
Date: Wed Aug 22 18:20:22 2018

[Findit] Flake Analyzer - Implementing statistical analysis for confidence score

Low-flakiness cases can cause a lot of false positives, because statistically
the "stable" point preceding the low-flaky point can be a fluke. For example,
a test with a 0.9975 pass rate still has a 36.7% chance of passing 400 iterations
which will yield a lot of false positives.

1. Use the Wilson Score Confidence Interval to identify a range of likely pass rates
   that a supposedly flaky test can have.
2. Identify the possible ranges of the "stable" point and "flaky" point, using the
   "flaky" point's pass rate as the input p value, alpha as 0.001 for 99.9%
   confidence that the true pass rate is indeed within that interval, and the number
   of iterations the stable point ran to produce its supposed 100% pass rate.
3. If there is any overlap in the 2 ranges, then there is a statistically significant
   chance that the culprit is a false positive as the stable point is unreliable.
4. Assign a very low "confidence score" of the analysis for such cases, so the calling
   code can bail out of performing auto actions. Note here, "confidence score" still
   refers to Findit's scoring mechanism on what to do with the culprit, and is not yet
   the same as "confidence" in pure statistics, though that is where we would like to
   head.

Bug:  874228 
Change-Id: I6d2e8b6ee864a68353c9d449adfd86e7e5dd2ac4
Reviewed-on: https://chromium-review.googlesource.com/1182191
Commit-Queue: Jeffrey Li <lijeffrey@chromium.org>
Reviewed-by: David Tu <dtu@chromium.org>

[add] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/dto/float_range.py
[modify] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/flake_failure/confidence_score_util.py
[modify] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/flake_failure/confidence.py
[modify] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/flake_failure/test/pass_rate_util_test.py
[add] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/math_util.py
[add] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/test/math_util_test.py
[add] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/libs/math/test/statistics_test.py
[modify] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/flake_failure/test/confidence_score_util_test.py
[add] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/libs/math/statistics.py
[modify] https://crrev.com/2d86ca27796637f369b47a6fc957e280093681fb/appengine/findit/services/flake_failure/flake_constants.py

Status: Fixed (was: Assigned)

Sign in to add a comment