New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 669549 link

Starred by 2 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 667794
issue 667836


Show other hotlists

Hotlists containing this issue:
speed-bisect


Sign in to add a comment

Bisect - Try to bail out and report that a test was too noisy

Project Member Reported by simonhatch@chromium.org, Nov 29 2016

Issue description

In some bisects the values are incredibly noisy, bi-modal, etc. and we still try to continue ahead and bisect anyway.

Here's examples:

https://bugs.chromium.org/p/chromium/issues/detail?id=669184#c7
https://bugs.chromium.org/p/chromium/issues/detail?id=667836



Copy/paste my comment from 667836:

It feels like it should have bailed at some point and said that it couldn't reproduce the regression, but the initial run actually did happen to produce a clear regression:

Here are the values from Gathering Reference Values:

{
  "result": {
    "U": 2, 
    "p": 0.001947527585946629, 
    "significance": "REJECT"
  }, 
  "sampleA": [
    7674880, 
    7994368, 
    7932928, 
    8346624, 
    7887872, 
    7658496, 
    8158208, 
    8084480
  ], 
  "sampleB": [
    4197376, 
    7789568, 
    3419136, 
    3685376, 
    3226624, 
    3738624, 
    3398656, 
    3195904
  ]
}

Other commits weren't as clear though, and the data almost seems bi-modal, here are the values from 432242:

  "sampleA": [
    3215360, 
    8011776, <----
    8192000, <----
    7970816, <----
    8355840, <----
    7929856, <----
    3375104, 
    8110080, <----
    6689792, 
    8184832, <----
    3216384, 
    3314688, 
    3286016, 
    3175424, 
    3216384, 
    3294208, 
    7770112, <----
    7655424, <----
    3473408, 
    3309568, 
    3387392, 
    3469312, 
    3497984, 
    3190784, 
    3563520, 
    8122368, <----
    3338240
  ], 


One interesting thing to note is that by the time the bisect had finished, it had expanded the number of tests for the "good" revision:

  "sampleA": [
    7674880, 
    7994368, 
    7932928, 
    8346624, 
    7887872, 
    7658496, 
    8158208, 
    8084480,
    3453952, 
    3245056, 
    3294208, 
    3249152, 
    3166208, 
    3297280, 
    3268608, 
    8044544, 
    3338240, 
    3174400
  ], 


If you re-test those via compare_samples, you don't have a clear regression anymore:

./tracing/bin/compare_samples ~/tmp/fake_metric_compare_samples1.json ~/tmp/fake_metric_compare_samples2.json Fake/Score --chartjson
{"sampleA":[7674880,7994368,7932928,8346624,7887872,7658496,8158208,8084480,3453952,3245056,3294208,3249152,3166208,3297280,3268608,8044544,3338240,3174400],"sampleB":[4197376,7789568,3419136,3685376,3226624,3738624,3398656,3195904],"result":{"U":58,"p":0.4532547047537364,"significance":"NEED_MORE_DATA"}}


I wonder if we could do something like re-compare previous runs as more samples are added, if we end up with a different answer maybe bail saying the test is too noisy?

 
Blocking: 667836
Blocking: 667794
Components: Speed>Bisection

Sign in to add a comment