Bisect - Doesn't do well on regressions where values are mostly the same |
||||||
Issue descriptionFrom crbug.com/681662 Can see all the values are nearly identical, save one massive outlier (which is the regression). { "result": { "U": 210, "p": 0.34090381592813035, "significance": "FAIL_TO_REJECT" }, "sampleA": [ 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 21259320, 2294840 ], "sampleB": [ 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840, 2294840 ] }
,
Jan 23 2017
I think we can also become confident about the result if we run lots of iterations. Letting the user continue to rerun the test would help.
,
Jan 23 2017
Yeah true, kinda think these kinda outlier style regressions really won't be handled well until we're using Pinpoint and we can guide or change the test.
,
Feb 2 2017
,
Feb 3 2017
,
Feb 24 2017
,
Feb 24 2017
Two ideas to fix this: 1. What about a bisect that uses the standard deviation of the results rather than the mean of the results? 2. What if we let the user of the bisect tool decide how confident the bisect needs to be? I would normally want my bisect to continue on even when it is not very confident, and then I would want to see the data myself at the end. The default value could be the same, but for users who know what they are doing, being able to adjust a p value would fix this problem.
,
Mar 15 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dtu@chromium.org
, Jan 23 2017