Pinpoint has a `comparison_mode` API argument that adjusts the sensitivity of the bisection, to trade off between regression size and hardware usage. E.g. to bisect a regression that goes from 0% -> 100% failing requires only 5 repeats, but to bisect a smaller regression that goes from 0% -> 20% failing requires 69 repeats.
Currently, the default for functional bisects is set to 1.0, which can detect solid failures fine, but doesn't work for flaky test failures. We'd like the user to be able to adjust it between 0.1 and 1.0.
Comment 1 by benhenry@google.com
, Jan 11