It sounds like you're suggesting we want one set of logic for `with patch` (retry a test until it passes or retries N times) and another set for `without patch` (retry the test N times or until it fails).
That seems like a good idea, but we don't have a way of doing that in the test harnesses today, so we'd have to wire that through everywhere.
I'm also not positive what the different cases that we need to support are (which combinations of failure handling, test filtering, and retrying), so this probably needs some exhaustive thinking-through to make sure the behavior is sensible.
> so we'd have to wire that through everywhere.
Yes.
> we don't have a way of doing that in the test harnesses today
gtest has --gtest_repeat, which does what we want.
Other harnesses will need to be modified to support this feature on a case-by-case basis. My solution allows us to incrementally add this behavior to test harnesses as necessary.
> I'm also not positive what the different cases that we need to support are (which combinations of failure handling, test filtering, and retrying), so this probably needs some exhaustive thinking-through to make sure the behavior is sensible.
If/when we modify test harnesses other than gtest, we'll need to make sure that the new flag we add has the appropriate behavior when combined with filters+failures. This will need to be tackled on a case-by-case basis.
> gtest has --gtest_repeat, which does what we want.
I'm not sure what the logic for handling failures is in that case; you'd probably either want the step to return a 1 if any test failed (maybe it does this already) , or you'd have to parse the test results to figure out how to interpret them. Which is doable, obviously.
Most of the recipe layer parsing logic already exists -- it just needed some plumbing, see:
https://chromium-review.googlesource.com/c/chromium/tools/build/+/1265655
I think we're on the same page in terms of the expected behavior -- so if we want to discuss implementation details, let's do so on the CL?
Sure. I'm fine with all of the changes you're discussing here, I just wanted to point out some of the stuff you'd have to handle. But it sounds like you're on top of it :).
> Will the 'retry without patch' roll to the tot and re-compile like 'retry with patch'? I'm assuming not because I don't see the point.
'retry without patch' will deapply the patch and recompile [same as right now]. The only difference is that we will run tests a fixed number of times and look for any failures, rather than up to N times, looking for any success.
> 'retry without patch' will deapply the patch and recompile [same as right now]. The only difference is that we will run tests a fixed number of times and look for any failures, rather than up to N times, looking for any success.
I don't think I understand. The deapply patch and recompile are already done in the 'without patch' step, if 'retry without patch' happens right after 'without patch', can't we just reuse the artifact from 'without patch' step?
I see, makes sense now, thanks for explaining.
I think this is a good idea, but I'm a little bit worried that the 'inconsistency' between with patch and without patch might cause confusions and make the build results hard for developers to understand, so it might worth adding some documentations for the recent changes, such as a md doc.
erikchen@: I hope this new change won't break Findit's Flake Detector. Would you mind following up with liaoyuke@ and chanli@ to make sure Findit is adapted to this change before the CL lands?
I have 2 questions:
1. With this change, will we run more iterations in 'without patch' to catch flakes?
a. If yes, would that increase the time overhead?
b. If no, how much percent of flakes we can catch?
2. With the change on the 'without patch', do we still need the new 'retry with patch' step? I feel they'll serve the same purpose (deflake) so maybe we'll only need one of them.
> a. If yes, would that increase the time overhead?
Yes, but minimally. The cost of rerunning tests in a test_suite is O(seconds), and we are only rerunning failed tests.
> With the change on the 'without patch', do we still need the new 'retry with patch' step? I feel they'll serve the same purpose (deflake) so maybe we'll only need one of them.
'retry with patch' catches certain types of infra failures [e.g. ADB failure on android bot] and reduces their impact on flakiness.
retrying in 'without patch' reduces the impact of flaky tests on ToT.
They are currently both needed, as they reduce flakiness from different sources.
IIUC, this means after the change to 'without patch', 'retry with patch' will be most used to mitigate the impact of infra failures rather than the tests are flaky themselves?
If so, I feel at Findit side, we should not care occurrences in 'retry with patch' as much as we do currently or even ignore them (seems not the test's fault)? stgao@, liaoyuke@ wdyt?
I was insufficiently precise in my previous response. More details.
> IIUC, this means after the change to 'without patch', 'retry with patch' will be most used to mitigate the impact of infra failures rather than the tests are flaky themselves?
'retry with patch' currently exists to mitigate two types of flakiness:
* infra failures
* bugs in test runner retry logic.
This is independent from the proposed change to 'without patch', which is intended to detect if a test is flaky on tip of tree.
> If so, I feel at Findit side, we should not care occurrences in 'retry with patch' as much as we do currently or even ignore them (seems not the test's fault)?
That is a really good question. Right now, I'm still trying to fix bugs in test runner try logic, e.g. see: https://bugs.chromium.org/p/chromium/issues/detail?id=889036. Right now Find-It does not distinguish between a test that flakes because the test implementation is flaky, and a test that flakes because the test runner has bugs. Should it?
The conversation above is starting to diverge from the original intended purpose, which is unblocking this CL: https://chromium-review.googlesource.com/c/chromium/tools/build/+/1265655
liaoyuke, chanli: This CL adds logic to chromium tests to handle multiple retries on 'retry without patch'. Do you have any concerns with regards to the current functionality of Findit and this CL?
After an offline discussion between liaoyuke@ and me, we don't need to block your change.
Basically after your change is landed, we expect fewer flakes on our dashboard (because this change should prevent many false rejections) but our functionality should not be good.
Comment 1 by dpranke@chromium.org
, Oct 6