Rerun steps without patch upon INFRA FAILURE |
|||
Issue descriptionCurrently, we have many builds failing that are incorrectly marked as INFRA_FAILURE. One reason for this is that we are not rerunning steps that are marked as INFRA_FAILURE (due to an execution timeout, for example) with a patch, without that patch. 1. w/ patch: infra-failure w/o patch: infra-failure --> infra failure 2. w/ patch: infra-failure w/o patch: pass --> patch/flakey test failure 3. w/ patch: pass w/o patch: infra-failure --> (flakey) infra failure The problem with this is that occasionally flakey infrastructure failures may be marked as patch failures. Anecdotally, it is more likely to be a patch failure or a flakey test. Regardless, it's worth kicking off to a human to inspect and decide what to do next. The question is, which human?
,
Oct 23 2016
Thanks for looking into this, but I am not sure I fully understand the issue: 1) Do we retry infra failures or not? According to the second sentence, it looks like we don't, while cases 1 and 2 suggest that we do. 2) I am not quite sure I understand why would infra failures be marked as patch failures. Do you mean patch application failures or simply test failures? IMHO, infra failures should not be re-run without patch because patch rarely has something to do with infrastructure. OTH, if we are talking about timeouts of the test steps, then perhaps we can track this type of failures differently. It could indeed be a patch that causes some test to be extremely slow. AFAIK, Pawel has recently been proposing to add per-step timeouts... perhaps he can also offer a comment here.
,
Jan 17 2017
Closing this because it's been too long without progress. Let's reopen if it comes up again. |
|||
►
Sign in to add a comment |
|||
Comment 1 by katthomas@google.com
, Oct 21 2016