New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 658358 link

Starred by 0 users

Issue metadata

Status: Archived
Owner: ----
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug

Blocking:
issue 649391



Sign in to add a comment

Rerun steps without patch upon INFRA FAILURE

Project Member Reported by katthomas@chromium.org, Oct 21 2016

Issue description

Currently, we have many builds failing that are incorrectly marked as INFRA_FAILURE. One reason for this is that we are not rerunning steps that are marked as INFRA_FAILURE (due to an execution timeout, for example) with a patch, without that patch.

1. w/ patch: infra-failure w/o patch: infra-failure --> infra failure
2. w/ patch: infra-failure w/o patch: pass --> patch/flakey test failure
3. w/ patch: pass w/o patch: infra-failure --> (flakey) infra failure

The problem with this is that occasionally flakey infrastructure failures may be marked as patch failures. Anecdotally, it is more likely to be a patch failure or a flakey test. Regardless, it's worth kicking off to a human to inspect and decide what to do next. The question is, which human?
 
Cc: jbudorick@chromium.org stip@chromium.org
 Issue 653304  has been merged into this issue.
Cc: phajdan.jr@chromium.org
Thanks for looking into this, but I am not sure I fully understand the issue:

1) Do we retry infra failures or not? According to the second sentence, it looks like we don't, while cases 1 and 2 suggest that we do.

2) I am not quite sure I understand why would infra failures be marked as patch failures. Do you mean patch application failures or simply test failures?

IMHO, infra failures should not be re-run without patch because patch rarely has something to do with infrastructure. OTH, if we are talking about timeouts of the test steps, then perhaps we can track this type of failures differently. It could indeed be a patch that causes some test to be extremely slow. AFAIK, Pawel has recently been proposing to add per-step timeouts... perhaps he can also offer a comment here.
Status: Archived (was: Untriaged)
Closing this because it's been too long without progress. Let's reopen if it comes up again. 

Sign in to add a comment