New issue
Advanced search Search tips

Issue 779147 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 672348
Owner: ----
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 672348

Blocking:
issue 730067



Sign in to add a comment

edgar-paladin: Single provision job failure led to suite failure

Project Member Reported by pprabhu@chromium.org, Oct 27 2017

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/edgar-paladin/1224

One provision failed due to mystery (no logs from repair because we had to do USB install, so no way we can find out why):
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/151996677-chromeos-test/chromeos4-row12-rack8-host15/

stuff happens!

But we are supposed to be resillient to a single provision failure. This provision monopolized the test it was running for 40 minutes, and so the test was never retried, failing the suite.

That is an infra bug with high impact -- uncontrollable low percentage of provision failures are leading to build failures.
 
Labels: Chase-Pending
The suite timeline clearly shows the problem: https://viceroy.corp.google.com/chromeos/suite_details?job_id=151996566

Test logs show the problem:

We ran provision from 9:15 -- 9:35 (that's when the second attempt failed)
For the next 20+ minute we kept trying to SSH into the DUT and blocking for 'master ssh connection' 5 times for a total of 10 + 4 * 1 = 14 minutes.

The ask on this bug is to find the specific stupidities committed in this failure, and eliminate them.
Blocking: 730067
I've filed bugs in the past about inefficiencies in the DUT-fail-to-repair cycle.
But I'm restricting this to just the time spent after provision fails, because that directly translates to a single provision failure leading to a build failure, which has high impact.
Blockedon: 672348
Cc: ayatane@chromium.org
Based on the description on this, sounds like it will be addressed by up-front-provision-suite.
Mergedinto: 672348
Status: Duplicate (was: Untriaged)

Sign in to add a comment