New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 592815 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Improve failure messages for "provision Failure" auto-generated failures

Project Member Reported by steve...@chromium.org, Mar 8 2016

Issue description

Searching for "provision Failure" results in a very large number of autifiled issues:

https://bugs.chromium.org/p/chromium/issues/list?can=2&q=%22provision%20Failure%20%22&sort=-status&colspec=ID%20Pri%20M%20Status%20Owner%20Summary%20Modified

We need to track down the root cause(s) of these failures and:
a) Change the description so that if there is more than one cause that gets exposed in the summary.
b) File issues and fix the top root causes.

 
Owner: shuqianz@chromium.org
This is more or less exactly what Charlene is working on.
Blocking: 591628
Cc: afakhry@chromium.org
Labels: -Build-PFQ-Code-Yellow OS-Chrome
Linking to issue 591628 and removing the top-level Code-Yellow label.

Cc: xixuan@chromium.org shuqianz@chromium.org
 Issue 323385  has been merged into this issue.
Blocking: 589367
Cc: jen...@chromium.org
Another provision failure today on the trick-pfq builder due to ssh timeout that is not clear whether it's legit or just an infra flake: https://bugs.chromium.org/p/chromium/issues/detail?id=589367#c31
I don't quite understand what is this bug for. The reason of the provision failure has already been listed in the bug. The root cause is not always easy to be detected. The bug has already listed all the useful links for the developers to find the root cause. 

For example, a ssh timeout error could be a network flake, or something wrong from the DUT side. I don't think it is doable to expose the *root cause*, the purpose of the bug is to help developers to find the root cause.
Historically this symptom has been a major pain point for Gardeners which is why it was 
identified as something we need to improve and or triage better.

5 weeks ago we had >100 pfq failures with "provision Failure" in the Summary:

https://bugs.chromium.org/p/chromium/issues/list?can=2&q=pfq+%22provision+Failure%22+opened%3Etoday-35+opened%3C%3Dtoday-28&sort=-modified+pri&colspec=ID+Pri+M+Status+Owner+Summary+OS+Modified+autofiled&x=m&y=releaseblock&cells=ids

At that failure rate, "the root cause is not easily detected" is a huge problem.

Last week we only had 8 pfq failures with that symptom, so I think we may have addressed much of this issue with other fixes.

If that trend continues (i.e the occurrence of the symptom remains low), we can lower the priority on this or resolve it WontFix.

Labels: Build-PFQ-Failures
Owner: xixuan@chromium.org
Re-assign to xixuan@, since she is working on redesign the devserver workflow, which will change the provision process. Xixuan, can you try to improve the provisioning error message when you implement your design for the devserver?
yep, in the plan and try my best~
Blocking: 615436
Blocking: -589367
Blocking: -615436 -591628
Status: WontFix (was: Assigned)
new provision framework is deployed, and now it's more convenient to diagnose provision failure. So close this bug for now.

Sign in to add a comment