New issue
Advanced search Search tips

Issue 602304 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

UnitTest failures in pfq-informational are not causing the builder to fail

Project Member Reported by steve...@chromium.org, Apr 11 2016

Issue description

Starting here:
https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/339

We started to see this output in the unit tests:

WARNING: The following packages failed once or more,
but succeeded upon retry. This might indicate incorrect
dependencies.
  chromeos-base/vpn-manager-0.0.1-r1197
  chromeos-base/libchromeos-ui-0.0.1-r200
@@@STEP_WARNINGS@@@

However the failures were not individual test failures, the test itself is failing mysteriously:

vpn-manager-0.0.1-r1197:  * ERROR: chromeos-base/vpn-manager-0.0.1-r1197::chromiumos failed (test phase):
vpn-manager-0.0.1-r1197:  *   (no error message)

This started showing up in the PFQ with enough frequency to prevent any succesful runs in the last several days.

We need to get more information from this type of failure and cause the informational builder to fail so that we can catch these earlier.

 
I was able to reproduce this locally. I am going to look for recent chromite changes and see if I can bisect this.

From what I can tell, chromite is working as intended -- it retries ebuilds and unit tests up to 1 time per ebuild.

Is the bug that you don't want these retires, or is it to fund the root cause of the unit test flake?
Cc: jdufault@chromium.org
+jdufault, apparantly this is getting triggered due to this change:
https://chromium-review.googlesource.com/#/c/335250/

akeshet@ - could we skip the rety if the failure was not due to a particular test failing? i.e. in this case all tests passed, but the test itself failed because it did not clean itself up properly (or some other reason) and we detected that. When that happens, I think we would like the builder to fail.

Comment 6 by vapier@chromium.org, Apr 11 2016

failure to clean up properly is a test problem and should be flagged as such
So, to be clear there are two separate issues here:
a) The bug causing the unit tests to fail, reverted for now (comment #5).
b) The fact that this failure did not cause the informational builder to fail, making it harder to identify.

(a) is what revealed (b), but this issue should track (b).

I'm not actually sure that we should repeate unit tests at all - those really really shouldn't be flakey. However, if we do, we should only do so if an individual test fails.

Also, since this is failing semi-consistently on the PFQ, are we not repeating the unit tests on the PFQ? Whatever we do it should be consistent.


Looks like we are retrying unit tests on the pfq https://uberchromegw.corp.google.com/i/chromeos/builders/tricky-chrome-pfq/builds/1743/steps/UnitTest/logs/stdio

Perhaps the pfq builder environment is such that the flake is more likely.
It doesn't actually appear flakey exactly on the continuous builder - it
fails consistently the first time, but not the second time. Not clear why
it is sometimes failing the second time on the PFQ.
CPU load on the builder is probably a factor.
Project Member

Comment 11 by bugdroid1@chromium.org, Apr 11 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/98e1516a40e126465e2cd7a10d65fdfb1610de70

commit 98e1516a40e126465e2cd7a10d65fdfb1610de70
Author: Jacob Dufault <jdufault@chromium.org>
Date: Mon Apr 11 18:15:09 2016

Revert "common-mk: Kill any auxiliary child processes after the child terminates."

The commit introduced some flaky failures, reverting to remove the
flakiness.

This reverts commit fd24e9b9796336cb7506e17e8719b9395ec308bc.

BUG= chromium:602304 

Change-Id: I03c24f86a8fb2fef804e2ae2c7aab11e61ec7f64
Reviewed-on: https://chromium-review.googlesource.com/338151
Commit-Ready: Jacob Dufault <jdufault@chromium.org>
Tested-by: Jacob Dufault <jdufault@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/98e1516a40e126465e2cd7a10d65fdfb1610de70/common-mk/platform2_test.py

Project Member

Comment 12 by bugdroid1@chromium.org, Apr 11 2016

Labels: merge-merged-release-R51-8172.B
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/a6fd3dab187237d0bb509abd0e28a95effd00466

commit a6fd3dab187237d0bb509abd0e28a95effd00466
Author: Jacob Dufault <jdufault@chromium.org>
Date: Mon Apr 11 18:15:09 2016

Revert "common-mk: Kill any auxiliary child processes after the child terminates."

The commit introduced some flaky failures, reverting to remove the
flakiness.

This reverts commit fd24e9b9796336cb7506e17e8719b9395ec308bc.

BUG= chromium:602304 

Change-Id: I03c24f86a8fb2fef804e2ae2c7aab11e61ec7f64
Reviewed-on: https://chromium-review.googlesource.com/338151
Commit-Ready: Jacob Dufault <jdufault@chromium.org>
Tested-by: Jacob Dufault <jdufault@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>
(cherry picked from commit 98e1516a40e126465e2cd7a10d65fdfb1610de70)
Reviewed-on: https://chromium-review.googlesource.com/338172
Reviewed-by: Ilja Friedel <ihf@chromium.org>
Tested-by: Ilja Friedel <ihf@chromium.org>

[modify] https://crrev.com/a6fd3dab187237d0bb509abd0e28a95effd00466/common-mk/platform2_test.py

Labels: Build-PFQ-Failures
Status: Archived (was: Assigned)

Sign in to add a comment