New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 753950 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Add VM test coverage for more kernels

Reported by jrbarnette@chromium.org, Aug 9 2017

Issue description

Currently, we only perform VM testing for betty, which tests only a
single (recent) kernel version.

Recently, a CL broke the CQ for an extended time because it used a
kernel feature present in v4.6, but not in various earlier versions
such as 3.8 or 3.10 (see  bug 753838 ).

We should create configs that will allow VM testing against older
kernel versions, and then get those configs into the pre-CQ, so that
problems like this will be caught earlier, and more reliably.

 
Cc: norvez@chromium.org

Comment 2 by nxia@chromium.org, Aug 9 2017

what would be the candidate build configs for the vmtests?
> what would be the candidate build configs for the vmtests?

Candidate configs don't exist.  This bug is a request to create
them.

Cc: bhthompson@chromium.org
FWIW there was a conscious decision a few months ago to only have 1 target for VMTests (see discussion in Issue 710629) because the tradeoff between the cost of supporting more VM-specific codepaths for various kernels and the benefit of the extra test coverage didn't look good.
> [ ... ] and the benefit of the extra test coverage didn't look good.

To be clear, the lack of the extra test coverage contributed directly to
an 18 hour CQ outage.  Reading comments on the problem CL, it sounds like
this isn't even the first time that this particular kernel difference has
caused trouble.  That feels like enough additional evidence to justify
revisiting the cost/benefit analysis.

Owner: snanda@chromium.org

Comment 7 by snanda@chromium.org, Aug 15 2017

Cc: dgarr...@chromium.org jrbarnette@chromium.org nxia@chromium.org ihf@chromium.org davidjames@chromium.org marc...@chromium.org dgreid@chromium.org
Adding folks from issue 710629 here so we have common audience.

Aviv, as discussed can we dig up some data that helps justify adding VM test coverage for additional kernels?

The implementation would likely entail adding per-kernel betty-like board overlays instead of going back to per hardware board VM tests as was the case in the past.
We have occasional outages that better PreCQ kernel coverage would catch. My gut feeling is that they are rare, but expensive to recover from (2-3 days of CQ outage), where rare means every 2-3 months. There have also been some disruptions where kernel bugs were flaky, but might have passed PreCQ anyway.

What frequency of outage would be enough to justify the work?

If the answer is "one a year" don't bother investigating, just do the work. If the answer is "one a week" just skip the work.

I estimate about 1 major outage per month that would be caught by broader vmtest. Looking at recent summary emails for bad CLs that I believe are vmtest-catchable...

august: https://chromium-review.googlesource.com/c/602882 cryptohome change that breaks on some kernels. 8+ cq failures. See  Issue 753838 . 

july: https://chromium-review.googlesource.com/c/565632 bad minijail uprev, broke hwtest. seems vm-catchable. 4 cq failures, 80 false rejections.

june: https://chromium-review.googlesource.com/c/421984 bad kernel 3.18 CL breaks boot on several boards . see  Issue 731253 . (not certain if vm-catchable though)

june: https://chromium-review.googlesource.com/c/514822 openssh uprev breaks sshd on DUTs, causing hwtest to break.




Comment 10 by ihf@chromium.org, Aug 15 2017

For all that is worth the last/june oppenssh issue was an intermittent failure and caught by betty once.
and https://chromium-review.googlesource.com/c/421984 is not catchable by VMs
Cc: -davidjames@chromium.org
Status: Archived (was: Untriaged)

Sign in to add a comment