New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 805770 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug
okr

Blocked on:
issue 730729



Sign in to add a comment

sanity HWTest on release builders is unreliable: causes test team / TPM manual work

Project Member Reported by dhadd...@chromium.org, Jan 25 2018

Issue description

Builders regularly fail the sanity phase. 
When this happens, test suites are not run: bvt-cq, bvt-inline, or the out of band suites 

However, the test team has learned to ignore this failing sanity phase signal. We just kick off the test suites ourselves and release the build anyway. 

On the most recent stable build that we released we had two boards fail sanity:
https://luci-milo.appspot.com/buildbot/chromeos_release/daisy_skate-release%20release-R63-10032.B/67
https://luci-milo.appspot.com/buildbot/chromeos_release/kefka-release%20release-R63-10032.B/67

ON the most recent beta build that we released we had 15+ boards fail sanity:
https://cros-goldeneye.corp.google.com/chromeos/console/monitorRelease?releaseName=M64-BETA-CHROMEOS-7

As is, the failing sanity phase just gives TPMs/Test team more work to do by kicking off the suites ourselves.

Can we kick off suites anyway if the sanity fails or is there a better signal we can use that tells us "This build will not even boot if you tried it so there is not point in kicking off the test suites."

 

Comment 1 by jkop@chromium.org, Jan 25 2018

Owner: jkop@chromium.org
Status: Assigned (was: Untriaged)
Assigning to myself for investigation/discussion of options and what purposes it's meant to serve and isn't.
Blockedon: 730729
Summary: sanity HWTest on release builders is unreliable: causes test team / TPM manual work (was: Failing sanity stage on builder isnt a helpful signal and just causes test team manual work. )
I think the stated problem is important, but I don't think the stated solution is right.

Sanity test is supposed to do exactly that: Check if the build will even boot.
If sanity tests are failing too often, and hence getting ignored, the direction we want to go is to make them a real signal, and keep the signal healthy.

Towrads this:
- cros-infra should create signals from metrics around release sanity test failures, preferably per-channel.
- cros-infra should ensure that infra errors do not cause sanity failures too often.

My spot checking of the failures above showed that they were all timeouts.

Also, our failure mode in case of HWTest timeout is stupid: issue 730729
Labels: -M-66 OKR
I don't think that M-66 label is agreed upon. Putting into OKR bucket though.

Comment 4 by jkop@chromium.org, Jan 26 2018

Status: Available (was: Assigned)

Comment 5 by jkop@chromium.org, Jan 26 2018

Owner: ----
Labels: bvttriage
Cc: ka...@chromium.org
Labels: -OKR okr
+Kalin: this would help with your OOB problem too 

Comment 8 by ka...@chromium.org, Mar 23 2018

Thanks David, 
My problem is similar and little (or more) different b/c:
- I and few more TEs have dedicated high-touch pools of lab hosts, and not part of general lab pools
- I would want to run my suites on my pools not just at green builds, but when build is red, and bvt suites have ran

Can there be solution for separate and dedicated to OOB suites lab pools to be served from scheduler, even when build is red(and bvt suites have started)? 

Sign in to add a comment