coral-release fails HWTests - DUTs look really unstable |
|||||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/coral-release/ The failure is not always the same (sometimes provisioning, sometimes instability in the tests) but are so frequent that it's probably a symptom of a generic instability. Various failures to provision (device not coming back up from reboot, not rebooting, ...): https://luci-milo.appspot.com/buildbot/chromeos/coral-release/779 Some provisioning errors, mixed with very flaky tests: https://luci-milo.appspot.com/buildbot/chromeos/coral-release/778 Assigning to vineeths@ who's agreed to help find an owner
,
Feb 26 2018
I've never found anything that wasn't infra related. Since coral runs 13 models, it's probably of getting hit by transient failures goes up by 13x.
,
Mar 1 2018
coral seems to fail provisioning at a pretty high rate (6-7%). Compounded with the fact that we run so many models this means that a lot of runs are going to be knocked out by this alone. I'll do some more digging here.
,
Mar 7 2018
Provision failure percentage is hovering aorund 5% recently: https://viceroy.corp.google.com/chromeos/provision?board=coral&breakdowns=build_type&build_type=&delta_window=60m&devserver=&duration=8d&dut=&groups=Provision&percentile=90&prior_alpha=0.1&prior_beta=1.9&refresh=-1&repository_behavior=DO_NOT_SKIP&success=&topstreams=5&type=Provision&use_precomputation=1 Contrast with generic failure rate often <2%: https://viceroy.corp.google.com/chromeos/provision?groups=Provision&breakdowns=build_type&board=&build_type=&success=&devserver=&dut=&topstreams=5&delta_window=60m&duration=8d&percentile=90&prior_alpha=0.1&prior_beta=1.9&refresh=-1&repository_behavior=DO_NOT_SKIP&type=Provision&use_precomputation=1 ->ejcaruso did you get anywhere with your investigation?
,
Mar 7 2018
Not yet, I've been sidetracked by last-minute modemfwd changes. I'll do more investigation tomorrow.
,
Mar 10 2018
,
Mar 30 2018
,
Apr 2 2018
Issue 826903 has been merged into this issue.
,
Apr 2 2018
Coral is still failing pretty widely. Is there a way forward that isn't blocked on the logging feature request at Issue 819882 ?
,
Apr 3 2018
Are there any logs of specific machines failing? I see mention of Cr50 failure on the other bug. Cr50 update will fail by rollback protection if requested to downrev (like if you test a build from master, then try to test an M65 build). I'd expect this to fail gracefully but maybe there's some reporting bug there. Coral is one of the first devices to see regular Cr50 updates so it's plausible that this could be a factor.
,
Apr 23 2018
cros-cts-te update, M66 Golden eye: https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?boards=coral&milestone=66&chromeOsVersion=&chromeVersion=&startTimeFrom=&startTimeTo=&token=ALeBcqHQClKOIJ0kZLo_KHX3A2iB%3A1524476144609#/ https://luci-milo.appspot.com/buildbot/chromeos_release/coral-release%20release-R66-10452.B/ build package issue: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos_release%2Fcoral-release_release-R66-10452.B%2F40%2F%2B%2Frecipes%2Fsteps%2FBuildPackages__afdo_use_%2F0%2Fstdout synchrome issue: https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos_release%2Fcoral-release_release-R66-10452.B%2F41%2F%2B%2Frecipes%2Fsteps%2FSyncChrome%2F0%2Fstdout
,
Apr 27 2018
coral-release/R68-10621.0.0/bvt-inline/provision_AutoUpdate.double (195246198-chromeos-test) https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=195246198 debug log: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/195246198-chromeos-test/chromeos6-row3-rack23-host15/debug |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by vineeths@chromium.org
, Feb 23 2018Owner: shapiroc@chromium.org