HWTest Stage was executed, but not logged by builder
Reported by
jrbarnette@chromium.org,
Mar 5 2017
|
||
Issue description
See these three canary builds for R58-9334.0.0:
https://uberchromegw.corp.google.com/i/chromeos/builders/reef-release/builds/894
https://uberchromegw.corp.google.com/i/chromeos/builders/snappy-release/builds/425
https://uberchromegw.corp.google.com/i/chromeos/builders/pyro-release/builds/440
None of those pages report running "HWTest [sanity]". Yet, the Autotest AFE
shows that the step ran and created jobs:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=104560485
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=104561242
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=104559561
This build also failed "HWTest [sanity]" similarly, but the failure was
reported properly:
https://uberchromegw.corp.google.com/i/chromeos/builders/ninja-release/builds/908
I suspect the reason for logging difference is related to the Paygen
failures.
,
Mar 15 2017
<sigh> Trying to answer the question in c#1, I went digging
a bit more. The answer is that the Paygen failures and the
lack of HWTest logging are a fabulous example of this:
https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation
I went and looked at the cases that were missing the HWTest
logging. In all cases, the build run time was about 7 hours,
45 minutes. I'm going to guess that this means that the builds
were timing out. Most likely, that was because DUTs were constantly
failing provision, and tests were constantly retrying. That
led lots of stuff to take too long, leading to timeouts.
It's a well-understood, hard-to-fix problem that builder timeouts
lead to undiagnosable build failures, starting with the basic problem
that the builder can't even say it was killed by the timeout.
We should dig a bit more: I'm not satisfied with an answer that says
"we can't make the failure more debuggable", not because I don't
believe that but because I need to be convinced that avoiding the
failure needs to be made rarer:
* Why did the provisioning job failures not lead to suite failure
sooner?
* Why were the HWTest phases allowed to run so long that the
hard buildbot timeouts kicked in?
I think a while back we increased the timeouts for HWTest phases.
possibly, we should revisit that choice.
,
Mar 15 2017
The HWTest stage suites should timeout before the builder timeout is hit, IF they started on time. However, if they started late because something else was slow their timers might still be running when the builder timeout is hit.
,
Mar 21 2017
Known issue that builder timeouts won't generate logs. |
||
►
Sign in to add a comment |
||
Comment 1 by autumn@chromium.org
, Mar 15 2017