autotest: hardware validation test on DUT failure |
||||
Issue descriptionDUT hardware fails sometimes, and currently causes tree flakes until diagnosed manually. We should have a hardware diagnostic that runs before or after repair to check for common problems, including battery failure, power failure, port failure, storage failure, memory failure, missing components, thermal failure, etc.
,
Jul 17
,
Jul 17
> [ ... ] a hardware diagnostic that runs before or after repair [ ... ]
Doing the work _during_ the repair task might be a better choice, as a
way to simplify the scheduler.
However, only choosing to do the work during repair might not be the best
or most effective choice. Just because we ran repair doesn't mean that
the hardware is suspect. Also, there's no guarantee that devices with hardware
problems will predictably wind up in repair tasks; they may simply fail
certain tests while reliably passing all verification checks.
An alternative proposal for how to schedule the diagnostics is here:
https://docs.google.com/document/d/1zIvIqwRbRtF2HP2a9pPti6dXMeq5ejHviFxsMsp_SWw/edit#heading=h.bfnmwg8natdi
The basic idea there is that every DUT should run the hardware
diagnostic tests, and should re-run them whenever test results seem
too stale.
,
Jul 17
More broadly, we should probably split this work into two loosely coupled pieces:
1) Write (and maintain) diagnostics for Chrome hardware.
2) Set up a system that ensures that we run the diagnostics, and that
failures are reported and acted on.
,
Jul 25
The first step here is to build out such diagnostics, if they don't exist already. This would be helpful even if the deputy needed to run them manually to check if a DUT is problematic. nsanders, are you the right person to move this forward?
,
Jul 25
Yes, I think so.
,
Aug 8
|
||||
►
Sign in to add a comment |
||||
Comment 1 by yusukes@chromium.org
, Jun 1 2018