peach_pit-chrome-pfq fail due to "Not enough DUTs for board" |
||
Issue descriptionpeach_pit-chrome-pfq fails due to "Not enough DUTs for board" https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-chrome-pfq/5655 Triggered task: peach_pit-chrome-pfq/R69-10738.0.0-rc3-provision chromeos-golo-server2-251: 3dce9a2ede514110 3 Autotest instance created: cautotest-prod TestLabException: Not enough DUTs for board: peach_pit, pool: bvt; required: 4, found: 3 Traceback (most recent call last): File "/usr/local/autotest/site_utils/run_suite.py", line 1990, in _run_task return _run_suite(options) File "/usr/local/autotest/site_utils/run_suite.py", line 1726, in _run_suite options.skip_duts_check) File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 330, in check_dut_availability hosts=hosts) NotEnoughDutsError: Not enough DUTs for board: peach_pit, pool: bvt; required: 4, found: 3 Will return from run_suite with status: INFRA_FAILURE
,
May 31 2018
There was a temporary testing glitch caused by a bug in Chrome OS:
$ dut-status -m peach_pit -p bvt -u '2018-05-31 04:40:00' -f -d 2 | grep repair
2018-05-31 03:33:42 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack10-host6/1187498-repair/
2018-05-31 03:35:30 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack10-host20/1187505-repair/
2018-05-31 03:43:19 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack11-host6/1187535-repair/
2018-05-31 03:45:23 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack11-host13/1187541-repair/
2018-05-31 03:37:43 OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack11-host20/1187518-repair/
At least three DUTs were actively repairing after a failure when the
PFQ requested its test suite. DUTs that are repairing are considered
unavailable, which is what caused the error.
,
May 31 2018
These are the test jobs that failed:
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=204247244
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=204260738
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=204260741
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=204260735
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=204247254
In all cases, there was a large number of Chrome crashes. The crashes
then forced repair, for one of two reasons:
* For most DUTs, they ran out of disk space to hold any more crashes.
* Presumably for all of the DUTs, Chrome didn't say up.
,
May 31 2018
The crashes that let to the repairs were caused by a known bug in the R68 beta branch for peach_pit; a fix needs to be cherry-picked.
,
May 31 2018
The bug that caused the crashes that caused this problem is bug 845429 .
,
Jun 2 2018
I'm not prepared to suggest that we change the check for "are there enough DUTs". And, absent making that check smarter, the only fix I see is to cherry-pick the fix to whatever is making peach_pit fail in Beta. This isn't the bug for cherry-picking the fix, so we're done here. |
||
►
Sign in to add a comment |
||
Comment 1 by jrbarnette@chromium.org
, May 31 2018