consider schedulering strategies for test retries that avoid inter-suite competition |
||||
Issue descriptionExample canary build: https://uberchromegw.corp.google.com/i/chromeos/builders/lulu-release/builds/2248 Symptom: One of the non paygen HWTest suites times out. In this case HWTest [bvt-arc] Root cause: bvt-arc suite was kicked off ~1 hour before paygen. Both suites had a 3 hour timeout. Some tests failed in bvt-arc suite. By the time they failed and were requeued, paygen tests had already been scheduled. So the retries eneded up behind paygen tests. paygen tests took > 2 hours to finish (successfully), pushing the retries beyond the allowed 3 hour limit for bvt-arc. This is clearly shown by the two suite timelines: The early bvt-arc suite: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=205881545 Note that only the tests till ~22:30 are actually tests from this suite. Other tests do not belong this suite. This is bug in how suite_timeline reporting works. And the interjecting paygen suite: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=205898669
,
Jun 6 2018
,
Jun 6 2018
Another instance of this from yesterday: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/suiteDetails?suiteId=206025308
,
Jun 6 2018
,
Jun 7 2018
#1 sgtm
,
Nov 2
Does qschedular care / mitigate this problem? Or is this a test planning fly?
,
Nov 26
Possible FR for quotascheduler. Low priority |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Jun 6 2018Owner: pprabhu@chromium.org
Status: Assigned (was: Untriaged)