[Autotests not running] nocturne not running various OOB test suites as expected |
|||||
Issue descriptionchromeos-infra, need your help in checking why the following suites are not running on nocturne as scheduled... bluetooth_sanity - should run nightly on tot https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?l=176 wifi_endtoend - should run nightly on tot https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?l=616 wifi_matfunc - should run nightly on tot https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?l=594 wifi_perf - should run nightly on tot https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?l=627 From this stainless link looks like wifi_* suites have been getting scheduled to but are not run most of the days. For bluetooth_sanity, looks like its not even getting scheduled on a nightly basis. https://stainless.corp.google.com/search?view=matrix&row=test&col=build&first_date=2018-10-04&last_date=2018-10-31&suite=bluetooth_sanity%7Cwifi_matfunc%7Cwifi_endtoend%7Cwifi_perf&model=%5Enocturne%24&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Can someone on chromeos-infra please take a look or point me to where I can look to see what the issue might be?
,
Nov 1
I don't think it's simply a lack of duts: there are 10 nocturne duts in the "Ready" state in the pool:suites right now. I'm looking into it.
,
Nov 2
,
Nov 2
I didn't pay attention earlier, these suites use pool:wificell which currently has a single dut chromeos15-row4-rack11-host3 which has been failing repair for a couple of days.
,
Nov 2
Maybe this is merely a question of corrupted pool labels? (why should any pool have just one DUT?)
,
Nov 2
This is not an auto-managed pool and judging by the name "wificell" the duts require a very special placement/treatment.
,
Nov 2
Correct there is only 1 nocturne dut in wificell pool and the device has been down since yesterday. Have asked the lab folks to power cycle it. The issue reported here has been happening for a while though; not related to the recent repair failed state.
,
Nov 2
Is a single dut enough for the throughput of that pool?
,
Nov 5
Its possible. Is there a way to find out how long each test takes to run on a given device? Looking at viceroy, nocturne in wificell is barely in idle state (link below). Based on the tests running on this device, I don't think this should be the case. So if there is an easy way to find out how long each of these tests are taking to run that will help in determining if we need more units. https://viceroy.corp.google.com/chromeos/dut_utilization?board=&model=nocturne&pool=wificell&status=Ready&is_locked=False&topstreams=100&build_config=kevin-paladin&build_number=&duration=15d&experimental=False&mdb_role=chrome-infra&milestone_version=&refresh=-1&scheduler_host=cros-full-0036&sentinel_host=chromeos-server156&staging_master=chromeos-staging-master2&waterfall=chromeos Here are all the suites (with # of tests) running on this DUT. https://stainless.corp.google.com/search?view=matrix&row=suite&col=queued_date&first_date=2018-10-09&last_date=2018-11-05&suite=%5Ebluetooth%7C%5Ewifi&model=%5Enocturne%24&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
,
Nov 5
Looks like lots of jobs ran in the last 2-3 days: https://stainless.corp.google.com/search?view=matrix&row=test&col=queued_date&first_date=2018-10-23&last_date=2018-11-05&suite=bluetooth_sanity%7Cwifi_matfunc%7Cwifi_endtoend%7Cwifi_perf&model=%5Enocturne%24&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false Also, looking at the shard jobs for some the jobs that didn't run earlier, I think it indeed is down to just that one DUT being repeatedly in repair_failed state. The shard can't find a DUT to match, so the job stays queued and times out. http://chromeos-skunk-3.mtv.corp.google.com/afe/#tab_id=view_job&object_id=255479121 http://chromeos-skunk-3.mtv.corp.google.com/afe/#tab_id=view_job&object_id=255461689 bluetooth_nightly were still not scheduled until today, and I can't explain that.
,
Nov 5
Ah, bluetooth_sanity is actually scheduled weekly: http://shortn/_SzqlkokTgP That's because the nightly suite's whitelist doesn't have nocturne: https://cs.corp.google.com/chromeos_public/infra/suite_scheduler/configs/suite_scheduler.ini?l=184 So, WAI.
,
Nov 5
pprabhu@ thanks for taking a look. You are right about nocturne missing from the nightly suite whitelist. I just found that out today as well. Any input on "Is there a way to find out how long each test takes to run on a given device?" ?
,
Nov 7
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/infra/suite_scheduler/+/c546e402a27f5d1665706fb48b3ae38cf332d52c commit c546e402a27f5d1665706fb48b3ae38cf332d52c Author: Shijin Abraham <shijinabraham@google.com> Date: Wed Nov 07 14:34:20 2018 Add nocturne to bluetooth_sanity Remove no_delay from Wifi_Release BUG= chromium:900766 TEST=None Change-Id: If5f7d5918f243c4aaabc1277824986c4c79e97f8 Reviewed-on: https://chromium-review.googlesource.com/1318098 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Shijin Abraham <shijinabraham@google.com> Reviewed-by: Harpreet Grewal <harpreet@chromium.org> [modify] https://crrev.com/c546e402a27f5d1665706fb48b3ae38cf332d52c/configs/suite_scheduler.ini |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by akeshet@google.com
, Oct 31Status: Assigned (was: Untriaged)