New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 828371 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

NotEnoughDutsError: eve-paladin

Project Member Reported by mnissler@chromium.org, Apr 3 2018

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/eve-paladin/builds/2800

************************************************************
** Start Stage HWTest [provision] - Tue, 03 Apr 2018 02:29:31 -0700 (PDT)
** 
** Stage that runs tests in the Autotest lab.
************************************************************
02:29:31: INFO: Created cidb engine bot@130.211.191.11 for pid 15550
02:29:31: INFO: Running cidb query on pid 15550, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7fd4c51ef1d0>
02:29:31: INFO: Waiting up to forever for payloads and test artifacts ...
Preconditions for the stage successfully met. Beginning to execute stage...
02:39:18: INFO: Running cidb query on pid 15550, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7fd4c45bc5d0>
02:39:18: INFO: Re-run swarming_cmd to avoid buildbot salency check.
02:39:18: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp6YRMN5/tmpOK9dm0/temp_summary.json --raw-cmd --task-name eve-paladin/R67-10541.0.0-rc3-provision --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:provision' '--tags=build:eve-paladin/R67-10541.0.0-rc3' '--tags=task_name:eve-paladin/R67-10541.0.0-rc3-provision' '--tags=board:eve' -- /usr/local/autotest/site_utils/run_suite.py --build eve-paladin/R67-10541.0.0-rc3 --board eve --suite_name provision --pool cq --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --suite_args "{u'num_required': 1}" --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 75336202L, 'cidb_build_id': 2439493, 'datastore_parent_key': ('Build', 2439493, 'BuildStage', 75336202L)}" --test_args "{'fast': 'True'}" -c
02:39:22: WARNING: Exception is not retriable return code: 3; command: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmp6YRMN5/tmpOK9dm0/temp_summary.json --raw-cmd --task-name eve-paladin/R67-10541.0.0-rc3-provision --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 9000 --io-timeout 9000 --hard-timeout 9000 --expiration 1200 '--tags=priority:CQ' '--tags=suite:provision' '--tags=build:eve-paladin/R67-10541.0.0-rc3' '--tags=task_name:eve-paladin/R67-10541.0.0-rc3-provision' '--tags=board:eve' -- /usr/local/autotest/site_utils/run_suite.py --build eve-paladin/R67-10541.0.0-rc3 --board eve --suite_name provision --pool cq --file_bugs False --priority CQ --timeout_mins 90 --retry True --max_retries 5 --minimum_duts 4 --suite_args "{u'num_required': 1}" --offload_failures_only False --job_keyvals "{'cidb_build_stage_id': 75336202L, 'cidb_build_id': 2439493, 'datastore_parent_key': ('Build', 2439493, 'BuildStage', 75336202L)}" --test_args "{'fast': 'True'}" -c
Priority was reset to 100
Triggered task: eve-paladin/R67-10541.0.0-rc3-provision
chromeos-golo-server2-158: 3ca3b1893186e010 3
  Autotest instance created: cautotest-prod
  TestLabException: Not enough DUTs for board: eve, pool: cq; required: 4, found: 3
  Traceback (most recent call last):
    File "/usr/local/autotest/site_utils/run_suite.py", line 2034, in _run_task
      return _run_suite(options)
    File "/usr/local/autotest/site_utils/run_suite.py", line 1775, in _run_suite
      options.skip_duts_check)
    File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 330, in check_dut_availability
      hosts=hosts)
  NotEnoughDutsError: Not enough DUTs for board: eve, pool: cq; required: 4, found: 3
  Will return from run_suite with status: INFRA_FAILURE
cmd=['/b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py', 'run', '--swarming', 'chromeos-proxy.appspot.com', '--task-summary-json', '/tmp/cbuildbot-tmp6YRMN5/tmpOK9dm0/temp_summary.json', '--raw-cmd', '--task-name', u'eve-paladin/R67-10541.0.0-rc3-provision', '--dimension', 'os', 'Ubuntu-14.04', '--dimension', 'pool', 'default', '--print-status-updates', '--timeout', '9000', '--io-timeout', '9000', '--hard-timeout', '9000', '--expiration', '1200', u'--tags=priority:CQ', u'--tags=suite:provision', u'--tags=build:eve-paladin/R67-10541.0.0-rc3', u'--tags=task_name:eve-paladin/R67-10541.0.0-rc3-provision', u'--tags=board:eve', '--', '/usr/local/autotest/site_utils/run_suite.py', '--build', u'eve-paladin/R67-10541.0.0-rc3', '--board', u'eve', '--suite_name', u'provision', '--pool', u'cq', '--file_bugs', 'False', '--priority', 'CQ', '--timeout_mins', '90', '--retry', 'True', '--max_retries', '5', '--minimum_duts', '4', '--suite_args', "{u'num_required': 1}", '--offload_failures_only', 'False', '--job_keyvals', "{'cidb_build_stage_id': 75336202L, 'cidb_build_id': 2439493, 'datastore_parent_key': ('Build', 2439493, 'BuildStage', 75336202L)}", '--test_args', "{'fast': 'True'}", '-c']
Autotest instance created: cautotest-prod
TestLabException: Not enough DUTs for board: eve, pool: cq; required: 4, found: 3
Traceback (most recent call last):
  File "/usr/local/autotest/site_utils/run_suite.py", line 2034, in _run_task
    return _run_suite(options)
  File "/usr/local/autotest/site_utils/run_suite.py", line 1775, in _run_suite
    options.skip_duts_check)
  File "/usr/local/autotest/site_utils/diagnosis_utils.py", line 330, in check_dut_availability
    hosts=hosts)
NotEnoughDutsError: Not enough DUTs for board: eve, pool: cq; required: 4, found: 3
Will return from run_suite with status: INFRA_FAILURE
02:39:22: INFO: No json dump found, no HWTest results to report
02:39:22: INFO: Running cidb query on pid 15550, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7fd4c45b9e10>
02:39:22: ERROR: ** HWTest did not complete due to infrastructure issues (code 3) **
02:39:22: INFO: Translating result ** HWTest did not complete due to infrastructure issues (code 3) ** to fail.
02:39:22: INFO: Running cidb query on pid 15550, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7fd4c4590410>
02:39:22: INFO: Running cidb query on pid 15550, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7fd4c45907d0>
************************************************************
** Finished Stage HWTest [provision] - Tue, 03 Apr 2018 02:39:22 -0700 (PDT)
************************************************************
02:39:23: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.failures_lib.TestLabFailure'>: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 442, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/test_stages.py", line 323, in PerformStage
    raise cmd_result.to_raise
TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/builders/generic_builders.py", line 120, in _RunParallelStages
    parallel.RunParallelSteps(steps)
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 679, in RunParallelSteps
    return [queue.get_nowait() for queue in queues]
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 676, in RunParallelSteps
    pass
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 562, in ParallelTasks
    raise BackgroundFailure(exc_infos=errors)
BackgroundFailure: <class 'chromite.lib.failures_lib.TestLabFailure'>: ** HWTest did not complete due to infrastructure issues (code 3) **
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 442, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/test_stages.py", line 323, in PerformStage
    raise cmd_result.to_raise
TestLabFailure: ** HWTest did not complete due to infrastructure issues (code 3) **

02:39:23: INFO: Running cidb query on pid 15032, repr(query) starts with <sqlalchemy.sql.expression.Select at 0x7fd4c4595650; Select object>
02:39:23: INFO: Running cidb query on pid 15032, repr(query) starts with <sqlalchemy.sql.expression.Select at 0x7fd4c4580b10; Select object>
02:29:31: INFO: Created cidb engine bot@130.211.191.11 for pid 15052
02:29:31: INFO: Running cidb query on pid 15052, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7fd4bfa5c9d0>
 
$ ../third_party/autotest/files/cli/atest host list -b board:eve --unlocked | ../third_party/autotest/files/contrib/count_labels -p                                                                               
     12 bvt                                                                                                                         
      2 chameleon
      2 chameleon_audio_stable
      6 cq
      1 cr50_stress
      1 crosperf
     18 cts
      1 faft_flashrom
      2 faft-test
      1 performance
      1 stress
      1 stress2
     29 suites


This shows 6 DUTs for cq, so we're good again hopefully. Will monitor to confirm and close if the next run is green.
Status: Available (was: Untriaged)
Failed again complaining about too few DUTs (3 available vs 4 required): https://uberchromegw.corp.google.com/i/chromeos/builders/eve-paladin/builds/2801

Need infra to take a closer look.
Cc: zhengpan@chromium.org jkop@chromium.org mnissler@chromium.org smbar...@chromium.org
Labels: -Pri-3 Pri-1
Owner: akes...@chromium.org
Assigning to this week's infra deputy. Including secondary deputy + sheriffs.
Status: Fixed (was: Available)
balance_pool cq eve --force-rebalance
eve cq pool: Target of 6 is above minimum.

Balancing ['model:eve'] cq pool:
Total 6 DUTs, 3 working, 3 broken, 0 reserved.
Target is 6 working DUTs; grow pool by 3 DUTs.
['model:eve'] suites pool has 22 spares available for balancing pool cq
['model:eve'] cq pool will return 3 broken DUTs, leaving 0 still in the pool.
Transferring 3 DUTs from cq to suites.
Updating host: chromeos6-row4-rack11-host12.
Removing labels ['pool:cq'] from host chromeos6-row4-rack11-host12
Adding labels ['pool:suites'] to host chromeos6-row4-rack11-host12
Updating host: chromeos6-row4-rack11-host13.
Removing labels ['pool:cq'] from host chromeos6-row4-rack11-host13
Adding labels ['pool:suites'] to host chromeos6-row4-rack11-host13
Updating host: chromeos6-row4-rack11-host18.
Removing labels ['pool:cq'] from host chromeos6-row4-rack11-host18
Adding labels ['pool:suites'] to host chromeos6-row4-rack11-host18
Transferring 3 DUTs from suites to cq.
Updating host: chromeos6-row4-rack11-host7.
Removing labels ['pool:suites'] from host chromeos6-row4-rack11-host7
Adding labels ['pool:cq'] to host chromeos6-row4-rack11-host7
Updating host: chromeos6-row4-rack11-host15.
Removing labels ['pool:suites'] from host chromeos6-row4-rack11-host15
Adding labels ['pool:cq'] to host chromeos6-row4-rack11-host15
Updating host: chromeos6-row4-rack12-host3.
Removing labels ['pool:suites'] from host chromeos6-row4-rack12-host3
Adding labels ['pool:cq'] to host chromeos6-row4-rack12-host3

Sign in to add a comment