New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 874991 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

guado_moblab-paladin failed with "No hosts found for board:cyan in pool:"

Project Member Reported by jwer...@chromium.org, Aug 16

Issue description

Failing build run: https://luci-milo.appspot.com/buildbot/chromeos/guado_moblab-paladin/10156

Log excerpt from autoserv.DEBUG:

08-16-2018 [03:12:23] Start collecting test results and dump them to json.
Suite job          [ PASSED ]
dummy_PassServer   [ PASSED ]
provision          [ FAILED ]
provision            FAIL: Failure in build R68-10718.74.0: command execution error
dummy_PassServer   [ PASSED ]

Suite timings:
Downloads started at 2018-08-16 02:55:52
Payload downloads ended at 2018-08-16 02:55:57
Suite started at 2018-08-16 02:56:08
Artifact downloads ended (at latest) at 2018-08-16 02:56:09
Testing started at 2018-08-16 02:56:16
Testing ended at 2018-08-16 03:11:11


Links to test logs:
Suite job http://localhost/tko/retrieve_logs.cgi?job=/results/1-moblab/
dummy_PassServer http://localhost/tko/retrieve_logs.cgi?job=/results/2-moblab/
provision http://localhost/tko/retrieve_logs.cgi?job=/results/3-moblab/
dummy_PassServer http://localhost/tko/retrieve_logs.cgi?job=/results/4-moblab/


08-16-2018 [03:12:24] Gathering timing stats for the suite job.

 08-16-2018 [03:12:24] Attempting to display pool info: 
No hosts found for board:cyan in pool:
Reason: Provisioning failed.

 08-16-2018 [03:12:24] Output below this line is for buildbot consumption:
@@@STEP_LINK@[Test-Logs]: provision: FAIL: Failure in build R68-10718.74.0: command execution error@http://localhost/tko/retrieve_logs.cgi?job=/results/3-moblab/@@@
Will return from run_suite with status: INFRA_FAILURE
Traceback (most recent call last):
  File "/usr/local/autotest/client/common_lib/test.py", line 831, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 495, in execute
    dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 362, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/client/common_lib/test.py", line 400, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/server/site_tests/moblab_RunSuite/moblab_RunSuite.py", line 80, in run_once
    result = host.run_as_moblab(cmd, timeout=run_suite_timeout_s)
  File "/usr/local/autotest/server/hosts/moblab_host.py", line 150, in run_as_moblab
    return self.run(command, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run
    return self.run_very_slowly(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly
    ssh_failure_retry_ok)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 262, in _run
    raise error.AutoservRunError("command execution error", result)
AutoservRunError: command execution error
* Command: 
    /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -l root -p 22 chromeos2-row1-rack8-host1 "export LIBC_FATAL_STDERR_=1; if
    type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::run_once|run_as_moblab|run] -> ssh_run(su - moblab -c
    '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan
    --build=cyan-release/R68-10718.74.0 --suite_name=dummy_server --retry=True
    --max_retries=1')\";fi; su - moblab -c
    '/usr/local/autotest/site_utils/run_suite.py --pool='' --board=cyan
    --build=cyan-release/R68-10718.74.0 --suite_name=dummy_server --retry=True
    --max_retries=1'"
Exit status: 3
Duration: 992.704529047

I don't know what this means, but Xixuan said it's a lab problem.
 
Cc: haddowk@chromium.org
Owner: mattmallett@chromium.org
Assigning to on-call Peeler.  However yes it is a lab/hardware issue
Status: Assigned (was: Untriaged)
I looked at this earlier this morning when I got in. It looked like the Moblab failed to provision and went into Repairing status. I just got back from the lab checking up on the Moblab devices, and this one seems to be OK now.

These lines from the log seem to point to a provision failure

08/16 03:02:48.909 DEBUG|          ssh_host:0301| Running (ssh) 'None --noreboot cyan-release/R68-10718.74.0 http://192.168.231.1:8080/static' from 'run_update|_install_update|_install_via_quick_provision|_run|run|run_very_slowly'
08/16 03:02:48.945 ERROR|             utils:0286| [stderr] bash: None: command not found
08/16 03:02:48.989 ERROR|       autoupdater:0889| quick-provision script failed; will fall back to update_engine.
Traceback (most recent call last):
  File "/usr/local/autotest/server/cros/autoupdater.py", line 882, in _install_via_quick_provision
    self._run(command)
  File "/usr/local/autotest/server/cros/autoupdater.py", line 356, in _run
    return self.host.run(cmd, *args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 323, in run
    return self.run_very_slowly(*args, **kwargs)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 312, in run_very_slowly
    ssh_failure_retry_ok)
  File "/usr/local/autotest/server/hosts/ssh_host.py", line 262, in _run
    raise error.AutoservRunError("command execution error", result)
AutoservRunError: command execution error
* Command: 
    /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_WxmY_Nssh-master/socket
    -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
    -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
    ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
    192.168.231.100 "export LIBC_FATAL_STDERR_=1; if type \"logger\" >
    /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::_install_via_quick_provision|_run|run] -> ssh_run(None
    --noreboot cyan-release/R68-10718.74.0
    http://192.168.231.1:8080/static)\";fi; None --noreboot cyan-
    release/R68-10718.74.0 http://192.168.231.1:8080/static"
Exit status: 127
Duration: 0.0682458877563

stderr:
bash: None: command not found

link to log: https://stainless.corp.google.com/browse/chromeos-autotest-results/228004837-chromeos-test/chromeos2-row1-rack8-host1/
Status: Fixed (was: Assigned)

Sign in to add a comment