master-paladin build 16481 & 16482 failed due to several paladins failing HWTest "not enough DUTs" or "Name or service not known" |
|||||||
Issue descriptionActually, builds 16464 - 16481 have all failed; the last pass was build 16463. A spot check shows that almost all [*] [**] of these builds failed in HWTest due to "** HWTest did not complete due to infrastructure issues (code 3) **", and/or "** HWTest failed (code 1) **" on a wide variety of different slave paladins. [*] https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16469 failed for "Could not submit nshai:*463656:*5c11d1a1, error: CL:*463656 was modified while the CQ was in the middle of testing it. Patch set 16 was uploaded." [**] https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16467 & https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16466 failed due to betty-arc64-paladin: "The VMTest (attempt 2) stage failed: ** VMTests failed with code 1 **" - which is issue 769808 . For example: https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16481 stdio [stdout] hana-paladin: The HWTest [bvt-arc] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** kevin-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** lumpy-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** peach_pit-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** tidus-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** Looking at just hana-paladin: https://luci-milo.appspot.com/buildbot/chromeos/hana-paladin/1106 https://uberchromegw.corp.google.com/i/chromeos/builders/hana-paladin/builds/1106/steps/HWTest%20%5Bbvt-arc%5D/logs/stdio NotEnoughDutsError: Not enough DUTs for board: hana, pool: cq; required: 4, found: 2 $ dut-status -b hana -p cq hostname S last checked URL chromeos6-row4-rack2-host1 NO 2017-10-04 08:14:50 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack2-host1/1593047-repair/ chromeos6-row4-rack2-host6 NO 2017-10-04 08:12:35 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack2-host6/1593028-repair/ chromeos6-row4-rack3-host17 NO 2017-10-04 08:07:30 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack3-host17/1592974-repair/ chromeos6-row4-rack3-host10 OK 2017-10-03 17:10:50 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack3-host10/1584624-repair/ chromeos6-row3-rack3-host15 OK 2017-10-03 17:11:33 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack3-host15/1584627-cleanup/ chromeos6-row3-rack3-host17 NO 2017-10-04 08:08:16 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack3-host17/1592985-repair/ chromeos6-row3-rack3-host21 OK 2017-10-04 05:06:04 http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row3-rack3-host21/1591455-repair/ Yup, 4 of the 6 DUTs are failing repair. Looking at chromeos6-row4-rack2-host1: http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row4-rack2-host1/1593047-repair/ 10/04 08:15:28.558 INFO | servo_host:0458| servo host has the following duts: ['chromeos6-row4-rack2-host1', 'chromeos6-row4-rack2-host7', 'chromeos6-row4-rack2-host3', 'chromeos6-row4-rack2-host5', 'chromeos6-row4-rack2-host9', 'chromeos6-row4-rack2-host13', 'chromeos6-row4-rack2-host11', 'chromeos6-row4-rack2-host15', 'chromeos6-row4-rack2-host17', 'chromeos6-row4-rack2-host19', 'chromeos6-row4-rack2-host21'] 10/04 08:15:28.559 INFO | servo_host:0460| servo host has multiple duts, scheduling synchronized reboot 10/04 08:15:31.070 ERROR|control_file_gette:0214| Unable to search directory /usr/local/autotest/site-packages/devserver/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/host/lib for control files. 10/04 08:15:31.072 ERROR|control_file_gette:0214| Unable to search directory /usr/local/autotest/site-packages/devserver/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib/venv/devserver_lib for control files. ... 10/04 08:20:52.493 ERROR|control_file_gette:0214| Unable to search directory /usr/local/autotest/containers/test_146772387_1507129356_32501 for control files. 10/04 08:20:52.493 ERROR|control_file_gette:0214| Unable to search directory /usr/local/autotest/containers/test_141107022_1505131533_3651 for control files. 10/04 08:20:52.828 ERROR| repair:0332| Failed: servo host software is up-to-date Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host self.verify(host) File "/usr/local/autotest/server/hosts/servo_repair.py", line 28, in verify host.update_image(wait_for_update=False) File "/usr/local/autotest/server/hosts/servo_host.py", line 550, in update_image status, current_build_number = self._check_for_reboot(updater) File "/usr/local/autotest/server/hosts/servo_host.py", line 462, in _check_for_reboot self.schedule_synchronized_reboot(dut_list, afe) File "/usr/local/autotest/server/hosts/servo_host.py", line 421, in schedule_synchronized_reboot control_file = getter.get_control_file_contents_by_name(test) File "/usr/local/autotest/server/cros/dynamic_suite/control_file_getter.py", line 152, in get_control_file_contents_by_name path = self.get_control_file_path(test_name) File "/usr/local/autotest/server/cros/dynamic_suite/control_file_getter.py", line 136, in get_control_file_path raise error.ControlFileNotFound(test_name + ' is not unique.') ControlFileNotFound: servohost_Reboot is not unique. 10/04 08:20:52.830 INFO | server_job:0214| FAIL ---- verify.update timestamp=1507130452 localtime=Oct 04 08:20:52 servohost_Reboot is not unique. 10/04 08:20:52.851 INFO | repair:0105| Skipping this operation: All host verification checks pass 10/04 08:20:52.851 DEBUG| repair:0106| The following dependencies failed: 10/04 08:20:52.851 DEBUG| repair:0108| servo host software is up-to-date 10/04 08:20:52.852 DEBUG| abstract_ssh:0915| Terminated tunnel, pid 26263 10/04 08:20:52.852 ERROR| servo_host:0859| Servo repair failed for chromeos6-row4-rack2-labstation1 Traceback (most recent call last): File "/usr/local/autotest/server/hosts/servo_host.py", line 853, in create_servo_host newhost.repair() File "/usr/local/autotest/server/hosts/servo_host.py", line 621, in repair self._repair_strategy.repair(self, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 703, in repair self._verify_root._verify_host(host, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 326, in _verify_host self._verify_dependencies(host, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 199, in _verify_dependencies self._verify_list(host, self._dependency_list, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 188, in _verify_list raise AutoservVerifyDependencyError(self, failures) AutoservVerifyDependencyError: servohost_Reboot is not unique. ... 10/04 08:22:06.927 INFO | repair:0327| Verifying this condition: The most recent AU attempt on this DUT succeeded 10/04 08:22:06.945 DEBUG| ssh_host:0296| Running (ssh) 'test -f /var/tmp/provision_failed' from '_repair_host|_verify_list|_verify_host|verify|run|run_very_slowly' 10/04 08:22:07.308 ERROR| repair:0332| Failed: The most recent AU attempt on this DUT succeeded Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 329, in _verify_host self.verify(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 194, in verify 'Last AU on this DUT failed') AutoservVerifyError: Last AU on this DUT failed 10/04 08:22:07.310 INFO | server_job:0214| FAIL ---- verify.good_au timestamp=1507130527 localtime=Oct 04 08:22:07 Last AU on this DUT failed 10/04 08:22:07.310 INFO | repair:0105| Attempting this repair action: Re-install the stable firmware via servo 10/04 08:22:07.311 DEBUG| repair:0106| Repairing because these triggers failed: 10/04 08:22:07.311 DEBUG| repair:0108| The most recent AU attempt on this DUT succeeded 10/04 08:22:07.311 INFO | server_job:0214| START ---- repair.firmware timestamp=1507130527 localtime=Oct 04 08:22:07 10/04 08:22:07.312 ERROR| repair:0449| Repair failed: Re-install the stable firmware via servo Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 447, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/cros_firmware.py", line 150, in repair host.hostname) AutoservRepairError: Firmware repair is not applicable to host chromeos6-row4-rack2-host1. 10/04 08:22:07.313 INFO | server_job:0214| FAIL ---- repair.firmware timestamp=1507130527 localtime=Oct 04 08:22:07 Firmware repair is not applicable to host chromeos6-row4-rack2-host1. 10/04 08:22:07.313 INFO | server_job:0214| END FAIL ---- repair.firmware timestamp=1507130527 localtime=Oct 04 08:22:07 10/04 08:22:07.314 INFO | repair:0327| Verifying this condition: The host should not be in dev mode .... 10/04 08:29:32.324 ERROR| utils:2739| Will raise error TimeoutError() due to unexpected return: '' 10/04 08:29:32.326 DEBUG| utils:0212| Running 'ssh 100.115.185.227 'curl "http://100.115.185.227:8082/kill_au_proc?pid=5493&host_name=100.115.132.155"'' 10/04 08:29:33.229 DEBUG| dev_server:2189| Exception raised on auto_update attempt #2: Traceback (most recent call last): File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 262, in TriggerAU self._RootfsUpdate(chromeos_AU) File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 173, in _RootfsUpdate cros_updater.UpdateRootfs() File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 727, in UpdateRootfs raise RootfsUpdateError(error_msg % e) RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) 10/04 08:29:33.230 DEBUG| dev_server:2195| Please see error details in log /usr/local/autotest/results/hosts/chromeos6-row4-rack2-host1/1593047-repair/20170410081449/autoupdate_logs/CrOS_update_100.115.132.155_5493.log 10/04 08:29:33.235 ERROR| repair:0449| Repair failed: Powerwash and then re-install the stable build via AU Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 447, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 473, in repair super(PowerWashRepair, self).repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 454, in repair afe_utils.machine_install_and_update_labels(host, repair=True) File "/usr/local/autotest/server/afe_utils.py", line 124, in machine_install_and_update_labels *args, **dargs) File "/usr/local/autotest/server/hosts/cros_host.py", line 815, in machine_install_by_devserver force_original=force_original) File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2255, in auto_update error_msg % (host_name, error_list[0])) DevServerException: CrOS auto-update failed for host chromeos6-row4-rack2-host1: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) 10/04 08:29:33.238 INFO | server_job:0214| FAIL ---- repair.powerwash timestamp=1507130973 localtime=Oct 04 08:29:33 CrOS auto-update failed for host chromeos6-row4-rack2-host1: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) 10/04 08:29:33.238 INFO | server_job:0214| END FAIL ---- repair.powerwash timestamp=1507130973 localtime=Oct 04 08:29:33 10/04 08:29:33.239 INFO | repair:0105| Attempting this repair action: Reinstall from USB using servo 10/04 08:29:33.239 DEBUG| repair:0106| Repairing because these triggers failed: 10/04 08:29:33.240 DEBUG| repair:0108| The most recent AU attempt on this DUT succeeded ... 10/04 08:29:33.240 INFO | server_job:0214| START ---- repair.usb timestamp=1507130973 localtime=Oct 04 08:29:33 10/04 08:29:33.240 ERROR| repair:0449| Repair failed: Reinstall from USB using servo Traceback (most recent call last): File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 447, in _repair_host self.repair(host) File "/usr/local/autotest/server/hosts/cros_repair.py", line 491, in repair '%s has no servo support.' % host.hostname) AutoservRepairError: chromeos6-row4-rack2-host1 has no servo support. 10/04 08:29:33.241 INFO | server_job:0214| FAIL ---- repair.usb timestamp=1507130973 localtime=Oct 04 08:29:33 chromeos6-row4-rack2-host1 has no servo support. 10/04 08:29:33.242 INFO | server_job:0214| END FAIL ---- repair.usb timestamp=1507130973 localtime=Oct 04 08:29:33 10/04 08:29:33.242 INFO | repair:0327| Verifying this condition: The host should have valid HWID and Serial Number 10/04 08:29:33.258 DEBUG| ssh_host:0296| Running (ssh) 'crossystem hwid' from '_verify_dependencies|_verify_list|_verify_host|verify|run|run_very_slowly' 10/04 08:29:33.613 DEBUG| utils:0280| [stderr] mux_client_request_session: read from master failed: Broken pipe 10/04 08:29:34.358 DEBUG| utils:0299| [stdout] HANA D3A-D7G-A6A-E2Q-A37 ... 10/04 08:29:36.196 DEBUG| repair:0106| The following dependencies failed: 10/04 08:29:36.197 DEBUG| repair:0108| The most recent AU attempt on this DUT succeeded 10/04 08:29:36.198 ERROR| repair:0043| Repair failed due to Exception. Traceback (most recent call last): File "/usr/local/autotest/server/control_segments/repair", line 38, in repair target.repair() File "/usr/local/autotest/server/hosts/cros_host.py", line 1216, in repair self._repair_strategy.repair(self) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 703, in repair self._verify_root._verify_host(host, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 326, in _verify_host self._verify_dependencies(host, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 199, in _verify_dependencies self._verify_list(host, self._dependency_list, silent) File "/usr/local/autotest/client/common_lib/hosts/repair.py", line 188, in _verify_list raise AutoservVerifyDependencyError(self, failures) AutoservVerifyDependencyError: Last AU on this DUT failed 10/04 08:29:36.198 INFO | server_job:0214| END FAIL ---- repair timestamp=1507130976 localtime=Oct 04 08:29:36
,
Oct 4 2017
,
Oct 4 2017
,
Oct 4 2017
,
Oct 4 2017
,
Oct 4 2017
,
Oct 4 2017
master-paladin/16482 failed in a similar way: https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16482 caroline-paladin/builds/1445 bvt-arc NotEnoughDutsError: Not enough DUTs for board: caroline, pool: cq; required: 4, found: 3 hana-paladin/builds/1107 bvt-arc NotEnoughDutsError: Not enough DUTs for board: hana, pool: cq; required: 4, found: 3 kevin-paladin/builds/2619 bvt-inline NotEnoughDutsError: Not enough DUTs for board: kevin, pool: cq; required: 4, found: 3 peach_pit-paladin/builds/17260 bvt-inline NotEnoughDutsError: Not enough DUTs for board: peach_pit, pool: cq; required: 4, found: 0 tidus-paladin/builds/984 bvt-cq NotEnoughDutsError: Not enough DUTs for board: tidus, pool: cq; required: 4, found: 3 reef-paladin/builds/3849 HWTest bvt-inline was actually purple: status reason: step was interrupted. lumpy-paladin/builds/29819 bvt-inline had a slightly different symptom: provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host19: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host17: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host17: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host12: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host3: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host10: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host10: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host19: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host3: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host17: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host12: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',) provision [ FAILED ] provision FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack8-host17: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',)
,
Oct 9 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by djkurtz@chromium.org
, Oct 4 2017