guado_moblab paladin is failing HWTest; DUTs in repair failed state |
|||||||||||
Issue descriptionBuilds have been failing consistently since Sunday, for example: https://luci-milo.appspot.com/buildbot/chromeos/guado_moblab-paladin/4690 Snippet of log that contains the failure. moblab_DummyServerSuite ABORT: Timed out, did not run. Suite timings: Downloads started at 2017-01-09 07:51:16 Payload downloads ended at 2017-01-09 07:51:25 Suite started at 2017-01-09 07:51:51 Artifact downloads ended (at latest) at 2017-01-09 07:51:55 Testing started at 2017-01-09 09:54:24 Testing ended at 2017-01-09 09:54:24 Links to test logs: Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/ moblab_DummyServerSuite http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/ Attempting to display pool info: cq host: chromeos2-row1-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair labels: ['bluetooth', 'power:AC_only', 'storage:ssd', 'hw_video_acc_enc_h264', 'hw_jpeg_acc_dec', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'board:guado_moblab', 'hw_video_acc_vp9', 'cts_abi_x86', 'cts_abi_arm', 'guado_moblab', 'sku:guado_intel_broadwell_i3_4Gb', 'variant:guado', 'os:moblab', 'phase:PVT', 'pool:cq', 'cros-version:guado_moblab-paladin/R57-9163.0.0-rc1'] Last 10 jobs within 2:18:00: 247733 Repair started on: 2017-01-09 09:45:16 status FAIL 247725 Verify started on: 2017-01-09 09:43:50 status FAIL 247708 Repair started on: 2017-01-09 09:15:11 status FAIL 247701 Verify started on: 2017-01-09 09:13:43 status FAIL 247685 Repair started on: 2017-01-09 08:45:05 status FAIL 247677 Verify started on: 2017-01-09 08:43:37 status FAIL 247663 Repair started on: 2017-01-09 08:14:56 status FAIL 247656 Verify started on: 2017-01-09 08:13:30 status FAIL 247644 Repair started on: 2017-01-09 07:44:46 status FAIL Looking at https://viceroy.corp.google.com/chromeos/dut_health?board=guado_moblab the Working managed DUTs dropped to 0; broken managed DUTs has gone up.
,
Jan 9 2017
,
Jan 9 2017
John, any thoughts on the failure being reported in c#1?
,
Jan 9 2017
Unfortunately I'm no longer working on additional ChromeOS work. However going off the error message I believe it's either or combination of no longer valid RPM and/or DNS entries for the servos.
,
Jan 9 2017
Filed b/34174600 to the eng lab team.
,
Jan 10 2017
Is this keeping the tree throttled? Raising to pri-0, please change back if I am wrong.
,
Jan 10 2017
The real problem was a network switch that was down. Fixed!
,
Jan 14 2017
Issue 681257 has been merged into this issue.
,
Mar 4 2017
,
Apr 17 2017
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by snanda@chromium.org
, Jan 9 2017I looked at two of the failing guado_moblab hosts -- chromeos2-row1-rack8-host1 & chromeos2-row1-rack8-host1. The repair jobs for both of them are failing with: Power cycling chromeos2-row1-rack8-host1-servo failed: Client call exception: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:('Could not determine POE hostname for %s. Please check the servo-interface mapping file.', 'chromeos2-row1-rack8-host1-servo')">