New issue
Advanced search Search tips
Starred by 2 users

Issue metadata

Status: Archived
Closed: Jan 2017
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Sign in to add a comment

guado_moblab paladin is failing HWTest; DUTs in repair failed state

Project Member Reported by, Jan 9 2017 Back to list

Issue description

Builds have been failing consistently since Sunday, for example:

Snippet of log that contains the failure.

  moblab_DummyServerSuite     ABORT: Timed out, did not run.
  Suite timings:
  Downloads started at 2017-01-09 07:51:16
  Payload downloads ended at 2017-01-09 07:51:25
  Suite started at 2017-01-09 07:51:51
  Artifact downloads ended (at latest) at 2017-01-09 07:51:55
  Testing started at 2017-01-09 09:54:24
  Testing ended at 2017-01-09 09:54:24
  Links to test logs:
  Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/
  moblab_DummyServerSuite http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/
  Attempting to display pool info: cq
  host: chromeos2-row1-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
  labels: ['bluetooth', 'power:AC_only', 'storage:ssd', 'hw_video_acc_enc_h264', 'hw_jpeg_acc_dec', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'board:guado_moblab', 'hw_video_acc_vp9', 'cts_abi_x86', 'cts_abi_arm', 'guado_moblab', 'sku:guado_intel_broadwell_i3_4Gb', 'variant:guado', 'os:moblab', 'phase:PVT', 'pool:cq', 'cros-version:guado_moblab-paladin/R57-9163.0.0-rc1']
  Last 10 jobs within 2:18:00:
  247733 Repair started on: 2017-01-09 09:45:16 status FAIL
  247725 Verify started on: 2017-01-09 09:43:50 status FAIL
  247708 Repair started on: 2017-01-09 09:15:11 status FAIL
  247701 Verify started on: 2017-01-09 09:13:43 status FAIL
  247685 Repair started on: 2017-01-09 08:45:05 status FAIL
  247677 Verify started on: 2017-01-09 08:43:37 status FAIL
  247663 Repair started on: 2017-01-09 08:14:56 status FAIL
  247656 Verify started on: 2017-01-09 08:13:30 status FAIL
  247644 Repair started on: 2017-01-09 07:44:46 status FAIL

Looking at the Working managed DUTs dropped to 0; broken managed DUTs has gone up.

I looked at two of the failing guado_moblab hosts -- chromeos2-row1-rack8-host1 & chromeos2-row1-rack8-host1.  The repair jobs for both of them are failing with:

Power cycling chromeos2-row1-rack8-host1-servo failed: Client call exception: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:('Could not determine POE hostname for %s. Please check the servo-interface mapping file.', 'chromeos2-row1-rack8-host1-servo')">
John, any thoughts on the failure being reported in c#1?
Unfortunately I'm no longer working on additional ChromeOS work.

However going off the error message I believe it's either or combination of no longer valid RPM and/or DNS entries for the servos.
Filed b/34174600 to the eng lab team.
Labels: -Pri-1 Pri-0
Is this keeping the tree throttled?  Raising to pri-0, please change back if I am wrong.
Status: Fixed
The real problem was a network switch that was down. Fixed!
 Issue 681257  has been merged into this issue.

Comment 9 by, Mar 4 2017

Labels: VerifyIn-58

Comment 10 by, Apr 17 2017

Labels: VerifyIn-59

Comment 11 by, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61
Status: Archived

Sign in to add a comment