Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users
Status: Archived
Owner:
Closed: Jan 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
guado_moblab paladin is failing HWTest; DUTs in repair failed state
Project Member Reported by rspangler@chromium.org, Jan 9 2017 Back to list
Builds have been failing consistently since Sunday, for example:

https://luci-milo.appspot.com/buildbot/chromeos/guado_moblab-paladin/4690

Snippet of log that contains the failure.

  moblab_DummyServerSuite     ABORT: Timed out, did not run.
  
  Suite timings:
  Downloads started at 2017-01-09 07:51:16
  Payload downloads ended at 2017-01-09 07:51:25
  Suite started at 2017-01-09 07:51:51
  Artifact downloads ended (at latest) at 2017-01-09 07:51:55
  Testing started at 2017-01-09 09:54:24
  Testing ended at 2017-01-09 09:54:24
  
  
  Links to test logs:
  Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/
  moblab_DummyServerSuite http://cautotest/tko/retrieve_logs.cgi?job=/results/95313367-chromeos-test/
  
  
  
  Attempting to display pool info: cq
  host: chromeos2-row1-rack8-host1, status: Repair Failed, locked: False diagnosis: Failed repair
  labels: ['bluetooth', 'power:AC_only', 'storage:ssd', 'hw_video_acc_enc_h264', 'hw_jpeg_acc_dec', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'board:guado_moblab', 'hw_video_acc_vp9', 'cts_abi_x86', 'cts_abi_arm', 'guado_moblab', 'sku:guado_intel_broadwell_i3_4Gb', 'variant:guado', 'os:moblab', 'phase:PVT', 'pool:cq', 'cros-version:guado_moblab-paladin/R57-9163.0.0-rc1']
  Last 10 jobs within 2:18:00:
  247733 Repair started on: 2017-01-09 09:45:16 status FAIL
  247725 Verify started on: 2017-01-09 09:43:50 status FAIL
  247708 Repair started on: 2017-01-09 09:15:11 status FAIL
  247701 Verify started on: 2017-01-09 09:13:43 status FAIL
  247685 Repair started on: 2017-01-09 08:45:05 status FAIL
  247677 Verify started on: 2017-01-09 08:43:37 status FAIL
  247663 Repair started on: 2017-01-09 08:14:56 status FAIL
  247656 Verify started on: 2017-01-09 08:13:30 status FAIL
  247644 Repair started on: 2017-01-09 07:44:46 status FAIL

Looking at https://viceroy.corp.google.com/chromeos/dut_health?board=guado_moblab the Working managed DUTs dropped to 0; broken managed DUTs has gone up.

 
I looked at two of the failing guado_moblab hosts -- chromeos2-row1-rack8-host1 & chromeos2-row1-rack8-host1.  The repair jobs for both of them are failing with:

Power cycling chromeos2-row1-rack8-host1-servo failed: Client call exception: <Fault 1: "<class 'rpm_infrastructure_exception.RPMInfrastructureException'>:('Could not determine POE hostname for %s. Please check the servo-interface mapping file.', 'chromeos2-row1-rack8-host1-servo')">
Cc: jrbarnette@chromium.org
Cc: johndhong@chromium.org
John, any thoughts on the failure being reported in c#1?
Cc: -johndhong@chromium.org haoweiw@chromium.org englab-sys-cros@google.com
Unfortunately I'm no longer working on additional ChromeOS work.

However going off the error message I believe it's either or combination of no longer valid RPM and/or DNS entries for the servos.
Filed b/34174600 to the eng lab team.
Labels: -Pri-1 Pri-0
Is this keeping the tree throttled?  Raising to pri-0, please change back if I am wrong.
Status: Fixed
The real problem was a network switch that was down. Fixed!
 Issue 681257  has been merged into this issue.
Comment 9 by dchan@google.com, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Comment 13 by dchan@chromium.org, Oct 14 (3 days ago)
Status: Archived
Sign in to add a comment