New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 852809 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Coral HWTests are broken due to lack of DUTs.

Project Member Reported by cra...@chromium.org, Jun 14 2018

Issue description

This morning, coral-paladain was failing with:
"""
NotEnoughDutsError: Not enough DUTs for board: coral, pool: cq; required: 4, found: 0
"""

Looks to me, like all the duts are locked or broken:

craigb@seastorm [~/lab-tools-bin 10:16:34]
$ atest host list -b model:astronaut,pool:suites | cut -b1-130
Host                         Status         Shard                                 Locked  Lock Reason          Locked by  Platform
chromeos2-row4-rack3-host8   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host1   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host3   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host4   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host5   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack3-host1   Repair Failed  chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack3-host2   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack3-host3   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack3-host6   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host11  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host7   Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host12  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host21  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host16  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host20  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos2-row4-rack4-host18  Ready          chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronau
chromeos6-row6-rack3-host1   Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
chromeos6-row6-rack3-host3   Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
chromeos6-row6-rack3-host15  Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
chromeos6-row6-rack3-host17  Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
chromeos6-row6-rack3-host8   Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
chromeos6-row6-rack3-host12  Verifying      chromeos-skunk-4.mtv.corp.google.com  False                        None       astronau
craigb@seastorm [~/lab-tools-bin 10:16:40]
$ atest host list -b model:astronaut,pool:cq | cut -b1-130
Host                         Status  Shard                                 Locked  Lock Reason          Locked by  Platform   Labe
chromeos2-row4-rack3-host10  Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host2   Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host6   Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host9   Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host15  Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host13  Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar
chromeos2-row4-rack4-host17  Ready   chromeos-skunk-4.mtv.corp.google.com  True    decomissioning DVTs  afaris     astronaut  boar

It looks to me like afaris@ has locked most of the duts in the suites pool and all of the duts in the cq pool.

jrbanette@ said something about PVT or DVT boards. Maybe we're replacing the older board versions with newer ones.

Alexis, can we have a handful of boards back so we can run the CQ again?


 

Comment 1 by la...@chromium.org, Jun 14 2018

coral-paladin has been marked experimental for the CQ.

Comment 2 by cra...@chromium.org, Jun 14 2018

Cc: mwolfram@google.com
Also relevant:
https://b.corp.google.com/issues/72357922

Comment 3 by flyboy@chromium.org, Jun 14 2018

Components: -Infra Infra>Client>ChromeOS
Status: Assigned (was: Untriaged)

Comment 4 by cra...@chromium.org, Jun 14 2018

Im on chat with mwortham@ who shared this doc:
https://docs.google.com/spreadsheets/d/1OcIPknYDPdY2iTYjHnj8xh4LGxhPBHfYJfB0CTMwRjI/edit#gid=0

Seems that the doc that the PMs are working from indicates that there are no astronauts in CQ, however the boards that are in the CQ pool are coral/astronauts:
$ atest host list -b board:coral,pool:cq |cut -b1-140
Host                         Status  Shard                                 Locked  Lock Reason  Locked by  Platform   Labels
chromeos2-row4-rack3-host10  Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, os:cros, 
chromeos2-row4-rack4-host2   Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, bluetooth
chromeos2-row4-rack4-host6   Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, bluetooth
chromeos2-row4-rack4-host9   Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, bluetooth
chromeos2-row4-rack4-host15  Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, bluetooth
chromeos2-row4-rack4-host13  Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, hw_jpeg_a
chromeos2-row4-rack4-host17  Ready   chromeos-skunk-4.mtv.corp.google.com  False                None       astronaut  board:coral, hw_video_

Comment 5 by cra...@chromium.org, Jun 14 2018

(afaris@ just unlocked the DUTs in the lab until we get this sorted)

Cc: jkop@chromium.org englab-sys-cros@google.com
We'll work on a plan to be more gentle with the newer revision DUTs
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>Test

Comment 8 by cra...@chromium.org, Jun 14 2018

Much appreciated, johndhong!
Not sure how but the pool allocations for the coral boards got all messed up...

Going through and fixing them...
> Not sure how but the pool allocations for the coral boards got all messed up...
>
> Going through and fixing them...

Wait, what?  Describe what you think is messed up, but please don't
change anything without consultation.

Well I went through and fixed all the pool:cq allocations for coral boards

Before
johndhong@phobrz:~$ cat /tmp/atest_host_list.txt | grep board:coral | count_labels -p
     75 bvt
      1 cellular
      1 chameleon_hdmi_stable
      4 chameleon_hdmi_unstable
      7 cq

Now
johndhong@phobrz:~$ atest host list -b board:coral | count_labels -p
     58 bvt
      1 cellular
      3 chameleon
      1 chameleon_hdmi_stable
      1 chameleon_hdmi_unstable
     42 cq

Safe to say for now we should have more than enough coral boards for CQ :)
I am stopping any coral pool allocation changes until negotiations are done
Status: Fixed (was: Assigned)
postmortem is being prepared here:

https://docs.google.com/document/d/17XvqNxIDXrLrtS0dfo_WTyyousiJMcqL5m3HnmqlM9c/edit

Sign in to add a comment