New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 635806 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

[provisioning failures] The servod configuration are missing for servo boards of pool faft-test-au of veyron_mickey

Project Member Reported by cywang@chromium.org, Aug 9 2016

Issue description

The pool faft-test-au for veyron_mickey has a storm of provisioning failures because the provisioning job tried the 'Rebuilding the servo object.' which tries to connect to the servod but in vain. Turns out the config of servod is missing on the servo boards. Then it tries to reboot the servo, again, in vain, the servo did not return in 90 seconds.

Richard,

  I am not sure how we managed the servod config_9999? I have manually updated the config of the /var/lib/servod/config_9999 on chromeos4-row3-rack10-host15, and now a provisioning task just completed successfully.

 
Summary: [provisioning failures] The servod configuration are missing for servo boards of pool faft-test-au of veyron_mickey (was: The servo boards for pool faft-test-au of veyron_mickey failed to boot in time during provisioning)
Cc: kevcheng@chromium.org ayatane@chromium.org
Was the failed provisioning for firmware provisioning?  Provisioning
a Chrome OS image shouldn't cause a provisioning failure.  However,
provisioning firmware with an unconfigured servo has never worked.

The preferred manual configuration procedure for a servo on
beaglebone is now this command:
    start servod PORT=9999 BOARD=...

My hope was that the old command should also work:
    start servod BOARD=...

kevcheng@ can say whether the old command would work, but the
new one definitely will.

There's ongoing work that I hope to have done by next week that
will mean that we automatically configure servo in a variety of
contexts, including (I expect) firmware provisioning.

I just tried 'start servod BOARD=...' and it worked for me (created config_9999).
I just checked, too.  On beaglebone, this command will work to
configure an unconfigured beaglebone:
    start servod BOARD=veyron_mickey

Running FAFT tests without doing this initial, one-time configuration step
isn't supported, and has never worked.

Looking at the faft-test-au pool, I see many of the hosts are in
Destiny.  Do you know when/how these DUTs got installed, and who
did the work?  The standard deployment procedure automatically
runs the servo configuration step.

Cc: haoweiw@chromium.org
+haowei@ to comment on the who, when, and how of deployment
for the FAFT pool DUTs.  Especially, on these five mickey DUTs:
    chromeos4-row3-rack10-host10
    chromeos4-row3-rack10-host12
    chromeos4-row3-rack10-host14
    chromeos4-row3-rack10-host15
    chromeos4-row3-rack10-host16

Comment 6 by cywang@google.com, Aug 9 2016

This issue also happened on the same pool for veyron_jaq(I am not sure if there are more servos impacted as we could see the failure rate jump up around the same time, see https://pcon.corp.google.com/p#cywang/CROS%20infra%20health?duration=3w).

Comment 7 by cywang@chromium.org, Aug 10 2016

After I manually updated the config_9999 on the pool FAFT yesterday, now the failure rate drops to under 5 percent.

So please examine how those servo boards were configured and we believe there are more servo boards affected accordingly.
veyron_mickey_provisioning_failure_rate.png
419 KB View Download

Comment 8 by cywang@chromium.org, Aug 12 2016

The root cause is mainly from the different versions of servo even in the same pool 'faft-test-au'. Here is a list of servo boards (showing their build and the config_9999 if exists after I manually start servod):

Question: do we have a plan to make all servo boards upgrade to the same cros version?


=== chromeos4-row6-rack1-host13-servo lulu ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=lulu
=== chromeos4-row6-rack11-host11-servo veyron_mighty ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_mighty
=== chromeos4-row6-rack11-host12-servo veyron_mighty ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_mighty
=== chromeos4-row8-rack3-host14-servo candy ===
CHROMEOS_RELEASE_CHROME_MILESTONE=47
CHROMEOS_RELEASE_VERSION=7394.0.0
=== chromeos2-row24-rack2-host11-servo samus ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8368.0.0
=== chromeos2-row24-rack3-host11-servo samus ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8368.0.0
=== chromeos2-row24-rack3-host13-servo samus ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8368.0.0
=== chromeos4-row2-rack4-host13-servo tricky ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=tricky
=== chromeos4-row2-rack4-host11-servo tricky ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=tricky
=== chromeos4-row10-rack7-host11-servo auron_yuna ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=auron_yuna
=== chromeos4-row10-rack6-host13-servo auron_yuna ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=auron_yuna
=== chromeos4-row9-rack9-host11-servo veyron_minnie ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_minnie
=== chromeos4-row9-rack9-host13-servo veyron_minnie ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_minnie
=== chromeos4-row2-rack11-host11-servo mccloud ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=mccloud
=== chromeos4-row2-rack11-host12-servo mccloud ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=mccloud
=== chromeos4-row2-rack11-host13-servo mccloud ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=mccloud
=== chromeos4-row8-rack5-host13-servo auron_paine ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8368.0.0
=== chromeos4-row8-rack4-host11-servo auron_paine ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8489.0.0
=== chromeos4-row8-rack4-host13-servo auron_paine ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=auron_paine
=== chromeos1-row2-rack11-host4-servo veyron_minnie ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_minnie
=== chromeos4-row3-rack4-host15-servo ninja ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=ninja
=== chromeos4-row4-rack6-host10-servo veyron_jaq ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_jaq
=== chromeos4-row4-rack6-host13-servo veyron_jaq ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_jaq
=== chromeos4-row4-rack6-host22-servo veyron_jaq ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_jaq
=== chromeos4-row4-rack7-host12-servo veyron_jaq ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_jaq
=== chromeos4-row5-rack5-host15-servo gandof ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=gandof
=== chromeos4-row4-rack11-host11-servo veyron_speedy ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_speedy
=== chromeos4-row3-rack5-host12-servo rikku ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=rikku
=== chromeos4-row3-rack5-host13-servo rikku ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=rikku
=== chromeos4-row3-rack5-host10-servo rikku ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=rikku
=== chromeos4-row3-rack6-host16-servo rikku ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=rikku
=== chromeos4-row3-rack6-host13-servo rikku ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=rikku
=== chromeos4-row3-rack7-host12-servo guado ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=guado
=== chromeos4-row3-rack8-host10-servo guado ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=guado
=== chromeos4-row3-rack8-host11-servo guado ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=guado
=== chromeos4-row6-rack2-host11-servo lulu ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=lulu
=== chromeos4-row6-rack2-host13-servo lulu ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=lulu
=== chromeos4-row5-rack7-host11-servo gandof ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=gandof
=== chromeos4-row5-rack7-host13-servo gandof ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=gandof
=== chromeos4-row12-rack2-host13-servo gnawty ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=gnawty
=== chromeos4-row9-rack9-host14-servo veyron_minnie ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=veyron_minnie
=== chromeos4-row3-rack9-host10-servo ninja ===
CHROMEOS_RELEASE_CHROME_MILESTONE=54
CHROMEOS_RELEASE_VERSION=8638.0.0
BOARD=ninja
=== chromeos4-row3-rack10-host11-servo veyron_mickey ===
CHROMEOS_RELEASE_CHROME_MILESTONE=53
CHROMEOS_RELEASE_VERSION=8489.0.0
=== chromeos4-row12-rack11-host15-servo cyan-cheets ===
All servos in the lab are expected to be running 8638.0.0.
Any servo _not_ running that build has an error that's
causing it to fail AU.

Status: WontFix (was: Untriaged)
I checked all hosts in faft pools.  Only three have servos that are both
working and out-of-date.  None have run tests since 7/10, at the earliest.

Closing for lack of an actionable problem.

Yes, after manually starting servod (and make sure their config_9999 are not empty), the failure rate of several boards went down during the weekend as attached.
failure_rate_reduced.png
1.1 MB View Download

Sign in to add a comment