New issue
Advanced search Search tips

Issue 600479 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: ----



Sign in to add a comment

Daisy_paladin & guado_moblab-paladin don't start

Project Member Reported by xixuan@chromium.org, Apr 4 2016

Issue description

Labels: -Pri-1 Pri-0
CQ has been failing consistently. Upping to P0
build139-m2, build116-m2, build143-m2 are offline.
Cc: chrome-troopers@google.com
trooper@
Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
The chromeos master isn't even loading for me right now. Investigating.
It's in the middle of booting up. Someone must've restarted it manually. We'll just have to wait until it's done.

Comment 6 by d...@chromium.org, Apr 4 2016

This is actually really bad. The master is in the middle of its daily cycle and we're trying to debug it. Any idea who decided to restart it? That sort of thing needs to be coordinated with the Infra trooper.
Issue 600526 has been merged into this issue.
Cc: aaboagye@chromium.org
The master is continuously loading builds into memory. Either someone keeps restarting it manually, or something is constantly asking the master to load all these pages.
We're going to try reverting https://chromereviews.googleplex.com/387507013 and see if that helps. Might be something where the floating builder algorithm goes wonky if one of the slaves is offline
Looks like the revert fixed it. The master has been responding well ever since.
Alright, I'm going to reopen the tree then.
The waterfall still claims that several important builders are offline (updating here soon). closing tree.
Builders that offline and their corresponding config name, according to https://uberchromegw.corp.google.com/i/chromeos/buildslaves :

build107-m2	daisy-chromium-pfq
build111-m2     [paladin float]
build116-m2	guado_moblab-paladin
build125-m2	x86-alex-chrome-pfq
build139-m2	daisy-paladin
build143-m2	x86-mario-paladin
build149-m2	x86-generic-chromium-pfq
build158-m2	amd64-generic-chromium-pfq
build183-m2	arm-generic_freon-chromium-pfq
build243-m2	lakitu-paladin
build259-m2	daisy_skate-chrome-pfq
build294-m2	peach_pit-chrome-pfq

Hmm. Now a different set of build slaves are offline. Is this just some sort of rolling slave reboot?
Sorry, I was just watching the master and seeing if it would restart the slaves.

Currently, they all look idle to me. Is that not the case?
On the waterfall all the CQ slaves do seem to list "idle",  but on the buildslave list https://uberchromegw.corp.google.com/i/chromeos/buildslaves  I still see multiple Not Connected buildslaves.
Ah, I see. Another question: For example on guado_moblab-paladin, I see that there are 2 build slaves. One is connected and the other is offline. Are both required to be connected for the CQ to run?
No, only 1 is required, but if one is offline that often indicates a problem (and we have a limited number of backup floats, which are shared, so if two primaries are offline I think we are already hosed.

Comment 21 by d...@chromium.org, Apr 5 2016

Status: Fixed (was: Assigned)
The waterfall has been restarted, the 6 dead slaves have been replaced, and it is back online.
There are 2 other dead slaves shown on the public waterfall:

build85-m2
build91-m2

Could those be brought back up/or replaced as well?
build85-m2 is up and running

build91-m2 is being looked at:
https://bugs.chromium.org/p/chromium/issues/detail?id=600599
Great, thanks!
Project Member

Comment 25 by bugdroid1@chromium.org, Apr 5 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/2bad7ba7bb69dbd9d6acbb4104d809ff8e2c1515

commit 2bad7ba7bb69dbd9d6acbb4104d809ff8e2c1515
Author: dnj@chromium.org <dnj@chromium.org>
Date: Tue Apr 05 01:08:31 2016

CrOS: Replace broken slaves on ChromiumOS.

NOPRESUBMIT=true
TBR=bpastene@chromium.org
BUG= chromium:600479 
TEST=None

Review URL: https://codereview.chromium.org/1862513002

git-svn-id: svn://svn.chromium.org/chrome/trunk/tools/build@299690 0039d316-1c4b-4281-b951-d872f2087c98

[modify] https://crrev.com/2bad7ba7bb69dbd9d6acbb4104d809ff8e2c1515/masters/master.chromiumos/slave_pool.json
[modify] https://crrev.com/2bad7ba7bb69dbd9d6acbb4104d809ff8e2c1515/masters/master.chromiumos/slaves.cfg

Comment 26 by d...@chromium.org, Apr 5 2016

Just finished, everything should be good to go now.
Alright, reopening tree now then.
Labels: VerifyIn-51
Components: Infra>Labs
Labels: -Infra-Labs
Status: Verified (was: Fixed)
Bulk verified

Sign in to add a comment