New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 698336 link

Starred by 5 users

Issue metadata

Status: Archived
Owner:
Closed: Apr 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Shortage of Trybot Buildslaves

Project Member Reported by dgarr...@chromium.org, Mar 3 2017

Issue description

We appear to be running out of tryjob build slaves for no-vmtest-pre-cq, and precq tryjobs are timing out.

This are GCE builders, and so can be quickly acquired. See go/cros-gce-bots.

We are also using every single physical tryjob builder, so might need more capacity there, which requires tickets to the Golo team and will take longer.

We currently have capacity for ~20 new GCE builders, but will soon have a lot more capacity.
 
Labels: -Pri-2 Pri-1
lakitu failed due to a GCE problem.

https://luci-milo.appspot.com/buildbot/chromeos/lakitu-paladin/5743

Is it related ?
No, that's different. The lakitu tests use Lakitu owned GCE instances that we don't manage at all.

Quckie glance, that error is either be a production GCE issue, or catching a bad CL that only affects lakitu.
Cc: -nxia@chromium.org xixuan@chromium.org
Owner: nxia@chromium.org
Since we're in better shape, it's not that urgent now. After discussion, I will pass this bug to next deputy and learn from next deputy how to make these changes.
Labels: -current-issue

Comment 6 by nxia@chromium.org, Mar 10 2017

Status: Started (was: Untriaged)
Project Member

Comment 7 by bugdroid1@chromium.org, Mar 10 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/49bf77a6fc2fcd930df4c094294343fa8d5c23c5

commit 49bf77a6fc2fcd930df4c094294343fa8d5c23c5
Author: Ningning Xia <nxia@google.com>
Date: Fri Mar 10 01:26:36 2017

Project Member

Comment 8 by bugdroid1@chromium.org, Mar 10 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/infra_internal/+/49bf77a6fc2fcd930df4c094294343fa8d5c23c5

commit 49bf77a6fc2fcd930df4c094294343fa8d5c23c5
Author: Ningning Xia <nxia@google.com>
Date: Fri Mar 10 01:26:36 2017

Project Member

Comment 9 by bugdroid1@chromium.org, Mar 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/f6b2cad8b240fb023ba51208c9d3166e0822263f

commit f6b2cad8b240fb023ba51208c9d3166e0822263f
Author: Ningning Xia <nxia@google.com>
Date: Fri Mar 10 02:08:56 2017

Add new GCEs to the Pre-CQ slave pool.

BUG= 698336 
TEST=None

Change-Id: Id30a62efe8c3a5a2229ee814e6dbf7d1f0a116b8
Reviewed-on: https://chromium-review.googlesource.com/452667
Reviewed-by: Ryan Tseng <hinoka@chromium.org>
Commit-Queue: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/f6b2cad8b240fb023ba51208c9d3166e0822263f/masters/master.chromiumos.tryserver/slave_pool.json
[modify] https://crrev.com/f6b2cad8b240fb023ba51208c9d3166e0822263f/masters/master.chromiumos.tryserver/slaves.cfg

Comment 10 by nxia@chromium.org, Mar 10 2017

Status: Fixed (was: Started)
Look at the stats at: https://viceroy.corp.google.com/chrome_infra/Buildbot/overview_v2?duration=7d&job=master.chromiumos.tryserver&master=master.chromiumos.tryserver&refresh=-1

Shows a couple of things which doesn't make sense to me:
1. pending builds prior to the restart today around 1pm was stuck high despite idle builders
2. connected slave ratio went up from 95 to 100 despite added 10 new devices.  https://viceroy.corp.google.com/chrome_infra/Buildbot/per_master?duration=1d&hostname=build217-m2&master=master.chromiumos.tryserver&refresh=-1 shows 122 slaves because it's double counting build215-m2, build230-m2, which matches https://uberchromegw.corp.google.com/i/chromiumos.tryserver/buildslaves

Comment 12 by nxia@chromium.org, Mar 10 2017

Re #11

1. The waterfall was restarted at 10:30am today, but the waterfall didn't get up and got fixed at 11:30 pm. crbug.com/700252

Status: Available (was: Fixed)
I just had a change from 2:30pm fail on pre-CQ:
https://chromium-review.googlesource.com/#/c/452004/

Comment 14 by nxia@chromium.org, Mar 10 2017

Re #13,

caroline-pre-cq uses baremetal bots to run VMTest, it doesn't run on GCEs. 
That's an excellent point. That suggests we also have a shortage in the "PreCQ" pool, not just the "PreCQ (GCE)" pool that was just increased.

Adding physical builders requires a ticket to CrOps lab team.

Also, that pool has one builder offline. "build214-m2" that should probably be fixed.

Comment 16 by nxia@chromium.org, Mar 11 2017

Cc: nxia@chromium.org keta...@chromium.org dgarr...@chromium.org gkihumba@chromium.org bhthompson@chromium.org akes...@chromium.org
 Issue 628751  has been merged into this issue.

Comment 17 by nxia@chromium.org, Mar 13 2017

Cc: -nxia@chromium.org
Owner: xixuan@chromium.org
have used up the ips and deployed 10 GCEs. Pass to the current deputy to keep eyes on the idle rate. 
Status: Fixed (was: Available)

Comment 19 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 21 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment