Shortage of Trybot Buildslaves |
||||||||||||
Issue descriptionWe appear to be running out of tryjob build slaves for no-vmtest-pre-cq, and precq tryjobs are timing out. This are GCE builders, and so can be quickly acquired. See go/cros-gce-bots. We are also using every single physical tryjob builder, so might need more capacity there, which requires tickets to the Golo team and will take longer. We currently have capacity for ~20 new GCE builders, but will soon have a lot more capacity.
,
Mar 3 2017
No, that's different. The lakitu tests use Lakitu owned GCE instances that we don't manage at all. Quckie glance, that error is either be a production GCE issue, or catching a bad CL that only affects lakitu.
,
Mar 3 2017
Since we're in better shape, it's not that urgent now. After discussion, I will pass this bug to next deputy and learn from next deputy how to make these changes.
,
Mar 7 2017
,
Mar 9 2017
Still happening: https://chromium-review.googlesource.com/#/c/447822/
,
Mar 10 2017
,
Mar 10 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/49bf77a6fc2fcd930df4c094294343fa8d5c23c5 commit 49bf77a6fc2fcd930df4c094294343fa8d5c23c5 Author: Ningning Xia <nxia@google.com> Date: Fri Mar 10 01:26:36 2017
,
Mar 10 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/49bf77a6fc2fcd930df4c094294343fa8d5c23c5 commit 49bf77a6fc2fcd930df4c094294343fa8d5c23c5 Author: Ningning Xia <nxia@google.com> Date: Fri Mar 10 01:26:36 2017
,
Mar 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/f6b2cad8b240fb023ba51208c9d3166e0822263f commit f6b2cad8b240fb023ba51208c9d3166e0822263f Author: Ningning Xia <nxia@google.com> Date: Fri Mar 10 02:08:56 2017 Add new GCEs to the Pre-CQ slave pool. BUG= 698336 TEST=None Change-Id: Id30a62efe8c3a5a2229ee814e6dbf7d1f0a116b8 Reviewed-on: https://chromium-review.googlesource.com/452667 Reviewed-by: Ryan Tseng <hinoka@chromium.org> Commit-Queue: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/f6b2cad8b240fb023ba51208c9d3166e0822263f/masters/master.chromiumos.tryserver/slave_pool.json [modify] https://crrev.com/f6b2cad8b240fb023ba51208c9d3166e0822263f/masters/master.chromiumos.tryserver/slaves.cfg
,
Mar 10 2017
,
Mar 10 2017
Look at the stats at: https://viceroy.corp.google.com/chrome_infra/Buildbot/overview_v2?duration=7d&job=master.chromiumos.tryserver&master=master.chromiumos.tryserver&refresh=-1 Shows a couple of things which doesn't make sense to me: 1. pending builds prior to the restart today around 1pm was stuck high despite idle builders 2. connected slave ratio went up from 95 to 100 despite added 10 new devices. https://viceroy.corp.google.com/chrome_infra/Buildbot/per_master?duration=1d&hostname=build217-m2&master=master.chromiumos.tryserver&refresh=-1 shows 122 slaves because it's double counting build215-m2, build230-m2, which matches https://uberchromegw.corp.google.com/i/chromiumos.tryserver/buildslaves
,
Mar 10 2017
Re #11 1. The waterfall was restarted at 10:30am today, but the waterfall didn't get up and got fixed at 11:30 pm. crbug.com/700252
,
Mar 10 2017
I just had a change from 2:30pm fail on pre-CQ: https://chromium-review.googlesource.com/#/c/452004/
,
Mar 10 2017
Re #13, caroline-pre-cq uses baremetal bots to run VMTest, it doesn't run on GCEs.
,
Mar 10 2017
That's an excellent point. That suggests we also have a shortage in the "PreCQ" pool, not just the "PreCQ (GCE)" pool that was just increased. Adding physical builders requires a ticket to CrOps lab team. Also, that pool has one builder offline. "build214-m2" that should probably be fixed.
,
Mar 11 2017
Issue 628751 has been merged into this issue.
,
Mar 13 2017
have used up the ips and deployed 10 GCEs. Pass to the current deputy to keep eyes on the idle rate.
,
Apr 3 2017
,
May 30 2017
,
Aug 1 2017
,
Jan 22 2018
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by xixuan@chromium.org
, Mar 3 2017