New issue
Advanced search Search tips

Issue 808298 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 777642



Sign in to add a comment

lots of offline bots in the win_msvc_cq pool, need to add more

Project Member Reported by dpranke@chromium.org, Feb 2 2018

Issue description

In theory the win_msvc_cq pool on tryserver.chromium.win is supposed to have 92 bots in it, but it looks like ~15 of them are permanently offline. We need them (or their replacements) back, and probably should add another ~10-20 or so bots on top of that.

This graph -- http://shortn/_mCK2ZoMPCP -- shows the usage over the past day, with win-msvc-dbg at 25%. I've posted https://crrev.com/c/898350 to bump it to 50%, which should give us a better sense of how many more bots we might need.

The bots that appear to be offline (from https://ci.chromium.org/buildbot/tryserver.chromium.win/win-msvc-dbg/ ):

vm612-m4
vm627-m4
vm632-m4
vm636-m4
vm638-m4
vm715-m4
vm717-m4
vm753-m4
vm755-m4
vm764-m4
vm895-m4
vm896-m4
vm950-m4
vm951-m4
vm952-m4

 
Components: Infra>Labs
Labels: -Pri-3 Pri-1
Blocking: 777642
Looks like powercycling w/ vmpower brought back vm627-m4, vm632-m4, vm715-m4,  vm717-m4, vm753-m4, and vm895-m4, leaving the other nine:

vm612-m4
vm636-m4
vm638-m4
vm755-m4
vm764-m4
vm896-m4
vm950-m4
vm951-m4
vm952-m4

I expect Labs probably needs to take it from here.

Comment 4 by d...@chromium.org, Feb 2 2018

Owner: d...@chromium.org
Status: Assigned (was: Untriaged)
vm95{0..2}-m4 look to be double allocated, as they're 10.9.5 Mac VMs:

$ botmap.py 2>/dev/null | grep vm95'[0-2]'-m4
vm950-m4                                win       buildbot            master.tryserver.chromium.win
vm950-m4                                mac       swarming            chromium-swarm.appspot.com
vm951-m4                                win       buildbot            master.tryserver.chromium.win
vm951-m4                                mac       swarming            chromium-swarm.appspot.com
vm952-m4                                win       buildbot            master.tryserver.chromium.win
vm952-m4                                mac       swarming            chromium-swarm.appspot.com

vm950-m4 os x 10.9.5 (13f1096)
vm951-m4 os x 10.9.5 (13f1096)
vm952-m4 os x 10.9.5 (13f1096)

I'm going to assume this was a mistake to make these swarming vms, so I'll redeploy those.

The other bots start their buildslave process and connect to the master then the process instantly exits. I'm going to try re-bootstrapping those.

Comment 5 by d...@chromium.org, Feb 2 2018

Bots in #3 are now all reconnected.

Will it still be desired to expand this pool?
Cc: d...@chromium.org
Owner: dpranke@chromium.org
Status: Started (was: Assigned)
The CL to bump up the load hasn't landed yet, so I can't say. I'll get that landed and see, and then report back if so desired. 

Thanks!
vadimsh@ found win{1508..1526}-c4 as unused (see  bug 75620 #c72) so that'll help, too.
Owner: d...@chromium.org
@dba - at a guess, it looks like we'll probably need 15 more bots in addition to the 18 listed in #c7 (which were actually slave{1508..1526}-c4, from bug 756270). Is that doable?
looks like 15{09,10,11} are already in use, so maybe 20 more instead ...
Status: Fixed (was: Started)
Actually, looks like the bots in this pool are all currently win7 ESX VMs. Mixing those with win10 GCE VMs can't be a good idea.

I think instead I'm going to close this out, and move the win-msvc-dbg bot over to the win10_gce pool, which looks like it has plenty of capacity.

Sign in to add a comment