New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719767 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 719618
issue 720124



Sign in to add a comment

Create a findit bot pool in swarmbucket

Project Member Reported by robert...@chromium.org, May 8 2017

Issue description

In preparation for moving our tryserver bots to swarming, we need to have a logical swarming pool established.
 
Blocking: 720124

Comment 2 by st...@chromium.org, May 9 2017

Cc: chanli@chromium.org lijeffrey@chromium.org mar...@chromium.org
Labels: -Pri-3 Pri-2
Status: Assigned (was: Untriaged)

Comment 3 by st...@chromium.org, May 9 2017

Cc: vadimsh@chromium.org
vadimsh@, would you mind advising how to proceed here?

Findit needs dedicated beefy bots to speed up compile.
Currently we have 32-core GCE Linux VM, and other beefy physical Mac/Windows machine in the labs.
Cc: smut@chromium.org
+ Sana for MP part.

tl;dr We can safely proceed with non-GCE part. For GCE part there are some risks, since this is new. Sana (or me, probably) can help there.

Longer version:

For non-GCE, ask labs to allocate you bots Mac/Win bots. The result of this process would be a bunch of machines (e.g "buildXYZ-b4") connected to Swarming in pool:Chrome (that's default one). They can be moved into a separate pool by declaring a new bot_group in bots.cfg (see for example https://chrome-internal.googlesource.com/infradata/config/+/535c8b21a389d22dbc412be26646080778e9a969/configs/chromium-swarm/bots.cfg#453).

In particular, manually allocated bots can be listed via bot_id property ('bot_id: "buildXYZ-b4").

For GCE bots it will be more future friendly to use Machine Provider (it's the thing we currently use to manage all tryserver GCE bots, it automatically launches/respawns/shutdowns VMs, cool stuff). It is pretty new and we haven't used it for different machines configurations before, so there may be some bumps (it should work, in theory):
1. Declare new beefy VM template in https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-backend/templates.cfg (by copy pasting existing one and adjusting instance class, disk size, etc). I think it also needs a new dimension to distinguish it from non-beefy VMs e.g. dimensions: "size:beefy" (or something).
2. Declare new VM manager that uses this beefy template in https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-backend/managers.cfg
3. Tell Swarming to lease a bunch of beefy VMs from Machine Provider and put them in your pool. In bots.cfg:
bot_group {
  machine_type {
    name: "findit-beefy-gce"
    description: "Beefy Trusty instances for pool:ChromeFindIt."
    early_release_secs: 7200
    lease_duration_secs: 86400
    mp_dimensions: "os_family:LINUX"
    mp_dimensions: "linux_flavor:UBUNTU"
    mp_dimensions: "os_version:14.04"
    mp_dimensions: "size:beefy"
    target_size: 2
  }
  ...
}

(Hm, I think we will also need to declare "size:small" dimension for all non-beefy machines first, otherwise beefy machines could get allocated for non-beefy purposes).
Also FYI if you aren't aware of this already. There are some issues with compilation on Swarming. See https://bugs.chromium.org/p/chromium/issues/detail?id=706224 (and the entire bug tree there). nodir@ is actively working on fixing this.

Comment 6 by s...@google.com, May 10 2017

Machine Provider would be great and I'm happy to help set up MP bots for findit, but we recycle bots every 24 hours and don't have a solution to persist the cache, so every 24 hours you may get a long sync and compile (if you even do compilation).

Regarding size:beefy and size:small, Vadim's right, but we can express this in terms of num_cpus, memory_gb, or disk_gb which is all information MP already has. No need to introduce a new dimension, GCE Backend automatically computes values for these dimensions. We would just need to modify bots.cfg to request, say, num_cpus:8 for pool:Chrome and num_cpus:16 for pool:ChromeFindIt.
Unfortunately we really need to keep the caches around, as our aim is to reduce latency to finding culprit.

Until there is a solution for this, can we get hardware from labs and assign it to swarming the same way as Mac/Win bots?
re #5, yes we are aware and that's why we are piloting this with Linux rather than win.
ok :(

Ask labs to allocate a beefy GCE bot for Swarming. It will have a name "swarm*-c*", and can be added to bot_group directly by specifying bot_id (just like golo or labs bots). 
Project Member

Comment 10 by bugdroid1@chromium.org, May 12 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc

commit 3e24b32440f797af57bad371f95efe1f05dfa9cc
Author: Roberto Carrillo <robertocn@google.com>
Date: Fri May 12 21:52:20 2017

Project Member

Comment 11 by bugdroid1@chromium.org, May 12 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc

commit 3e24b32440f797af57bad371f95efe1f05dfa9cc
Author: Roberto Carrillo <robertocn@google.com>
Date: Fri May 12 21:52:20 2017

Project Member

Comment 12 by bugdroid1@chromium.org, May 12 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc

commit 3e24b32440f797af57bad371f95efe1f05dfa9cc
Author: Roberto Carrillo <robertocn@google.com>
Date: Fri May 12 21:52:20 2017

Status: Fixed (was: Assigned)

Sign in to add a comment