Create a findit bot pool in swarmbucket |
|||||
Issue descriptionIn preparation for moving our tryserver bots to swarming, we need to have a logical swarming pool established.
,
May 9 2017
,
May 9 2017
vadimsh@, would you mind advising how to proceed here? Findit needs dedicated beefy bots to speed up compile. Currently we have 32-core GCE Linux VM, and other beefy physical Mac/Windows machine in the labs.
,
May 9 2017
+ Sana for MP part. tl;dr We can safely proceed with non-GCE part. For GCE part there are some risks, since this is new. Sana (or me, probably) can help there. Longer version: For non-GCE, ask labs to allocate you bots Mac/Win bots. The result of this process would be a bunch of machines (e.g "buildXYZ-b4") connected to Swarming in pool:Chrome (that's default one). They can be moved into a separate pool by declaring a new bot_group in bots.cfg (see for example https://chrome-internal.googlesource.com/infradata/config/+/535c8b21a389d22dbc412be26646080778e9a969/configs/chromium-swarm/bots.cfg#453). In particular, manually allocated bots can be listed via bot_id property ('bot_id: "buildXYZ-b4"). For GCE bots it will be more future friendly to use Machine Provider (it's the thing we currently use to manage all tryserver GCE bots, it automatically launches/respawns/shutdowns VMs, cool stuff). It is pretty new and we haven't used it for different machines configurations before, so there may be some bumps (it should work, in theory): 1. Declare new beefy VM template in https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-backend/templates.cfg (by copy pasting existing one and adjusting instance class, disk size, etc). I think it also needs a new dimension to distinguish it from non-beefy VMs e.g. dimensions: "size:beefy" (or something). 2. Declare new VM manager that uses this beefy template in https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-backend/managers.cfg 3. Tell Swarming to lease a bunch of beefy VMs from Machine Provider and put them in your pool. In bots.cfg: bot_group { machine_type { name: "findit-beefy-gce" description: "Beefy Trusty instances for pool:ChromeFindIt." early_release_secs: 7200 lease_duration_secs: 86400 mp_dimensions: "os_family:LINUX" mp_dimensions: "linux_flavor:UBUNTU" mp_dimensions: "os_version:14.04" mp_dimensions: "size:beefy" target_size: 2 } ... } (Hm, I think we will also need to declare "size:small" dimension for all non-beefy machines first, otherwise beefy machines could get allocated for non-beefy purposes).
,
May 9 2017
Also FYI if you aren't aware of this already. There are some issues with compilation on Swarming. See https://bugs.chromium.org/p/chromium/issues/detail?id=706224 (and the entire bug tree there). nodir@ is actively working on fixing this.
,
May 10 2017
Machine Provider would be great and I'm happy to help set up MP bots for findit, but we recycle bots every 24 hours and don't have a solution to persist the cache, so every 24 hours you may get a long sync and compile (if you even do compilation). Regarding size:beefy and size:small, Vadim's right, but we can express this in terms of num_cpus, memory_gb, or disk_gb which is all information MP already has. No need to introduce a new dimension, GCE Backend automatically computes values for these dimensions. We would just need to modify bots.cfg to request, say, num_cpus:8 for pool:Chrome and num_cpus:16 for pool:ChromeFindIt.
,
May 10 2017
Unfortunately we really need to keep the caches around, as our aim is to reduce latency to finding culprit. Until there is a solution for this, can we get hardware from labs and assign it to swarming the same way as Mac/Win bots?
,
May 10 2017
re #5, yes we are aware and that's why we are piloting this with Linux rather than win.
,
May 10 2017
ok :( Ask labs to allocate a beefy GCE bot for Swarming. It will have a name "swarm*-c*", and can be added to bot_group directly by specifying bot_id (just like golo or labs bots).
,
May 12 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc commit 3e24b32440f797af57bad371f95efe1f05dfa9cc Author: Roberto Carrillo <robertocn@google.com> Date: Fri May 12 21:52:20 2017
,
May 12 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc commit 3e24b32440f797af57bad371f95efe1f05dfa9cc Author: Roberto Carrillo <robertocn@google.com> Date: Fri May 12 21:52:20 2017
,
May 12 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/3e24b32440f797af57bad371f95efe1f05dfa9cc commit 3e24b32440f797af57bad371f95efe1f05dfa9cc Author: Roberto Carrillo <robertocn@google.com> Date: Fri May 12 21:52:20 2017
,
May 16 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by robert...@chromium.org
, May 9 2017