Add new "PreCQ" dimension to some percentage of Chrome OS Swarming machine pool |
|||||||
Issue descriptionToday, we have a uniform pool of machines that we schedule Swarming tasks against. We have enough capacity for all of the peak load that we might have. We would like to schedule more PreCQ jobs than we will have capacity for without exhausting other things (CQ, Release, Tryjob, Firmware, etc.). We will be doing this because we want to increase PreCQ board coverage before we have new machines to fully cover that and we want to use Buildbucket's FIFO to manage the queuing implementation. To accomplish this, we need to add a new dimension to all of the machines that we use for PreCQ today of "role:precq" and then update the Buildbucket PreCQ job to require that role. Once we've done that, PreCQ jobs will only run up-to the number of machines that we have marked with this role. But other jobs can freely use machines in the PreCQ pool opportunistically. If these other jobs starving PreCQ becomes an issue, we can move PreCQ to a higher scheduler priority. I have started on this.
,
May 16 2018
,
May 16 2018
Per Don, we have 210 Swarming builders right now of which 80 of those came from moving PreCQ builders over to Swarming. We're looking to put an upper limit on the PreCQ consumption so 100 seems like a good guess. I'm going to label 100 hosts as role:precq.
,
Jun 5 2018
Current configuration: https://chrome-internal.googlesource.com/chromeos/manifest-internal/+/infra/config What luci-config sees: https://luci-config.appspot.com/#/projects/chromeos LUCI docs on bots: https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/User-Guide.md and on creating a new Swarming pool: https://chrome-internal.git.corp.google.com/infra/infra_internal/+/HEAD/doc/luci/new_swarming_pool.md# Example bots.cfg from chromium-swarm: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg Our existing CrOS Swarming bot configuration: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chrome-swarming/bots.cfg#110
,
Jun 18 2018
Per the email, splitting this to 100 bots for "role:precq" and 70 bots for "role:tryjob". For simplicity, the CQ jobs will always be able to trump any other jobs since they have full run to all 210 bots. We could slice and dice this further if we so choose. In the future, using Machine Provider will allow us to set dimensions on machine allocation, avoiding the use of names for identification. This also allows us to move some builder configs to a single location. -- Mike
,
Jun 25 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/d14f3fb5b98cfa7a646d7113f2c2a3b26ba912da commit d14f3fb5b98cfa7a646d7113f2c2a3b26ba912da Author: Mike Nichols <mikenichols@chromium.org> Date: Mon Jun 25 23:09:06 2018
,
Jun 25 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/bd9af87f8675132146849ed0a35046810ea55cb7 commit bd9af87f8675132146849ed0a35046810ea55cb7 Author: Mike Nichols <mikenichols@google.com> Date: Mon Jun 25 23:18:06 2018
,
Jun 26 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/6e71c00fe7521b7fa6461373b1351edeefa956b1 commit 6e71c00fe7521b7fa6461373b1351edeefa956b1 Author: Mike Nichols <mikenichols@chromium.org> Date: Tue Jun 26 17:15:17 2018
,
Jun 26 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/manifest-internal/+/2a09b439239959ab7c7b35846b58082caa659165 commit 2a09b439239959ab7c7b35846b58082caa659165 Author: Mike Nichols <mikenichols@chromium.org> Date: Tue Jun 26 17:16:00 2018
,
Jun 26 2018
The bot configurations are now in place: PreCQ: 100 bots TryJob: 70 bots Prod: 209 bots Experimental: 1 bot We'll continue to adjust these pools as additional capacity is added, as well as the completion of the buildbot migration to swarming. -- Mike
,
Jun 26 2018
,
Jun 26 2018
We need to add role:precq to our PreCQ Launcher Buildbucket requests in order to unblock issue 842068 . Should be a one line change. I didn't have a separate bug for that. Could track that here or open another bug.
,
Jun 26 2018
I don't think this is needed. The benefit of associating the builders with mixins, associating roles, means we can associate builders with bot groups without the need to update any calling tasks: PreCQ bots: https://chrome-swarming.appspot.com/botlist?c=id&c=os&c=task&c=status&c=pool&c=role&f=pool%3AChromeOS&f=role%3Aprecq&l=210&s=id%3Aasc PreCQ task after change: https://chrome-swarming.appspot.com/task?id=3e55f31d46468a10&refresh=10&request_detail=true -- Mike
,
Jun 26 2018
Hrm, yea I see now: looks like you submit a Buildbucket request with Builder name "PreCQ" and role:precq is attached to that. BTW, can we update the owners of the the bot pools to the whole team? https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chrome-swarming/bots.cfg#123 It affects the ability to click buttons on the Swarming UI.
,
Jun 26 2018
I believe the permissions are part of another bug but will keep this open until I can confirm. -- Mike
,
Jul 9
The permissions are outside the bots.cfg changed during this initiative. The latest changes have been submitted. Marking this part as complete. -- Mike |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by jclinton@chromium.org
, May 16 2018