Add capacity for non-GPU-testing Mac10.12 VMs for swarming |
||||
Issue descriptionBackground: We want to start supporting Mac10.12 for various things ( bug 624049 ), including tests run on the chromium.webkit waterfall ( bug 697971 ), and we'll want to run things on swarming. Context: See http://crbug.com/697971#c14 . maruel@ or dpranke@, do you know how much capacity should be added? How many non-GPU VMs are there for other Mac versions available for swarming?
,
Mar 9 2017
FTR the 10.12 bots also run iOS tests, and the GPU pool is simply anything that's bare metal (Minis and MacBook Pros mostly). On the labs side, we'd like to continue skewing towards more baremetal than VMs for Mac as it's more cost-effective at the current time. That's not to say we'd not deploy VMs, as that's still a platform our customers use, but to me the tests should be able to run on either configuration if possible.
,
Mar 9 2017
@dba - bare metal is fine for most tests, I don't think there's any reason things have to be VMs. I guess the main concern is that everything in a single pool is homogenous, so that we don't potentially get different results depending on which bot a task runs on (e.g., mini vs. mbp). Last I looked we had a ton of spare capacity in the 10.12 pool, so if we do want to share it we can just remove the `-d gpu none` flag and I'd guess this'll just work fine.
,
Mar 9 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome-golo/chrome-golo/+/af19d4131c5a081bef0d179657ef613d9df0b6ba commit af19d4131c5a081bef0d179657ef613d9df0b6ba Author: Bryce Albritton <dba@google.com> Date: Thu Mar 09 21:22:48 2017
,
Mar 9 2017
M-A filed issue 700053 to possibly split out the GPU pool into its own set of bots. I'm inclined to agree with the idea to split the machines out so that we don't end up overloading the pool with non-GPU related tasks which is most likely going to happen as we move the fleet over from 10.9. For now, I've deployed 30 10.12 VMs. So any 10.12 jobs that don't specify a GPU dimension or require 'gpu: none' and 10.12 should end up there (I've already seen some iOS jobs hit them). @dpranke - I've actually wondered about the fact that some jobs don't dictate hardware type (Mini, Pro, or MacBook Pro) and if that would become an issue in the future. Right now GPU jobs just get their flavor of hardware by which GPU they require (as it stands all 3 flavors of hardware have different GPUs in our fleet). Nothing else seems to really care at this point, and just looks for the OS (or Xcode) version to match.
,
Mar 9 2017
I don't think we've really worked out the best ways to manage resource allocation, and so these are all good questions and points. We need to keep working on this going forward. Thanks for setting up the VMs!
,
Mar 13 2017
dpranke@: to answer your question from #1: smut@, dba@ and I collaborated to merge ~3 groups of Mac Minis to a single consistent image, in order to have one large pool that could be used for multiple purposes, not just GPU testing. Our group was hoping to run some larger GPU test suites (the webgl2_conformance_tests) against more Chromium CLs using that set of machines, but I'm not sure that would be cost effective. As it stands, there is a good amount of available capacity there. Other teams should certainly spawn jobs on those machines. I commented on Issue 700053; we should work together to figure out the best strategy there.
,
Mar 17 2017
Alright! Is it correct that the next step is to revert https://codereview.chromium.org/2738933002, re-enabling swarming for Mac10.12, and this issue is considered fixed?
,
Mar 17 2017
I would say so, as there's now capacity for os:Mac-10.12 gpu:none and according to #7, it'd be ok to also not even specify a gpu and allow those jobs to spawn on non-VMs as well.
,
Mar 20 2017
We should be careful to not spawn the Blink layout tests on the really expensive hardware for which there isn't enough capacity, like the AMD based MacBook Pros. I agree at this point that we should probably migrate the GPU bots to a separate pool to avoid accidentally moving tests to the physical hardware when that wasn't intended. Asked question on Issue 700053 and will work with the Labs and Swarming teams to make this transition.
,
Mar 27 2017
At this point non-GPU capacity has been spawned, meaning the Labs work has been done. I'm going to close this out. Looks like issue 700053 can be used for the discussion surrounding splitting out the more expensive hardware into its own GPU pool. |
||||
►
Sign in to add a comment |
||||
Comment 1 by dpranke@chromium.org
, Mar 9 2017Status: Available (was: Unconfirmed)