buildbot slaves have enough disk space to run linux_deterministic builder |
||||
Issue descriptionTo run a builder to confirm build determinism, it need to keep results for two clobber chromium build to compare. To deploy linux_deterministic builder, we need to make sure slaves have enough disk space for it.
,
Jan 20 2017
@estaab, sergeyberezin - how easy is it to figure out if we can afford an additional linux release build on the main linux cq pool?
,
Jan 21 2017
As easy as this: https://goto.google.com/ocnrq Looks like *on average* machines in this pool use 75% of their disk space. They are 500GB disks, so we only have 125GB left, which I'd be nervous to use up for anything else - we need a safety buffer. I'd go for a separate pool for linux_deterministic, just to be safe. But keep in mind that tryserver.chromium.linux is already running ~600 slaves, and historically it's dangerously close to the master's breaking point. I'd look into enabling logdog-only mode for this master before we load it up with more busy slaves.
,
Jan 23 2017
I think it fine to wait until logdog get ready. After that, how can I get separated pool of builders?
,
Jan 23 2017
To carve out a separate pool, just update the slaves-to-builders assignments in the master's config: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.linux/slaves.cfg#11 and request to restart the master at http://go/bugatrooper .
,
Jan 24 2017
Actually, I think we're well under the master's capacity at the moment, as we're only running ~300 concurrent builds at peak the past few days. But, I agree that removing the logging will hopefully give us a fair amount headroom on the master, too. I'm guessing that we'll probably need 60-100 machines in the pool to handle the load, given that we'll be doing two full compiles per build. We can take those from the existing 600. @sergeyberezin - if the new pool is only serving one builder, we should trivially have enough disk space per builder, right? Do you know if we have any rough heuristics of disk space needed per builder (possibly per different config variants like debug/release, regular/official, linux/android, etc.), or how we might get such numbers?
,
Jan 24 2017
Just so I'm caught up, we're going to compare two builds for all chromium CLs and have failures block commits? How often do we expect this new builder to catch bad CLs? I notice https://build.chromium.org/p/chromium.fyi/console?category=deterministic is red for the past 200 builds each, do we need that to be green first? I just want to make sure we're getting good value out of doing this. :)
,
Jan 24 2017
Re: #6 Do you suggest me to get certain number of bots from Infra Labs? Re: #8 Let me focus on Linux deterministic builder because Mac and Windows have known issues. When it become stable, I expect it less than once a month. https://goto.google.com/zfzjz Build had been deterministic for two month between 2016-10-11 and 2017-01-12. Yes, making it green should be done before making in runs as CQ. I have already filed non-deterministic build on Linux. https://bugs.chromium.org/p/chromium/issues/detail?id=678903
,
Jan 24 2017
Yes, we expect the builder to catch CLs that break determinism and it has in the past. And yes, it looks like the Linux builder is currently broken and we'd need to fix that. As to how often we catch failures and whether that's enough to put it in the CQ, that's a good question and something we need to come up with real guidelines for. We may end up wanting to move the waterfall builder from the fyi waterfall to the main waterfall but *not* put it on the CQ. Re: comment #9 - I was suggesting that we just take some of the machines from the existing linux_cq pool out, since there are more machines in that pool than we currently need. However, Erik is raising a good point that maybe we shouldn't do anything in the CQ at all at the moment. I'll think more about this and update this bug again tomorrow.
,
Jan 24 2017
Deterministic builds regressions are relatively rare but on the other hand it's annoying for devs to only have post-commit checks forcing an (otherwise) unnecessary revert.
,
Jan 24 2017
Yes, it is. On the other hand, every builder we have on the CQ imposes both hardware and operational costs, so we need to figure out the right balance here (as I was saying in paragraph #3 of comment #10).
,
Jan 25 2017
Yes, if this costs us frequent annoyance through false rejections and CQ cycle time for a relatively rare annoyance through an occasional rollback I don't think it's worth it. Let's figure out the balance. And we should definitely start with a sheriffed waterfall builder so we don't have a red CQ and green tree.
,
Feb 3 2017
FYI, non-deterministic build caused in Linux release builder has been fixed. https://uberchromegw.corp.google.com/i/chromium.fyi/builders/Linux%20deterministic If I understand dpranke and estaab suggestion correctly, you mean: 1. integrate deterministic builder to continuous integration, and not set it to presubmit. (maruel might have different opinions?) 2. I can borrow some buildbot slaves from linux_cq? 3. use different pool. Is my understanding correct?
,
Feb 3 2017
1. +1 to defining a waterfall builder first (in fact, our tryserver builders are mirrors of the waterfall builders, so there is really no other way) 2. linux_cq (tryserver) slaves live on a different network from the waterfall. So you'd need to request a new slave for the waterfall. 3. If / when we get to add a tryserver builder, we'll likely need a separate pool due to disk space constraints. Waterfall already has separate slaves (pools of size 1) per each builder.
,
Feb 6 2017
Re: #15 Please fix me if I misunderstand, 1. might mean to add new builder to tryserver, right? 2. and 3. will you advice me the way to calculate how much builders is enough for this? I am going to use the same recipe that runs https://uberchromegw.corp.google.com/i/chromium.fyi/builders/Linux%20deterministic Builder name will be: https://chromium.googlesource.com/chromium/tools/build.git/+/master/scripts/slave/recipes/swarming/deterministic_build.py#90
,
Feb 7 2017
I think you can just move your existing "Linux Deterministic" builder from chromium.fyi to chromium.linux. You already have the "linux_deterministic_rel" optional tryserver, so I don't think anything needs to change for that (apart from updating the entry in trybots.py when you move the other builder). Does that make sense?
,
Feb 7 2017
I think so. However, I am also an owner of the issue https://bugs.chromium.org/p/chromium/issues/detail?id=644641). I do not want to add new precise builder to chromium.linux. Let me ask to have yet another buildbot slave with trusty https://bugs.chromium.org/p/chromium/issues/detail?id=689380
,
Feb 10 2017
Now the buildbot slave has been converted to trusty. I have updated the builder name in https://chromium-review.googlesource.com/c/416511. Will you review this?
,
Feb 12 2018
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Feb 12 2018
Is this issue actually resolved? (it appears so from the comments). Can we close it?
,
Feb 21 2018
Please reopen if needed. |
||||
►
Sign in to add a comment |
||||
Comment 1 by yyanagisawa@chromium.org
, Jan 19 2017