chromium.memory main waterfall doesn't have dcheck coverage |
||
Issue descriptionSee parent bug 627373. There were a number of failures that together led to a large outage. One of the failures was that a bad cl slipped through the CQ because there was a bug in gn ( bug 627667 ). While that'll get fixed, another bug is that once the cl landed there was no breakage on the main waterfall. This is because the asan bots didn't have asserts turned on. Normally for the other bots on the waterfall that run tests, we have both debug and release coverage. This is necessary so that if an assert is triggering in committed code and it's breaking the CQ, this should be visible on the main waterfall (which helps to quickly isolate which revision range was the culprit). We have two choices, and I leave it up to memory sheriffs to decide: -add debug bots on chromium.memory -turn on dchecks on the existing builders I suspect the latter should be sufficient for now. Assigning to today's memory sheriff.
,
Jul 13 2016
I'm ok with adding the dchecks. The only concerns are MSan and Dr.Memory bots, but we sure can make an experiment.
,
Jul 13 2016
I'm specifically talking about chromium.memory, which is on the main waterfall: https://build.chromium.org/p/chromium.memory/console I'm not talking about the memory FYI waterfall. The reason this is needed for the bots on the main waterfall is because they're matched on the CQ, but with asserts. So if the CQ hits asserts, but the main waterfall doesn't, then the CQ could be hosed while the main waterfall is green (i.e. parent bug). The reason this doesn't matter for FYI bots (i.e. dr memory/msan) is that they're not on the CQ.
,
Jul 13 2016
jam: Yes, we are on the same page here. I see https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/192761 as an example, and it has dcheck_always_on = true, and symbol_level = 1 as well. I agree the trybots should match the chromium.memory bots. So let's go with the trybot config. The chromium.memory Linux tester takes 25 minutes, of which 8 was for updates/extraction. The try job I pointed to took 38 minutes, of which 10 is for updates/compilation. Assuming all else is apples to apples, the test time will go from 17 minutes to 28? So overall increase of 25 minutes to 36. That's not too bad.
,
Jul 13 2016
There have been a few other cases I've seen in the GYP->GN migration where we've been bitten by having DCHECKs on trybots but no debug continuous builder, particularly in memory-related builders. In most of the cases I've seen thus far, I've turned off the DCHECKs on the trybots. It might be easier to just do that everywhere for now to be consistent, and then to consider enabling DCHECKs on both trybot and continuous bot if we think we have spare cycles to fix memory-related issues.
,
Jul 13 2016
I think symbol_level defaults to 1 when sanitizers are used, so just going to leave that alone.
,
Jul 13 2016
Not to nit, but I want to make one thing clear in case I was ambiguous: in general we try to make trybots match main waterfall (in both compile config and tests run). The exception is asserts-in-release, where we add it on trybots for pragmatic reasons. If we decide we don't want debug bots on chromium.memory, that's fine. But to be clear, we're making an exception for that specific waterfall only, and won't enable dchecks for any other waterfall's release builds (since those have debug builds running tests). As an aside, is there a reason to have the builder/tester split anymore now that all tests are swarmed? I picked a recent runs: -we're spending 12 minutes packaging the files on the builder. total runtime is 31 minutes -Linux ASan LSan Tests (1) spends 7 minutes unpacking. total runtime is 15 minutes -Linux ASan Tests (sandboxed) spends 7 minutes unpacking. total runtime is 30 minutes. The second tester can have similar runtime as the first by similarly sharding browser_tests 10x like it, instead of only 5x. Once that's done, it seems it would be faster to trigger the tests on the builder and wait for the swarming tasks rather than to pack the build!
,
Jul 13 2016
(my message crossed with Dirk) In reply to comment 5, removing asserts from asan trybots is something I didn't consider. That's a good idea, since the whole point of CQ is to mirror main waterfall. If the main waterfall doesn't need asserts on asan bots, then the CQ doesn't either. Your suggestion SGTM.
,
Jul 13 2016
I've split off the aside to combine builder/tester into bug 627993, let's discuss it there.
,
Jul 13 2016
I missed your comments and put up a CL to make the chromium.memory do dcheck_always_on: https://codereview.chromium.org/2146183002 - I don't think the potential slowdown is that bad.
,
Jul 13 2016
Either is fine with me (but I defer to you guys). Thanks!
,
Jul 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6b0cab359ccca1dc7c563cc42a77015afe9bfa34 commit 6b0cab359ccca1dc7c563cc42a77015afe9bfa34 Author: thestig <thestig@chromium.org> Date: Thu Jul 14 03:33:05 2016 Sync up chromium.memory builder configs with trybots. BUG= 627847 Review-Url: https://codereview.chromium.org/2146183002 Cr-Commit-Position: refs/heads/master@{#405426} [modify] https://crrev.com/6b0cab359ccca1dc7c563cc42a77015afe9bfa34/tools/mb/mb_config.pyl
,
Jul 14 2016
Do we still need a master restart?
,
Jul 14 2016
No. This should now be fixed.
,
Jul 14 2016
Thank you |
||
►
Sign in to add a comment |
||
Comment 1 by thestig@chromium.org
, Jul 13 2016