Turn on dcheck_always_on on the GPU waterfall bots |
||||||
Issue descriptionAfter Chromium's GPU bots were converted from the GPU recipe to the Chromium recipe, I learned that a difference between the tryservers and the waterfall bots was that the tryservers set dcheck_always_on=1, but the waterfall bots do not. The ostensible rationale for this is that the waterfall bots should "test what we ship". That is one point of view, but the counterpoint is that there is (1) a discrepancy between the behavior of the commit queue and waterfall bots; and (2) less rigorous testing coverage on the waterfall bots. For this reason, the GPU waterfall bots will be changed to enable dcheck_always_on. (I would recommend doing this for the rest of the Chromium waterfall bots, too.)
,
Apr 26 2016
The background of this is that: 1) waterfall tests both debug and release -we get assert coverage on debug bots -release don't need it, since debug covers it 2) trybots can't run debug builds because they're too slow, so they're release 3) we don't want changes to land from CQ but then break asserts on debug waterfall bots 4) the compromise is that trybots are release for speed, yet get asserts for coverage Can you expand on what problems you're seeing? The GPU waterfall bots have both debug and release. (to be clear, I disagree we need to change our release waterfall bots)
,
Apr 26 2016
I agree w/ jam@; we shouldn't change the configurations of the main (non-gpu) waterfall bots. I agree that the discrepancy between the CQ and the waterfall can be a problem (certainly for the layout tests), but the reasons John gives to me win out over the reasons you give. That said, I don't care that much about what the GPU bots run with.
,
Apr 26 2016
,
Apr 27 2016
The Release builds expose different race conditions than the Debug builds do. There have been test failures in the past (usually flaky ones) that occurred only on Release builds. Having asserts turned on on Release builds would help diagnosis of at least some of these failures. The tryserver mirroring in the tools/build workspace currently requires all of the builders associated with a tryserver to have the same gyp/gn flags. So for linux_chromium_rel_ng: https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/trybots.py&q=trybots.py&sq=package:chromium&type=cs&l=265 it's not possible for "GPU Linux Builder" to specify dcheck_always_on=1 without doing the same on "Linux Builder". I'd appreciate it if some solution could be agreed upon -- either let dchecks be turned on on "Linux Builder" et al, or relax the constraints in the chromium_trybot recipe.
,
Apr 27 2016
For the GPU waterfall bots, it's up to GPU team (i.e. you) to pick the most convenient config. The main waterfall (non-gpu bots) should stay the same. If we add dchecks/dlogs there, we change the timing and we could miss bugs that happen in the official release-with-no asserts bots. It's crucial that the release bots on the main waterfall are as close to official as possible, especially since we unfortunately run few tests on official bots (in particular on Windows).
,
Apr 27 2016
,
Apr 27 2016
For this part: > I'd appreciate it if some solution could be agreed upon -- either let > dchecks be turned on on "Linux Builder" et al, or relax the constraints > in the chromium_trybot recipe. We should be able to work out something, particularly since the actual list of GYP_DEFINES/gn args is determined by MB and is usually different anyway. I'm not quite sure what the right thing to do here is but we'll figure out something.
,
Apr 27 2016
Right now we no longer run webgl2 conformance tests on debug configuration because they take more than one hour. Also, some GPUs are on the GPU FYI waterfall but not on CQ. This means if a CL would have triggered a DCHECK on these specific GPUs, we won't be able to catch it at all.
,
Apr 27 2016
> -release don't need it, since debug covers it Except when it doesn't. What harm does it cause to have us verify dchecks in release? Surely us crashing in release bots on the waterfall is always a good signal to receive?
,
Apr 27 2016
@danakj: I think jam@ answered your question in comment #6.
,
Apr 27 2016
Ah thanks I missed that. Arguments about timing seem a bit weird in either direction. Debug vs release has different timing, but races are races. Our bots in release will certainly have different timing than a user's machine as they are much more busy. I'm more worried about tests that are only run in release builds. But I guess GPU is a bit of an exception here, and it sounds like we'll turn on DCHECKs for those bots then?
,
Apr 27 2016
danakj wrote: > I'm more worried about tests that are only run in release builds. > But I guess GPU is a bit of an exception here, and it sounds like > we'll turn on DCHECKs for those bots then? Correct. We should stop arguing about this as we're not actually blocked on anything but me implementing the change that will make this go through. Reassigning to myself accordingly.
,
Apr 27 2016
Awesome, thanks :)
,
Apr 28 2016
I wasn't able to get to this today ... hopefully tomorrow, though.
,
Apr 28 2016
Actually, I think this is perhaps simple, which is to say, we should just not add the dcheck config in the recipes, and simply enable dcheck on the bots in //tools/mb/mb_config.pyl This will mean that the GYP_DEFINES in the expectations (and in runhooks) won't actually match the real GYP_DEFINES, but that's probably okay; now that all of the bots are using MB we should probably start cleaning up all of these things that aren't needed build-side. So, I'll post a src-side CL for this, if people think that's okay.
,
Apr 28 2016
Fix posted: https://codereview.chromium.org/1933533002/ .
,
Apr 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fe42ebe1aa800787216d69be07b52411076592dd commit fe42ebe1aa800787216d69be07b52411076592dd Author: dpranke <dpranke@chromium.org> Date: Sat Apr 30 01:10:13 2016 Enable dcheck on the GPU waterfall bots. Since we can't afford to run the debug versions of the tests on the GPU waterfall bots, we want to run release builds w/ dchecks enabled. Note that we enable dchecks here, but the dcheck chromium config is *not* applied on the build-side recipes. We do that intentionally so that a given trybot can merge the configs from both the main waterfall bots and the gpu bots (which otherwise would have conflicting configs), but this may be confusing. The right fix here is to do away with as many recipe-side configs that set GYP_DEFINES as possible, since the real settings are governed by MB, src-side. R=kbr@chromium.org, phajdan.jr@chromium.org BUG= 606700 Review-Url: https://codereview.chromium.org/1933533002 Cr-Commit-Position: refs/heads/master@{#390828} [modify] https://crrev.com/fe42ebe1aa800787216d69be07b52411076592dd/tools/mb/mb_config.pyl
,
May 2 2016
Thanks, Dirk, this solution sounds fine. Really appreciate your help. Looking at the most recent build here: https://build.chromium.org/p/chromium.gpu/builders/GPU%20Win%20Builder/builds/43744 The "generate_build_files" step indicates that it's setting the dcheck_always_on=1 GYP_DEFINE, which is presumably picked up while running gyp_chromium. Marking verified. Thanks again. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by kbr@chromium.org
, Apr 26 2016Owner: phajdan.jr@chromium.org