New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 606700 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Feature

Blocked on:
issue 542370



Sign in to add a comment

Turn on dcheck_always_on on the GPU waterfall bots

Project Member Reported by kbr@chromium.org, Apr 26 2016

Issue description

After Chromium's GPU bots were converted from the GPU recipe to the Chromium recipe, I learned that a difference between the tryservers and the waterfall bots was that the tryservers set dcheck_always_on=1, but the waterfall bots do not.

The ostensible rationale for this is that the waterfall bots should "test what we ship".

That is one point of view, but the counterpoint is that there is (1) a discrepancy between the behavior of the commit queue and waterfall bots; and (2) less rigorous testing coverage on the waterfall bots. For this reason, the GPU waterfall bots will be changed to enable dcheck_always_on. (I would recommend doing this for the rest of the Chromium waterfall bots, too.)

 

Comment 1 by kbr@chromium.org, Apr 26 2016

Labels: -Type-Bug -Pri-2 Infra-CommitQueue Pri-1 Type-Feature
Owner: phajdan.jr@chromium.org
I attempted to do this but changing the chromium_apply_config lists for the builders in chromium_gpu.py and chromium_gpu_fyi.py in tools/build/scripts/slave/recipe_modules/chromium_tests/ require the regular Chromium builders to be changed as well.

Are there any disagreements with this policy change? In my opinion this will only strengthen test coverage.

Pawel, could you please take this and push this policy change through?

Comment 2 by jam@chromium.org, Apr 26 2016

The background of this is that:
1) waterfall tests both debug and release
  -we get assert coverage on debug bots
  -release don't need it, since debug covers it
2) trybots can't run debug builds because they're too slow, so they're release
3) we don't want changes to land from CQ but then break asserts on debug waterfall bots
4) the compromise is that trybots are release for speed, yet get asserts for coverage

Can you expand on what problems you're seeing? The GPU waterfall bots have both debug and release.


(to be clear, I disagree we need to change our release waterfall bots)
I agree w/ jam@; we shouldn't change the configurations of the main (non-gpu) waterfall bots. 

I agree that the discrepancy between the CQ and the waterfall can be a problem (certainly for the layout tests), but the reasons John gives to me win out over the reasons you give.

That said, I don't care that much about what the GPU bots run with.

Comment 4 by aga...@chromium.org, Apr 26 2016

Components: Infra>CQ
Labels: -Infra-CommitQueue

Comment 5 by kbr@chromium.org, Apr 27 2016

The Release builds expose different race conditions than the Debug builds do. There have been test failures in the past (usually flaky ones) that occurred only on Release builds. Having asserts turned on on Release builds would help diagnosis of at least some of these failures.

The tryserver mirroring in the tools/build workspace currently requires all of the builders associated with a tryserver to have the same gyp/gn flags. So for linux_chromium_rel_ng:

https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/trybots.py&q=trybots.py&sq=package:chromium&type=cs&l=265

it's not possible for "GPU Linux Builder" to specify dcheck_always_on=1 without doing the same on "Linux Builder".

I'd appreciate it if some solution could be agreed upon -- either let dchecks be turned on on "Linux Builder" et al, or relax the constraints in the chromium_trybot recipe.

Comment 6 by jam@google.com, Apr 27 2016

For the GPU waterfall bots, it's up to GPU team (i.e. you) to pick the most convenient config.

The main waterfall (non-gpu bots) should stay the same. If we add dchecks/dlogs there, we change the timing and we could miss bugs that happen in the official release-with-no asserts bots. It's crucial that the release bots on the main waterfall are as close to official as possible, especially since we unfortunately run few tests on official bots (in particular on Windows).
For this part:
> I'd appreciate it if some solution could be agreed upon -- either let
> dchecks be turned on on "Linux Builder" et al, or relax the constraints 
> in the chromium_trybot recipe.

We should be able to work out something, particularly since the actual list of GYP_DEFINES/gn args is determined by MB and is usually different anyway. I'm not quite sure what the right thing to do here is but we'll figure out something.

Comment 9 by zmo@chromium.org, Apr 27 2016

Right now we no longer run webgl2 conformance tests on debug configuration because they take more than one hour. Also, some GPUs are on the GPU FYI waterfall but not on CQ. This means if a CL would have triggered a DCHECK on these specific GPUs, we won't be able to catch it at all.
Cc: danakj@chromium.org
> -release don't need it, since debug covers it

Except when it doesn't. What harm does it cause to have us verify dchecks in release? Surely us crashing in release bots on the waterfall is always a good signal to receive?
@danakj: I think jam@ answered your question in comment #6.
Ah thanks I missed that.

Arguments about timing seem a bit weird in either direction. Debug vs release has different timing, but races are races. Our bots in release will certainly have different timing than a user's machine as they are much more busy.

I'm more worried about tests that are only run in release builds. But I guess GPU is a bit of an exception here, and it sounds like we'll turn on DCHECKs for those bots then?
Owner: dpranke@chromium.org
danakj wrote:
> I'm more worried about tests that are only run in release builds. 
> But I guess GPU is a bit of an exception here, and it sounds like 
> we'll turn on DCHECKs for those bots then?

Correct. We should stop arguing about this as we're not actually blocked on anything but me implementing the change that will make this go through.

Reassigning to myself accordingly.
Awesome, thanks :)
I wasn't able to get to this today ... hopefully tomorrow, though.
Actually, I think this is perhaps simple, which is to say, we should just not add the dcheck config in the recipes, and simply enable dcheck on the bots in //tools/mb/mb_config.pyl

This will mean that the GYP_DEFINES in the expectations (and in runhooks) won't actually match the real GYP_DEFINES, but that's probably okay; now that all of the bots are using MB we should probably start cleaning up all of these things that aren't needed build-side.

So, I'll post a src-side CL for this, if people think that's okay.
Components: Build
Status: Started (was: Assigned)
Fix posted: https://codereview.chromium.org/1933533002/ .
Project Member

Comment 18 by bugdroid1@chromium.org, Apr 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fe42ebe1aa800787216d69be07b52411076592dd

commit fe42ebe1aa800787216d69be07b52411076592dd
Author: dpranke <dpranke@chromium.org>
Date: Sat Apr 30 01:10:13 2016

Enable dcheck on the GPU waterfall bots.

Since we can't afford to run the debug versions of the tests
on the GPU waterfall bots, we want to run release builds w/
dchecks enabled.

Note that we enable dchecks here, but the dcheck chromium config is
*not* applied on the build-side recipes. We do that intentionally
so that a given trybot can merge the configs from both the main
waterfall bots and the gpu bots (which otherwise would have conflicting
configs), but this may be confusing. The right fix here is to do
away with as many recipe-side configs that set GYP_DEFINES as possible,
since the real settings are governed by MB, src-side.

R=kbr@chromium.org, phajdan.jr@chromium.org
BUG= 606700 

Review-Url: https://codereview.chromium.org/1933533002
Cr-Commit-Position: refs/heads/master@{#390828}

[modify] https://crrev.com/fe42ebe1aa800787216d69be07b52411076592dd/tools/mb/mb_config.pyl

Comment 19 by kbr@chromium.org, May 2 2016

Status: Verified (was: Started)
Thanks, Dirk, this solution sounds fine. Really appreciate your help.

Looking at the most recent build here:
https://build.chromium.org/p/chromium.gpu/builders/GPU%20Win%20Builder/builds/43744

The "generate_build_files" step indicates that it's setting the dcheck_always_on=1 GYP_DEFINE, which is presumably picked up while running gyp_chromium.

Marking verified. Thanks again.

Sign in to add a comment