New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 618066 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 555273
Owner:
Closed: Jul 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

Regression: GN bots building all don't work with analyze

Project Member Reported by sky@chromium.org, Jun 7 2016

Issue description

Here's the patch: https://codereview.chromium.org/2042423002/ . This copies a file. The file isn't referenced in any files. The win_clang thinks it needs to compile all:

{
  "compile_targets": [
    "all"
  ], 
  "status": "Found dependency (all)", 
  "test_targets": []
}
 
Blocking: 617837
Cc: thakis@chromium.org brettw@chromium.org brucedaw...@chromium.org
Components: -Infra Build
Labels: -Pri-3 Proj-GN-Migration OS-Windows Pri-1
Owner: thakis@chromium.org
Summary: win_clang shouldn't be building 'all' (was: Analyze returned targets for file not compiled)
GN/MB doesn't know how to deal with "all", so it doesn't try. See  bug 555273 .

win_clang is mirroring the 'WinClang64 (dbg)' bot on chromium.win

While debug component builds (with minimal symbols) should be substantially faster than release static builds, this still might be a place where we can free up some capacity in the cq_pool of bots.

Currently none of other CQ bots on Windows build all, or even gn_all. If we have extra capacity, IMO we should be using it for the MSVC builds instead of the clang builds (unless clang is maybe substantially faster, but I don't think it is?).


At the very least, while we have capacity issues in the cq_pool on tryserver.chromium.win , we should change that bot to build 'gn_all' instead.

So, I'm going to change the subject of this bug from "Analyze returned targets for file not compile" to "win_clang shouldn't be building 'all'". Otherwise, this bug is just a dup of 555273.

@brucedawson, you can also decide if fixing this should be done before we flip the main bots back to GN.
What's wrong with win_clang building all? It's faster than it needs to be, and having a bot that checks that everything builds seems useful.
thakis@, let me know if comment #1 doesn't answer your question?
It doesn't. I added enough slaves to the pool to pay for win_clang. Building all is useful.
I'm not debating whether building all is useful; I agree that it is. 

And, I agree that we should ideally have the capacity to build 'all' for whatever configs we like as long as the cycle time stays within acceptable limits.

I'm saying that if we only have the capacity to build all for one set of bots, IMO we should be building all w/ msvs rather than clang, unless for some reason clang is substantially faster and thus we can afford clang but not msvs. Do you disagree with that?


Status: WontFix (was: Assigned)
Yes. The MSVC bots need to run tests. win_clang currently doesn't, so it can afford this.

(And adding yet another bot that's msvc compile-only doesn't help with capacity issues. Also, I doubt an all-building msvc bot would find a very different set of issues as a clang-cl all-building bot; if something's broken in all-only it's usually "this doesn't work at all" stuff.)
Status: Assigned (was: WontFix)
I guess you could argue that win_clang gets you x64 coverage, while win_chromium_compile_dbg_ng gets you x86 coverage, and that arguably 'all' is more important to test on x64 than x86. Given my druthers, then, we'd build x64 w/ msvs and x86 w/ clang, but that's quibbling at that point.

However, it's not okay to be building 'all' in the CQ as long as we have capacity problems, period, until  bug 555273  is fixed. And that bug isn't easy to fix, whereas switching to 'gn_all' is trivial and it's a bug that 'all' would build anything that we'd care about (or anything at all, really) that 'gn_all' doesn't build.

AFAIK we don't currently have capacity problems while win_chromium_rel_ng is on GYP, but if we get the cycle time fixed well enough w/ GN but still have capacity issues, I would argue that we will still need to change this.
Status: WontFix (was: Assigned)
We can reconsider then. I too think we currently don't have capacity issues.

The idea to have all slaves in one pool is to smooth out spikes afaik. If someone regresses one bot's cycle time and then tells other bot owners that their bots need to do less, that's in my understanding not what the shared pool is for.

Comment 9 by jam@chromium.org, Jun 9 2016

Cc: jam@chromium.org
Owner: dpranke@chromium.org
Status: Assigned (was: WontFix)
Summary: Regression: GN bots building all don't work with analyze (was: win_clang shouldn't be building 'all')
555273 has more than this specific bug, so I'm reopening it.

https://codereview.chromium.org/2049973002/ is my sample change that didn't touch any source.

win_clang, linux_chromium_clobber_rel_ng, and linux_chromium_rel_ng are all building unnecessarily. Note the capacity issue is only secondary. The main point about having analyze work correctly is to severely discourage folks from sometimes skipping the CQ for cls that shouldn't be running compiles or tests. We have gotten broken builds from these kind of non-source changes landing without CQ coverage.

It has been a lot of work to get analyze working for all recipes, so IMO this is a pretty big regression that bots on the CQ building all with gn don't work.

Comment 10 by jam@chromium.org, Jun 9 2016

Labels: -Pri-1 Pri-0
Bumping up to P0 since analyze is broken for linux_chromium_rel_ng. We are running all tests and compiling all targets for every change. This is a major regression.
Labels: -Pri-0 Pri-1
Status: Started (was: Assigned)
I've posted https://codereview.chromium.org/2055853002/ which flips all of the builders that have matching GN trybots in the CQ that build "all" to build "gn_all" instead. I'll let y'all decide whether that should be landed.


That said, I don't think this should be a P0, and I'm lowering the priority.

The whole point of an SLO is that if you are meeting it, then things are okay. As far as I know, we are currently meeting the SLO, and I have no particular evidence that people are skipping the CQ because things are too slow and the world is breaking as a result.

We've also been building 'all' w/o analyze on Linux for half a year.

Obviously, it would be good to make analyze work with all for GN. It would also be good to make the SLO more aggressive if we can, because faster cycle times are good. 

But, what the priority of those things should be versus everything else we could possibly work on is a different discussion than whether this is a P0 thing that should cause everything else to be dropped.

I'll also work on the idea sky@ suggested in https://bugs.chromium.org/p/chromium/issues/detail?id=555273#c5 to see if we can do better in the 'all' case (or if I can just fix this outright), but I have other broken things on my plate as well that seem to me to be higher priority.

Comment 12 by jam@chromium.org, Jun 9 2016

Labels: -Pri-1 Pri-0
Let's discuss this in person when you get in.

The SLO is not the only thing that matters. i.e. analyze's benefits are not just related to lowering CQ times. Running less tests for changes that don't impact a test also decreases observed flakiness.

Again, getting analyze working correctly (even if it didn't impact SLO) for all bots was a lot of work. Regressing that is undoing a lot of work.
We should fix this but it seems a 6-month-old bug is pretty much not P0, and I don't think needs to block the work we're doing this week.

Comment 14 by jam@chromium.org, Jun 9 2016

I agree doesn't have to be solve by tomorrow, but I put it as P0 because I think it's a big regression and it should be fixed asap. From Dirk's comment on a code review, he says this could be fixed in a day.

AFAIK Windows's reverted GN conversion was also building all, so this bug would have an even bigger effect there because of slower building/linking/tests.
> AFAIK Windows's reverted GN conversion was also building all, 
> so this bug would have an even bigger effect there because of
> slower building/linking/tests.

As we discussed in person, the CQ bots, apart from win_clang, were not building 'all'. win_clang was (and continues to) build all, but is a compile-only bot that is within the SLO, as described above.

Comment 16 by jam@chromium.org, Jun 9 2016

Owner: brettw@chromium.org
Status: Assigned (was: Started)
Brett said he'll take a look.
Blocking: 432967
Labels: -Pri-0 Pri-1
Blocking: -432967 621726
Blocking: -617837 -621726
As suggested by laforge@ and a conversation w/ the monorail folks, I'm going to try tracking GN-Migration related issues by *just* using the Proj-GN-Migration label, and not using blocking/rollup bugs, so that we can use blocking for just tasks that truly need to be completed before other tasks can make progress.
Owner: dpranke@chromium.org
Mergedinto: 555273
Status: Duplicate (was: Assigned)
I'm actually going to close this as a dup of 555273.

Sign in to add a comment