check_gn_headers sometimes passes on CQ but fails on waterfall |
|||||||||
Issue descriptionhttps://chromium-review.googlesource.com/c/536293 passed the trybots with no problems but broke the waterfall. check_gn_headers looks like a fairly simple step to run on the bots, so we should just do that. See https://findit-for-me.appspot.com/waterfall/build-failure?url=https://build.chromium.org/p/chromium.linux/builders/Linux%20Builder%20%28dbg%29/builds/112249
,
Jun 15 2017
,
Jun 15 2017
check_gn_headers was run in the CQ for that CL: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/479142 However, the checking was skipped because step "analyze" skipped compile target "all": Running ['/usr/bin/python', '/b/c/b/linux/src/build/check_gn_headers.py', '--out-dir', u'/b/c/b/linux/src/out/Release', '--whitelist', '/b/c/b/linux/src/build/check_gn_headers_whitelist.txt', '--json', '/tmp/tmpYJwHl1', '--verbose'] in '/b/c/b/linux/src' (env: None) OUT_DIR looks dirty. You need to build all there. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_rel_ng%2F479142%2F%2B%2Frecipes%2Fsteps%2Fanalyze%2F0%2Fstdout Having "all" in additional_compile_targets doesn't mean it would always compile all, and making it so looks a bit wasteful. Issue 112371 talked about having a presubmit for missing headers, and I think it might be a good complementary checking to check_gn_headers. WDYT?
,
Jun 15 2017
All the cases where check_gn_headers passed on CQ but failed on waterfall since 2017-06-09: https://codereview.chromium.org/2898743002 https://codereview.chromium.org/2926453002 https://chromium-review.googlesource.com/c/536293 https://codereview.chromium.org/2930263002
,
Jun 15 2017
The false negative rate of check_gn_headers in CQ is higher than ideal. We should lower it to no more than 1/week. Otherwise it's a bit disruptive. Now it's ~4/week.
,
Jun 15 2017
,
Jun 15 2017
Issue 732043 has been merged into this issue.
,
Jun 15 2017
,
Jun 15 2017
Another patch where this happened: https://codereview.chromium.org/2930263002/
,
Jun 15 2017
The main root causes of false negative: 1) gn analyse wrongly skips some targets due to issue 661774. This won't get worse since we have check_gn_headers, and there's a continued effort cleaning it up. 2) gn analyse skips "all" if none of .gn of .gni files are changed. I misunderstood how "all" works when planning this. I thought if any compilation is needed, additional_compile_targets would be added. It's not the case, and false negative is higher than expected.
,
Jun 15 2017
For the 4 false negatives in the past week, 3 out of 4 are due to the handling of "all". "No dependency": https://codereview.chromium.org/2898743002 gn analyse skips "all": https://codereview.chromium.org/2926453002/#ps180001 https://chromium-review.googlesource.com/c/536293 https://codereview.chromium.org/2930263002 For this particular purpose, it might make sense to special-case compile-only bots so that it compiles "all" if any compilation is needed, no matter whether build files are touched or not. This way, the waste is minimized to compilation, but not tests, and we get higher CQ coverage with the presence of issue 661774 since compilation could be wrongly skipped. Granted, this will introduce issues I'm not aware of. Would it work?
,
Jun 15 2017
Step check_gn_headers is causing more harm than good due to this false negative. If it cannot be improved within days, it should be disabled. Bumped up priority.
,
Jun 15 2017
The purpose of analyze is to try and make sure any failures you see are limited to your patch, rather than something that might be broken in the underlying tree. Skipping analyze for bots that build `all` would thus be confusing and/or counterproductive. I would rather have analyze working and not do the header check, or explore if we can do the header check some other way.
,
Jun 15 2017
Oh. I thought analyze step is just to decrease the resource usage. Thanks for the info. Isolation sounds useful when ToT is broken, especially for tests, but probably not as much in compilation, since compilation errors usually close the tree. That's why I suggested to do this exclusively for compile-only bots, but not the normal bots. Analyze works as usual for tests, and compilation coverage is higher. Does this sound better?
,
Jun 16 2017
No, I'm afraid that that doesn't help things. analyze is particularly important if compilation is broken at tip-of-tree. We don't want the try job to be marked as failed and block the patch from landing if it isn't at fault. It is true that we don't want to *commit* the change if compilation is broken at tip-of-tree, but we have other code in the commit queue that enforces that (by checking for tree closure), so the two issues can be kept separate. I think we probably have to give up on the ninja-based approach you've taken to checking GN headers. After thinking about this more, and talking to Brett to double-check my thinking, by construction I don't think we can make this work. If we want analyze to run on the trybots, and that means that we might skip compilation, then the build will never be up to date (i.e., if analyze runs correctly we'll almost never actually build "all"), and your script won't be able to work correctly. That would mean that we could only actually run your script on the waterfall builders. However, we try really hard to not have a test that can only run on the waterfall builders and not the trybots, since there's no way to catch a failure in that test in the CQ. That means that in order for us to keep the test running on just the waterfall builders, the value it adds would have to offset the trouble it causes to people when their patches are reverted. And I don't think this check as implemented meets that criteria. I think we've talked in the past about whether we should just grep for the headers added (or removed) in the patch in the BUILD files instead. This is a much coarser change and not guaranteed to be correct, but maybe it's good enough? I can think of a few other options. One is that we keep this approach, and only run the test on the waterfall bots, and we decide that this is the right tradeoff (i.e., brett and I get convinced that we're wrong or are otherwise outvoted). I think in order to do this we'd have to be convinced that a grep-based approach wouldn't be good enough to catch nearly all of the issues. Another is that we run the check as a step on the waterfall, but change how we handle failures. For example, we could run this on FYI bots and have it send out emails or file bugs or something instead of being treated as a failure the sheriff has to deal with. A third is that we somehow change how the trybots are running so that we *can* compile all and run the check, but if the compilation fails for a reason that appears unrelated to the patch and/or analyze would've skipped the compile, we don't fail the tryjob. I'm not sure how we'd even implement this, but it would add some amount of load to the system and probably be kinda complex. What do you think? I'm sorry I didn't think through these implications earlier, and I think the work you've done is still useful even if we can't figure out a way to make it run automatically.
,
Jun 16 2017
argh .. hit send too early ... however, I think we need to disable this test step for now while we work on a new plan.
,
Jun 16 2017
I agree it's better to disable this step for now. Thanks for going through all these implications and bear with me on these experiments! Running the ninja-based checking synchronously is indeed difficult. The main downside of grep-based solution in presubmit is the possibility of false positive, so this should only be a warning that doesn't block uploading/landing. In terms of accuracy, the false negative should be low enough to block most regressions. I'd prefer to have a ground truth, but a heuristic is better than nothing. I like the idea of running on FYI bots. It can be added to CQ_INCLUDE_TRYBOTS when the whitelist file is touched. Since the whitelist is also in the exclusions, it would always build all, so we don't have to complicate how the system works. When making a CL specifically fixing the missing headers, running the ninja-based checking against the whitelist is quite helpful in my experience. More often than not it caught errors when I thought I've fixed it. This can be run locally, but for external folks, building all is a very time-consuming process. Action items: - Stop the check_gn_headers on CQ and waterfall (me, today) - Add a grep-based checking in PRESUBMIT (me, tracked in issue 112371 ) - Add check_gn_headers to FYI bots. We don't have a build-all FYI bots readily available, do we? (me, but need some help)
,
Jun 16 2017
Another thing that occurred to me is that we could run this on the waterfall bots that have optional trybots but that aren't part of the CQ. E.g., we could run the step on the 'chromium' waterfall, where (I think) all of the bots build 'all' but none of them are in the CQ by default. I think those optional trybots do still run analyze, but arguably there's no real reason for them to do so.
,
Jun 16 2017
Detected 5 new flakes for test/step "check_gn_headers (with patch)". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyKAsSBUZsYWtlIh1jaGVja19nbl9oZWFkZXJzICh3aXRoIHBhdGNoKQw. This message was posted automatically by the chromium-try-flakes app.
,
Jun 16 2017
I'm confused. If we run the script on the waterfall bots and some trybots that's not in CQ, a culprit CL will deterministically pass CQ and break waterfall, right?
,
Jun 16 2017
Correct. We do actually do that today in other configurations where we have to; for example, we don't run tests in the debug configurations because they're too slow, but we run them on the waterfall. As long as they don't break too often, this is an acceptable tradeoff. We'd have to be sure that check_gn_headers also didn't fail too often.
,
Jun 16 2017
Hmm. I'm still confused. Last week, check_gn_headers can block some culprit CLs from landing, but some would slip through (false negatives) and break waterfall. There should be no false positives after https://codereview.chromium.org/2943043002/ in this set up. If we do the proposal in #c18, culprit CLs will deterministically pass CQ and break waterfall. The waterfall will break more frequently, so this is worse than the situation last week, right? If we have a presubmit, then the breakage with #c18 should be less frequent than last week without a presubmit, but the best case should still be having both presubmit and check_gn_headers on CQ, right?
,
Jun 16 2017
To make sure we're using the same language: is a "false positive" to you that the check failed and it shouldn't have, or that the check didn't fail and it should have?
,
Jun 16 2017
In my usage above, false negative is the check should fail but didn't, and this causes breakage on waterfall. False positive is the check should pass but didn't.
,
Jun 16 2017
Yeah, that's what I figured, I use the terms in the opposite way :). The important difference is that if we run the step on the CQ bots, then sometimes a build will pass on a given builder (say, linux_chromium_rel_ng), and not pass on the matching waterfall bot (Linux Builder / Linux Tests). We really don't want that to ever happen because it's very frustrating and hard to explain for devs. Whereas, if we don't run the step at all in the CQ, but it does run on the builder, it's easier to explain why something got through. However, since the goal is to reduce the total number of things that get through, it's a fair question as to which tradeoff would be better or worse. And, regardless, if we think we can prevent 90%+ of the failures with a different test that we *could* run in the CQ, I'd rather do that. Does that help?
,
Jun 16 2017
Thanks for the explanation. So the suggestion at #c18 is to have more consistency and easier to explain to devs, but it would potentially break the waterfall more often. Do we have a build-all FYI bots that's not part of waterfall?
,
Jun 16 2017
Probably, but I'd have to check to be sure. If not, we can certainly add one.
,
Jun 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d8b1ec3835eee2daddcae4c292f9a87683c20001 commit d8b1ec3835eee2daddcae4c292f9a87683c20001 Author: wychen <wychen@chromium.org> Date: Fri Jun 16 23:33:16 2017 Revert of Enable checking GN missing headers on CQ and waterfall (patchset #3 id:40001 of https://codereview.chromium.org/2910603002/ ) Reason for revert: The false negative rate of step check_gn_headers on CQ is higher than expected. Disable it to ease waterfall breakage. BUG=733500 Original issue's description: > Enable checking GN missing headers on CQ and waterfall > > This relands the rest of crrev.com/2903733004. > > Step "check_gn_headers" is enabled on the following bots on CQ: > - linux_chromium_dbg_ng > - linux_chromium_rel_ng > > And also on the following builders on waterfall: > - Linux Builder > - Linux Builder (dbg) > > Note that this script can only be added to bots that have "all" > in "additional_compile_targets". > > BUG=661774, 725877 > > Review-Url: https://codereview.chromium.org/2910603002 > Cr-Commit-Position: refs/heads/master@{#477058} > Committed: https://chromium.googlesource.com/chromium/src/+/00f2b7ed28e3d1d864943497d6aa865ef1cde5e2 TBR=dpranke@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=661774, 725877 Review-Url: https://codereview.chromium.org/2940383002 Cr-Commit-Position: refs/heads/master@{#480218} [modify] https://crrev.com/d8b1ec3835eee2daddcae4c292f9a87683c20001/testing/buildbot/chromium.linux.json
,
Jun 21 2017
dpranke@, could you tell me which config file to modify if you know there are build-all FYI bots that's not part of waterfall or you've added one? Thanks!
,
Jun 21 2017
There are multiple builders on the FYI waterfall that do build all, but they all (unsurprisingly) have different purposes and might not want to overload this test onto it. @thakis - would you be okay with adding a step to the ClangToT builders that checked to see if all of the dependencies in the GN files matched the dependencies ninja actually found? Failing that, we could maybe add them to the "deterministic" builders. After having thought about it a bit more, I would rather avoid having to set up an entirely new bot (or bots) just for this one (fairly fast) test.
,
Jun 21 2017
Seems fairly independent of what the clang bots do and what the people on the sheriff rotation there are comfortable with, so I'm not sure they're a good place for this.
,
Jun 21 2017
I agree it's independent and not a good fit; the question is, is it an acceptably bad fit? I.e., can you live with it for a while while we evaluate how well this works and whether to keep it on or move it onto the main waterfall?
,
Jun 21 2017
If it doesn't turn the bot red and our sherriffing rotation isn't supposed to look at it, I wouldn't mind.
,
Jun 21 2017
One possible compromise: if the script returns 0 when seeing errors, it won't turn the bot red. It can still write the JSON file saying what's wrong, so we still get useful information. It would be more manual than other FYI bots because you'll have to click into the results to see if it passes, but I can live with that.
,
Jul 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2c7c438d81d7b417254cef2b0dc2681c0a166a00 commit 2c7c438d81d7b417254cef2b0dc2681c0a166a00 Author: Wei-Yin Chen (陳威尹) <wychen@chromium.org> Date: Fri Jul 14 16:29:12 2017 Run check_gn_headers on ClangToTLinux BUG=733500 Change-Id: Iee053126ab61bdb99ba78bdc74e720ff179febc6 Reviewed-on: https://chromium-review.googlesource.com/567729 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Wei-Yin Chen (陳威尹) <wychen@chromium.org> Cr-Commit-Position: refs/heads/master@{#486770} [modify] https://crrev.com/2c7c438d81d7b417254cef2b0dc2681c0a166a00/testing/buildbot/chromium.fyi.json [modify] https://crrev.com/2c7c438d81d7b417254cef2b0dc2681c0a166a00/testing/scripts/check_gn_headers.py
,
Jul 14 2017
If we don't turn the bot red even if seeing errors, we can actually run this on CQ and waterfall bots as well, right? Then we can observe how likely it is for a culprit CL to sneak through CQ, with and without the presubmit checking.
,
Jul 23 2017
yes.
,
Jul 30
ClangToTLinux is moved to https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/ The latest build https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/3238 (2017-07-30) contains 406 violations. The earliest successful run on that bot is https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/37 on 2017-10-06, with 57 violations. The presubmit check was landed https://chromium-review.googlesource.com/1152126 on 2017-07-30. Let's see if this slows down the growth.
,
Jul 31
I did a quick visualization according to the results from https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/.
,
Jul 31
Now that the presubmit is there, I'd like to add check_gn_headers to more bots to cover platforms other than linux. For example: - chromium.clang/ToTAndroid - chromium.clang/ToTMac - chromium.clang/ToTWin - chromium.clang/ToTiOS All these would work like chromium.clang/ToTLinux by only generating the list of missing headers without turning the bots red. After we have coverage on most platforms, we can observe how fast the violations grow with the presubmit checking in place. If that is low enough, we could consider adding check_gn_headers as a real checking step (failure would turn the bot red) in CQ/CI. According to #c10, the main reasons check_gn_headers couldn't correctly catch violations should be rejected by the presubmit checking, so hopefully check_gn_headers would be much less flaky than last time. What do you think?
,
Aug 6
I think it's fine to add it to a few more bots. If thakis@ doesn't mind, adding it to the .clang bots is fine by me. However, we also have the "Jumbo" FYI bots now (which we didn't a year ago) and you could run the step on them. We don't have Android/iOS jumbo bots, but I'm not sure if it's really worth setting up new builders just for them. How fast is the step these days? Apart from that, though, I still don't think we can ever run this on the CQ, for the reasons explained in #c15 (the way analyze works).
,
Aug 15
On https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/, the step takes ~70 seconds. What's the significance of Jumbo build? Looks like use_jumbo_build shouldn't change the result of check_gn_headers. I guess I'm not picky about where to run this step, as long as we have decent OS coverage.
,
Aug 15
Regarding the concerns in #c15, I'm glad to observe that the condition is a bit different now that the presubmit script is in place. With the presubmit, the most common error pattern (add a new header but doesn't touch gn files) is eliminated. I've scanned all the culprit CLs after the presubmit was landed, and all but one of these CLs touched at least one gn/gni files. The only exception (https://chromium-review.googlesource.com/c/chromium/src/+/1162430) *moved* the headers instead of *adding* them. I'll update the presubmit script to catch this. Some recent culprit CLs: https://chromium-review.googlesource.com/c/chromium/src/+/1165723 https://chromium-review.googlesource.com/c/chromium/src/+/1155754 https://chromium-review.googlesource.com/c/chromium/src/+/1116542 https://chromium-review.googlesource.com/c/chromium/src/+/1142749 https://chromium-review.googlesource.com/c/chromium/src/+/1087724 Note that all of them are fairly complicated, and I wouldn't expect anyone to be able to spot the missing header issue without the help of tools. Given that all of these CLs touched at least one gn or gni files, the analyse step wouldn't skip compiling all. Therefore, unlike last time around, right now it would be fairly unlikely for a culprit CL to slip through if the checking is turned on on the CQ. Does this make sense?
,
Aug 15
One correction. The error in https://chromium-review.googlesource.com/c/chromium/src/+/1162430 was actually caught by the PRESUBMIT script. It is possible the script wasn't up to date in the author's workspace.
,
Aug 16
Ouch, 70s is slow, given that it's something that has to be run directly on the builder (can't be run as a swarming task). As far as whether or not things would skip "all" in analyze goes, analyze is smarter now and can handle changes to .gn/.gni files and should generally only rebuild the things that those changes affect. The first CL (1165723) didn't touch a .gn/.gni file. It tripped one of the exclusion rules in //testing/buildbot/trybot_analyze_config.json and so it bypassed analyze. That was probably a bug (I don't know why that line is in the exclusions file), so if it had been fixed, it wouldn't have built all. The second CL (1155754) built all because "too many compile targets would've been affected", which is a bit surprising given that change, so that also probably only built "all" as an accident. The third CL (1116542) didn't build all, it only built a subset of the targets. The fourth CL (1142749) didn't compile anything; analyze skipped the compile (on linux). I unfortunately can't fetch any of the logs from 1087724, so I don't know what that one did. Besides, you can't just look at the CLs that were broken and that did build all, you have to look at *all* the CLs, and in this case (i.e., most of the time), the build won't be completely up-to-date, because we haven't been building all. So, I still think I'm right and we can't run this on the CQ, but it's possible that I've purged too much of this topic from my head and I should take another fresh look at it. As far as "what's the significance of Jumbo build" goes, it's another supported build configuration. Jumbo builds preserve the component structure in GN builds, and so at first glance I would expect any change that broke Jumbo would break non-Jumbo builds, header-wise, but it's possible there's some subtlety there I'm missing.
,
Sep 11
Here is another visualization from the results at https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/. The blue vertical line is the when the presubmit is landed. Looks like the slope is flatter with the presubmit checking. |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by estaab@chromium.org
, Jun 15 2017