Issue metadata
Sign in to add a comment
|
//content/test:content_nocompile_tests_run_nocompile flaky on linux-jumbo-rel |
||||||||||||||||||||
Issue descriptionHi troopers :) Jumbo builds seem to be hiccuping on //content/test:content_nocompile_tests_run_nocompile in a flaky manner: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/linux-jumbo-rel [2170/2276] ACTION //content/test:content_nocompile_tests_run_nocompile(//build/toolchain/linux:clang_x64) FAILED: gen/content/test/browser_task_traits_unittest_nc.cc python ../../tools/nocompile_driver.py 4 ../../content/public/browser/browser_task_traits_unittest.nc gen/content/test/browser_task_traits_unittest_nc.cc -- -nostdinc++ -isystem../../buildtools/third_party/libc++/trunk/include -isystem../../buildtools/third_party/libc++abi/trunk/include -std=c++14 -Wall -Werror -Wfatal-errors -Wthread-safety -I../../ -Igen --sysroot ../../build/linux/debian_sid_amd64-sysroot Is there a way to obtain the output of the failing step from the bot? I have trouble reproducing this locally, but will continue trying. Thanks!
,
Sep 11
What is weird is that we've seen some problems with this particular step on other bots as well (but fixed those before linux-jumbo-rel started flaking) - and there, the command did print some more useful error messages (see bug 882234 ). Looking at nocompile_driver.py, it seems like it only returns a non-zero exit code when a subprocess command fails (apart from exceptions thrown, which should be logged too). And stderr of the failing command should be printed: https://cs.chromium.org/chromium/src/tools/nocompile_driver.py?l=472 I can only assume that stderr is empty for some reason. +ajwong and +wychen who may know more about the script's internals.
,
Sep 11
Removing trooper label. Feel free to reapply if there's something we can do.
,
Sep 11
Is this only flaky on jumbo builds? If so, we might want to look into how jumbo builds are different. If we couldn't reproduce this locally, we could probably print resultlog before sys.exit(non-zero) for debugging. Is it possible that some useful information is in stdout and we throw it away? We could keep stdout and also print that in https://cs.chromium.org/chromium/src/tools/nocompile_driver.py?l=469. _, stderr = test['proc'].communicate() These CLs should make nocompile_driver.py easier to diagnose, so they can stay even after this particular issue is fixed. NoCompile test is an underused feature in Chromium, and you are the first user outside of //base, so I guess there are some rough corners. Hopefully after these fixes, it is more usable.
,
Sep 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c1f97a8cd41c4c9a5e82592f2e13732cfc6bb141 commit c1f97a8cd41c4c9a5e82592f2e13732cfc6bb141 Author: Eric Seckler <eseckler@chromium.org> Date: Thu Sep 13 00:35:37 2018 tools: Add more diagnostic output to nocompile_driver.py Bug: 882852 Change-Id: Ie88e6fceb726cd69963eaed5eef90f71f55b38e4 Reviewed-on: https://chromium-review.googlesource.com/1222313 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Wei-Yin Chen (陳威尹) <wychen@chromium.org> Commit-Queue: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#590876} [modify] https://crrev.com/c1f97a8cd41c4c9a5e82592f2e13732cfc6bb141/tools/nocompile_driver.py
,
Sep 13
I'll keep an eye out for more failures on the bot, but it has been running fine for the last two days without any changes to the test.
,
Sep 14
,
Sep 17
Assigning to remove from our triaging queue; mark as Untriaged to get the infra trooper to take a look at this bug.
,
Sep 27
Here's another recent failure: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8934298114243077232/+/steps/compile/0/stdout From the log: [4196/4286] ACTION //content/test:content_nocompile_tests_run_nocompile(//build/toolchain/linux:clang_x64) FAILED: gen/content/test/browser_task_traits_unittest_nc.cc python ../../tools/nocompile_driver.py 4 ../../content/public/browser/browser_task_traits_unittest.nc gen/content/test/browser_task_traits_unittest_nc.cc -- -nostdinc++ -isystem../../buildtools/third_party/libc++/trunk/include -isystem../../buildtools/third_party/libc++abi/trunk/include -std=c++14 -Wall -Werror -Wfatal-errors -Wthread-safety -I../../ -Igen --sysroot ../../build/linux/debian_sid_amd64-sysroot No-compile driver failure with return_code -15. Result log: TEST(NoCompileBrowserTaskTraitsUnittest): Started 1537986901.797163, Ended 1537987301.102194, Total 399.305031s, Extract 1.120920s, Compile 120.119999s, Process 278.064112s
,
Sep 27
According to python documentation, return code -15 means that the process was terminated because it received signal 15 (SIGTERM). The nocompile driver seems to send this when it thinks the compilation has timed out, see [1]. This timeout is currently 60 seconds [2]. Maybe we can increase it, sending a patch [3]. [1] https://cs.chromium.org/chromium/src/tools/nocompile_driver.py?l=386 [2] https://cs.chromium.org/chromium/src/tools/nocompile_driver.py?l=79 [3] https://chromium-review.googlesource.com/c/chromium/src/+/1248601
,
Oct 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e056c759fd28d9ea4cea81ab77be309d23e94b12 commit e056c759fd28d9ea4cea81ab77be309d23e94b12 Author: Eric Seckler <eseckler@chromium.org> Date: Wed Oct 03 14:11:46 2018 tools: Increase timeout of nocompile tests due to test flakiness. The linux jumbo bot is flaking on content nocompile tests due to the nocompilation tests timing out. This patch increases the timeout to twice what it was before. Bug: 882852 Change-Id: Ib2bc0023acd8d677ea77eb2769a5f83da39ed0da Reviewed-on: https://chromium-review.googlesource.com/c/1248601 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Eric Seckler <eseckler@chromium.org> Cr-Commit-Position: refs/heads/master@{#596199} [modify] https://crrev.com/e056c759fd28d9ea4cea81ab77be309d23e94b12/tools/nocompile_driver.py
,
Oct 4
,
Oct 11
Hmm, this still seems to time out occasionally even with 120 sec timeout, e.g.: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8933076481390421392/+/steps/compile/0/stdout Should we increase the timeout further, or are we better off just disabling the tests on jumbo builds?
,
Oct 11
The NextAction date has arrived: 2018-10-11
,
Oct 18
,
Oct 19
What is the jumbo build? When I first wrote this driver, we didn't enable it because it turned out the error-reporting path for gcc was way way slower than the success path. Worse, the variance in time for completion was higher too. If the jumbo build is creating huge translation units, I wonder if that's causing us to perform slowly?
,
Oct 29
Jumbo: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/jumbo.md In short, yes, it does create large translation units, which is probably why the nocompile tests take longer there. I think we'd probably be better off simply disabling the tests on the jumbo bots. I'm not familiar enough with the buildbot configs to know how to best accomplish that though.
,
Dec 4
Looks like this hasn't been flaking anymore recently. Feel free to reopen if this reoccurs.
,
Dec 14
Reopening, since this reoccurred: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/linux-jumbo-rel/10198
,
Dec 19
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/be6feb0aca9d5e0025f3898c01d75ba9db11f1fa commit be6feb0aca9d5e0025f3898c01d75ba9db11f1fa Author: Eric Seckler <eseckler@chromium.org> Date: Wed Dec 19 10:56:40 2018 content: Disable nocompile tests on jumbo builds. They simply take too long to execute on the FYI bots, and they are covered by the regular builders already. Bug: 882852 Change-Id: Ibfd67a54fb243e35b49a2b1a877db7bd13b73517 Reviewed-on: https://chromium-review.googlesource.com/c/1378181 Commit-Queue: Eric Seckler <eseckler@chromium.org> Reviewed-by: Wei-Yin Chen (陳威尹) <wychen@chromium.org> Cr-Commit-Position: refs/heads/master@{#617794} [modify] https://crrev.com/be6feb0aca9d5e0025f3898c01d75ba9db11f1fa/content/test/BUILD.gn
,
Dec 19
,
Jan 7
The reason this happens is that nocompile tests run one compile for each fail assertion from what I understand, and then jumbo probably packs a bunch of them together. A real fix is probably to tell jumbo to not jumbo together .nc files. bratell, is that possible to do?
,
Jan 7
It shouldn't be grouping nc files. Only .cpp/.cc/.mm and .c files are (should) be grouped. I looked at the code and I don't see any chance for any other extensions to get into the jumbo files. Is this goma (i.e. 8 files per jumbo chunk) or not (i.e. 50 files per jumbo chunk)? If it's still a timeout issue hidden somewhere.
,
Jan 7
Oh, it's the normal builder. Not sure why I thought it was some internal builder. So they should be goma, but still using large jumbo chunks since that is needed to catch the problems for non-goma users. 120 seconds should be enough, but it depends on the actual hardware and other factors. And jumbo will bring it much closer to the limit. Chunks of 50 files on average takes 5 times longer to compile than 1 file, but I'm sure there are outliers where it's worse.
,
Jan 7
The .nc files get converted to .cc files which are built as a normal test() target: https://cs.chromium.org/chromium/src/build/nocompile.gni?q=nocom&sq=package:chromium&g=0&l=111 Is there a way to not jumbo those?
,
Jan 8
Adding |never_build_jumbo=true| to a target block disables jumbo for it. Typically used when jumbo comes from a template or to disable it for nacl but this seems like an alternative reason. I'm still not sure how they became jumbo though. A |test| target will not expand to jumbo compilation. (I tried that once and it was not possible even if we wanted to.) Is there something else, which does support jumbo compilation, that extracts the sources from the |test| target?
,
Yesterday
(46 hours ago)
Nocompile tests run clang manually via a driver script (see the gni file thakis@ links to in #26). I'm not sure how jumbo builds affect that. But I was pretty certain that the bit that's timing out is the compile initiated by the driver script (as opposed to the compile of the generated "result" cc files that are later packaged into a normal test() target). I'm not sure I'm the best person to look into this further, I'm not an expert in jumbo builds nor nocompile tests :)
,
Yesterday
(39 hours ago)
I've looked at the nocompile tests and there is nothing jumbo there. Just a script that spawns a couple (two) clang processes and wait for them to complete. It could be that parallel jumbo processes slow down the host but the tests run in hundreds of milliseconds so that would mean that the computer is completely locked up for two minutes. Unlikely. Another possibility is that the processes fill up the proc stdout/stderr buffers. The code does not read from any process until they have proc.poll() returns that the process is done. I've not tested this, but I think the buffers are like 64 KB though so that also seems unlikely. Why would clang suddenly throw out 64 KB of data? Still, it's possible.
,
Today
(19 hours ago)
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by bpastene@chromium.org
, Sep 11