Issues with unit tests used for code coverage generation |
||||||||||
Issue descriptionTracking issue for figuring out the tests that should be used for code coverage generation.
,
Nov 29 2017
I used the following GN flags: use_clang_coverage = true is_component_build = false is_debug = false use_goma = true and the following Chromium revision: adb61db19020ed8ecee5e91b1a0ea4c924ae2988 which is r508578 -- branch base commit for beta M63 I've successfully built and run the following tests: 1) breakpad_unittests 2) content_unittests 3) mojo_common_unittests 4) sql_unittests 5) unit_tests 6) cc_blink_unittests 7) crypto_unittests 8) pdf_unittests 9) swiftshader_unittests Below is the list of the tests that I've failed to build: cc_unittests telemetry_unittests components_unittests courgette_unittests extensions_unittests gpu_unittests headless_unittests audio_unittests media_unittests media_blink_unittests media_mojo_unittests media_service_unittests net_unittests services_unittests service_manager_unittests skia_unittests storage_unittests blink_platform_unittests blink_heap_unittests wtf_unittests blink_common_unittests angle_unittests pdfium_unittests fileutils_unittests accessibility_unittests gfx_unittests gl_unittests keyboard_unittests snapshot_unittests views_unittests wm_unittests url_unittests The full log is available here: https://paste.googleplex.com/5504336041869312 It seems that I might've been not too smart, and ninja simply stopped the build after some amount of the errors occurred. I should have tried to build each target separately. I'll try doing that and will report on how many of "failures" started to work after that.
,
Nov 30 2017
,
Nov 30 2017
Yes, it was my bad. I expected "ninja A B C D" to build everything it can (e.g. A, C and D, even if B fails to compile), but that was a wrong assumption. After trying to build each target separately, I got failures only for the following tests: telemetry_unittests extensions_unittests blink_platform_unittests fileutils_unittests All the rest from c#2 have been built successfully.
,
Nov 30 2017
A bit more details: 1) telemetry_unittests compiles, but it doesn't produce a binary named "telemetry_unittests": https://cs.chromium.org/chromium/src/chrome/test/BUILD.gn?type=cs&q=telemetry_unittests&sq=package:chromium&l=5242 2) extensions_unittests seems to crash clang: https://paste.googleplex.com/5023383322361856 3) blink_platform_unittests has understandable compilation errors: https://paste.googleplex.com/5150605085507584 4) fileutils_unittests compiles, but it's not a unittests binary: https://cs.chromium.org/chromium/src/third_party/webrtc/test/BUILD.gn?type=cs&q=fileutils_unittests&sq=package:chromium&l=434
,
Nov 30 2017
+cc Dirk, Hans - these unittests should be running with clang on many buildbots. Do you see any failures there.
,
Nov 30 2017
The tree would be very red if these tests failed to build with Clang in general. I assume it's a problem that occurs only with the "use_clang_coverage = true" build config? That should be easy to confirm.
,
Nov 30 2017
Yes, I used the following GN flags: use_clang_coverage = true is_component_build = false is_debug = false use_goma = true On the following revision: adb61db19020ed8ecee5e91b1a0ea4c924ae2988 which is r508578 -- branch base commit for beta M63 Please disregard my comment 2, the only valid failures I see now are: - extensions_unittests seems to crash clang: https://paste.googleplex.com/5023383322361856 - blink_platform_unittests has understandable compilation errors: https://paste.googleplex.com/5150605085507584
,
Dec 1 2017
I tried building at HEAD (#520710) and both targets build successfully with the config from #8. I also tried at the revision in #8 an can reproduce both failures. The Clang crash for extension_unittests only happens with "use_clang_coverage = true", but the build error for blink_platform_unittests happens without that as well, so that's a bit mysterious. Is there are reason you want to build with an old revision instead of trunk?
,
Dec 1 2017
>> Is there are reason you want to build with an old revision instead of trunk? It's only for now. When I started to look for the tests we can run, some of them were failing for me on trunk. This is why I switched to more stable branch, just to simplify that process of determining the tests to be used. The plan for coverage generation is to do a sanity check first: run a test without coverage to make sure it's not failing on a given revision. After that, we'll be running a coverage instrumented build on the same revision.
,
Dec 1 2017
One more question: how can I control number of processes spawned by unittests? There is --test-launcher-jobs=N flag, but it doesn't seem to make any sense to me, the number of processes stays the same. Another option I know of, is --single-process-tests, that works, but makes tests execution too slow for large tests. The problem I'm experiencing with large tests: e.g. component_unittests spawns ~290 processes, and each coverage dump is ~900 MB, so I at least 300 GB of free space to run that. Another pain might happen when I'll try to merge those dumps together. It would be great to find some trade-off e.g. spawn no more than 10 processes or another relatively small number.
,
Dec 1 2017
--test-launcher-jobs=1 should ensure that no more than one test is running at a time, but I believe the tests are still run in subprocesses in batches. --test-launcher-batch-limit I think controls the batch size; I don't know if you can turn batching off completely, or what the impact of that would be, but you could start by setting that to a large value (e.g., 1000 or 10000) and see what happens?
,
Dec 14 2017
Another issue I'm experiencing with unit tests (especially large ones, e.g. unit_tests) is described here: https://bugs.llvm.org/show_bug.cgi?id=35665 The root cause of taht may not necessary be in the LLVM code, so I'd appreciate if you take a look at the bug description linked above and share any thoughts you may have. Thanks!
,
Dec 15 2017
Regarding the issue above, it seems to be caused by a null-deref happening while running one of the tests. Folks, can you point me into the bots building and running tests such as unit_tests? I have some troubles with finding "good" revisions, either I'm building / running stuff in wrong way. Taking a look at the bot doing that properly would help me a lot.
,
Dec 15 2017
FTR, I build tests with the following GN args: is_debug = false is_component_build = false use_clang_coverage = false # true for coverage build sanitizer_keep_symbols = true # false when I'm not trying to debug any tests And running tests with: $ python testing/xvfb.py out/default/unit_tests --test-launcher-jobs=1 --single-process-tests
,
Dec 15 2017
It seems that "--single-process-tests" breaks things for me. When I run the same test binary without that option, all the tests pass successfully. However, I cannot run tests that way for code coverage generation, as it spawns thousands of processed and needs terabytes of disk space.
,
Dec 15 2017
,
Dec 15 2017
A correction to my c#16. The "--single-process-tests" indeed breaks things for me, but only because any CHECK failure or NULL-Deref crashes the tests process. Even if I run tests without that flag, there are still different crashes happening, but the whole test binary doesn't crash, as each individual testcase gets executed inside a separate process. With that, I can't build & run tests in a way that all the tests would pass. I'm definitely doing something wrong, as testbots seem to complete all of the tests successfully almost all the time: https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests%20%28dbg%29%281%29/?limit=200 I'm unassigning myself to avoid spending even more time on desperate attempts to build and run things properly. I'd greatly appreciate if someone can help me with clear instruction on how I should build and run the tests (for instance, unit_tests binary) with a success rate similar to the testbot: https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests%20%28dbg%29%281%29/?limit=200 In the meantime, I'll try to clean up my optimization on LLVM side and also return to writing some new code for the coverage management system.
,
Dec 19 2017
Discussed with Yuke offline. TL;DR of the trouble here: I can't successfully run "unit_tests --single-process-tests", it's always crashing on something. Even when it's built without code coverage instrumentation. I've tried a bunch of different revisions and the following combinations of GN args: 1) is_debug = false is_component_build = false use_clang_coverage = false # true for coverage build sanitizer_keep_symbols = true # false when I'm not trying to debug any tests 2) dcheck_always_on = true ffmpeg_branding = "Chrome" is_component_build = false is_debug = false proprietary_codecs = true strip_absolute_paths_from_debug_symbols =true symbol_level = 1 #use_goma = true
,
Dec 19 2017
,
Dec 19 2017
,
Dec 19 2017
Unfortunately, unit_tests doesn't even compile on Mac with enable_clang_coverage=true, I've filed a separate bug to track, and am working on investigating it. crbug.com/796290
,
Dec 22 2017
,
Jan 10 2018
It seems that %Nm option described here [1] might help our issue. At least, it does so on a small example that spawns 20 processes. I'm trying that on unit_tests now, fingers crossed that it will produce a correct result. 1: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
,
Jan 10 2018
I've also tried the following approach to run all tests in single process mode:
all_passed_tests = []
while True:
passed_tests = run test target with --single-process-tests flag but skip running tests in |all_passed_tests| using gtest_filter
if success:
break
all_passed_tests.extend(passed_tests)
Basically, the idea is to run test target in single process mode in one child process, and if it fails, we spawn a new child process and resume from the last failed test.
with unit_tests target, it takes 37 re-tries, which generates 37 coverage data dumps. With content_unittests, it takes nearly 400 re-tries, which is beyond control.
The conclusion is that this approach won't work.
,
Jan 11 2018
Just tested out the %1m option mentioned in #24 on Mac using url_unittests and crypto_unittests.
Dump size:
with %p option: 12 * 5M + 12 * 4M = 108M (24 profraw files are generated, 12 for each target).
with %1m option: 1 * 5M + 1 * 4M = 9M (only 2 profraw files are generated, 1 for each target).
Report generation time:
with %p option:
real 0m24.683s
user 0m55.799s
sys 0m9.154s
With %1m option:
real 0m22.450s
user 0m46.538s
sys 0m7.581s
(%1m is even a little bit faster than %p)
Correctness:
Please see the attached compare.png file, they are almost the same except for a few lines difference in base/ and third_party/, and given that it's infeasible to decide which one is "correct", I would conclude that both data are correct and make sense.
So, looks like that we're able to get the same data with "%1m" option, which generates manageable coverage data dumps! Thanks Max, this is a great catch!
,
Jan 11 2018
Thanks a lot for testing, Yuke! That indeed sounds very promising, and +1 we don't really know which report is more accurate, so either of them is good enough. I'm running a bunch of tests with %8m now, will merge them and generate a total report after that.
,
Jan 11 2018
Ok, this seems to be resolved, I managed to generate a total report from bunch of different tests: issue 789981. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by mmoroz@chromium.org
, Nov 29 2017