New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 789733 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 789981
issue 795314
issue 796290

Blocking:
issue 759794
issue 784613



Sign in to add a comment

Issues with unit tests used for code coverage generation

Project Member Reported by mmoroz@chromium.org, Nov 29 2017

Issue description

Tracking issue for figuring out the tests that should be used for code coverage generation.
 

Comment 1 by mmoroz@chromium.org, Nov 29 2017

Blocking: 759794 784613

Comment 2 by mmoroz@chromium.org, Nov 29 2017

I used the following GN flags:

use_clang_coverage = true
is_component_build = false
is_debug = false
use_goma = true


and the following Chromium revision: adb61db19020ed8ecee5e91b1a0ea4c924ae2988 which is r508578 -- branch base commit for beta M63

I've successfully built and run the following tests:

1) breakpad_unittests
2) content_unittests
3) mojo_common_unittests
4) sql_unittests
5) unit_tests
6) cc_blink_unittests
7) crypto_unittests
8) pdf_unittests
9) swiftshader_unittests


Below is the list of the tests that I've failed to build:

cc_unittests
telemetry_unittests
components_unittests
courgette_unittests
extensions_unittests
gpu_unittests
headless_unittests
audio_unittests
media_unittests
media_blink_unittests
media_mojo_unittests
media_service_unittests
net_unittests
services_unittests
service_manager_unittests
skia_unittests
storage_unittests
blink_platform_unittests
blink_heap_unittests
wtf_unittests
blink_common_unittests
angle_unittests
pdfium_unittests
fileutils_unittests
accessibility_unittests
gfx_unittests
gl_unittests
keyboard_unittests
snapshot_unittests
views_unittests
wm_unittests
url_unittests


The full log is available here: https://paste.googleplex.com/5504336041869312


It seems that I might've been not too smart, and ninja simply stopped the build after some amount of the errors occurred. I should have tried to build each target separately. I'll try doing that and will report on how many of "failures" started to work after that.

Comment 3 by mmoroz@chromium.org, Nov 30 2017

Blockedon: 789981

Comment 4 by mmoroz@chromium.org, Nov 30 2017

Yes, it was my bad. I expected "ninja A B C D" to build everything it can (e.g. A, C and D, even if B fails to compile), but that was a wrong assumption.

After trying to build each target separately, I got failures only for the following tests:

telemetry_unittests
extensions_unittests
blink_platform_unittests
fileutils_unittests


All the rest from c#2 have been built successfully.

Comment 5 by mmoroz@chromium.org, Nov 30 2017

A bit more details:

1) telemetry_unittests compiles, but it doesn't produce a binary named "telemetry_unittests": https://cs.chromium.org/chromium/src/chrome/test/BUILD.gn?type=cs&q=telemetry_unittests&sq=package:chromium&l=5242

2) extensions_unittests seems to crash clang: https://paste.googleplex.com/5023383322361856

3) blink_platform_unittests has understandable compilation errors: https://paste.googleplex.com/5150605085507584

4) fileutils_unittests compiles, but it's not a unittests binary: https://cs.chromium.org/chromium/src/third_party/webrtc/test/BUILD.gn?type=cs&q=fileutils_unittests&sq=package:chromium&l=434
Cc: h...@chromium.org dpranke@chromium.org
+cc Dirk, Hans - these unittests should be running with clang on many buildbots. Do you see any failures there.

Comment 7 by h...@chromium.org, Nov 30 2017

The tree would be very red if these tests failed to build with Clang in general. I assume it's a problem that occurs only with the "use_clang_coverage = true" build config? That should be easy to confirm.

Comment 8 by mmoroz@chromium.org, Nov 30 2017

Yes, I used the following GN flags:

use_clang_coverage = true
is_component_build = false
is_debug = false
use_goma = true


On the following revision: adb61db19020ed8ecee5e91b1a0ea4c924ae2988 which is r508578 -- branch base commit for beta M63

Please disregard my comment 2, the only valid failures I see now are:

- extensions_unittests seems to crash clang: https://paste.googleplex.com/5023383322361856

- blink_platform_unittests has understandable compilation errors: https://paste.googleplex.com/5150605085507584

Comment 9 by h...@chromium.org, Dec 1 2017

I tried building at HEAD (#520710) and both targets build successfully with the config from #8.

I also tried at the revision in #8 an can reproduce both failures. The Clang crash for extension_unittests only happens with "use_clang_coverage = true", but the build error for blink_platform_unittests happens without that as well, so that's a bit mysterious.

Is there are reason you want to build with an old revision instead of trunk?

Comment 10 by mmoroz@google.com, Dec 1 2017

>> Is there are reason you want to build with an old revision instead of trunk?

It's only for now. When I started to look for the tests we can run, some of them were failing for me on trunk. This is why I switched to more stable branch, just to simplify that process of determining the tests to be used.

The plan for coverage generation is to do a sanity check first: run a test without coverage to make sure it's not failing on a given revision. After that, we'll be running a coverage instrumented build on the same revision.
One more question: how can I control number of processes spawned by unittests?

There is --test-launcher-jobs=N flag, but it doesn't seem to make any sense to me, the number of processes stays the same.

Another option I know of, is --single-process-tests, that works, but makes tests execution too slow for large tests.

The problem I'm experiencing with large tests: e.g. component_unittests spawns ~290 processes, and each coverage dump is ~900 MB, so I at least 300 GB of free space to run that. Another pain might happen when I'll try to merge those dumps together. It would be great to find some trade-off e.g. spawn no more than 10 processes or another relatively small number.
--test-launcher-jobs=1 should ensure that no more than one test is running at a time, but I believe the tests are still run in subprocesses in batches. --test-launcher-batch-limit I think controls the batch size; I don't know if you can turn batching off completely, or what the impact of that would be, but you could start by setting that to a large value (e.g., 1000 or 10000) and see what happens?
Another issue I'm experiencing with unit tests (especially large ones, e.g. unit_tests) is described here: https://bugs.llvm.org/show_bug.cgi?id=35665

The root cause of taht may not necessary be in the LLVM code, so I'd appreciate if you take a look at the bug description linked above and share any thoughts you may have. Thanks!
Regarding the issue above, it seems to be caused by a null-deref happening while running one of the tests.

Folks, can you point me into the bots building and running tests such as unit_tests? I have some troubles with finding "good" revisions, either I'm building / running stuff in wrong way. Taking a look at the bot doing that properly would help me a lot.
FTR, I build tests with the following GN args:

is_debug = false
is_component_build = false
use_clang_coverage = false # true for coverage build
sanitizer_keep_symbols = true # false when I'm not trying to debug any tests

And running tests with:
$ python testing/xvfb.py out/default/unit_tests --test-launcher-jobs=1 --single-process-tests
It seems that "--single-process-tests" breaks things for me. When I run the same test binary without that option, all the tests pass successfully.

However, I cannot run tests that way for code coverage generation, as it spawns thousands of processed and needs terabytes of disk space.
Blockedon: 795314
Cc: mmoroz@chromium.org
Owner: ----
Status: Available (was: Started)
A correction to my c#16. The "--single-process-tests" indeed breaks things for me, but only because any CHECK failure or NULL-Deref crashes the tests process. Even if I run tests without that flag, there are still different crashes happening, but the whole test binary doesn't crash, as each individual testcase gets executed inside a separate process.

With that, I can't build & run tests in a way that all the tests would pass. I'm definitely doing something wrong, as testbots seem to complete all of the tests successfully almost all the time: https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests%20%28dbg%29%281%29/?limit=200

I'm unassigning myself to avoid spending even more time on desperate attempts to build and run things properly. I'd greatly appreciate if someone can help me with clear instruction on how I should build and run the tests (for instance, unit_tests binary) with a success rate similar to the testbot: https://ci.chromium.org/buildbot/chromium.linux/Linux%20Tests%20%28dbg%29%281%29/?limit=200

In the meantime, I'll try to clean up my optimization on LLVM side and also return to writing some new code for the coverage management system.

Cc: -liaoyuke@chromium.org
Owner: liaoyuke@chromium.org
Discussed with Yuke offline.

TL;DR of the trouble here: I can't successfully run "unit_tests --single-process-tests", it's always crashing on something. Even when it's built without code coverage instrumentation.

I've tried a bunch of different revisions and the following combinations of GN args:

1)

is_debug = false
is_component_build = false
use_clang_coverage = false # true for coverage build
sanitizer_keep_symbols = true # false when I'm not trying to debug any tests


2) 

dcheck_always_on = true
ffmpeg_branding = "Chrome"
is_component_build = false
is_debug = false
proprietary_codecs = true
strip_absolute_paths_from_debug_symbols =true
symbol_level = 1
#use_goma = true

Status: Assigned (was: Available)
Blockedon: 796290
Unfortunately, unit_tests doesn't even compile on Mac with enable_clang_coverage=true, I've filed a separate bug to track, and am working on investigating it.

 crbug.com/796290 
Components: Tools>CodeCoverage
It seems that %Nm option described here [1] might help our issue. At least, it does so on a small example that spawns 20 processes. I'm trying that on unit_tests now, fingers crossed that it will produce a correct result.


1: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program
I've also tried the following approach to run all tests in single process mode:

all_passed_tests = []
while True:
  passed_tests = run test target with --single-process-tests flag but skip running tests in |all_passed_tests| using gtest_filter
  if success:
    break

  all_passed_tests.extend(passed_tests)

Basically, the idea is to run test target in single process mode in one child process, and if it fails, we spawn a new child process and resume from the last failed test.

with unit_tests target, it takes 37 re-tries, which generates 37 coverage data dumps. With content_unittests, it takes nearly 400 re-tries, which is beyond control.

The conclusion is that this approach won't work.
Just tested out the %1m option mentioned in #24 on Mac using url_unittests and crypto_unittests.

Dump size:
  with %p option: 12 * 5M + 12 * 4M = 108M (24 profraw files are generated, 12 for each target).
  with %1m option: 1 * 5M + 1 * 4M = 9M (only 2 profraw files are generated, 1 for each target).

Report generation time:
  with %p option:
    real	0m24.683s
    user	0m55.799s
    sys	        0m9.154s
  With %1m option:
    real	0m22.450s
    user	0m46.538s
    sys	        0m7.581s

(%1m is even a little bit faster than %p)

Correctness:
Please see the attached compare.png file, they are almost the same except for a few lines difference in base/ and third_party/, and given that it's infeasible to decide which one is "correct", I would conclude that both data are correct and make sense.

So, looks like that we're able to get the same data with "%1m" option, which generates manageable coverage data dumps! Thanks Max, this is a great catch! 
compare.png
211 KB View Download
Thanks a lot for testing, Yuke! That indeed sounds very promising, and +1 we don't really know which report is more accurate, so either of them is good enough.

I'm running a bunch of tests with %8m now, will merge them and generate a total report after that.


Status: Fixed (was: Assigned)
Ok, this seems to be resolved, I managed to generate a total report from bunch of different tests: issue 789981.

Sign in to add a comment