New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 801387 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 801362

Blocking:
issue 759794



Sign in to add a comment

Coverage data differs with %p and %1m option

Project Member Reported by liaoyuke@chromium.org, Jan 12 2018

Issue description

The context is from comment #26 in bug:  crbug.com/789733 , need to look into why the data differs, the following is a copy of the comment:

Just tested out the %1m option mentioned in #24 on Mac using url_unittests and crypto_unittests.

Dump size:
  with %p option: 12 * 5M + 12 * 4M = 108M (24 profraw files are generated, 12 for each target).
  with %1m option: 1 * 5M + 1 * 4M = 9M (only 2 profraw files are generated, 1 for each target).

Report generation time:
  with %p option:
    real	0m24.683s
    user	0m55.799s
    sys	        0m9.154s
  With %1m option:
    real	0m22.450s
    user	0m46.538s
    sys	        0m7.581s

(%1m is even a little bit faster than %p)

Correctness:
Please see the attached compare.png file, they are almost the same except for a few lines difference in base/ and third_party/, and given that it's infeasible to decide which one is "correct", I would conclude that both data are correct and make sense.

So, looks like that we're able to get the same data with "%1m" option, which generates manageable coverage data dumps! Thanks Max, this is a great catch! 
 
compare.png
211 KB View Download
Cc: sadrul@chromium.org
see also https://chromium-review.googlesource.com/956930
Labels: -Pri-3 Pri-1
Owner: liaoyuke@chromium.org
Status: Assigned (was: Available)
Yuke, could you please check what exactly differs in case of viz_unittests? I'm curious whether %1m, %2m and %4m would show different results.

Feel free to assign back to me, if you don't have time, I can take a look next week.
This happens with other targets too, e.g. base_unittests.
Blocking: 759794
Sadrul, thank you for trying the tool and for reporting that bug. The differences we've seen previously were fairly insignificant (e.g. screenshot in the c#0), but since you've noticed a worse case, there might be a bug in the merge pool in LLVM or somewhere else, we'll investigate that.
sadrul@, do you mind sharing the exact gn args and test command you used?
Status: Started (was: Assigned)

Comment 8 by sadrul@chromium.org, Mar 10 2018

gn args (on linux):
  is_component_build = false
  is_asan = false
  use_goma = true
  is_debug = false
  dcheck_always_on = true
  use_libfuzzer = true
  use_clang_coverage = true

Comment 9 by sadrul@chromium.org, Mar 10 2018

One example with viz_unittests for -f components/viz:
. Results with %4m: http://springfield1.wat.corp.google.com/stuff/viz_unittests/with-4m/file_view_index.html
. Results with %p: http://springfield1.wat.corp.google.com/stuff/viz_unittests/with-p/file_view_index.html

With %4m, see, for an example, the coverage for components/viz/common/gpu/context_cache_controller_unittest.cc is reported as 0%, although if you look at the logs (http://springfield1.wat.corp.google.com/stuff/viz_unittests/with-4m/viz_unittests_output.txt), the tests in that file did run successfully.

With %p, the report shows 100% coverage for that file.

Note also that with %4m, the output-report can be different between consecutive runs of the same test-executable.
Thanks sadrul@. Weird, I can't reproduce it. At least, when I do all the steps manually:


mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ cat out/coverage/args.gn 
use_clang_coverage = true
ffmpeg_branding = "ChromeOS"
is_component_build = false
is_debug = false
proprietary_codecs = true
strip_absolute_paths_from_debug_symbols = true
use_libfuzzer = true
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ninja -C out/coverage -j64 viz_unittests
ninja: Entering directory `out/coverage'
[70/70] LINK ./viz_unittests
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ mkdir viz_p
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ mkdir viz_m
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ nano run_chrome_unittests.sh 
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ export LLVM_PROFILE_FILE="viz_p/viz.%p.profraw"
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ time python testing/xvfb.py out/coverage/viz_unittests --test-launcher-jobs=1 > viz_p/log.txt

real	6m49.563s
user	6m14.908s
sys	0m21.676s
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ export LLVM_PROFILE_FILE="viz_m/viz.%m.profraw"
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ time python testing/xvfb.py out/coverage/viz_unittests --test-launcher-jobs=1 > viz_m/log.txt

real	6m44.926s
user	6m15.724s
sys	0m16.588s
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ third_party/llvm-build/Release+Asserts/bin/llvm-profdata merge -sparse viz_p/*.profraw -o viz_p/dump.profdata
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ third_party/llvm-build/Release+Asserts/bin/llvm-profdata merge -sparse viz_m/*.profraw -o viz_m/dump.profdata
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls viz_p/ | wc -l
192
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls viz_m/ | wc -l
4
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls -l viz_*/dump.profdata
-rw-rw-r-- 1 mmoroz mmoroz 17590688 Mar 10 04:13 viz_m/dump.profdata
-rw-rw-r-- 1 mmoroz mmoroz 17590688 Mar 10 04:13 viz_p/dump.profdata
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ time third_party/llvm-build/Release+Asserts/bin/llvm-cov show -format html -o viz_p_report -instr-profile viz_p/dump.profdata out/coverage/viz_unittests components/viz

real	0m3.388s
user	0m30.268s
sys	0m0.448s
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ time third_party/llvm-build/Release+Asserts/bin/llvm-cov show -format html -o viz_m_report -instr-profile viz_m/dump.profdata out/coverage/viz_unittests components/viz

real	0m3.293s
user	0m30.596s
sys	0m0.444s
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ zip -rq viz_p_report.zip viz_p_report
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ zip -rq viz_m_report.zip viz_m_report
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls -l viz*.zip
-rw-rw-r-- 1 mmoroz mmoroz 2002744 Mar 10 04:17 viz_m_report.zip
-rw-rw-r-- 1 mmoroz mmoroz 2002743 Mar 10 04:16 viz_p_report.zip
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls -l viz_m_report
total 112
drwxrwx--- 3 mmoroz mmoroz     60 Mar 10 04:16 coverage
-rw-rw-r-- 1 mmoroz mmoroz 108326 Mar 10 04:16 index.html
-rw-rw-r-- 1 mmoroz mmoroz   2519 Mar 10 04:16 style.css
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ ls -l viz_p_report
total 112
drwxrwx--- 3 mmoroz mmoroz     60 Mar 10 04:16 coverage
-rw-rw-r-- 1 mmoroz mmoroz 108326 Mar 10 04:16 index.html
-rw-rw-r-- 1 mmoroz mmoroz   2519 Mar 10 04:16 style.css
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ sha1sum viz_*report/index.html
d725b4fca2f7abf9a6356ff23ccca0ca80823853  viz_m_report/index.html
d725b4fca2f7abf9a6356ff23ccca0ca80823853  viz_p_report/index.html
mmoroz@code-coverage-wild:/mnt/ram-disk/chromium/src$ git log -n 1
commit 470ff7ff522a1613d567a6aad3e66c1fb7aeb008
Author: Erik Chen <erikchen@chromium.org>
Date:   Sat Mar 10 01:39:24 2018 +0000

    Tentative fix for LoginHandler leak.
    
    The ownership semantics for LoginHandler are very confusing. The class is
    self-owned, and is only destroyed when LoginHandler::ReleaseSoon() is called.
    But this relies on the assumption that the LoginHandlerView is shown, which isn't
    always the case. This CL adds tracking for whether the LoginHandlerView has been
    shown, and if it has never been shown but CancelAuth() is called, then
    LoginHandler::ReleaseSoon() will also be called.
    
    This relies on the assumption that all codes paths must either show the
    LoginHandlerView or else call CancelAuth().
    
    Ideally, the whole class should be refactored to have saner ownership
    semantics.
    
    Bug:  814334 
    Change-Id: I28085ca948a5629f11e6263153938e5c87f42518
    Reviewed-on: https://chromium-review.googlesource.com/957283
    Reviewed-by: Mustafa Emre Acer <meacer@chromium.org>
    Commit-Queue: Erik Chen <erikchen@chromium.org>
    Cr-Commit-Position: refs/heads/master@{#542305}



viz_p_report.zip
1.9 MB Download
viz_m_report.zip
1.9 MB Download
Yuke, is it possible that we don't pass all *.profraw paths to the llvm-profdata merge command https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py?q=coverage.py&sq=package:chromium&l=917 ?
The index.html does not actually have any real content. Does sha1sum on file_view_index.html instead show difference?

Also, I am hoping to be able to not require --test-launcher-jobs=1

(btw: I appreciate you taking the time to respond while you are ooo :) Thank you!)
sadrul@, in my case index.html does have a real content, as I used llvm-cov and llvm-profdata tools manually (see all steps in c#10).

I also tried using the script:

$ tools/code_coverage/coverage.py viz_unittests -b out/coverage -o viz_report_via_script_with_p -c 'python testing/xvfb.py out/coverage/viz_unittests --test-launcher-jobs=1' -f components/viz/

and

$ tools/code_coverage/coverage.py viz_unittests -b out/coverage -o viz_report_via_script_as_is -c 'python testing/xvfb.py out/coverage/viz_unittests --test-launcher-jobs=1' -f components/viz/


and obtained the same results again (reports attached):

$ sha1sum viz_report_via_script_*/file_view_index.html
426f7b252f33aa076d28c4cac6d7781884b792a9  viz_report_via_script_as_is/file_view_index.html
426f7b252f33aa076d28c4cac6d7781884b792a9  viz_report_via_script_with_p/file_view_index.html


Finally, I removed --test-launcher-jobs=1 argument and used the script again:

$ tools/code_coverage/coverage.py viz_unittests -b out/coverage -o viz_report_via_script_as_is_2 -c 'python testing/xvfb.py out/coverage/viz_unittests' -f components/viz/

$ tools/code_coverage/coverage.py viz_unittests -b out/coverage -o viz_report_via_script_with_p_2 -c 'python testing/xvfb.py out/coverage/viz_unittests' -f components/viz/


And obtained the same reports once again (attached with _2 suffix in archive name).

$ sha1sum viz_report_via_script_*_2/file_view_index.html
426f7b252f33aa076d28c4cac6d7781884b792a9  viz_report_via_script_as_is_2/file_view_index.html
426f7b252f33aa076d28c4cac6d7781884b792a9  viz_report_via_script_with_p_2/file_view_index.html

At this point I'm getting confused, as I can't reproduce the mismatch you've got.
viz_report_via_script_as_is.zip
6.5 MB Download
viz_report_via_script_with_p.zip
6.5 MB Download
viz_report_via_script_as_is_2.zip
6.5 MB Download
viz_report_via_script_with_p_2.zip
6.5 MB Download
FTR, arguments I used:

$ cat out/coverage/args.gn 
use_clang_coverage = true
ffmpeg_branding = "ChromeOS"
is_component_build = false
is_debug = false
proprietary_codecs = true
strip_absolute_paths_from_debug_symbols = true
use_libfuzzer = true

I am pretty confused too. From the OP, at least liaoyuke@ also did get different results with %1m and %p. So maybe he will have better luck repro-ing?
btw: As noted in comment#9, running with %4m can give out different results. I tried the following, with %4m:

~/devel/c/src$ for ((i=0;i<10;++i)); do python tools/code_coverage/coverage.py viz_unittests -b out/lin -o out/report-viz-4m -c 'python testing/xvfb.py out/lin/viz_unittests' -f components/viz >/dev/null 2>/dev/null; sha1sum out/report-viz-4m/file_view_index.html; rm -rf out/report-viz-4m; done
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
55ac6ded7e78d7c17f3a97b233b2e335aa053956  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
dffece49087ce3c6ee3b89c618bdb08cf47c3406  out/report-viz-4m/file_view_index.html
959fb7fcc3c223ff31126cae5f9c8c1d72fc123c  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-4m/file_view_index.html

and with %p:

~/devel/c/src$ for ((i=0;i<10;++i)); do python tools/code_coverage/coverage.py viz_unittests -b out/lin -o out/report-viz-p -c 'python testing/xvfb.py out/lin/viz_unittests' -f components/viz >/dev/null 2>/dev/null; sha1sum out/report-viz-p/file_view_index.html; rm -rf out/report-viz-p; done
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html
d431eb57f79401de955443231e431a4e74d15212  out/report-viz-p/file_view_index.html

I think I'm able to repro the issue sadrul@ mentioned, using %4m, the total coverage of components/viz is around 65%, however, with %p, the total coverage increases to around 90%. I'll take a closer look at this today
And I can also confirm that %4m give out different results though the difference is not significant.
This is so weird, and I'm getting quite confused, I only repro'ed once, now it's gone, %4m now always generates the same results as %p.
hmmmm, sadrul@, in #16, out of your 10 runs with %4m option, 7 runs generated the same results as %p, so I'm guessing this is flake. I'll try to figure out what exactly caused the difference.
Right. I think it's a timing issue, depending on how the test-subprocesses are launched/managed by the test-launcher, and ... maybe one subprocess somehow manages to overwrite the content from another subprocess, or something like that.
I couldn't find anything interesting at Chromium side, I'll dig into llvm-cov.
I think the first thing we need to figure out is which step went wrong: profraw generation or profraw merge.

mmoroz@, do you happen to know any tool to inspect the profraw files?
Yuke, I don't know if there is anything except of llvm-profdata. There seems to be "llvm-profdata show" command (https://llvm.org/docs/CommandGuide/llvm-profdata.html#id4) which may give some insight, I guess.

Other than that, I feel that the root cause is either c#21 or some issue with the merge pool. I haven't looked at its implementation yet, but my current hypothesis is that sometimes a process may be locked by the merge pool (i.e. waiting for a .profraw file available for writing) for too long, thus getting killed by the test launcher or something else.

I'll take a look tomorrow or later this week.
Re c#23: Yuke, I'm quite confident that "profdata merge" works well, because:

a) it's relatively simple operation

b) we do "profdata merge" for 250+ fuzz targets and the results are consistent over time


Yes, I agree with the assessment, I tested on iOS platform, which run tests in single process, and results are also reliable.

I don't think any process was killed by the test launcher as the output file shows that all tests passed, it's more likely that there is a racing issue that gets the data overwritten.

I'm making a very slow progress investigating as I'm still ramping up llvm, but I'll keep working it. Max, feel free to take this bug when you have time and feel like doing some investigation.
Owner: mmoroz@chromium.org
I guess I can steal this, since Yuke got quite busy with other issues and I'm back to work now.
I'm adding some instrumentation to the LLVM profiler part that does merge when %Nm specifier is used, will see whether it works well or not.
If it's too hard to fix, one workaround we can do is:

Use %9m and set --test-launcher-jobs=9.

I tested locally, tests run as fast as without --test-launcher-jobs, and the generated coverage results are correct and stable across 40 runs. 9 intermediate dumps takes more disk space than 4, but I think it's acceptable, and given that we delete them after "merge", they won't accumulate.
Blockedon: 801362
Yuke and I have just noticed that an incorrect result happens in cases when during "llvm-profdata merge" step there are warnings like the following:

...
[2018-03-15 11:21:13,498 INFO] Creating the coverage profile data file
out/2_report-viz-4m_7/viz_unittests.6671870263282226682_0.profraw: pthread_equal: Counter overflow
out/2_report-viz-4m_7/viz_unittests.6671870263282226682_1.profraw: pthread_equal: Counter overflow
out/2_report-viz-4m_7/viz_unittests.6671870263282226682_1.profraw: pthread_equal: Counter overflow
warning: out/2_report-viz-4m_7/viz_unittests.15835931327307552609_2.profraw: Counter overflow
[2018-03-15 11:21:18,341 INFO] Code coverage profile data is created as: out/2_report-viz-4m_7/coverage.profdata
...

I'm also seeing those quite often when running larger unittests.

This particular case is a good one because sometimes neither of profraw files in the pool has an overflow, and sometimes some of them may have it. That happens because across different runs child processes are distributed across .profraw files differently. So, using a bigger pool as suggested in c#29 may reduce risk of this occurring, but we still be seeing that problem, especially with larger tests.

We have a bug open for counter overflow warning, I'm jumping on it now.
Status: Fixed (was: Started)

Sign in to add a comment