Investigate code coverage drop between r565281 and r565413 |
||||
Issue descriptionhttps://chromium-coverage.appspot.com/ REVISION REPORT LINE COVERAGE FUNCTION COVERAGE REGION COVERAGE LOGS 565606 Tests and Fuzzers 71.73% (6858349/9561846) 75.29% (594216/789189) 72.31% (9100536/12585312) Link 565471 Tests and Fuzzers 71.38% (6823616/9559813) 75.00% (591782/789054) 71.83% (9037095/12581754) Link 565413 Tests and Fuzzers 70.97% (6783645/9558023) 74.61% (588619/788911) 71.34% (8972083/12575764) Link 565281 Tests and Fuzzers 72.93% (6967170/9553358) 76.23% (601131/788595) 73.85% (9282559/12569935) Link
,
Jun 10 2018
I am seeing counter overflow issues come back https://chromium-coverage.appspot.com/reports/565637/linux/metadata/_bot.log https://chromium-coverage.appspot.com/reports/565572/linux/metadata/_bot.log and now i saw all -0001 to -0005 stuck on build step, going crazy slow didn't complete in 24 hrs. will try different things on -0001 to see if anything makes a difference.
,
Jun 10 2018
revert https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 on -0001 and restarting made it complete build step in 4 hrs. So, trying -0002 with https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 reverted and -0003 with https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 and rebooting at same time to see if -0002 works and -0003 does not work. need to wait a day on -0001 to verify the coverage report finishes.
,
Jun 11 2018
Weird... :( Thanks for reverting, Abhishek!
,
Jun 11 2018
with use_sanitizer_coverage=false, something is stuck on linking. -0003 shows that app_shell_unittests completed. -0002 seems to stuck on app_shell_unittests PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10072 coverag+ 20 0 15.507g 0.013t 9.509g S 193.8 11.0 14:33.87 ld.lld will check tmrw if -0001 and -0002 completed. also, will try locally if app_shell_unittests hang reproduces.
,
Jun 11 2018
Thanks! I've filed a separate issue 851337 for that problem, as it seems to be unrelated to the coverage drop mentioned in c#0. Those reports were generated before I've modified the build scripts with use_sanitizer_coverage=false.
,
Jun 11 2018
Verified locally, so will revert for now. No rush on getting this optimization in, lets see if things stabilize again. I will restart -0003 to -0005 today.
,
Jun 11 2018
Thanks! +1, let's see what's up with coverage numbers before fixing / re-enabling.
,
Jun 11 2018
This is still happening and report from -0001 got quarantined. See https://chromium-coverage.appspot.com/reports/565910/linux/chromium/src/report.html
,
Jun 11 2018
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZN4base6subtle12Acquire_LoadEPVKl: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE9__is_longEv: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__117__compressed_pairINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repES5_E5firstEv: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNSt3__116__to_raw_pointerIKcEEPT_S3_: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__122__compressed_pair_elemINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repELi0ELb0EE5__getEv: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZN8tcmalloc7MaskPtrEPv: Counter overflow /chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE13__get_pointerEv: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__122__compressed_pair_elemINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repELi0ELb0EE5__getEv: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZN4base6subtle12Acquire_LoadEPVKl: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE9__is_longEv: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZN8tcmalloc7MaskPtrEPv: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__117__compressed_pairINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repES5_E5firstEv: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNSt3__116__to_raw_pointerIKcEEPT_S3_: Counter overflow /chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE13__get_pointerEv: Counter overflow warning: /chromium/src/dumps/swiftshader_unittests/dump.profdata: Counter overflow from https://chromium-coverage.appspot.com/reports/565910/linux/metadata/_bot.log
,
Jun 11 2018
Similar to those that you mentioned in c#2.
,
Jun 11 2018
Weird that those targets do not have the warnings themselves, only when we merge everything together: - https://chromium-coverage.appspot.com/reports/565910/linux/metadata/services_unittests.log - https://chromium-coverage.appspot.com/reports/565910/linux/metadata/interactive_ui_tests.log
,
Jun 11 2018
Looks like again we have the same functions defined in multiple places, e.g. base::subtle::Acquire_Load has its definition in base/atomicops.h: https://cs.chromium.org/chromium/src/base/atomicops.h?q=base::subtle::Acquire_Load&sq=package:chromium&l=118&dr=Ss and a bunch of implementations in in tcmalloc: https://cs.chromium.org/search/?q=inline+Atomic32+Acquire_Load%5C(volatile+const+Atomic32+%5C*ptr%5C)&type=cs
,
Jun 11 2018
tcmalloc::MaskPtr(void* p) seems to be a better example: https://cs.chromium.org/search/?q=tcmalloc::MaskPtr&sq=package:chromium&type=cs
,
Jun 11 2018
There is a bunch of stuff duplicated in the code coverage dumps produced by browser_tests binary. Feels like some things are compiled and linked in multiple times. See demangled.txt for a list of functions.
,
Jun 12 2018
I still don't understand why these overflow occur. It seems to be different from what we've seen in the past. Last time, we could clearly see duplicated counters for the same symbol as the symbol was also duplicated in the binary: https://bugs.chromium.org/p/chromium/issues/detail?id=801362#c12 This time, I don't see any duplicated counters, e.g.: $ for i in dumps/browser_tests/*.profraw; do cmd="./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm $i"; echo -e "\t\tCMD: $cmd"; bash -c "$cmd"; echo "------------------------------------------------"; done CMD: ./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm dumps/browser_tests/dump.13058978560518452138_0.profraw Counters: _ZN10MallocHook13InvokeNewHookEPKvm: Hash: 0x000029a4912a0491 Counters: 3 Function count: 16628957309809515209 Instrumentation level: Front-end Functions shown: 1 Total functions: 2757344 Maximum function count: 18445607871915869446 Maximum internal block count: 18440217365707751424 ------------------------------------------------ CMD: ./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm dumps/browser_tests/dump.13058978560518452138_1.profraw Counters: _ZN10MallocHook13InvokeNewHookEPKvm: Hash: 0x000029a4912a0491 Counters: 3 Function count: 4431705967091179856 Instrumentation level: Front-end Functions shown: 1 Total functions: 2757344 Maximum function count: 16843187341854785896 Maximum internal block count: 16848097578119396637 ------------------------------------------------ OR $ for i in dumps/browser_tests/*.profraw; do cmd="./llvm-profdata show -function=vdbeFreeOpArray $i"; echo -e "\t\tCMD: $cmd"; bash -c "$cmd"; echo "------------------------------------------------"; done CMD: ./llvm-profdata show -function=vdbeFreeOpArray dumps/browser_tests/dump.13058978560518452138_0.profraw Counters: sqlite3.c:vdbeFreeOpArray: Hash: 0x029211e29d491451 Counters: 4 Function count: 9020269409725594203 Instrumentation level: Front-end Functions shown: 1 Total functions: 2757344 Maximum function count: 18445607871915869446 Maximum internal block count: 18440217365707751424 ------------------------------------------------ CMD: ./llvm-profdata show -function=vdbeFreeOpArray dumps/browser_tests/dump.13058978560518452138_1.profraw Counters: sqlite3.c:vdbeFreeOpArray: Hash: 0x029211e29d491451 Counters: 4 Function count: 3224260846701248 Instrumentation level: Front-end Functions shown: 1 Total functions: 2757344 Maximum function count: 16843187341854785896 Maximum internal block count: 16848097578119396637 ------------------------------------------------
,
Jun 12 2018
I'd expect that duplication occurs when processes merge counters from their memory into corresponding files on disk, but that also doesn't seem to be the case. I've ran 22 testcases with "%p", and noticed weird counters for the affected functions. The values are either 0, 1, 514, 515, 516 for -function=vdbeFreeOpArray. All of that seems like: A) processes use shared memory for counters, so they update values of each other somehow B) counters are not getting initialized for some processes, and contains values from previous processes
,
Jun 12 2018
There wasn't a clang roll in 565281:565413, so the culprit must be on Chromium side: https://chromium.googlesource.com/chromium/src/+log/0ae167265d4..3d7f15c742e
,
Jun 12 2018
https://chromium-coverage.appspot.com/reports/566169/linux/metadata/_bot.log - recent bad report has no overflow warnings while merging into profdata, but yes browser_tests merging shows the same issue https://chromium-coverage.appspot.com/reports/566169/linux/metadata/browser_tests.log warning: dumps/browser_tests/dump.15835948910903662433_1.profraw: Counter overflow warning: dumps/browser_tests/dump.15835948910903662433_2.profraw: Counter overflow
,
Jun 12 2018
Interesting! I'm still trying to understand where such large counters come from in browser_tests. Moving slow, as I have to patch compiler-rt, re-build it, re-link browser_tests, and then run at least a few testcases to get some debug info.
,
Jun 13 2018
It's definitely something new... I've been dumping counters when running with %1m, and the values looked good until some point. They were growing on a reasonable delta (0 - 516) per testcase, but at some point they doubled: ... In-File Counter [pid: 14292] @ 7f86491c1778 holding Value: 217757 SrcDataStart: 1655281 In-File Counter [pid: 14292] @ 7f86491c1780 holding Value: 217757 In-File Counter [pid: 14292] @ 7f863902a778 holding Value: 217757 In-File Counter [pid: 14292] @ 7f86491c1788 holding Value: 3034453 In-File Counter [pid: 14292] @ 7f86491c1790 holding Value: 419986 In-File Counter [pid: 14292] @ 7f863902a788 holding Value: 3034453 In-File Counter [pid: 14292] @ 7f863902a790 holding Value: 419986 Index 0 of 4 In-Memory Counter [pid: 8703] @ 2fb6c510 holding Value: 515 Index 1 of 4 In-Memory Counter [pid: 8703] @ 2fb6c518 holding Value: 515 Index 2 of 4 In-Memory Counter [pid: 8703] @ 2fb6c520 holding Value: 7185 Index 3 of 4 In-Memory Counter [pid: 8703] @ 2fb6c528 holding Value: 994 In-File SrcCountersStart: 7f23e3b9fa40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8 In-File Calculation [pid: 8703], Counters: 7f23e51fb778 In-Sanity-File SrcData: 7f23e092cd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281 In-File Counter [pid: 8703] @ 7f23e51fb778 holding Value: 435514 In-File Counter [pid: 8703] @ 7f23e51fb780 holding Value: 435514 In-File Counter [pid: 8703] @ 7f23e51fb788 holding Value: 6068906 In-File Counter [pid: 8703] @ 7f23e51fb790 holding Value: 839972 ... From that point on, though, it's been growing with a reasonable speed again. Until it suddenly doubled once again, and again: ... Index 0 of 4 In-Memory Counter [pid: 103080] @ 2fb6c510 holding Value: 96 Index 1 of 4 In-Memory Counter [pid: 103080] @ 2fb6c518 holding Value: 96 Index 2 of 4 In-Memory Counter [pid: 103080] @ 2fb6c520 holding Value: 620 Index 3 of 4 In-Memory Counter [pid: 103080] @ 2fb6c528 holding Value: 40 In-File SrcCountersStart: 7f98d42d8a40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8 In-File Calculation [pid: 103080], Counters: 7f98d5934778 In-Sanity-File SrcData: 7f98d1065d70, DstData: 3567e488, SrcData - SrcDataStart: 1655281 In-File Counter [pid: 103080] @ 7f98d5934778 holding Value: 773352 In-File Counter [pid: 103080] @ 7f98d5934780 holding Value: 773354 In-File Counter [pid: 103080] @ 7f98d5934788 holding Value: 10874925 In-File Counter [pid: 103080] @ 7f98d5934790 holding Value: 1512757 Index 0 of 4 In-Memory Counter [pid: 103080] @ 2fb6c510 holding Value: 773448 Index 1 of 4 In-Memory Counter [pid: 103080] @ 2fb6c518 holding Value: 773450 Index 2 of 4 In-Memory Counter [pid: 103080] @ 2fb6c520 holding Value: 10875545 Index 3 of 4 In-Memory Counter [pid: 103080] @ 2fb6c528 holding Value: 1512797 In-File SrcCountersStart: 7f98d42d8a40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8 In-File Calculation [pid: 103080], Counters: 7f98d5934778 In-Sanity-File SrcData: 7f98d1065d70, DstData: 3567e488, SrcData - SrcDataStart: 1655281 In-File Counter [pid: 103080] @ 7f98d5934778 holding Value: 773448 In-File Counter [pid: 103080] @ 7f98d5934780 holding Value: 773450 In-File Counter [pid: 103080] @ 7f98d5934788 holding Value: 10875545 In-File Counter [pid: 103080] @ 7f98d5934790 holding Value: 1512797 Index 0 of 4 In-Memory Counter [pid: 103995] @ 2fb6c510 holding Value: 0 Index 1 of 4 In-Memory Counter [pid: 103995] @ 2fb6c518 holding Value: 0 Index 2 of 4 In-Memory Counter [pid: 103995] @ 2fb6c520 holding Value: 0 Index 3 of 4 In-Memory Counter [pid: 103995] @ 2fb6c528 holding Value: 0 In-File SrcCountersStart: 7f1605f9ea40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8 In-File Calculation [pid: 103995], Counters: 7f16075fa778 In-Sanity-File SrcData: 7f1602d2bd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281 In-File Counter [pid: 103995] @ 7f16075fa778 holding Value: 1546896 In-File Counter [pid: 103995] @ 7f16075fa780 holding Value: 1546900 In-File Counter [pid: 103995] @ 7f16075fa788 holding Value: 21751090 In-File Counter [pid: 103995] @ 7f16075fa790 holding Value: 3025594 Index 0 of 4 In-Memory Counter [pid: 103995] @ 2fb6c510 holding Value: 1546896 Index 1 of 4 In-Memory Counter [pid: 103995] @ 2fb6c518 holding Value: 1546900 Index 2 of 4 In-Memory Counter [pid: 103995] @ 2fb6c520 holding Value: 21751090 Index 3 of 4 In-Memory Counter [pid: 103995] @ 2fb6c528 holding Value: 3025594 In-File SrcCountersStart: 7f1605f9ea40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8 In-File Calculation [pid: 103995], Counters: 7f16075fa778 In-Sanity-File SrcData: 7f1602d2bd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281 In-File Counter [pid: 103995] @ 7f16075fa778 holding Value: 1546896 In-File Counter [pid: 103995] @ 7f16075fa780 holding Value: 1546900 In-File Counter [pid: 103995] @ 7f16075fa788 holding Value: 21751090 In-File Counter [pid: 103995] @ 7f16075fa790 holding Value: 3025594 ... This occurrence is more interesting, as I see that merge code gets executed twice. That could explain why numbers get duplicated.
,
Jun 13 2018
I have a fix idea, fingers crossed...
,
Jun 13 2018
The fix seems to work, but it revealed another problem. From time to time the following issue is happening: BrowserTestBase received signal: Terminated. Backtrace: #0 0x000018672630 base::debug::StackTrace::StackTrace() #1 0x00001a4e28fe content::(anonymous namespace)::DumpStackTraceSignalHandler() #2 0x7fc8dcf76030 <unknown> #3 0x7fc8e3ae62b0 do_fcntl #4 0x7fc8e3ae63a2 __libc_fcntl #5 0x00002e466895 lprofOpenFileEx #6 0x00002e4649ec __llvm_profile_write_file #7 0x7fc8dcf78910 <unknown> #8 0x7fc8dcf7896a exit #9 0x7fc8dcf632b8 __libc_start_main #10 0x0000088c202a __llvm_coverage_mapping Parent failed to complete crash dump. Parent failed to complete crash dump. Parent failed to complete crash dump. It doesn't seem to happen too often on the bot, e.g. I see only one occurrence in https://chromium-coverage.appspot.com/reports/566169/linux/metadata/browser_tests.log, but it's still very bad as it may corrupt the dump file and/or reset the counters in there. I'm testing with %8m now locally (as we do on the bot), but am also looking into ways to prevent SIGTERM from killing the dumping process.
,
Jun 13 2018
Test processes are getting killed with SIGTERM
case SIGTERM:
buffer = "Terminated";
break;
while __llvm_profile_write_file has guard only against SIGKILL.
I've got another idea though, will try as well.
,
Jun 13 2018
,
Jun 13 2018
,
Jun 13 2018
Filed issue 852469 for c#25 and c#26, as I'm not sure if it's a real problem or not.
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/611672fbe0b7e44d243692e54e011d48837d111f commit 611672fbe0b7e44d243692e54e011d48837d111f Author: Max Moroz <mmoroz@chromium.org> Date: Wed Jun 13 21:05:51 2018 Code Coverage: Fix concurrent code coverage dumps from content/child processes. Duplication of code coverage counters is bad, as the values grow exponentially, which ends up with the counters being overflown when running large test suites (e.g. browser_tests). Counter Overflow not only causes incorrect coverage to be reported, but also affects the global coverage merge process, which leads to flaky and lower than usual code coverage numbers to be reported. This CL fixes two issues: 1) __llvm_profile_write_file(), which has been used until now, can duplicate code coverage counters when it is called twice or more times. Instead, we should call __llvm_profile_dump() which will not dump the same counters more than once. 2) even though __llvm_profile_dump() prevents us from possible duplications, it is not thread safe, as it is supposed to be called from atexit() handler. Since we have to call it ourselves from various places, and we already do call it from two different sites at least, we still can get duplicated code coverage counters. Thus, we have to ensure thread safety ourselves. Bug: 851230 Change-Id: Ib3bebc8173a8be9b5622f2449da05afad77d0a11 Reviewed-on: https://chromium-review.googlesource.com/1099336 Commit-Queue: Max Moroz <mmoroz@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Cr-Commit-Position: refs/heads/master@{#566983} [modify] https://crrev.com/611672fbe0b7e44d243692e54e011d48837d111f/content/child/child_thread_impl.cc
,
Jun 14 2018
The most recent report looks good: https://chromium-coverage.appspot.com/reports/567015/linux/metadata/browser_tests.log I think we can declare a victory here, but we'll keep an eye on the further reports getting published. |
||||
►
Sign in to add a comment |
||||
Comment 1 by mmoroz@chromium.org
, Jun 9 2018