New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 851230 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Investigate code coverage drop between r565281 and r565413

Project Member Reported by mmoroz@chromium.org, Jun 9 2018

Issue description

https://chromium-coverage.appspot.com/

REVISION	REPORT	LINE COVERAGE	FUNCTION COVERAGE	REGION COVERAGE	LOGS
565606	Tests and Fuzzers	 71.73% (6858349/9561846)	 75.29% (594216/789189)	 72.31% (9100536/12585312)	Link
565471	Tests and Fuzzers	 71.38% (6823616/9559813)	 75.00% (591782/789054)	 71.83% (9037095/12581754)	Link
565413	Tests and Fuzzers	 70.97% (6783645/9558023)	 74.61% (588619/788911)	 71.34% (8972083/12575764)	Link
565281	Tests and Fuzzers	 72.93% (6967170/9553358)	 76.23% (601131/788595)	 73.85% (9282559/12569935)	Link
 
I would suspect that this could be caused by my recent changes ( issue 850742 ), but apparently it wasn't, e.g. ctrl+F "gn gen" in https://chromium-coverage.appspot.com/reports/565413/linux/metadata/_bot.log demonstrates that old combination of GN flags was used.
I am seeing counter overflow issues come back
https://chromium-coverage.appspot.com/reports/565637/linux/metadata/_bot.log
https://chromium-coverage.appspot.com/reports/565572/linux/metadata/_bot.log

and now i saw all -0001 to -0005 stuck on build step, going crazy slow didn't complete in 24 hrs. will try different things on -0001 to see if anything makes a difference.
revert https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 on -0001 and restarting made it complete build step in 4 hrs.

So, trying -0002 with https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 reverted and -0003 with https://chrome-internal-review.googlesource.com/c/chrome/tools/code-coverage/+/637829 and rebooting at same time to see if -0002 works and -0003 does not work. 

need to wait a day on -0001 to verify the coverage report finishes.

Comment 4 by mmoroz@chromium.org, Jun 11 2018

Weird... :( Thanks for reverting, Abhishek!
with use_sanitizer_coverage=false, something is stuck on linking. -0003 shows that app_shell_unittests completed. -0002 seems to stuck on app_shell_unittests

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
10072 coverag+  20   0 15.507g 0.013t 9.509g S 193.8 11.0  14:33.87 ld.lld

will check tmrw if -0001 and -0002 completed. also, will try locally if app_shell_unittests hang reproduces.

Comment 6 by mmoroz@chromium.org, Jun 11 2018

Thanks! I've filed a separate  issue 851337  for that problem, as it seems to be unrelated to the coverage drop mentioned in c#0. Those reports were generated before I've modified the build scripts with use_sanitizer_coverage=false.
Verified locally, so will revert for now. No rush on getting this optimization in, lets see if things stabilize again. I will restart -0003 to -0005 today.

Comment 8 by mmoroz@chromium.org, Jun 11 2018

Thanks! +1, let's see what's up with coverage numbers before fixing / re-enabling.

Comment 9 Deleted

Comment 10 Deleted

This is still happening and report from -0001 got quarantined.
See
https://chromium-coverage.appspot.com/reports/565910/linux/chromium/src/report.html
Status: Started (was: Untriaged)
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZN4base6subtle12Acquire_LoadEPVKl: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE9__is_longEv: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__117__compressed_pairINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repES5_E5firstEv: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNSt3__116__to_raw_pointerIKcEEPT_S3_: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__122__compressed_pair_elemINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repELi0ELb0EE5__getEv: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZN8tcmalloc7MaskPtrEPv: Counter overflow
/chromium/src/dumps/interactive_ui_tests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE13__get_pointerEv: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__122__compressed_pair_elemINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repELi0ELb0EE5__getEv: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZN4base6subtle12Acquire_LoadEPVKl: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE9__is_longEv: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZN8tcmalloc7MaskPtrEPv: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__117__compressed_pairINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE5__repES5_E5firstEv: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNSt3__116__to_raw_pointerIKcEEPT_S3_: Counter overflow
/chromium/src/dumps/swiftshader_unittests/dump.profdata: _ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE13__get_pointerEv: Counter overflow
warning: /chromium/src/dumps/swiftshader_unittests/dump.profdata: Counter overflow


from https://chromium-coverage.appspot.com/reports/565910/linux/metadata/_bot.log
Similar to those that you mentioned in c#2.
Looks like again we have the same functions defined in multiple places, e.g. base::subtle::Acquire_Load has its definition in base/atomicops.h:

https://cs.chromium.org/chromium/src/base/atomicops.h?q=base::subtle::Acquire_Load&sq=package:chromium&l=118&dr=Ss

and a bunch of implementations in in tcmalloc:

https://cs.chromium.org/search/?q=inline+Atomic32+Acquire_Load%5C(volatile+const+Atomic32+%5C*ptr%5C)&type=cs

tcmalloc::MaskPtr(void* p) seems to be a better example:

https://cs.chromium.org/search/?q=tcmalloc::MaskPtr&sq=package:chromium&type=cs


There is a bunch of stuff duplicated in the code coverage dumps produced by browser_tests binary. Feels like some things are compiled and linked in multiple times. See demangled.txt for a list of functions.
log.txt
186 KB View Download
names.txt
79.1 KB View Download
demangled.txt
145 KB View Download
I still don't understand why these overflow occur. It seems to be different from what we've seen in the past. Last time, we could clearly see duplicated counters for the same symbol as the symbol was also duplicated in the binary: https://bugs.chromium.org/p/chromium/issues/detail?id=801362#c12


This time, I don't see any duplicated counters, e.g.:


$ for i in dumps/browser_tests/*.profraw; do cmd="./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm $i"; echo -e "\t\tCMD: $cmd"; bash -c "$cmd"; echo "------------------------------------------------"; done
		CMD: ./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm dumps/browser_tests/dump.13058978560518452138_0.profraw
Counters:
  _ZN10MallocHook13InvokeNewHookEPKvm:
    Hash: 0x000029a4912a0491
    Counters: 3
    Function count: 16628957309809515209
Instrumentation level: Front-end
Functions shown: 1
Total functions: 2757344
Maximum function count: 18445607871915869446
Maximum internal block count: 18440217365707751424
------------------------------------------------
		CMD: ./llvm-profdata show -function=_ZN10MallocHook13InvokeNewHookEPKvm dumps/browser_tests/dump.13058978560518452138_1.profraw
Counters:
  _ZN10MallocHook13InvokeNewHookEPKvm:
    Hash: 0x000029a4912a0491
    Counters: 3
    Function count: 4431705967091179856
Instrumentation level: Front-end
Functions shown: 1
Total functions: 2757344
Maximum function count: 16843187341854785896
Maximum internal block count: 16848097578119396637
------------------------------------------------




OR



$ for i in dumps/browser_tests/*.profraw; do cmd="./llvm-profdata show -function=vdbeFreeOpArray $i"; echo -e "\t\tCMD: $cmd"; bash -c "$cmd"; echo "------------------------------------------------"; done
		CMD: ./llvm-profdata show -function=vdbeFreeOpArray dumps/browser_tests/dump.13058978560518452138_0.profraw
Counters:
  sqlite3.c:vdbeFreeOpArray:
    Hash: 0x029211e29d491451
    Counters: 4
    Function count: 9020269409725594203
Instrumentation level: Front-end
Functions shown: 1
Total functions: 2757344
Maximum function count: 18445607871915869446
Maximum internal block count: 18440217365707751424
------------------------------------------------
		CMD: ./llvm-profdata show -function=vdbeFreeOpArray dumps/browser_tests/dump.13058978560518452138_1.profraw
Counters:
  sqlite3.c:vdbeFreeOpArray:
    Hash: 0x029211e29d491451
    Counters: 4
    Function count: 3224260846701248
Instrumentation level: Front-end
Functions shown: 1
Total functions: 2757344
Maximum function count: 16843187341854785896
Maximum internal block count: 16848097578119396637
------------------------------------------------


I'd expect that duplication occurs when processes merge counters from their memory into corresponding files on disk, but that also doesn't seem to be the case. I've ran 22 testcases with "%p",  and noticed weird counters for the affected functions. The values are either 0, 1, 514, 515, 516 for -function=vdbeFreeOpArray.

All of that seems like:

A) processes use shared memory for counters, so they update values of each other somehow

B) counters are not getting initialized for some processes, and contains values from previous processes





log_per_p.txt
18.4 KB View Download
There wasn't a clang roll in 565281:565413, so the culprit must be on Chromium side: https://chromium.googlesource.com/chromium/src/+log/0ae167265d4..3d7f15c742e
https://chromium-coverage.appspot.com/reports/566169/linux/metadata/_bot.log - recent bad report has no overflow warnings while merging into profdata, but yes browser_tests merging shows the same issue
https://chromium-coverage.appspot.com/reports/566169/linux/metadata/browser_tests.log

warning: dumps/browser_tests/dump.15835948910903662433_1.profraw: Counter overflow
warning: dumps/browser_tests/dump.15835948910903662433_2.profraw: Counter overflow

Interesting! I'm still trying to understand where such large counters come from in browser_tests. Moving slow, as I have to patch compiler-rt, re-build it, re-link browser_tests, and then run at least a few testcases to get some debug info.
It's definitely something new... I've been dumping counters when running with %1m, and the values looked good until some point. They were growing on a reasonable delta (0 - 516) per testcase, but at some point they doubled:

...
In-File  Counter [pid: 14292] @ 7f86491c1778 holding Value: 217757
 SrcDataStart: 1655281
In-File  Counter [pid: 14292] @ 7f86491c1780 holding Value: 217757
In-File  Counter [pid: 14292] @ 7f863902a778 holding Value: 217757
In-File  Counter [pid: 14292] @ 7f86491c1788 holding Value: 3034453
In-File  Counter [pid: 14292] @ 7f86491c1790 holding Value: 419986
In-File  Counter [pid: 14292] @ 7f863902a788 holding Value: 3034453
In-File  Counter [pid: 14292] @ 7f863902a790 holding Value: 419986
Index 0 of 4
In-Memory Counter [pid: 8703] @ 2fb6c510 holding Value: 515
Index 1 of 4
In-Memory Counter [pid: 8703] @ 2fb6c518 holding Value: 515
Index 2 of 4
In-Memory Counter [pid: 8703] @ 2fb6c520 holding Value: 7185
Index 3 of 4
In-Memory Counter [pid: 8703] @ 2fb6c528 holding Value: 994
In-File  SrcCountersStart: 7f23e3b9fa40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8
In-File Calculation [pid: 8703], Counters: 7f23e51fb778
In-Sanity-File  SrcData: 7f23e092cd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281
In-File  Counter [pid: 8703] @ 7f23e51fb778 holding Value: 435514
In-File  Counter [pid: 8703] @ 7f23e51fb780 holding Value: 435514
In-File  Counter [pid: 8703] @ 7f23e51fb788 holding Value: 6068906
In-File  Counter [pid: 8703] @ 7f23e51fb790 holding Value: 839972
...

From that point on, though, it's been growing with a reasonable speed again. Until it suddenly doubled once again, and again:

...
Index 0 of 4
In-Memory Counter [pid: 103080] @ 2fb6c510 holding Value: 96
Index 1 of 4
In-Memory Counter [pid: 103080] @ 2fb6c518 holding Value: 96
Index 2 of 4
In-Memory Counter [pid: 103080] @ 2fb6c520 holding Value: 620
Index 3 of 4
In-Memory Counter [pid: 103080] @ 2fb6c528 holding Value: 40
In-File  SrcCountersStart: 7f98d42d8a40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8
In-File Calculation [pid: 103080], Counters: 7f98d5934778
In-Sanity-File  SrcData: 7f98d1065d70, DstData: 3567e488, SrcData - SrcDataStart: 1655281
In-File  Counter [pid: 103080] @ 7f98d5934778 holding Value: 773352
In-File  Counter [pid: 103080] @ 7f98d5934780 holding Value: 773354
In-File  Counter [pid: 103080] @ 7f98d5934788 holding Value: 10874925
In-File  Counter [pid: 103080] @ 7f98d5934790 holding Value: 1512757
Index 0 of 4
In-Memory Counter [pid: 103080] @ 2fb6c510 holding Value: 773448
Index 1 of 4
In-Memory Counter [pid: 103080] @ 2fb6c518 holding Value: 773450
Index 2 of 4
In-Memory Counter [pid: 103080] @ 2fb6c520 holding Value: 10875545
Index 3 of 4
In-Memory Counter [pid: 103080] @ 2fb6c528 holding Value: 1512797
In-File  SrcCountersStart: 7f98d42d8a40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8
In-File Calculation [pid: 103080], Counters: 7f98d5934778
In-Sanity-File  SrcData: 7f98d1065d70, DstData: 3567e488, SrcData - SrcDataStart: 1655281
In-File  Counter [pid: 103080] @ 7f98d5934778 holding Value: 773448
In-File  Counter [pid: 103080] @ 7f98d5934780 holding Value: 773450
In-File  Counter [pid: 103080] @ 7f98d5934788 holding Value: 10875545
In-File  Counter [pid: 103080] @ 7f98d5934790 holding Value: 1512797
Index 0 of 4
In-Memory Counter [pid: 103995] @ 2fb6c510 holding Value: 0
Index 1 of 4
In-Memory Counter [pid: 103995] @ 2fb6c518 holding Value: 0
Index 2 of 4
In-Memory Counter [pid: 103995] @ 2fb6c520 holding Value: 0
Index 3 of 4
In-Memory Counter [pid: 103995] @ 2fb6c528 holding Value: 0
In-File  SrcCountersStart: 7f1605f9ea40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8
In-File Calculation [pid: 103995], Counters: 7f16075fa778
In-Sanity-File  SrcData: 7f1602d2bd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281
In-File  Counter [pid: 103995] @ 7f16075fa778 holding Value: 1546896
In-File  Counter [pid: 103995] @ 7f16075fa780 holding Value: 1546900
In-File  Counter [pid: 103995] @ 7f16075fa788 holding Value: 21751090
In-File  Counter [pid: 103995] @ 7f16075fa790 holding Value: 3025594
Index 0 of 4
In-Memory Counter [pid: 103995] @ 2fb6c510 holding Value: 1546896
Index 1 of 4
In-Memory Counter [pid: 103995] @ 2fb6c518 holding Value: 1546900
Index 2 of 4
In-Memory Counter [pid: 103995] @ 2fb6c520 holding Value: 21751090
Index 3 of 4
In-Memory Counter [pid: 103995] @ 2fb6c528 holding Value: 3025594
In-File  SrcCountersStart: 7f1605f9ea40, SrcData->CounterPtr: 2fb6c510, Header->CountersDelta: 2e5107d8
In-File Calculation [pid: 103995], Counters: 7f16075fa778
In-Sanity-File  SrcData: 7f1602d2bd70, DstData: 3567e488, SrcData - SrcDataStart: 1655281
In-File  Counter [pid: 103995] @ 7f16075fa778 holding Value: 1546896
In-File  Counter [pid: 103995] @ 7f16075fa780 holding Value: 1546900
In-File  Counter [pid: 103995] @ 7f16075fa788 holding Value: 21751090
In-File  Counter [pid: 103995] @ 7f16075fa790 holding Value: 3025594
...


This occurrence is more interesting, as I see that merge code gets executed twice. That could explain why numbers get duplicated.
log_1m.txt
7.9 MB View Download
I have a fix idea, fingers crossed...
The fix seems to work, but it revealed another problem. From time to time the following issue is happening:

BrowserTestBase received signal: Terminated. Backtrace:
#0 0x000018672630 base::debug::StackTrace::StackTrace()
#1 0x00001a4e28fe content::(anonymous namespace)::DumpStackTraceSignalHandler()
#2 0x7fc8dcf76030 <unknown>
#3 0x7fc8e3ae62b0 do_fcntl
#4 0x7fc8e3ae63a2 __libc_fcntl
#5 0x00002e466895 lprofOpenFileEx
#6 0x00002e4649ec __llvm_profile_write_file
#7 0x7fc8dcf78910 <unknown>
#8 0x7fc8dcf7896a exit
#9 0x7fc8dcf632b8 __libc_start_main
#10 0x0000088c202a __llvm_coverage_mapping
Parent failed to complete crash dump.
Parent failed to complete crash dump.
Parent failed to complete crash dump.

It doesn't seem to happen too often on the bot, e.g. I see only one occurrence in https://chromium-coverage.appspot.com/reports/566169/linux/metadata/browser_tests.log, but it's still very bad as it may corrupt the dump file and/or reset the counters in there.


I'm testing with %8m now locally (as we do on the bot), but am also looking into ways to prevent SIGTERM from killing the dumping process.
Test processes are getting killed with SIGTERM

    case SIGTERM:
      buffer = "Terminated";
      break;

while __llvm_profile_write_file has guard only against SIGKILL.

I've got another idea though, will try as well.
Owner: mmoroz@chromium.org
Filed  issue 852469  for c#25 and c#26, as I'm not sure if it's a real problem or not.
Project Member

Comment 30 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/611672fbe0b7e44d243692e54e011d48837d111f

commit 611672fbe0b7e44d243692e54e011d48837d111f
Author: Max Moroz <mmoroz@chromium.org>
Date: Wed Jun 13 21:05:51 2018

Code Coverage: Fix concurrent code coverage dumps from content/child processes.

Duplication of code coverage counters is bad, as the values grow exponentially,
which ends up with the counters being overflown when running large test suites
(e.g. browser_tests). Counter Overflow not only causes incorrect coverage to be
reported, but also affects the global coverage merge process, which leads to
flaky and lower than usual code coverage numbers to be reported.

This CL fixes two issues:

1) __llvm_profile_write_file(), which has been used until now, can duplicate
   code coverage counters when it is called twice or more times. Instead, we
   should call __llvm_profile_dump() which will not dump the same counters more
   than once.

2) even though __llvm_profile_dump() prevents us from possible duplications, it
   is not thread safe, as it is supposed to be called from atexit() handler.
   Since we have to call it ourselves from various places, and we already do
   call it from two different sites at least, we still can get duplicated code
   coverage counters. Thus, we have to ensure thread safety ourselves.

Bug:  851230 
Change-Id: Ib3bebc8173a8be9b5622f2449da05afad77d0a11
Reviewed-on: https://chromium-review.googlesource.com/1099336
Commit-Queue: Max Moroz <mmoroz@chromium.org>
Reviewed-by: Antoine Labour <piman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566983}
[modify] https://crrev.com/611672fbe0b7e44d243692e54e011d48837d111f/content/child/child_thread_impl.cc

Status: Fixed (was: Started)
The most recent report looks good: https://chromium-coverage.appspot.com/reports/567015/linux/metadata/browser_tests.log

I think we can declare a victory here, but we'll keep an eye on the further reports getting published.

Sign in to add a comment