multiple CHECK-s within a function are indistinguishable in official build |
|||||||||||
Issue description
It looks like the way CHECKs are implemented today induces the compiler to optimize them so hard in official builds, in a way that makes it impossible to distinguish the original line if there are multiple checks in a function.
What is happening is that the various __builtin_trap() are compiled as a single instruction ("ud2" on x86, "udf ff" on arm) per function, and the CHECKs are compiled as jumps to that label, even with just -O1.
More practically, see below, which is a minified x86_64 repro of the internal thread http://go/xskhh:
-----
/* 01 */ #define LOGGING_CRASH() __builtin_trap()
/* 02 */ #define CHECK(condition) !(condition) ? LOGGING_CRASH() : (void)0
/* 03 */ void func(int x, int y) {
/* 04 */ CHECK(x);
/* 05 */ CHECK(y);
/* 06 */ *((volatile char*)0x42) = 'z';
/* 07 */ CHECK(*((volatile char*)0x43));
/* 08 */ }
-----
$ clang++ -c -g test.cc --std=c++11 -O1 -S -o - | less
-----
__Z4funcii: ## @_Z4funcii
Lfunc_begin0:
.file 1 "test.cc"
.loc 1 3 0 ## test.cc:3:0
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
##DEBUG_VALUE: func:x <- %EDI
##DEBUG_VALUE: func:y <- %ESI
.loc 1 4 12 prologue_end ## test.cc:4:12
Ltmp3:
testl %edi, %edi
je LBB0_4 <-- !!!
## BB#1:
##DEBUG_VALUE: func:y <- %ESI
.loc 1 5 12 ## test.cc:5:12
testl %esi, %esi
je LBB0_4 <-- !!!
## BB#2:
.loc 1 6 36 ## test.cc:6:36
movb $122, 66
.loc 1 7 12 ## test.cc:7:12
cmpb $0, 67
je LBB0_4 <-- !!!
## BB#3:
.loc 1 8 10 ## test.cc:8:10
popq %rbp
retq
LBB0_4:
.loc 1 7 12 ## test.cc:7:12
ud2 <-- this is __builtin_trap()
-----
Note how all the CHECKs are "je LBB0_4".
When "ud2" is hit, it will cause the crash. However, at that point the information about which line caused the crash is lost. ud2 will be attributed to the last line, and since the checks are straight jumps and not calls, there isn't even any frame associated with the crash line.
Something similar happens on Android/arm. See the investigation on the internal thread http://go/xskhh which triggered this bug.
It goes without saying that this causes endless headaches to our developers, like in that internal thread.
I think (but didn't really check) that before crrev.com/2125923002 this was less likely to happen as long as the distance between the CHECK line and BreakDebugger was enough to fit into the je/jne instruction and not require a trampoline.
Now the question for our illustrious C++/compiler peeps.
Is there a way we can tell CHECK to either:
1. not trampoline the __builtin_trap ?
2. make that a call with an actual frame, so we get the right source line in the 2nd frame?
I think 2 is more easily doable, as it would be something similar to what we had before. but then I am not sure whether that is going to introduce the same performance / binary bloat which motivated crrev.com/2125923002 in the first place.
Ideas?
,
Nov 12 2016
The specific implementation of CHECK is what it is based on the blink RELEASE_ASSERT. We went thru multiple iterations to weed out all performance regressions causes by switching blink from CHECK to RELEASE_ASSERT. So I'm concerned about ideas to change it, tho clearly making it more useful is nice. RELEASE_ASSERT is still not using CHECK because we did not figure out why it's giving different performance yet. But I don't want to make that harder. 1) Doing architecture-specific things seems okay to me here. 2) Can you use an inline function or something? See https://bugs.chromium.org/p/chromium/issues/detail?id=599867 And https://groups.google.com/a/chromium.org/d/topic/blink-dev/TL0NkNXIT1w/discussion I might be missing some more discussion. But this is ongoing work that is kinda sidelined at this point.
,
Nov 14 2016
This makes it significantly harder for anyone to understand a subset of CHECK failures in the crash db coming from the field (basically any case where there's more than one CHECK in the same function, as the compiler appears to optimise this indiscriminately as soon as there's two occurrences in the same function). Primiano and I were able to figure out the real situation in this one particular crash but it required quite some investigation and also a level of expertise with understanding assembly that is certainly not universal. I think this is a pretty big problem in practise, and not just a matter of "making it more useful". The reason we omit the log message in official build CHECKs is *because* we believe we can at least tell which CHECK fired from the return address :) An inline function probably works. We could probably put a CL together to invent a not_as_builtin_trap() which just inlines an appropriate undef/trap/breakpoint instruction for each architecture/mode (we can just copy the asm sequence that __builtin_trap() generates on that arch). Just a bit of fiddly effort for someone to put that list together. Setting a few more labels since this doesn't seem to be OS-specific and all of arm/arm64/x86/x86_64 seem to be affected at least.
,
Nov 15 2016
So I just tried two variants here: https://codereview.chromium.org/2502953003 1) use a inline noreturn function: static ALWAYS_INLINE __attribute__((noreturn)) void DoTrap() { #if defined(ARCH_CPU_X86_FAMILY) asm volatile ("ud2"); ... } #define LOGGING_CRASH() DoTrap() #define CHECK(condition) !(condition) ? LOGGING_CRASH() : EAT_STREAM_PARAMETERS 2) put directly the asm in the macro using the compound statement expr, which seems to work also on clang: #if defined(ARCH_CPU_X86_FAMILY) #define LOGGING_CRASH() ({ asm volatile ("ud2"); }) #define CHECK(condition) !(condition) ? LOGGING_CRASH() : EAT_STREAM_PARAMETERS ... Both 1) and 2) produce distinct crash addresses now, which is good. However... Given the following code: ----- CHECK(death_location != 1); CHECK(death_location != 2); printf("\n"); CHECK(death_location != 3); ----- 1) Produces very compact code. I quite like it and I'm pretty convinced (didn't try) that this is equivalent, performance-wise, to what we have today: 0x1001b33b0 <+0>: pushq %rbp 0x1001b33b1 <+1>: movq %rsp, %rbp 0x1001b33b4 <+4>: pushq %rbx 0x1001b33b5 <+5>: pushq %rax 0x1001b33b6 <+6>: movl %edi, %ebx 0x1001b33b8 <+8>: cmpl $0x2, %ebx 0x1001b33bb <+11>: je 0x1001b33da ; <+42> [inlined] logging::DoTrap() at logging_unittest.cc:22 0x1001b33bd <+13>: cmpl $0x1, %ebx 0x1001b33c0 <+16>: je 0x1001b33dc ; <+44> [inlined] logging::DoTrap() at logging_unittest.cc:21 0x1001b33c2 <+18>: movl $0xa, %edi 0x1001b33c7 <+23>: callq 0x10090a41a ; symbol stub for: putchar 0x1001b33cc <+28>: cmpl $0x3, %ebx 0x1001b33cf <+31>: je 0x1001b33d8 ; <+40> [inlined] logging::DoTrap() at logging_unittest.cc:24 0x1001b33d1 <+33>: addq $0x8, %rsp 0x1001b33d5 <+37>: popq %rbx 0x1001b33d6 <+38>: popq %rbp 0x1001b33d7 <+39>: retq 0x1001b33d8 <+40>: ud2 0x1001b33da <+42>: ud2 0x1001b33dc <+44>: ud2 It has a problem though. Even if the ud2 instructions are now distinct, addr2line associates them to the inline funciton Logging::DoTrap() and not the caller. I dunno if there is a way to tell the compiler to put the caller name in the produced symbols. 2) Produces a bit less compact code, as the "ud2" instructions are now interleaved with the flow 0x1001b3c58 <+8>: cmpl $0x2, %ebx 0x1001b3c5b <+11>: je 0x1001b3c72 ; <+34> [inlined] logging::DoTrap() at logging_unittest.cc:22 0x1001b3c5d <+13>: cmpl $0x1, %ebx 0x1001b3c60 <+16>: jne 0x1001b3c82 ; <+50> at logging_unittest.cc:23 0x1001b3c62 <+18>: ud2 0x1001b3c64 <+20>: movb $0x0, 0x0 0x1001b3c6c <+28>: nopl (%rax) 0x1001b3c70 <+32>: jmp 0x1001b3c70 ; <+32> [inlined] logging::DoTrap() + 14 at logging_unittest.cc:21 0x1001b3c72 <+34>: ud2 0x1001b3c74 <+36>: movb $0x0, 0x0 0x1001b3c7c <+44>: nopl (%rax) 0x1001b3c80 <+48>: jmp 0x1001b3c80 ; <+48> [inlined] logging::DoTrap() + 14 at logging_unittest.cc:22 0x1001b3c82 <+50>: movl $0xa, %edi 0x1001b3c87 <+55>: callq 0x10090efda ; symbol stub for: putchar 0x1001b3c8c <+60>: cmpl $0x3, %ebx 0x1001b3c8f <+63>: jne 0x1001b3ca2 ; <+82> at logging_unittest.cc:25 0x1001b3c91 <+65>: ud2 0x1001b3c93 <+67>: movb $0x0, 0x0 0x1001b3c9b <+75>: nopl (%rax,%rax) 0x1001b3ca0 <+80>: jmp 0x1001b3ca0 ; <+80> [inlined] logging::DoTrap() + 15 at logging_unittest.cc:24 And this seems more symbolizer friendly.
,
Nov 21 2016
Ok I think I have a "not too horrifying" solution in https://codereview.chromium.org/2502953003/ The TL;DR of that is #define LOGGING_CRASH(LINE) \ ({ \ # This is a generalized version of mov eax, LINE asm volatile("" : : "r"(LINE) : "memory"); \ __builtin_trap(); \ __builtin_unreachable(); \ }) Essentially, if I prepend a mov eax,line before __builtin_trap, the compiler stops the coalescing, debugging symbols are correctly attributed, at the price of few bytes per call site (the CL has a summary of the binary cost of this CL). On the good side, this doesn't require to maintain architecture-specific #ifdef or assembly. Just for the records, answering myself > I dunno if there is a way to tell the compiler to put the caller name in the produced symbols. This seems to be attribute(artificial) but clang doesn't support it.
,
Nov 23 2016
,
Nov 23 2016
Sorry for being slow to have much input here. I've been really concerned about this creating deadlocks for combining RELEASE_ASSERT and CHECK, and if CHECK is already too slow then what if we change it, etc. I wanted to say that I've come to a perspective that I think perf tests showing such large change with CHECK vs RELEASE_ASSERT points to RELEASE_ASSERT being used in extremely hot code which is probably not needed, rather than to a problem with CHECK. I filed https://bugs.chromium.org/p/chromium/issues/detail?id=668300 to this effect and mentioned it to eae@ in person who was really interested in this, as this hints at moving RELEASE_ASSERTs could provide perf wins for users right now. All that is to say that I'd like to unblock this in my own mind from dealing with RELEASE_ASSERT problems, which should be good news for this. :)
,
Nov 23 2016
just to be clear, this is not about performance. this is about check generating confusing crash reports and people getting stuck on false source line info (see link to internal thread)
,
Nov 23 2016
also, not sure how the current implementation (regardless of this proposal) is slow. in official builds CHECK is just a conditional jump to a trap opcode. I would love to understand where all the "slow" that people claim comes from.
,
Nov 24 2016
From reading the threads, it's not 100% clear, but it appears that the issue is only when compiling with MSVC and so disassembling gcc/clang binaries won't show it. My reading is that MSVC isn't quite doing a good enough job of optimising away the output-streaming part of CHECK() and is touching the stack slightly more than necessary. RELEASE_ASSERT in blink doesn't support the operator<< logging at all, and so doesn't have to use EAT_STREAM_PARAMETERS in the first place, so never causes this problem.
,
Dec 1 2016
It looks like http://crbug.com/660293 is also affected by this. Same symptoms: Windows Official builds, and the failing CHECK statements don't seem to make sense.
,
Jan 6 2017
,
Feb 1 2017
,
Feb 1 2017
#include <stdlib.h>
#define LOGGING_CRASH() __debugbreak()
#define CHECK(condition) !(condition) ? LOGGING_CRASH() : (void)0
void func(int x, int y) {
CHECK(x);
CHECK(y);
*((volatile char*)0x42) = 'z';
CHECK(*((volatile char*)0x43));
}
int main() {
func(rand(), rand());
}
--> cl /Ox /GL and link /LTCG -->
main:
0000000140043E40: 40 53 push rbx
0000000140043E42: 48 83 EC 20 sub rsp,20h
0000000140043E46: E8 69 6C FC FF call rand
0000000140043E4B: 8B D8 mov ebx,eax
0000000140043E4D: E8 62 6C FC FF call rand
0000000140043E52: 85 C0 test eax,eax
0000000140043E54: 75 01 jne 0000000140043E57
0000000140043E56: CC int 3
0000000140043E57: 85 DB test ebx,ebx
0000000140043E59: 75 01 jne 0000000140043E5C
0000000140043E5B: CC int 3
0000000140043E5C: C6 04 25 42 00 00 mov byte ptr [42h],7Ah
00 7A
0000000140043E64: 8A 04 25 43 00 00 mov al,byte ptr [43h]
00
0000000140043E6B: 84 C0 test al,al
0000000140043E6D: 75 01 jne 0000000140043E70
0000000140043E6F: CC int 3
0000000140043E70: 33 C0 xor eax,eax
0000000140043E72: 48 83 C4 20 add rsp,20h
0000000140043E76: 5B pop rbx
0000000140043E77: C3 ret
In particular, there's 3 separate "int 3".
,
Feb 2 2017
,
Feb 2 2017
I wrote https://codereview.chromium.org/2676483002/ for Windows, which will result in desired behaviour and shorter code (7 bytes -> 1 byte for the actual crash). I'm not 100% certain whether it'll change the optimization opportunities the compiler feels it's entitled to in surrounding code though.
,
Feb 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a17c8db528cadca9eef98ce03b0910700735722e commit a17c8db528cadca9eef98ce03b0910700735722e Author: scottmg <scottmg@chromium.org> Date: Wed Feb 15 21:35:49 2017 Make CHECK int 3 on Windows, rather than crash Not being able to distinguish intentional crashes (CHECK) from unintentional ones (other access violations) in terms of exit codes reduces the amount of signal we have in stability metrics. R=wfh@chromium.org BUG= 664209 , 687326 Review-Url: https://codereview.chromium.org/2676483002 Cr-Commit-Position: refs/heads/master@{#450809} [modify] https://crrev.com/a17c8db528cadca9eef98ce03b0910700735722e/base/logging.h [modify] https://crrev.com/a17c8db528cadca9eef98ce03b0910700735722e/base/logging_unittest.cc
,
Feb 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/225319af95b483fc525e08149b4d1d64524dd9c4 commit 225319af95b483fc525e08149b4d1d64524dd9c4 Author: Scott Graham <scottmg@chromium.org> Date: Thu Feb 16 22:14:16 2017 Make CHECK int 3 on Windows, rather than crash Not being able to distinguish intentional crashes (CHECK) from unintentional ones (other access violations) in terms of exit codes reduces the amount of signal we have in stability metrics. R=wfh@chromium.org BUG= 664209 , 687326 Review-Url: https://codereview.chromium.org/2676483002 Cr-Commit-Position: refs/heads/master@{#450809} (cherry picked from commit a17c8db528cadca9eef98ce03b0910700735722e) Review-Url: https://codereview.chromium.org/2695833008 . Cr-Commit-Position: refs/branch-heads/2987@{#564} Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943} [modify] https://crrev.com/225319af95b483fc525e08149b4d1d64524dd9c4/base/logging.h [modify] https://crrev.com/225319af95b483fc525e08149b4d1d64524dd9c4/base/logging_unittest.cc
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8c972d0e190168b4b5621e81563f319563fd0af8 commit 8c972d0e190168b4b5621e81563f319563fd0af8 Author: primiano <primiano@chromium.org> Date: Fri Feb 17 21:08:50 2017 base: make CHECK macros trap at distinct addresses in official builds Abstract -------- CHECK() is the macro used all over the places (~4K occurrences without counting for dupes due to inlining) for release-time assertions. It is enabled in a minimal form (crash without a message) in official builds. It needs to be fast as it is used in lot of fastpath. It needs to not emit too much code, as it is used in lot of places. It needs to guarantee that crash reports can pinpoint to the right location when hitting a CHECK. - Back in the days CHECK was fat, slow, but crash-friendly. - After crrev.com/1982123002 it became tiny, fast but not crash friendly. - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. The problem this CL deals with is the case of multiple CHECK()s within the same function. Also, adds a test that covers Mac, Linux and Android. A bit of history: ----------------- Before crrev.com/1982123002 (reverted and later re-landed in crrev.com/2125923002) CHECK() in official builds was essentially: if (!condition) BreakDebugger() It was later found that this approach was not efficient, both in terms of binary size and performance. More importantly it was a regression w.r.t. blink's assert that were later switched to CHECK(). "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" The major reason for this is DebuggerBreak being quite complex and not treated as noreturn by the compiler. It seems (see crbug.com/664209 for more) that the most efficient way to handle these checks is ending up with: Source code: CHECK(cond1); CHECK(cond2); ... Ideal assembly: (ideal for perf and binary, but not for crash reports, more below) compare_opcode cond1; jump_if_zero prologue; compare_opcode cond2; jump_if_zero prologue; ... prologue: trap_instruction(s) Rather than something like: compare_opcode cond1; jump_if_NOT_zero next1; trap_instruction(s) next1: compare_opcode cond2; jump_if_NOT_zero next2; trap_instruction(s) next2: ... Where essentially the trap instructions are interleaved within the main execution flow. This is even worse if the trap instruction is actually a call to a function, with annex frame initialization, as in the case of BreakDebugger(). That bloats the binary and reduces i-cache hotness for the main flow. crrev.com/1982123002 recently fixed the situation making the assembly look like the ideal case above. Unfortunately this caused another problem due the extreme optimization: once the program crashes in "trap_instruction(s)", there is no easy way to tell which condition caused the trap. In practice this translates into the inability of tell which CHECK failed in a function that has more than one check. This CL: -------- Re-addresses crrev.com/2125923002, adding an extra instruction after the trap which creates an opcode with a unique counter. This prevents the compiler from folding the trap instructions, still applying no-return optimizations. Also by doing this the various prologue get properly attributed to the CHECK line in debugging symbols. Binary size inflation on official builds: ----------------------------------------- Android arm: 48684276 -> 48712948 = 28 K Android arm64: 85611800 -> 85665048 = 53 K Android x86_64: 91904944 -> 91933616 = 28 K Linux x86_64: 124219488 -> 124346464 = 124 K (Android build with -Os, hence why the difference between the two) BUG= 664209 , 599867 Review-Url: https://codereview.chromium.org/2502953003 Cr-Commit-Position: refs/heads/master@{#451381} [modify] https://crrev.com/8c972d0e190168b4b5621e81563f319563fd0af8/base/logging.h [modify] https://crrev.com/8c972d0e190168b4b5621e81563f319563fd0af8/base/logging_unittest.cc
,
Feb 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c98a8c2c794b10f70fc22e02128139252a0b0c1 commit 4c98a8c2c794b10f70fc22e02128139252a0b0c1 Author: alph <alph@chromium.org> Date: Fri Feb 17 21:29:15 2017 Revert of base: make CHECK macros trap at distinct addresses in official builds (patchset #10 id:180001 of https://codereview.chromium.org/2502953003/ ) Reason for revert: Broke compile https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20ChromeOS/builds/29509 Original issue's description: > base: make CHECK macros trap at distinct addresses in official builds > > Abstract > -------- > CHECK() is the macro used all over the places (~4K occurrences without > counting for dupes due to inlining) for release-time assertions. > It is enabled in a minimal form (crash without a message) in official builds. > It needs to be fast as it is used in lot of fastpath. > It needs to not emit too much code, as it is used in lot of places. > It needs to guarantee that crash reports can pinpoint to the right > location when hitting a CHECK. > > - Back in the days CHECK was fat, slow, but crash-friendly. > - After crrev.com/1982123002 it became tiny, fast but not crash friendly. > - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. > > The problem this CL deals with is the case of multiple CHECK()s within > the same function. Also, adds a test that covers Mac, Linux and Android. > > A bit of history: > ----------------- > Before crrev.com/1982123002 (reverted and later re-landed in > crrev.com/2125923002) CHECK() in official builds was essentially: > if (!condition) BreakDebugger() > It was later found that this approach was not efficient, both in terms > of binary size and performance. More importantly it was a regression > w.r.t. blink's assert that were later switched to CHECK(). > "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" > The major reason for this is DebuggerBreak being quite complex and > not treated as noreturn by the compiler. > It seems (see crbug.com/664209 for more) that the most efficient way to > handle these checks is ending up with: > > Source code: > CHECK(cond1); > CHECK(cond2); > ... > > Ideal assembly: > (ideal for perf and binary, but not for crash reports, more below) > compare_opcode cond1; > jump_if_zero prologue; > compare_opcode cond2; > jump_if_zero prologue; > ... > prologue: > trap_instruction(s) > > Rather than something like: > compare_opcode cond1; > jump_if_NOT_zero next1; > trap_instruction(s) > next1: > compare_opcode cond2; > jump_if_NOT_zero next2; > trap_instruction(s) > next2: > ... > Where essentially the trap instructions are interleaved within the > main execution flow. This is even worse if the trap instruction is > actually a call to a function, with annex frame initialization, > as in the case of BreakDebugger(). That bloats the binary and > reduces i-cache hotness for the main flow. > > crrev.com/1982123002 recently fixed the situation making the > assembly look like the ideal case above. Unfortunately this caused > another problem due the extreme optimization: once the program > crashes in "trap_instruction(s)", there is no easy way to tell which condition > caused the trap. In practice this translates into the inability of > tell which CHECK failed in a function that has more than one check. > > This CL: > -------- > Re-addresses crrev.com/2125923002, adding an extra instruction after > the trap which creates an opcode with a unique counter. This prevents > the compiler from folding the trap instructions, still applying no-return > optimizations. > Also by doing this the various prologue get properly attributed to the > CHECK line in debugging symbols. > > Binary size inflation on official builds: > ----------------------------------------- > Android arm: 48684276 -> 48712948 = 28 K > Android arm64: 85611800 -> 85665048 = 53 K > Android x86_64: 91904944 -> 91933616 = 28 K > Linux x86_64: 124219488 -> 124346464 = 124 K > (Android build with -Os, hence why the difference between the two) > > BUG= 664209 , 599867 > > Review-Url: https://codereview.chromium.org/2502953003 > Cr-Commit-Position: refs/heads/master@{#451381} > Committed: https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f319563fd0af8 TBR=thakis@chromium.org,torne@chromium.org,brettw@chromium.org,danakj@chromium.org,mark@chromium.org,primiano@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 664209 , 599867 Review-Url: https://codereview.chromium.org/2706453004 Cr-Commit-Position: refs/heads/master@{#451384} [modify] https://crrev.com/4c98a8c2c794b10f70fc22e02128139252a0b0c1/base/logging.h [modify] https://crrev.com/4c98a8c2c794b10f70fc22e02128139252a0b0c1/base/logging_unittest.cc
,
Feb 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f53167270c44da971a2f085299404319e42756f4 commit f53167270c44da971a2f085299404319e42756f4 Author: primiano <primiano@chromium.org> Date: Tue Feb 21 13:09:26 2017 [Reland] base: make CHECK macros trap at distinct addresses in official builds Original CL: https://crrev.com/2502953003 Revert: https://crrev.com/2706453004 Reason for relanding: masked out OS_NACL which was failing to build because of this error: "GNU-style inline assembly is disabled" Original CL Description: > Abstract > -------- > CHECK() is the macro used all over the places (~4K occurrences without > counting for dupes due to inlining) for release-time assertions. > It is enabled in a minimal form (crash without a message) in official builds. > It needs to be fast as it is used in lot of fastpath. > It needs to not emit too much code, as it is used in lot of places. > It needs to guarantee that crash reports can pinpoint to the right > location when hitting a CHECK. > > - Back in the days CHECK was fat, slow, but crash-friendly. > - After crrev.com/1982123002 it became tiny, fast but not crash friendly. > - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly. > > The problem this CL deals with is the case of multiple CHECK()s within > the same function. Also, adds a test that covers Mac, Linux and Android. > > A bit of history: > ----------------- > Before crrev.com/1982123002 (reverted and later re-landed in > crrev.com/2125923002) CHECK() in official builds was essentially: > if (!condition) BreakDebugger() > It was later found that this approach was not efficient, both in terms > of binary size and performance. More importantly it was a regression > w.r.t. blink's assert that were later switched to CHECK(). > "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation" > The major reason for this is DebuggerBreak being quite complex and > not treated as noreturn by the compiler. > It seems (see crbug.com/664209 for more) that the most efficient way to > handle these checks is ending up with: > > Source code: > CHECK(cond1); > CHECK(cond2); > ... > > Ideal assembly: > (ideal for perf and binary, but not for crash reports, more below) > compare_opcode cond1; > jump_if_zero prologue; > compare_opcode cond2; > jump_if_zero prologue; > ... > prologue: > trap_instruction(s) > > Rather than something like: > compare_opcode cond1; > jump_if_NOT_zero next1; > trap_instruction(s) > next1: > compare_opcode cond2; > jump_if_NOT_zero next2; > trap_instruction(s) > next2: > ... > Where essentially the trap instructions are interleaved within the > main execution flow. This is even worse if the trap instruction is > actually a call to a function, with annex frame initialization, > as in the case of BreakDebugger(). That bloats the binary and > reduces i-cache hotness for the main flow. > > crrev.com/1982123002 recently fixed the situation making the > assembly look like the ideal case above. Unfortunately this caused > another problem due the extreme optimization: once the program > crashes in "trap_instruction(s)", there is no easy way to tell which condition > caused the trap. In practice this translates into the inability of > tell which CHECK failed in a function that has more than one check. > > This CL: > -------- > Re-addresses crrev.com/2125923002, adding an extra instruction after > the trap which creates an opcode with a unique counter. This prevents > the compiler from folding the trap instructions, still applying no-return > optimizations. > Also by doing this the various prologue get properly attributed to the > CHECK line in debugging symbols. > > Binary size inflation on official builds: > ----------------------------------------- > Android arm: 48684276 -> 48712948 = 28 K > Android arm64: 85611800 -> 85665048 = 53 K > Android x86_64: 91904944 -> 91933616 = 28 K > Linux x86_64: 124219488 -> 124346464 = 124 K > (Android build with -Os, hence why the difference between the two) > BUG= 664209 , 599867 TBR=mark@chromium.org Review-Url: https://codereview.chromium.org/2705053002 Cr-Commit-Position: refs/heads/master@{#451749} [modify] https://crrev.com/f53167270c44da971a2f085299404319e42756f4/base/logging.h [modify] https://crrev.com/f53167270c44da971a2f085299404319e42756f4/base/logging_unittest.cc
,
Feb 21 2017
I haven't read all the comments and CLs here, but I think the right thing to do here would be to stop merging __builtin_trap in the backend or add something like __builtin_unique_trap that can't be tail merged. I think all we have to do is set the "convergent" attribute on the trap instruction and call it a day.
,
Feb 21 2017
Re #22: Honestly from a compiler perspective, I was quite surprised that __builtin_trap is merged in the first place. I would expect that (not merging) to be the default behavior. the doc says "__builtin_trap function causes the program to exit abnormally" and, I might be biased, but I would expect to be able to reconstruct the state of execution from the coredump in the case of an abnormal program termination. I was also expecting that "asm volatile (int3)" doesn't get merged due to the volatile. But apparently volatile is also not enough. From a chrome perspective, we still use GCC on Android, so even if you fix clang, we still won't be able to fully rely on that :/
,
Feb 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6a23306766d02a480c7992f91b51e9691d6511b8 commit 6a23306766d02a480c7992f91b51e9691d6511b8 Author: scottmg <scottmg@chromium.org> Date: Tue Feb 21 23:52:14 2017 Shorten IMMEDIATE_CRASH() code size on win clang R=thakis@chromium.org, primiano@chromium.org BUG= 664209 , 694670 Review-Url: https://codereview.chromium.org/2710703003 Cr-Commit-Position: refs/heads/master@{#451857} [modify] https://crrev.com/6a23306766d02a480c7992f91b51e9691d6511b8/base/logging.h
,
Feb 28 2017
Looks like it stuck. Marking as fixed.
,
Feb 28 2017
Thanks for perservering, Primiano!
,
Nov 29
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by torne@chromium.org
, Nov 10 2016I tested using the blunt instrument of replacing __builtin_trap() on ARM with: asm volatile ("udf 0xff") and that appears to make the compiler do what we want and actually insert individual traps. Significant downsides: 1) architecture specific, obviously 2) asm is a statement, and so putting it into the ternary requires using the gcc extension ({ }) to put a statement inside an expression :/