New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 664209 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner: ----
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

multiple CHECK-s within a function are indistinguishable in official build

Project Member Reported by primiano@chromium.org, Nov 10 2016

Issue description

It looks like the way CHECKs are implemented today induces the compiler to optimize them so hard in official builds, in a way that makes it impossible to distinguish the original line if there are multiple checks in a function.
What is happening is that the various __builtin_trap() are compiled as a single instruction ("ud2" on x86, "udf ff" on arm) per function, and the CHECKs are compiled as jumps to that label, even with just -O1.

More practically, see below, which is a minified x86_64 repro of the internal thread http://go/xskhh:
-----
/* 01 */ #define LOGGING_CRASH() __builtin_trap()
/* 02 */ #define CHECK(condition) !(condition) ? LOGGING_CRASH() : (void)0
/* 03 */ void func(int x, int y) {
/* 04 */   CHECK(x);
/* 05 */   CHECK(y);
/* 06 */   *((volatile char*)0x42) = 'z';
/* 07 */   CHECK(*((volatile char*)0x43));
/* 08 */ }
-----

$ clang++ -c -g test.cc --std=c++11 -O1 -S -o - | less
-----
__Z4funcii:                             ## @_Z4funcii
Lfunc_begin0:
        .file   1 "test.cc"
        .loc    1 3 0                   ## test.cc:3:0
        .cfi_startproc
## BB#0:
        pushq   %rbp
Ltmp0:
        .cfi_def_cfa_offset 16
Ltmp1:
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
Ltmp2:
        .cfi_def_cfa_register %rbp
        ##DEBUG_VALUE: func:x <- %EDI
        ##DEBUG_VALUE: func:y <- %ESI
        .loc    1 4 12 prologue_end     ## test.cc:4:12
Ltmp3:
        testl   %edi, %edi
        je      LBB0_4          <-- !!!
## BB#1:
        ##DEBUG_VALUE: func:y <- %ESI
        .loc    1 5 12                  ## test.cc:5:12
        testl   %esi, %esi
        je      LBB0_4          <-- !!!
## BB#2:
        .loc    1 6 36                  ## test.cc:6:36
        movb    $122, 66
        .loc    1 7 12                  ## test.cc:7:12
        cmpb    $0, 67
        je      LBB0_4          <-- !!!
## BB#3:
        .loc    1 8 10                  ## test.cc:8:10
        popq    %rbp
        retq
LBB0_4:
        .loc    1 7 12                  ## test.cc:7:12
        ud2                <-- this is __builtin_trap()
-----

Note how all the CHECKs are "je  LBB0_4".
When "ud2" is hit, it will cause the crash. However, at that point the information about which line caused the crash is lost. ud2 will be attributed to the last line, and since the checks are straight jumps and not calls, there isn't even any frame associated with the crash line.
Something similar happens on Android/arm. See the investigation on the internal thread http://go/xskhh which triggered this bug.

It goes without saying that this causes endless headaches to our developers, like in that internal thread.

I think (but didn't really check) that before crrev.com/2125923002 this was less likely to happen as long as the distance between the CHECK line and BreakDebugger was enough to fit into the je/jne instruction and not require a trampoline.

Now the question for our illustrious C++/compiler peeps.
Is there a way we can tell CHECK to either:
1. not trampoline the __builtin_trap ?
2. make that a call with an actual frame, so we get the right source line in the 2nd frame?

I think 2 is more easily doable, as it would be something similar to what we had before. but then I am not sure whether that is going to introduce the same performance / binary bloat which motivated crrev.com/2125923002 in the first place.

Ideas?
 

Comment 1 by torne@chromium.org, Nov 10 2016

I tested using the blunt instrument of replacing __builtin_trap() on ARM with:
asm volatile ("udf 0xff")
and that appears to make the compiler do what we want and actually insert individual traps.  Significant downsides:
1) architecture specific, obviously
2) asm is a statement, and so putting it into the ternary requires using the gcc extension ({ }) to put a statement inside an expression :/

Comment 2 by danakj@chromium.org, Nov 12 2016

Cc: koten...@yandex-team.ru brettw@chromium.org
The specific implementation of CHECK is what it is based on the blink RELEASE_ASSERT. We went thru multiple iterations to weed out all performance regressions causes by switching blink from CHECK to RELEASE_ASSERT. So I'm concerned about ideas to change it, tho clearly making it more useful is nice.

RELEASE_ASSERT is still not using CHECK because we did not figure out why it's giving different performance yet. But I don't want to make that harder.

1) Doing architecture-specific things seems okay to me here.
2) Can you use an inline function or something?

See https://bugs.chromium.org/p/chromium/issues/detail?id=599867
And https://groups.google.com/a/chromium.org/d/topic/blink-dev/TL0NkNXIT1w/discussion

I might be missing some more discussion. But this is ongoing work that is kinda sidelined at this point.

Comment 3 by torne@chromium.org, Nov 14 2016

Labels: Arch-All OS-All
Status: Untriaged (was: Unconfirmed)
This makes it significantly harder for anyone to understand a subset of CHECK failures in the crash db coming from the field (basically any case where there's more than one CHECK in the same function, as the compiler appears to optimise this indiscriminately as soon as there's two occurrences in the same function).

Primiano and I were able to figure out the real situation in this one particular crash but it required quite some investigation and also a level of expertise with understanding assembly that is certainly not universal.

I think this is a pretty big problem in practise, and not just a matter of "making it more useful". The reason we omit the log message in official build CHECKs is *because* we believe we can at least tell which CHECK fired from the return address :)

An inline function probably works. We could probably put a CL together to invent a not_as_builtin_trap() which just inlines an appropriate undef/trap/breakpoint instruction for each architecture/mode (we can just copy the asm sequence that __builtin_trap() generates on that arch). Just a bit of fiddly effort for someone to put that list together.

Setting a few more labels since this doesn't seem to be OS-specific and all of arm/arm64/x86/x86_64 seem to be affected at least.
So I just tried two variants here:
https://codereview.chromium.org/2502953003


1)
use a inline noreturn function:
static ALWAYS_INLINE  __attribute__((noreturn)) void DoTrap() {
#if defined(ARCH_CPU_X86_FAMILY)
  asm volatile ("ud2");
...
}
#define LOGGING_CRASH() DoTrap()
#define CHECK(condition) !(condition) ? LOGGING_CRASH() : EAT_STREAM_PARAMETERS


2)
put directly the asm in the macro using the compound statement expr, which seems to work also on clang:

#if defined(ARCH_CPU_X86_FAMILY)
#define LOGGING_CRASH() ({ asm volatile ("ud2"); })
#define CHECK(condition) !(condition) ? LOGGING_CRASH() : EAT_STREAM_PARAMETERS
...

Both 1) and 2) produce distinct crash addresses now, which is good. However...

Given the following code:
-----
  CHECK(death_location != 1);
  CHECK(death_location != 2);
  printf("\n");
  CHECK(death_location != 3);
-----

1) Produces very compact code. I quite like it and I'm pretty convinced (didn't try) that this is equivalent, performance-wise, to what we have today:

0x1001b33b0 <+0>:  pushq  %rbp
0x1001b33b1 <+1>:  movq   %rsp, %rbp
0x1001b33b4 <+4>:  pushq  %rbx
0x1001b33b5 <+5>:  pushq  %rax
0x1001b33b6 <+6>:  movl   %edi, %ebx
0x1001b33b8 <+8>:  cmpl   $0x2, %ebx
0x1001b33bb <+11>: je     0x1001b33da               ; <+42> [inlined] logging::DoTrap() at logging_unittest.cc:22
0x1001b33bd <+13>: cmpl   $0x1, %ebx
0x1001b33c0 <+16>: je     0x1001b33dc               ; <+44> [inlined] logging::DoTrap() at logging_unittest.cc:21
0x1001b33c2 <+18>: movl   $0xa, %edi
0x1001b33c7 <+23>: callq  0x10090a41a               ; symbol stub for: putchar
0x1001b33cc <+28>: cmpl   $0x3, %ebx
0x1001b33cf <+31>: je     0x1001b33d8               ; <+40> [inlined] logging::DoTrap() at logging_unittest.cc:24
0x1001b33d1 <+33>: addq   $0x8, %rsp
0x1001b33d5 <+37>: popq   %rbx
0x1001b33d6 <+38>: popq   %rbp
0x1001b33d7 <+39>: retq
0x1001b33d8 <+40>: ud2
0x1001b33da <+42>: ud2
0x1001b33dc <+44>: ud2

It has a problem though. Even if the ud2 instructions are now distinct, addr2line associates them to the inline funciton Logging::DoTrap() and not the caller. I dunno if there is a way to tell the compiler to put the caller name in the produced symbols.

2) Produces a bit less compact code, as the "ud2" instructions are now interleaved with the flow
0x1001b3c58 <+8>:  cmpl   $0x2, %ebx
0x1001b3c5b <+11>: je     0x1001b3c72               ; <+34> [inlined] logging::DoTrap() at logging_unittest.cc:22
0x1001b3c5d <+13>: cmpl   $0x1, %ebx
0x1001b3c60 <+16>: jne    0x1001b3c82               ; <+50> at logging_unittest.cc:23
0x1001b3c62 <+18>: ud2
0x1001b3c64 <+20>: movb   $0x0, 0x0
0x1001b3c6c <+28>: nopl   (%rax)
0x1001b3c70 <+32>: jmp    0x1001b3c70               ; <+32> [inlined] logging::DoTrap() + 14 at logging_unittest.cc:21
0x1001b3c72 <+34>: ud2
0x1001b3c74 <+36>: movb   $0x0, 0x0
0x1001b3c7c <+44>: nopl   (%rax)
0x1001b3c80 <+48>: jmp    0x1001b3c80               ; <+48> [inlined] logging::DoTrap() + 14 at logging_unittest.cc:22
0x1001b3c82 <+50>: movl   $0xa, %edi
0x1001b3c87 <+55>: callq  0x10090efda               ; symbol stub for: putchar
0x1001b3c8c <+60>: cmpl   $0x3, %ebx
0x1001b3c8f <+63>: jne    0x1001b3ca2               ; <+82> at logging_unittest.cc:25
0x1001b3c91 <+65>: ud2
0x1001b3c93 <+67>: movb   $0x0, 0x0
0x1001b3c9b <+75>: nopl   (%rax,%rax)
0x1001b3ca0 <+80>: jmp    0x1001b3ca0               ; <+80> [inlined] logging::DoTrap() + 15 at logging_unittest.cc:24

And this seems more symbolizer friendly.
Ok I think I have a "not too horrifying" solution in https://codereview.chromium.org/2502953003/
The TL;DR of that is
#define LOGGING_CRASH(LINE)                    \
  ({                                           \
    # This is a generalized version of mov eax, LINE
    asm volatile("" : : "r"(LINE) : "memory"); \
    __builtin_trap();                          \
    __builtin_unreachable();                   \
  })

Essentially, if I prepend a mov eax,line before __builtin_trap, the compiler stops the coalescing, debugging symbols are correctly attributed, at the price of few bytes per call site (the CL has a summary of the binary cost of this CL).
On the good side, this doesn't require to maintain architecture-specific #ifdef or assembly.

Just for the records, answering myself
> I dunno if there is a way to tell the compiler to put the caller name in the produced symbols. 
This seems to be attribute(artificial) but clang doesn't support it.

Comment 6 by vakh@chromium.org, Nov 23 2016

Cc: vakh@chromium.org

Comment 7 by danakj@chromium.org, Nov 23 2016

Sorry for being slow to have much input here. I've been really concerned about this creating deadlocks for combining RELEASE_ASSERT and CHECK, and if CHECK is already too slow then what if we change it, etc. I wanted to say that I've come to a perspective that I think perf tests showing such large change with CHECK vs RELEASE_ASSERT points to RELEASE_ASSERT being used in extremely hot code which is probably not needed, rather than to a problem with CHECK. I filed https://bugs.chromium.org/p/chromium/issues/detail?id=668300 to this effect and mentioned it to eae@ in person who was really interested in this, as this hints at moving RELEASE_ASSERTs could provide perf wins for users right now.

All that is to say that I'd like to unblock this in my own mind from dealing with RELEASE_ASSERT problems, which should be good news for this. :)
just to be clear, this is not about performance. this is about check generating confusing crash reports and people getting stuck on false source line info (see link to internal thread) 
also, not sure how the current implementation (regardless of this proposal)  is slow. 
in official builds CHECK  is just a conditional jump to a trap opcode. 
I would love to understand where all the "slow"  that people claim comes from. 

Comment 10 by torne@chromium.org, Nov 24 2016

From reading the threads, it's not 100% clear, but it appears that the issue is only when compiling with MSVC and so disassembling gcc/clang binaries won't show it. My reading is that MSVC isn't quite doing a good enough job of optimising away the output-streaming part of CHECK() and is touching the stack slightly more than necessary. RELEASE_ASSERT in blink doesn't support the operator<< logging at all, and so doesn't have to use EAT_STREAM_PARAMETERS in the first place, so never causes this problem.

Comment 11 by vakh@chromium.org, Dec 1 2016

Blocking: 660293
It looks like http://crbug.com/660293 is also affected by this.
Same symptoms: Windows Official builds, and the failing CHECK statements don't seem to make sense.

Comment 12 by vakh@chromium.org, Jan 6 2017

Blocking: -660293
Re #c11 -- I was wrong. Please ignore.
Cc: scottmg@chromium.org
#include <stdlib.h>

#define LOGGING_CRASH() __debugbreak()
#define CHECK(condition) !(condition) ? LOGGING_CRASH() : (void)0
void func(int x, int y) {
  CHECK(x);
  CHECK(y);
  *((volatile char*)0x42) = 'z';
  CHECK(*((volatile char*)0x43));
}

int main() {
  func(rand(), rand());
}

-->  cl /Ox /GL and link /LTCG  -->

main:
  0000000140043E40: 40 53              push        rbx
  0000000140043E42: 48 83 EC 20        sub         rsp,20h
  0000000140043E46: E8 69 6C FC FF     call        rand
  0000000140043E4B: 8B D8              mov         ebx,eax
  0000000140043E4D: E8 62 6C FC FF     call        rand
  0000000140043E52: 85 C0              test        eax,eax
  0000000140043E54: 75 01              jne         0000000140043E57
  0000000140043E56: CC                 int         3
  0000000140043E57: 85 DB              test        ebx,ebx
  0000000140043E59: 75 01              jne         0000000140043E5C
  0000000140043E5B: CC                 int         3
  0000000140043E5C: C6 04 25 42 00 00  mov         byte ptr [42h],7Ah
                    00 7A
  0000000140043E64: 8A 04 25 43 00 00  mov         al,byte ptr [43h]
                    00
  0000000140043E6B: 84 C0              test        al,al
  0000000140043E6D: 75 01              jne         0000000140043E70
  0000000140043E6F: CC                 int         3
  0000000140043E70: 33 C0              xor         eax,eax
  0000000140043E72: 48 83 C4 20        add         rsp,20h
  0000000140043E76: 5B                 pop         rbx
  0000000140043E77: C3                 ret


In particular, there's 3 separate "int 3".
Cc: dcheng@chromium.org
I wrote https://codereview.chromium.org/2676483002/ for Windows, which will result in desired behaviour and shorter code (7 bytes -> 1 byte for the actual crash).

I'm not 100% certain whether it'll change the optimization opportunities the compiler feels it's entitled to in surrounding code though.
Project Member

Comment 17 by bugdroid1@chromium.org, Feb 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a17c8db528cadca9eef98ce03b0910700735722e

commit a17c8db528cadca9eef98ce03b0910700735722e
Author: scottmg <scottmg@chromium.org>
Date: Wed Feb 15 21:35:49 2017

Make CHECK int 3 on Windows, rather than crash

Not being able to distinguish intentional crashes (CHECK) from unintentional ones (other access violations) in terms of exit codes reduces the amount of signal we have in stability metrics.

R=wfh@chromium.org
BUG= 664209 , 687326 

Review-Url: https://codereview.chromium.org/2676483002
Cr-Commit-Position: refs/heads/master@{#450809}

[modify] https://crrev.com/a17c8db528cadca9eef98ce03b0910700735722e/base/logging.h
[modify] https://crrev.com/a17c8db528cadca9eef98ce03b0910700735722e/base/logging_unittest.cc

Project Member

Comment 18 by bugdroid1@chromium.org, Feb 16 2017

Labels: merge-merged-2987
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/225319af95b483fc525e08149b4d1d64524dd9c4

commit 225319af95b483fc525e08149b4d1d64524dd9c4
Author: Scott Graham <scottmg@chromium.org>
Date: Thu Feb 16 22:14:16 2017

Make CHECK int 3 on Windows, rather than crash

Not being able to distinguish intentional crashes (CHECK) from unintentional ones (other access violations) in terms of exit codes reduces the amount of signal we have in stability metrics.

R=wfh@chromium.org
BUG= 664209 , 687326 

Review-Url: https://codereview.chromium.org/2676483002
Cr-Commit-Position: refs/heads/master@{#450809}
(cherry picked from commit a17c8db528cadca9eef98ce03b0910700735722e)

Review-Url: https://codereview.chromium.org/2695833008 .
Cr-Commit-Position: refs/branch-heads/2987@{#564}
Cr-Branched-From: ad51088c0e8776e8dcd963dbe752c4035ba6dab6-refs/heads/master@{#444943}

[modify] https://crrev.com/225319af95b483fc525e08149b4d1d64524dd9c4/base/logging.h
[modify] https://crrev.com/225319af95b483fc525e08149b4d1d64524dd9c4/base/logging_unittest.cc

Project Member

Comment 19 by bugdroid1@chromium.org, Feb 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8c972d0e190168b4b5621e81563f319563fd0af8

commit 8c972d0e190168b4b5621e81563f319563fd0af8
Author: primiano <primiano@chromium.org>
Date: Fri Feb 17 21:08:50 2017

base: make CHECK macros trap at distinct addresses in official builds

Abstract
--------
CHECK() is the macro used all over the places (~4K occurrences without
counting for dupes due to inlining) for release-time assertions.
It is enabled in a minimal form (crash without a message) in official builds.
It needs to be fast as it is used in lot of fastpath.
It needs to not emit too much code, as it is used in lot of places.
It needs to guarantee that crash reports can pinpoint to the right
location when hitting a CHECK.

- Back in the days CHECK was fat, slow, but crash-friendly.
- After crrev.com/1982123002 it became tiny, fast but not crash friendly.
- This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.

The problem this CL deals with is the case of multiple CHECK()s within
the same function. Also, adds a test that covers Mac, Linux and Android.

A bit of history:
-----------------
Before crrev.com/1982123002 (reverted and later re-landed in
crrev.com/2125923002) CHECK() in official builds was essentially:
if (!condition) BreakDebugger()
It was later found that this approach was not efficient, both in terms
of binary size and performance. More importantly it was a regression
w.r.t. blink's assert that were later switched to CHECK().
"[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
The major reason for this is DebuggerBreak being quite complex and
not treated as noreturn by the compiler.
It seems (see  crbug.com/664209  for more) that the most efficient way to
handle these checks is ending up with:

Source code:
  CHECK(cond1);
  CHECK(cond2);
  ...

Ideal assembly:
(ideal for perf and binary, but not for crash reports, more below)
    compare_opcode  cond1;
    jump_if_zero    prologue;
    compare_opcode  cond2;
    jump_if_zero    prologue;
    ...
 prologue:
    trap_instruction(s)

Rather than something like:
    compare_opcode    cond1;
    jump_if_NOT_zero  next1;
    trap_instruction(s)
next1:
    compare_opcode    cond2;
    jump_if_NOT_zero  next2;
    trap_instruction(s)
next2:
    ...
Where essentially the trap instructions are interleaved within the
main execution flow. This is even worse if the trap instruction is
actually a call to a function, with annex frame initialization,
as in the case of BreakDebugger(). That bloats the binary and
reduces i-cache hotness for the main flow.

crrev.com/1982123002 recently fixed the situation making the
assembly look like the ideal case above. Unfortunately this caused
another problem due the extreme optimization: once the program
crashes in "trap_instruction(s)", there is no easy way to tell which condition
caused the trap. In practice this translates into the inability of
tell which CHECK failed in a function that has more than one check.

This CL:
--------
Re-addresses crrev.com/2125923002, adding an extra instruction after
the trap which creates an opcode with a unique counter. This prevents
the compiler from folding the trap instructions, still applying no-return
optimizations.
Also by doing this the various prologue get properly attributed to the
CHECK line in debugging symbols.

Binary size inflation on official builds:
-----------------------------------------
Android arm:    48684276  -> 48712948  = 28 K
Android arm64:  85611800  -> 85665048  = 53 K
Android x86_64:  91904944 -> 91933616 = 28 K
Linux x86_64: 124219488 -> 124346464 = 124 K
(Android build with -Os, hence why the difference between the two)

BUG= 664209 , 599867 

Review-Url: https://codereview.chromium.org/2502953003
Cr-Commit-Position: refs/heads/master@{#451381}

[modify] https://crrev.com/8c972d0e190168b4b5621e81563f319563fd0af8/base/logging.h
[modify] https://crrev.com/8c972d0e190168b4b5621e81563f319563fd0af8/base/logging_unittest.cc

Project Member

Comment 20 by bugdroid1@chromium.org, Feb 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4c98a8c2c794b10f70fc22e02128139252a0b0c1

commit 4c98a8c2c794b10f70fc22e02128139252a0b0c1
Author: alph <alph@chromium.org>
Date: Fri Feb 17 21:29:15 2017

Revert of base: make CHECK macros trap at distinct addresses in official builds (patchset #10 id:180001 of https://codereview.chromium.org/2502953003/ )

Reason for revert:
Broke compile https://build.chromium.org/p/chromium.chrome/builders/Google%20Chrome%20ChromeOS/builds/29509

Original issue's description:
> base: make CHECK macros trap at distinct addresses in official builds
>
> Abstract
> --------
> CHECK() is the macro used all over the places (~4K occurrences without
> counting for dupes due to inlining) for release-time assertions.
> It is enabled in a minimal form (crash without a message) in official builds.
> It needs to be fast as it is used in lot of fastpath.
> It needs to not emit too much code, as it is used in lot of places.
> It needs to guarantee that crash reports can pinpoint to the right
> location when hitting a CHECK.
>
> - Back in the days CHECK was fat, slow, but crash-friendly.
> - After crrev.com/1982123002 it became tiny, fast but not crash friendly.
> - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.
>
> The problem this CL deals with is the case of multiple CHECK()s within
> the same function. Also, adds a test that covers Mac, Linux and Android.
>
> A bit of history:
> -----------------
> Before crrev.com/1982123002 (reverted and later re-landed in
> crrev.com/2125923002) CHECK() in official builds was essentially:
> if (!condition) BreakDebugger()
> It was later found that this approach was not efficient, both in terms
> of binary size and performance. More importantly it was a regression
> w.r.t. blink's assert that were later switched to CHECK().
> "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
> The major reason for this is DebuggerBreak being quite complex and
> not treated as noreturn by the compiler.
> It seems (see  crbug.com/664209  for more) that the most efficient way to
> handle these checks is ending up with:
>
> Source code:
>   CHECK(cond1);
>   CHECK(cond2);
>   ...
>
> Ideal assembly:
> (ideal for perf and binary, but not for crash reports, more below)
>     compare_opcode  cond1;
>     jump_if_zero    prologue;
>     compare_opcode  cond2;
>     jump_if_zero    prologue;
>     ...
>  prologue:
>     trap_instruction(s)
>
> Rather than something like:
>     compare_opcode    cond1;
>     jump_if_NOT_zero  next1;
>     trap_instruction(s)
> next1:
>     compare_opcode    cond2;
>     jump_if_NOT_zero  next2;
>     trap_instruction(s)
> next2:
>     ...
> Where essentially the trap instructions are interleaved within the
> main execution flow. This is even worse if the trap instruction is
> actually a call to a function, with annex frame initialization,
> as in the case of BreakDebugger(). That bloats the binary and
> reduces i-cache hotness for the main flow.
>
> crrev.com/1982123002 recently fixed the situation making the
> assembly look like the ideal case above. Unfortunately this caused
> another problem due the extreme optimization: once the program
> crashes in "trap_instruction(s)", there is no easy way to tell which condition
> caused the trap. In practice this translates into the inability of
> tell which CHECK failed in a function that has more than one check.
>
> This CL:
> --------
> Re-addresses crrev.com/2125923002, adding an extra instruction after
> the trap which creates an opcode with a unique counter. This prevents
> the compiler from folding the trap instructions, still applying no-return
> optimizations.
> Also by doing this the various prologue get properly attributed to the
> CHECK line in debugging symbols.
>
> Binary size inflation on official builds:
> -----------------------------------------
> Android arm:    48684276  -> 48712948  = 28 K
> Android arm64:  85611800  -> 85665048  = 53 K
> Android x86_64:  91904944 -> 91933616 = 28 K
> Linux x86_64: 124219488 -> 124346464 = 124 K
> (Android build with -Os, hence why the difference between the two)
>
> BUG= 664209 , 599867 
>
> Review-Url: https://codereview.chromium.org/2502953003
> Cr-Commit-Position: refs/heads/master@{#451381}
> Committed: https://chromium.googlesource.com/chromium/src/+/8c972d0e190168b4b5621e81563f319563fd0af8

TBR=thakis@chromium.org,torne@chromium.org,brettw@chromium.org,danakj@chromium.org,mark@chromium.org,primiano@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 664209 , 599867 

Review-Url: https://codereview.chromium.org/2706453004
Cr-Commit-Position: refs/heads/master@{#451384}

[modify] https://crrev.com/4c98a8c2c794b10f70fc22e02128139252a0b0c1/base/logging.h
[modify] https://crrev.com/4c98a8c2c794b10f70fc22e02128139252a0b0c1/base/logging_unittest.cc

Project Member

Comment 21 by bugdroid1@chromium.org, Feb 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f53167270c44da971a2f085299404319e42756f4

commit f53167270c44da971a2f085299404319e42756f4
Author: primiano <primiano@chromium.org>
Date: Tue Feb 21 13:09:26 2017

[Reland] base: make CHECK macros trap at distinct addresses in official builds

Original CL: https://crrev.com/2502953003
Revert: https://crrev.com/2706453004
Reason for relanding:
masked out OS_NACL which was failing to build because of this error:
"GNU-style inline assembly is disabled"

Original CL Description:
> Abstract
> --------
> CHECK() is the macro used all over the places (~4K occurrences without
> counting for dupes due to inlining) for release-time assertions.
> It is enabled in a minimal form (crash without a message) in official builds.
> It needs to be fast as it is used in lot of fastpath.
> It needs to not emit too much code, as it is used in lot of places.
> It needs to guarantee that crash reports can pinpoint to the right
> location when hitting a CHECK.
>
> - Back in the days CHECK was fat, slow, but crash-friendly.
> - After crrev.com/1982123002 it became tiny, fast but not crash friendly.
> - This CL is making it a bit less tiny (+28/+128K), fast and crash friendly.
>
> The problem this CL deals with is the case of multiple CHECK()s within
> the same function. Also, adds a test that covers Mac, Linux and Android.
>
> A bit of history:
> -----------------
> Before crrev.com/1982123002 (reverted and later re-landed in
> crrev.com/2125923002) CHECK() in official builds was essentially:
> if (!condition) BreakDebugger()
> It was later found that this approach was not efficient, both in terms
> of binary size and performance. More importantly it was a regression
> w.r.t. blink's assert that were later switched to CHECK().
> "[blink-dev] Update of wtf/Assertions.h, and ASSERT macros deprecation"
> The major reason for this is DebuggerBreak being quite complex and
> not treated as noreturn by the compiler.
> It seems (see  crbug.com/664209  for more) that the most efficient way to
> handle these checks is ending up with:
>
> Source code:
>   CHECK(cond1);
>   CHECK(cond2);
>   ...
>
> Ideal assembly:
> (ideal for perf and binary, but not for crash reports, more below)
>     compare_opcode  cond1;
>     jump_if_zero    prologue;
>     compare_opcode  cond2;
>     jump_if_zero    prologue;
>     ...
>  prologue:
>     trap_instruction(s)
>
> Rather than something like:
>     compare_opcode    cond1;
>     jump_if_NOT_zero  next1;
>     trap_instruction(s)
> next1:
>     compare_opcode    cond2;
>     jump_if_NOT_zero  next2;
>     trap_instruction(s)
> next2:
>     ...
> Where essentially the trap instructions are interleaved within the
> main execution flow. This is even worse if the trap instruction is
> actually a call to a function, with annex frame initialization,
> as in the case of BreakDebugger(). That bloats the binary and
> reduces i-cache hotness for the main flow.
>
> crrev.com/1982123002 recently fixed the situation making the
> assembly look like the ideal case above. Unfortunately this caused
> another problem due the extreme optimization: once the program
> crashes in "trap_instruction(s)", there is no easy way to tell which condition
> caused the trap. In practice this translates into the inability of
> tell which CHECK failed in a function that has more than one check.
>
> This CL:
> --------
> Re-addresses crrev.com/2125923002, adding an extra instruction after
> the trap which creates an opcode with a unique counter. This prevents
> the compiler from folding the trap instructions, still applying no-return
> optimizations.
> Also by doing this the various prologue get properly attributed to the
> CHECK line in debugging symbols.
>
> Binary size inflation on official builds:
> -----------------------------------------
> Android arm:    48684276  -> 48712948  = 28 K
> Android arm64:  85611800  -> 85665048  = 53 K
> Android x86_64:  91904944 -> 91933616 = 28 K
> Linux x86_64: 124219488 -> 124346464 = 124 K
> (Android build with -Os, hence why the difference between the two)
>

BUG= 664209 , 599867 
TBR=mark@chromium.org

Review-Url: https://codereview.chromium.org/2705053002
Cr-Commit-Position: refs/heads/master@{#451749}

[modify] https://crrev.com/f53167270c44da971a2f085299404319e42756f4/base/logging.h
[modify] https://crrev.com/f53167270c44da971a2f085299404319e42756f4/base/logging_unittest.cc

Comment 22 by r...@chromium.org, Feb 21 2017

Cc: r...@chromium.org
I haven't read all the comments and CLs here, but I think the right thing to do here would be to stop merging __builtin_trap in the backend or add something like __builtin_unique_trap that can't be tail merged. I think all we have to do is set the "convergent" attribute on the trap instruction and call it a day.
Re #22: Honestly from a compiler perspective, I was quite surprised that __builtin_trap is merged in the first place. I would expect that (not merging) to be the default behavior. 
the doc says "__builtin_trap function causes the program to exit abnormally" and, I might be biased, but I would expect to be able to reconstruct the state of execution from the coredump in the case of an abnormal program termination.
I was also expecting that "asm volatile (int3)" doesn't get merged due to the volatile. But apparently volatile is also not enough.

From a chrome perspective, we still use GCC on Android, so even if you fix clang, we still won't be able to fully rely on that :/
Project Member

Comment 24 by bugdroid1@chromium.org, Feb 21 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a23306766d02a480c7992f91b51e9691d6511b8

commit 6a23306766d02a480c7992f91b51e9691d6511b8
Author: scottmg <scottmg@chromium.org>
Date: Tue Feb 21 23:52:14 2017

Shorten IMMEDIATE_CRASH() code size on win clang

R=thakis@chromium.org, primiano@chromium.org
BUG= 664209 , 694670 

Review-Url: https://codereview.chromium.org/2710703003
Cr-Commit-Position: refs/heads/master@{#451857}

[modify] https://crrev.com/6a23306766d02a480c7992f91b51e9691d6511b8/base/logging.h

Status: Fixed (was: Untriaged)
Looks like it stuck. Marking as fixed.

Comment 26 by torne@chromium.org, Feb 28 2017

Thanks for perservering, Primiano!
Cc: -vabr@chromium.org

Sign in to add a comment