New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 910644 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 904337



Sign in to add a comment

clang roll 346388:347933 caused failures in compile on Google Chrome Linux x64

Project Member Reported by hirosh...@chromium.org, Nov 30

Issue description

First failure:
https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Linux%20x64/38102

[19117/38197] ACTION //v8:run_torque(//build/toolchain/linux:clang_x64)
FAILED: gen/v8/torque-generated/builtin-definitions-from-dsl.h gen/v8/torque-generated/builtins-base-from-dsl-gen.cc gen/v8/torque-generated/builtins-base-from-dsl-gen.h gen/v8/torque-generated/builtins-array-from-dsl-gen.cc gen/v8/torque-generated/builtins-array-from-dsl-gen.h gen/v8/torque-generated/builtins-collections-from-dsl-gen.cc gen/v8/torque-generated/builtins-collections-from-dsl-gen.h gen/v8/torque-generated/builtins-iterator-from-dsl-gen.cc gen/v8/torque-generated/builtins-iterator-from-dsl-gen.h gen/v8/torque-generated/builtins-object-from-dsl-gen.cc gen/v8/torque-generated/builtins-object-from-dsl-gen.h gen/v8/torque-generated/builtins-typed-array-from-dsl-gen.cc gen/v8/torque-generated/builtins-typed-array-from-dsl-gen.h gen/v8/torque-generated/builtins-data-view-from-dsl-gen.cc gen/v8/torque-generated/builtins-data-view-from-dsl-gen.h gen/v8/torque-generated/builtins-test-from-dsl-gen.cc gen/v8/torque-generated/builtins-test-from-dsl-gen.h 
python ../../v8/tools/run.py ./torque -o gen/v8/torque-generated ../../v8/src/builtins/base.tq ../../v8/src/builtins/array.tq ../../v8/src/builtins/array-copywithin.tq ../../v8/src/builtins/array-foreach.tq ../../v8/src/builtins/array-join.tq ../../v8/src/builtins/array-lastindexof.tq ../../v8/src/builtins/array-of.tq ../../v8/src/builtins/array-reverse.tq ../../v8/src/builtins/array-slice.tq ../../v8/src/builtins/array-splice.tq ../../v8/src/builtins/array-unshift.tq ../../v8/src/builtins/collections.tq ../../v8/src/builtins/data-view.tq ../../v8/src/builtins/object.tq ../../v8/src/builtins/object-fromentries.tq ../../v8/src/builtins/iterator.tq ../../v8/src/builtins/typed-array.tq ../../v8/test/torque/test-torque.tq ../../v8/third_party/v8/builtins/array-sort.tq

 
Cc: r...@chromium.org thakis@chromium.org
Issue 893437 is related to the same failure, and Issue 893437 suggests clang is involved there (as ASAN is involved), so I suspect the clang roll in the regression range is the culprit. Reverting the clang roll.
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/835d0fdea0f19000edf8766a4e299704586be44d

commit 835d0fdea0f19000edf8766a4e299704586be44d
Author: Hiroshige Hayashizaki <hiroshige@chromium.org>
Date: Fri Nov 30 17:42:49 2018

Revert "Roll clang 346388:347933."

This reverts commit 0bdd3c25975f763d121802a092c1ef0925049976.

Reason for revert: Suspected to cause compile failure
on the Google Chrome Linux x64 bot  crbug.com/910644 

Original change's description:
> Roll clang 346388:347933.
> 
> Bug:  904337 
> Change-Id: I8fad51cde2b93b83dfa1a14c61a992f659d827be
> Reviewed-on: https://chromium-review.googlesource.com/c/1356710
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#612660}

TBR=thakis@chromium.org,hans@chromium.org,rnk@chromium.org

Change-Id: Ia95868dd39daaa60e204a6ec64df6eabfafacb94
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  904337 ,  910644 
Reviewed-on: https://chromium-review.googlesource.com/c/1357012
Reviewed-by: Hiroshige Hayashizaki <hiroshige@chromium.org>
Commit-Queue: Hiroshige Hayashizaki <hiroshige@chromium.org>
Cr-Commit-Position: refs/heads/master@{#612687}
[modify] https://crrev.com/835d0fdea0f19000edf8766a4e299704586be44d/tools/clang/scripts/update.py

Cc: h...@chromium.org
The clang roll is also suspected to cause compare_build_artifacts failures on
the Deterministic Linux bot:

Regressed:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Deterministic%20Linux/18210
Recovered:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Deterministic%20Linux/18212
Google Chrome Linux x64 bot also recovered by the revert:
https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Linux%20x64/38110
Cc: -h...@chromium.org
Labels: -Pri-1 -Sheriff-Chromium Pri-2
Owner: h...@chromium.org
Status: Assigned (was: Started)
Summary: clang roll 346388:347933 caused failures in compile on Google Chrome Linux x64 / compare_build_artifacts on the Deterministic Linux (was: Compile failure on Google Chrome Linux x64 bot )
As all related failures are recovered, removing Sheriff label and assigning to hans@ as the author of the clang roll CL.
Summary: clang roll 346388:347933 caused failures in compile on Google Chrome Linux x64 (was: clang roll 346388:347933 caused failures in compile on Google Chrome Linux x64 / compare_build_artifacts on the Deterministic Linux)
Deterministic Linux failures might be unrelated (still failing after revert of the clang roll).
I think there's been a second regression on the deterministic linux bots, see  issue 910699 . The first bug could be due to the clang roll (but it was only some nacl binary, and those already have suppressions, so worst case we could update the whitelist after rolling and then investigate async).
Focusing on the v8:run_torque failure.

If I turn off CFI, it stops reproducing..
Status: Started (was: Assigned)
Bisecting...

(Today's run script:)

#!/bin/bash
set -exo pipefail

ninja -C build.release clang lld compiler-rt llvm-ar || exit 125
ninja -C /work/chromium/src/out/release -t clean || echo only mostly clean
ninja -C /work/chromium/src/out/release v8:run_torque
Cc: pirama@google.com
Bisection points to this:

---
b51082ef4bdc83efdc37747773ff43bafad55d5d is the first bad commit
commit b51082ef4bdc83efdc37747773ff43bafad55d5d
Author: Pirama Arumuga Nainar <pirama@google.com>
Date:   Thu Nov 8 20:10:07 2018 +0000

    [LTO] Drop non-prevailing definitions only if linkage is not local or appending
    
    Summary:
    This fixes PR 37422
    
    In ELF, non-weak symbols can also be non-prevailing.  In this particular
    PR, the __llvm_profile_* symbols are non-prevailing but weren't getting
    dropped - causing multiply-defined errors with lld.
    
    Also add a test, strong_non_prevailing.ll, to ensure that multiple
    copies of a strong symbol are dropped.
    
    To fix the test regressions exposed by this fix,
    - do not mark prevailing copies for symbols with 'appending' linkage.
    There's no one prevailing copy for such symbols.
    - fix the prevailing version in dead-strip-fulllto.ll
    - explicitly pass exported symbols to llvm-lto in fumcimport.ll and
    funcimport_var.ll
    
    Reviewers: tejohnson, pcc
    
    Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith,
    dang, srhines, llvm-commits
    
    Differential Revision: https://reviews.llvm.org/D54125
---
The SVN revision is r346436
Looking at the map files from linking torque before and after that revision, I don't see anything getting dropped in the new revision.

In fact, it looks like in the new revision, new definitions are getting included in the binary: isalpha, isalnum, isdigit, isxdigit and isspace. But things do more around a bit, so maybe torque is sensitive to that somehow? How exactly is it failing anyway...
This is what it looks like in gdb:

(gdb) r
Starting program: /work/chromium/src/out/release/torque -o gen/v8/torque-generated ../../v8/src/builtins/base.tq ../../v8/src/builtins/array.tq ../../v8/src/builtins/array-copywithin.tq ../../v8/src/builtins/array-foreach.tq ../../v8/src/builtins/array-join.tq ../../v8/src/builtins/array-lastindexof.tq ../../v8/src/builtins/array-of.tq ../../v8/src/builtins/array-reverse.tq ../../v8/src/builtins/array-slice.tq ../../v8/src/builtins/array-splice.tq ../../v8/src/builtins/array-unshift.tq ../../v8/src/builtins/collections.tq ../../v8/src/builtins/data-view.tq ../../v8/src/builtins/object.tq ../../v8/src/builtins/object-fromentries.tq ../../v8/src/builtins/iterator.tq ../../v8/src/builtins/typed-array.tq ../../v8/test/torque/test-torque.tq ../../v8/third_party/v8/builtins/array-sort.tq
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
v8::internal::torque::Grammar::MatchChar (char_class=0x5555555d49c0 <isspace(int)>, pos=0x7fffffffcff8) at ../../v8/src/torque/earley-parser.cc:270
270       if (**pos && char_class(static_cast<unsigned char>(**pos))) {
(gdb) bt
#0  v8::internal::torque::Grammar::MatchChar (char_class=0x5555555d49c0 <isspace(int)>, pos=0x7fffffffcff8) at ../../v8/src/torque/earley-parser.cc:270
#1  0x00005555555cceda in v8::internal::torque::(anonymous namespace)::TorqueGrammar::MatchWhitespace (pos=0x7fffffffcff8) at ../../v8/src/torque/torque-parser.cc:1041
#2  0x0000555555584d89 in v8::internal::torque::Lexer::RunLexer (this=0x7fffffffd090, input=...) at ../../v8/src/torque/earley-parser.cc:112
#3  0x00005555555ac1d8 in v8::internal::torque::Grammar::Parse (this=0x555555569310 <_ZN2v88internal6torque12_GLOBAL__N_113TorqueGrammar15MatchWhitespaceEPPKc$d665c457959e3da47ab5726614bfefc1>, 
    input=...) at ../../v8/src/torque/earley-parser.h:349
#4  v8::internal::torque::ParseTorque (input=...) at ../../v8/src/torque/torque-parser.cc:1541
#5  0x0000555555569f14 in v8::internal::torque::WrappedMain (argc=<optimized out>, argv=<optimized out>) at ../../v8/src/torque/torque.cc:50
#6  main (argc=<optimized out>, argv=<optimized out>) at ../../v8/src/torque/torque.cc:87



The code looks like this:

// static
bool Grammar::MatchChar(int (*char_class)(int), InputPosition* pos) {
  if (**pos && char_class(static_cast<unsigned char>(**pos))) {
    ++*pos;
    return true;
  }
  return false;
}


And char_class is isspace(), which is showing in the diff of the linker map files...
It's the CFI check for isspace that fails:


mov    %rsp,%rbp
push   %rbx
push   %rax
mov    %rdi,%rax
mov    (%rsi),%rcx
movzbl (%rcx),%edi
test   %edi,%edi
je     0x55555558760b <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+59>
lea    -0x1e33a(%rip),%rcx        # 0x5555555692b0 <isspace.cfi_jt>
mov    %rax,%rdx
sub    %rcx,%rdx
ror    $0x3,%rdx
cmp    $0x5,%rdx
ja     0x555555587614 <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+68>   <---- We jump to ud2 here.
mov    %rsi,%rbx
callq  *%rax
test   %eax,%eax
je     0x55555558760b <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+59>
addq   $0x1,(%rbx)
mov    $0x1,%al
jmp    0x55555558760d <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+61>
xor    %eax,%eax
add    $0x8,%rsp
pop    %rbx
pop    %rbp
retq
ud2
Blocking: 904337
Cc: h...@chromium.org
Components: Build
Labels: -Pri-2 clang Pri-1
Owner: p...@chromium.org
Status: Assigned (was: Started)
pcc: Can you take a look at how pirama's LTO change causes this CFI failure?


Steps to reproduce:
Check out and sync Chromium #612660 / 0bdd3c25975f763d121802a092c1ef0925049976

gn args:
is_chrome_branded = true
is_debug = false
is_official_build = true
strip_absolute_paths_from_debug_symbols = true

$ ninja -C out/release v8:run_torque
The issue is that pirama's change causes (I'm not sure exactly how yet) the available_externally definition of isspace in torque-parser.o to become internal during the "promote" phase.

$ ~/l4/ra/bin/llvm-dis ./obj/v8/torque_base/torque-parser.o.0.preopt.bc -o - | grep isspace
  %call = tail call zeroext i1 @_ZN2v88internal6torque7Grammar9MatchCharEPFiiEPPKc(i32 (i32)* nonnull @isspace, i8** %pos) #20, !dbg !121963
define available_externally i32 @isspace(i32 %__c) #9 !dbg !24263 !prof !25529 !type !122296 !type !122297 {
!24263 = distinct !DISubprogram(name: "isspace", scope: !1731, file: !1731, line: 182, type: !24237, scopeLine: 182, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !1683, retainedNodes: !24264)

$ ~/l4/ra/bin/llvm-dis ./obj/v8/torque_base/torque-parser.o.1.promote.bc -o - | grep isspace
  %call = tail call zeroext i1 @_ZN2v88internal6torque7Grammar9MatchCharEPFiiEPPKc(i32 (i32)* nonnull @isspace, i8** %pos) #20, !dbg !121963
define internal i32 @isspace(i32 %__c) #9 !dbg !24263 !prof !25529 !type !122296 !type !122297 {
!24263 = distinct !DISubprogram(name: "isspace", scope: !1731, file: !1731, line: 182, type: !24237, scopeLine: 182, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !1683, retainedNodes: !24264)

This is an incorrect transformation in general for C and C++ since it breaks function pointer equality for isspace (since it causes the address of isspace to be different inside the module than outside); CFI just happens to rely more strongly on this property because of the jump tables that it creates.
Cc: srhines@google.com
> Fix: https://reviews.llvm.org/D55237

Great stuff! Thanks for looking into this.
Fix landed in r348321.
Status: Fixed (was: Assigned)
Thanks!

Sign in to add a comment