clang roll 346388:347933 caused failures in compile on Google Chrome Linux x64 |
|||||||||
Issue descriptionFirst failure: https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Linux%20x64/38102 [19117/38197] ACTION //v8:run_torque(//build/toolchain/linux:clang_x64) FAILED: gen/v8/torque-generated/builtin-definitions-from-dsl.h gen/v8/torque-generated/builtins-base-from-dsl-gen.cc gen/v8/torque-generated/builtins-base-from-dsl-gen.h gen/v8/torque-generated/builtins-array-from-dsl-gen.cc gen/v8/torque-generated/builtins-array-from-dsl-gen.h gen/v8/torque-generated/builtins-collections-from-dsl-gen.cc gen/v8/torque-generated/builtins-collections-from-dsl-gen.h gen/v8/torque-generated/builtins-iterator-from-dsl-gen.cc gen/v8/torque-generated/builtins-iterator-from-dsl-gen.h gen/v8/torque-generated/builtins-object-from-dsl-gen.cc gen/v8/torque-generated/builtins-object-from-dsl-gen.h gen/v8/torque-generated/builtins-typed-array-from-dsl-gen.cc gen/v8/torque-generated/builtins-typed-array-from-dsl-gen.h gen/v8/torque-generated/builtins-data-view-from-dsl-gen.cc gen/v8/torque-generated/builtins-data-view-from-dsl-gen.h gen/v8/torque-generated/builtins-test-from-dsl-gen.cc gen/v8/torque-generated/builtins-test-from-dsl-gen.h python ../../v8/tools/run.py ./torque -o gen/v8/torque-generated ../../v8/src/builtins/base.tq ../../v8/src/builtins/array.tq ../../v8/src/builtins/array-copywithin.tq ../../v8/src/builtins/array-foreach.tq ../../v8/src/builtins/array-join.tq ../../v8/src/builtins/array-lastindexof.tq ../../v8/src/builtins/array-of.tq ../../v8/src/builtins/array-reverse.tq ../../v8/src/builtins/array-slice.tq ../../v8/src/builtins/array-splice.tq ../../v8/src/builtins/array-unshift.tq ../../v8/src/builtins/collections.tq ../../v8/src/builtins/data-view.tq ../../v8/src/builtins/object.tq ../../v8/src/builtins/object-fromentries.tq ../../v8/src/builtins/iterator.tq ../../v8/src/builtins/typed-array.tq ../../v8/test/torque/test-torque.tq ../../v8/third_party/v8/builtins/array-sort.tq
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/835d0fdea0f19000edf8766a4e299704586be44d commit 835d0fdea0f19000edf8766a4e299704586be44d Author: Hiroshige Hayashizaki <hiroshige@chromium.org> Date: Fri Nov 30 17:42:49 2018 Revert "Roll clang 346388:347933." This reverts commit 0bdd3c25975f763d121802a092c1ef0925049976. Reason for revert: Suspected to cause compile failure on the Google Chrome Linux x64 bot crbug.com/910644 Original change's description: > Roll clang 346388:347933. > > Bug: 904337 > Change-Id: I8fad51cde2b93b83dfa1a14c61a992f659d827be > Reviewed-on: https://chromium-review.googlesource.com/c/1356710 > Reviewed-by: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#612660} TBR=thakis@chromium.org,hans@chromium.org,rnk@chromium.org Change-Id: Ia95868dd39daaa60e204a6ec64df6eabfafacb94 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 904337 , 910644 Reviewed-on: https://chromium-review.googlesource.com/c/1357012 Reviewed-by: Hiroshige Hayashizaki <hiroshige@chromium.org> Commit-Queue: Hiroshige Hayashizaki <hiroshige@chromium.org> Cr-Commit-Position: refs/heads/master@{#612687} [modify] https://crrev.com/835d0fdea0f19000edf8766a4e299704586be44d/tools/clang/scripts/update.py
,
Nov 30
,
Nov 30
The clang roll is also suspected to cause compare_build_artifacts failures on the Deterministic Linux bot: Regressed: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Deterministic%20Linux/18210 Recovered: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Deterministic%20Linux/18212
,
Nov 30
Google Chrome Linux x64 bot also recovered by the revert: https://ci.chromium.org/buildbot/chromium.chrome/Google%20Chrome%20Linux%20x64/38110
,
Nov 30
As all related failures are recovered, removing Sheriff label and assigning to hans@ as the author of the clang roll CL.
,
Nov 30
Deterministic Linux failures might be unrelated (still failing after revert of the clang roll).
,
Nov 30
I think there's been a second regression on the deterministic linux bots, see issue 910699 . The first bug could be due to the clang roll (but it was only some nacl binary, and those already have suppressions, so worst case we could update the whitelist after rolling and then investigate async).
,
Dec 3
Focusing on the v8:run_torque failure. If I turn off CFI, it stops reproducing..
,
Dec 3
Bisecting... (Today's run script:) #!/bin/bash set -exo pipefail ninja -C build.release clang lld compiler-rt llvm-ar || exit 125 ninja -C /work/chromium/src/out/release -t clean || echo only mostly clean ninja -C /work/chromium/src/out/release v8:run_torque
,
Dec 3
Bisection points to this: --- b51082ef4bdc83efdc37747773ff43bafad55d5d is the first bad commit commit b51082ef4bdc83efdc37747773ff43bafad55d5d Author: Pirama Arumuga Nainar <pirama@google.com> Date: Thu Nov 8 20:10:07 2018 +0000 [LTO] Drop non-prevailing definitions only if linkage is not local or appending Summary: This fixes PR 37422 In ELF, non-weak symbols can also be non-prevailing. In this particular PR, the __llvm_profile_* symbols are non-prevailing but weren't getting dropped - causing multiply-defined errors with lld. Also add a test, strong_non_prevailing.ll, to ensure that multiple copies of a strong symbol are dropped. To fix the test regressions exposed by this fix, - do not mark prevailing copies for symbols with 'appending' linkage. There's no one prevailing copy for such symbols. - fix the prevailing version in dead-strip-fulllto.ll - explicitly pass exported symbols to llvm-lto in fumcimport.ll and funcimport_var.ll Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D54125 ---
,
Dec 3
The SVN revision is r346436
,
Dec 3
Looking at the map files from linking torque before and after that revision, I don't see anything getting dropped in the new revision. In fact, it looks like in the new revision, new definitions are getting included in the binary: isalpha, isalnum, isdigit, isxdigit and isspace. But things do more around a bit, so maybe torque is sensitive to that somehow? How exactly is it failing anyway...
,
Dec 3
This is what it looks like in gdb:
(gdb) r
Starting program: /work/chromium/src/out/release/torque -o gen/v8/torque-generated ../../v8/src/builtins/base.tq ../../v8/src/builtins/array.tq ../../v8/src/builtins/array-copywithin.tq ../../v8/src/builtins/array-foreach.tq ../../v8/src/builtins/array-join.tq ../../v8/src/builtins/array-lastindexof.tq ../../v8/src/builtins/array-of.tq ../../v8/src/builtins/array-reverse.tq ../../v8/src/builtins/array-slice.tq ../../v8/src/builtins/array-splice.tq ../../v8/src/builtins/array-unshift.tq ../../v8/src/builtins/collections.tq ../../v8/src/builtins/data-view.tq ../../v8/src/builtins/object.tq ../../v8/src/builtins/object-fromentries.tq ../../v8/src/builtins/iterator.tq ../../v8/src/builtins/typed-array.tq ../../v8/test/torque/test-torque.tq ../../v8/third_party/v8/builtins/array-sort.tq
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGILL, Illegal instruction.
v8::internal::torque::Grammar::MatchChar (char_class=0x5555555d49c0 <isspace(int)>, pos=0x7fffffffcff8) at ../../v8/src/torque/earley-parser.cc:270
270 if (**pos && char_class(static_cast<unsigned char>(**pos))) {
(gdb) bt
#0 v8::internal::torque::Grammar::MatchChar (char_class=0x5555555d49c0 <isspace(int)>, pos=0x7fffffffcff8) at ../../v8/src/torque/earley-parser.cc:270
#1 0x00005555555cceda in v8::internal::torque::(anonymous namespace)::TorqueGrammar::MatchWhitespace (pos=0x7fffffffcff8) at ../../v8/src/torque/torque-parser.cc:1041
#2 0x0000555555584d89 in v8::internal::torque::Lexer::RunLexer (this=0x7fffffffd090, input=...) at ../../v8/src/torque/earley-parser.cc:112
#3 0x00005555555ac1d8 in v8::internal::torque::Grammar::Parse (this=0x555555569310 <_ZN2v88internal6torque12_GLOBAL__N_113TorqueGrammar15MatchWhitespaceEPPKc$d665c457959e3da47ab5726614bfefc1>,
input=...) at ../../v8/src/torque/earley-parser.h:349
#4 v8::internal::torque::ParseTorque (input=...) at ../../v8/src/torque/torque-parser.cc:1541
#5 0x0000555555569f14 in v8::internal::torque::WrappedMain (argc=<optimized out>, argv=<optimized out>) at ../../v8/src/torque/torque.cc:50
#6 main (argc=<optimized out>, argv=<optimized out>) at ../../v8/src/torque/torque.cc:87
The code looks like this:
// static
bool Grammar::MatchChar(int (*char_class)(int), InputPosition* pos) {
if (**pos && char_class(static_cast<unsigned char>(**pos))) {
++*pos;
return true;
}
return false;
}
And char_class is isspace(), which is showing in the diff of the linker map files...
,
Dec 3
It's the CFI check for isspace that fails: mov %rsp,%rbp push %rbx push %rax mov %rdi,%rax mov (%rsi),%rcx movzbl (%rcx),%edi test %edi,%edi je 0x55555558760b <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+59> lea -0x1e33a(%rip),%rcx # 0x5555555692b0 <isspace.cfi_jt> mov %rax,%rdx sub %rcx,%rdx ror $0x3,%rdx cmp $0x5,%rdx ja 0x555555587614 <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+68> <---- We jump to ud2 here. mov %rsi,%rbx callq *%rax test %eax,%eax je 0x55555558760b <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+59> addq $0x1,(%rbx) mov $0x1,%al jmp 0x55555558760d <v8::internal::torque::Grammar::MatchChar(int (*)(int), char const**)+61> xor %eax,%eax add $0x8,%rsp pop %rbx pop %rbp retq ud2
,
Dec 3
pcc: Can you take a look at how pirama's LTO change causes this CFI failure? Steps to reproduce: Check out and sync Chromium #612660 / 0bdd3c25975f763d121802a092c1ef0925049976 gn args: is_chrome_branded = true is_debug = false is_official_build = true strip_absolute_paths_from_debug_symbols = true $ ninja -C out/release v8:run_torque
,
Dec 3
The issue is that pirama's change causes (I'm not sure exactly how yet) the available_externally definition of isspace in torque-parser.o to become internal during the "promote" phase.
$ ~/l4/ra/bin/llvm-dis ./obj/v8/torque_base/torque-parser.o.0.preopt.bc -o - | grep isspace
%call = tail call zeroext i1 @_ZN2v88internal6torque7Grammar9MatchCharEPFiiEPPKc(i32 (i32)* nonnull @isspace, i8** %pos) #20, !dbg !121963
define available_externally i32 @isspace(i32 %__c) #9 !dbg !24263 !prof !25529 !type !122296 !type !122297 {
!24263 = distinct !DISubprogram(name: "isspace", scope: !1731, file: !1731, line: 182, type: !24237, scopeLine: 182, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !1683, retainedNodes: !24264)
$ ~/l4/ra/bin/llvm-dis ./obj/v8/torque_base/torque-parser.o.1.promote.bc -o - | grep isspace
%call = tail call zeroext i1 @_ZN2v88internal6torque7Grammar9MatchCharEPFiiEPPKc(i32 (i32)* nonnull @isspace, i8** %pos) #20, !dbg !121963
define internal i32 @isspace(i32 %__c) #9 !dbg !24263 !prof !25529 !type !122296 !type !122297 {
!24263 = distinct !DISubprogram(name: "isspace", scope: !1731, file: !1731, line: 182, type: !24237, scopeLine: 182, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !1683, retainedNodes: !24264)
This is an incorrect transformation in general for C and C++ since it breaks function pointer equality for isspace (since it causes the address of isspace to be different inside the module than outside); CFI just happens to rely more strongly on this property because of the jump tables that it creates.
,
Dec 3
,
Dec 3
,
Dec 4
> Fix: https://reviews.llvm.org/D55237 Great stuff! Thanks for looking into this.
,
Dec 5
Fix landed in r348321.
,
Dec 7
Thanks! |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by hirosh...@chromium.org
, Nov 30