New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 837090 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 837440

Blocking:
issue 792131



Sign in to add a comment

Win ASan LLD builds fail check in asan_globals.cc

Project Member Reported by mmoroz@chromium.org, Apr 26 2018

Issue description

https://ci.chromium.org/buildbot/chromium.lkgr/Win%20ASan%20Release/?limit=200

Starting since https://ci.chromium.org/buildbot/chromium.lkgr/Win%20ASan%20Release/14324

Error:

[2783/35642] ACTION //components/resources:compressed_about_credits(//build/toolchain/win:win_clang_x64)
FAILED: gen/components/resources/about_credits.bro 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py brotli.exe --force --no-copy-stat gen/components/resources/about_credits.html -o gen/components/resources/about_credits.bro
==7968==AddressSanitizer CHECK failed: C:\b\rr\tmpm5zxwy\w\src\third_party\llvm\projects\compiler-rt\lib\asan\asan_globals.cc:382 "((globals[i].size == 0 && globals[i].size_with_redzone == 0 && globals[i].name == nullptr && globals[i].module_name == nullptr && globals[i].odr_indicator == 0)) != (0)" (0x0, 0x0)
    #0 0x13ff0925f in __asan::AsanCheckFailed C:\b\rr\tmpm5zxwy\w\src\third_party\llvm\projects\compiler-rt\lib\asan\asan_rtl.cc:69
    #1 0x13ff1ad7b in __sanitizer::CheckFailed C:\b\rr\tmpm5zxwy\w\src\third_party\llvm\projects\compiler-rt\lib\sanitizer_common\sanitizer_termination.cc:79
    #2 0x13ff328c1 in __asan_register_globals C:\b\rr\tmpm5zxwy\w\src\third_party\llvm\projects\compiler-rt\lib\asan\asan_globals.cc:380
    #3 0x13ff3faf6 in _initterm minkernel\crts\ucrt\src\appcrt\startup\initterm.cpp:21
    #4 0x13ff3bb45 in __scrt_common_main_seh f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:251
    #5 0x76de59ec in BaseThreadInitThunk+0xc (C:\Windows\system32\kernel32.dll+0x78d359ec)
    #6 0x76f1b370 in RtlUserThreadStart+0x20 (C:\Windows\SYSTEM32\ntdll.dll+0x78e7b370)
brotli.exe failed with exit code 1



There are only 3 changes in the blamelist, https://chromium-review.googlesource.com/c/chromium/src/+/983632 seems to be the culprit
 

Comment 1 by thakis@chromium.org, Apr 26 2018

Cc: dpranke@chromium.org
Why do we have a bot on lkgr that's neither on the main waterfall nor on the cq?

Comment 2 by mmoroz@chromium.org, Apr 26 2018

I don't know.
Project Member

Comment 4 by bugdroid1@chromium.org, Apr 26 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fda7b0b62df2fe87ca1650907911ef66a051faa8

commit fda7b0b62df2fe87ca1650907911ef66a051faa8
Author: Hans Wennborg <hans@chromium.org>
Date: Thu Apr 26 03:13:25 2018

Revert "Reland "win: Link with lld instead of MSVC's link.exe by default""

This reverts commit 6b5ac2efbfcbf98590bb1b7e610dacb3318c122a.

Reason for revert:
It broke the Win ASan bot on the LKGR waterfall (see second bug).

Original change's description:
> Reland "win: Link with lld instead of MSVC's link.exe by default"
> 
> This is a reland of be8138a4eb66bddb5bca81e08c13c6cb1dafc981
> The issues with the previous attempt were addressed in the latest
> Clang roll (#553415).
> 
> Original change's description:
> > win: Link with lld instead of MSVC's link.exe by default
> >
> > lld is LLVM's linker. It produces PE/COFF and PDB files just like
> > link.exe, but it's significantly faster and it can also handle LLVM's
> > internal representation, which will enable us to do link-time
> > optimization and control-flow integraty checks with Clang.
> >
> > While lld is much faster at linking, it doesn't support incremental
> > links, meaning builds that only touch a few files and re-link a large
> > executable may become slower.
> >
> > This is the first attempt at switching everything over, with the
> > purpose of gathering data and finding unknown unknowns. It's likely
> > temporary until something breaks.
> >
> > is_win_fastlink is implicitly ignored when using lld, as lld without
> > fastlink is faster than link.exe with it.
> >
> > Also switch the CrWinClangLLD bots on chromium.clang to use MSVC's
> > link.exe to make sure that configuration keeps working.
> >
> > Bug:  792131 
> > Change-Id: I0f115a78c33d69eadbd480f75c2a5d636e86483d
> > Reviewed-on: https://chromium-review.googlesource.com/983632
> > Commit-Queue: Nico Weber <thakis@chromium.org>
> > Reviewed-by: Nico Weber <thakis@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#551367}
> 
> Bug:  792131 
> Change-Id: I3b76cd015ef18bb3e2ac7a3efa4352c4f04e560b
> Reviewed-on: https://chromium-review.googlesource.com/1028374
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Commit-Queue: Hans Wennborg <hans@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#553776}

TBR=thakis@chromium.org,hans@chromium.org

Change-Id: Ied5addd2f8d16d1e002bf37846e7106b4c754cb9
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  792131 ,  837090 
Reviewed-on: https://chromium-review.googlesource.com/1029631
Reviewed-by: Hans Wennborg <hans@chromium.org>
Commit-Queue: Hans Wennborg <hans@chromium.org>
Cr-Commit-Position: refs/heads/master@{#553878}
[modify] https://crrev.com/fda7b0b62df2fe87ca1650907911ef66a051faa8/build/config/compiler/compiler.gni
[modify] https://crrev.com/fda7b0b62df2fe87ca1650907911ef66a051faa8/tools/mb/mb_config.pyl

Comment 5 by thakis@chromium.org, Apr 26 2018

Hm, not clear this warranted a revert. Worst case, we could've disabled lld in asan builds, but since this bot exists only on lkgr I'm not sure even that's appropriate, we could've maybe just fixed forward.

Comment 6 by mmoroz@chromium.org, Apr 26 2018

Status: Fixed (was: Untriaged)
Thanks a lot Hans, the build is green now!

Comment 7 by h...@chromium.org, Apr 26 2018

Blocking: 792131
Labels: -Pri-1 Pri-2
Status: Assigned (was: Fixed)
Summary: Win ASan LLD builds fail check in asan_globals.cc (was: LKGR build is broken on Windows)
> Hm, not clear this warranted a revert.

Rats, I didn't think about just disabling for ASan :-/ It looked like a real issue though, so seemed worth investigating offline.

mmoroz: While the build was fixed by the revert, I'll re-open and use this to track the underlying problem.

Comment 8 by h...@chromium.org, Apr 26 2018

Cc: ruiu@google.com r...@chromium.org
Status: Started (was: Assigned)
This is the code where we fail the check:

void __asan_register_globals(__asan_global *globals, uptr n) {
  if (!flags()->report_globals) return;
  GET_STACK_TRACE_MALLOC;
  u32 stack_id = StackDepotPut(stack);
  BlockingMutexLock lock(&mu_for_globals);
  if (!global_registration_site_vector)
    global_registration_site_vector =
        new(allocator_for_globals) GlobalRegistrationSiteVector(128);
  GlobalRegistrationSite site = {stack_id, &globals[0], &globals[n - 1]};
  global_registration_site_vector->push_back(site);
  if (flags()->report_globals >= 2) {
    PRINT_CURRENT_STACK();
    Printf("=== ID %d; %p %p\n", stack_id, &globals[0], &globals[n - 1]);
  }
  for (uptr i = 0; i < n; i++) {
    if (SANITIZER_WINDOWS && globals[i].beg == 0) {
      // The MSVC incremental linker may pad globals out to 256 bytes. As long
      // as __asan_global is less than 256 bytes large and its size is a power
      // of two, we can skip over the padding.
      static_assert(
          sizeof(__asan_global) < 256 &&
              (sizeof(__asan_global) & (sizeof(__asan_global) - 1)) == 0,
          "sizeof(__asan_global) incompatible with incremental linker padding");
      // If these are padding bytes, the rest of the global should be zero.
      CHECK(globals[i].size == 0 && globals[i].size_with_redzone == 0 &&
            globals[i].name == nullptr && globals[i].module_name == nullptr &&
            globals[i].odr_indicator == 0);
      continue;
    }
    RegisterGlobal(&globals[i]);
  }

  // Poison the metadata. It should not be accessible to user code.
  PoisonShadow(reinterpret_cast<uptr>(globals), n * sizeof(__asan_global),
               kAsanGlobalRedzoneMagic);
}


If I understand correctly, this gets called for all pointers between __asan_globals_start and __asan_globals_end.

Looking at the /lldmap file for brotli.exe:

Address  Size     Align Out     In      Symbol
001b8cf0 00000000    16         d:\rs2.obj.amd64fre\minkernel\crts\ucrt\src\appcrt\dll\mt\..\..\tran\mt\objfre\amd64\log_f_inv_qword_table.obj:(.data)
001b8cf0 00000040     8         projects\compiler-rt\lib\asan\CMakeFiles\RTAsan.x86_64.dir\asan_globals_win.cc.obj:(.ASAN$GA)
001b8cf0 00000000     0                 __asan_globals_start
001b8d40 00000040    64         obj/third_party/brotli/brotli/backward_references_hq.obj:(.ASAN$GL)
001b8d80 00000040    64         obj/third_party/brotli/brotli/backward_references_hq.obj:(.ASAN$GL)
001b8dc0 00000040    64         obj/third_party/brotli/brotli/backward_references_hq.obj:(.ASAN$GL)

It seems like the first global doesn't actually start at __asan_globals_start, but a few bytes after due to alignment.

Maybe what's happening is that __asan_register_globals() starts with 0x001b8cf0 and sees globals[i].beg == 0 because it's reading padding before the first global..

Comment 9 by r...@chromium.org, Apr 26 2018

Looks like this is an x64 asan bot, which isn't as well supported as x86, which is what I tried first to reproduce this.

Comment 10 by h...@chromium.org, Apr 26 2018

Okay so the problem is that __asan_globals_start is only 8-byte aligned, but the others are 64-byte aligned. The over-alignment comes from here:

AddressSanitizerModule::InstrumentGlobalsCOFF():

    // The MSVC linker always inserts padding when linking incrementally. We
    // cope with that by aligning each struct to its size, which must be a power
    // of two.
    unsigned SizeOfGlobalStruct = DL.getTypeAllocSize(Initializer->getType());
    assert(isPowerOf2_32(SizeOfGlobalStruct) &&
           "global metadata will not be padded appropriately");
    Metadata->setAlignment(SizeOfGlobalStruct);


I wonder if we can use a pragma or something to increase the alignment of __asan_globals_start

Comment 11 by r...@chromium.org, Apr 26 2018

I agree, in order for this nonsense to work, __asan_globals_start needs to be aligned to sizeof(__asan_global) to match the .ASAN$GL bits.

Comment 12 by r...@chromium.org, Apr 26 2018

OK, we have a fix upstream in r330990. Also, now the ASan runtime has full debug info instead of just line tables, so debugging these issues should be easier in the future.

In the short term, I think Hans wanted to reland the LLD switch with an exception for is_asan=true builds.

In the long term, we'll roll clang, pick up this fix, and remove the exception.

Comment 13 by h...@chromium.org, Apr 26 2018

Verified locally that r330990 fixes the check when invoking brotli.
Why doesn't link.exe need the compiler-rt change? Do they align things less? If so, why are we different?

Comment 15 by h...@chromium.org, Apr 26 2018

Looking at the map file from using link.exe:

 Start         Length     Name                   Class
 0002:000a4334 00000014H .idata$2                DATA
 0002:000a4348 00000018H .idata$3                DATA
 0002:000a4360 00000358H .idata$4                DATA
 0002:000a46b8 00000832H .idata$6                DATA
 0003:00000000 00000040H .ASAN$GA                DATA
 0003:00000040 00001c00H .ASAN$GL                DATA
 0003:00001c40 00000040H .ASAN$GZ                DATA
 0003:00001c80 00002d40H .data                   DATA
 0003:000049b0 00000000H .data$r                 DATA
 0003:000049c0 008d18e8H .bss                    DATA
 0004:00000000 00003abcH .pdata                  DATA
 0005:00000000 00000004H .SCOV$A                 DATA

  Address         Publics by Value              Rva+Base               Lib:Object

 0003:00000000       __asan_globals_start       00000001401bd000     clang_rt.asan-x86_64:asan_globals_win.cc.obj
 0003:00001c40       __asan_globals_end         00000001401bec40     clang_rt.asan-x86_64:asan_globals_win.cc.obj


It looks like .ASAN$GA goes first in the .data segment (due due alphabetical ordering?) and so __asan_globals_start gets page aligned.

The fix to compiler-rt still makes sense, but maybe lld also should adjust which order it merges these "sub sections" to match link.exe?

rkn, ruiu: ^

Comment 16 by h...@chromium.org, Apr 26 2018

Blockedon: 837440

Comment 17 by r...@chromium.org, May 10 2018

> It looks like .ASAN$GA goes first in the .data segment (due due alphabetical ordering?) and so __asan_globals_start gets page aligned.
> 
> The fix to compiler-rt still makes sense, but maybe lld also should adjust which order it merges these "sub sections" to match link.exe?

I agree, LLD should simply do a stable sort on the input section name before splitting on the '$'. I doubt anyone relies on this, but it's reasonable behavior and if users care they can name their sections .zzzdbg or something to move them to the end of the output section.

Comment 18 by h...@chromium.org, May 24 2018

Status: Fixed (was: Started)
I looked at the ordering stuff today.

We're already processing the sections in alphabetical order in createSections(), (except that "standard" sections like .text, .bss, etc. are created first so they will be emitted first). So if there's a .zzzdbg, it will come last (modulo .rsrc and .reloc).

Instead, it seems the difference is how .ASAN gets merged into .data (which happens because of a /MERGE flag that comes from compiler-rt/lib/asan/asan_globals_win.cc:25:#pragma comment). It seems that link.exe merges it to the beginning of .data, but lld puts it at the end. Putting it at the end makes sense, and is less work since it's appending to an array, so I don't think we want to change that.

One idea might be to do the "merging" earlier, when chunks are put into sections in the first place. That way the .ASAN chunks would go into .data before the regular .data chunks. I'm not sure that's a good idea though, maybe pcc and ruiu can comment.

Marking fixed since I don't think there's more to do here.

Sign in to add a comment