New issue
Advanced search Search tips

Issue 922654 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 922384

Blocking:
issue 921497



Sign in to add a comment

webkit_unit_tests JsToCppTest.BackPointer and others failing on Clang ToTWin bot

Project Member Reported by rnk@google.com, Jan 16 (6 days ago)

Issue description

First failing build / previous green build:
https://ci.chromium.org/buildbot/chromium.clang/ToTWin/2830
https://ci.chromium.org/buildbot/chromium.clang/ToTWin/2829

Many other tests also fail at that build transition point. Here is the list for searchability:

webui_polymer2_browser_tests:
BluetoothInternalsTest.Startup_BluetoothInternals

browser_tests / network_service_browser_tests:
SecureOriginWhitelistBrowsertest/SecureOriginWhitelistBrowsertest.SecurityIndicators/0
SecureOriginWhitelistBrowsertest/SecureOriginWhitelistBrowsertest.SecurityIndicators/3
SecurityStateTabHelperTest.DefaultSecurityLevelOnBlobUrl/1
SecurityStateTabHelperTest.DefaultSecurityLevelOnBlobUrl/0
BluetoothInternalsTest.Startup_BluetoothInternals
SecurityStateTabHelperTest.DefaultSecurityLevelOnFilesystemUrl/1
SecurityStateTabHelperTest.DefaultSecurityLevelOnFilesystemUrl/0

webkit_unit_tests:
JsToCppTest.BackPointer
JsToCppTest.BitFlip
JsToCppTest.Ping
JsToCppTest.Echo

There are some bluetooth related changes in that build that could be the culprit, maybe just for the bluetooth internals test.

I tried to reproduce the failures locally with pinned clang *and* ToT clang built locally, but I could not reproduce the test webkit_unit_tests failures with *either* compiler.
 

Comment 1 by rnk@google.com, Jan 16 (6 days ago)

The fact that ToTWin64 is not affected suggests that this is a 32-bit only bug, perhaps a regression in the compiler. If so, the LLVM revision window would be:
good: r349919
bad:  r349964

Comment 2 by rnk@google.com, Jan 16 (6 days ago)

> I tried to reproduce the failures locally with pinned clang *and* ToT clang built locally, but I could not reproduce the test webkit_unit_tests failures with *either* compiler.

Disregard this comment, I forgot to set target_cpu = "x86" during my attempt to reproduce. I'll try that next.

The pinned 32-bit official bot is currently green. It runs browser_tests, but not webkit_unit_tests.

Comment 3 by rnk@google.com, Jan 16 (6 days ago)

I built webkit_unit_tests with the current pinned version of clang, and these four JsToCppTest tests time out, as on the bot.

The LLVM revision range includes this suspicious commit from me: r349945

    [BasicAA] Fix AA bug on dynamic allocas and stackrestore

    Summary:
    BasicAA has special logic for unescaped allocas, which normally applies
    equally well to dynamic and static allocas. However, llvm.stackrestore
    has the power to end the lifetime of dynamic allocas, without referring
    to them directly.

    stackrestore is already marked with the most conservative memory
    modification attributes, but because the alloca is not escaped, the
    normal logic produces incorrect results. I think BasicAA needs a special
    case here to teach it about the relationship between dynamic allocas and
    stackrestore.

    Fixes PR40118

I'll keep investigating.

Comment 4 by rnk@google.com, Jan 16 (6 days ago)

I reverted the last clang roll locally, ran the hooks to get the previous compiler (pre r349945), rebuilt with goma, and the timeout problem went away. I still don't know which file triggers the bug. I put `#pragma clang optimize off` in a few related files, but I didn't find the one that mattered.

I'm going to try making a build with good and bad object files and mixing and matching objs and .a files to narrow down where the miscompile is.

Comment 5 by rnk@google.com, Jan 16 (6 days ago)

Clang is miscompiling this object:
obj/third_party/blink/renderer/core/mojo/mojo/mojo_handle.obj

If I link webkit_unit_tests with all object files built from clang r349417 except this one built from clang r350768, the test hangs. I'll drill in on that object next.

Comment 6 by rnk@google.com, Jan 16 (6 days ago)

I bisected the functions in the file with #pragma clang optimize off/on, and if I add `__attribute__((optnone))` to blink::MojoHandle::readMessage, the bug goes away, and removing it brings it back.

Comment 7 by rnk@google.com, Jan 17 (6 days ago)

The problem goes away if I disable debug info. That means there is a second bug, which is that the code LLVM generates depends on the presence of debug info.

Comment 8 by rnk@google.com, Jan 17 (6 days ago)

> The problem goes away if I disable debug info. That means there is a second bug, which is that the code LLVM generates depends on the presence of debug info.

Er, now I can't reproduce that. It doesn't seem to matter whether debug info is enabled or disabled, either way the test fails.

Most of the code differences with and without r349945 center around calls to blink::MojoHandle::Create, which uses inalloca, and gets inlined, in this loop here:
  HeapVector<Member<MojoHandle>> handles(num_handles);
  for (uint32_t i = 0; i < num_handles; ++i) {
    handles[i] = MojoHandle::Create(
        mojo::MakeScopedHandle(mojo::Handle(raw_handles[i])));
  }
  result_dict->setHandles(handles);
  result_dict->setResult(result);
https://cs.chromium.org/chromium/src/third_party/blink/renderer/core/mojo/mojo_handle.cc?type=cs&q=MojoHandle::readMessage&sq=package:chromium&g=0&l=123

In this code, we do two nested inalloca calls, which should work fine, but something is wrong with the code that I don't see yet...

Comment 9 by thakis@google.com, Jan 17 (5 days ago)

Blocking: 921497

Comment 10 by r...@chromium.org, Jan 17 (5 days ago)

In the morning with sufficient coffee, the bug stands out in the assembly and LLVM IR. I think what's going on here is that previously, we used to optimize away the problematic stacksave/alloca/stackrestore pattern, but now that alias analysis treats it more conservatively, it doesn't get optimized away, and we run into a bug somewhere else in LLVM.

I think it's easiest to understand the problem looking at this asm:

  000002D9: 89 E0              mov         eax,esp                       # stacksave
  000002DB: 89 66 04           mov         dword ptr [esi+4],esp
  000002DE: 50                 push        eax                           # really alloc 4 bytes of stack, push eax is just for size
  000002DF: 8B 4E 28           mov         ecx,dword ptr [esi+28h]
  000002E2: 89 E7              mov         edi,esp                       # get address of 4 byte alloca in edi
  000002E4: 8B 0C 99           mov         ecx,dword ptr [ecx+ebx*4]     # load raw_handles[i]
  000002E7: 89 0F              mov         dword ptr [edi],ecx           # store to alloca
  000002E9: 89 C4              mov         esp,eax                       # stackrestore
  000002EB: 6A 00              push        0
  000002ED: 6A 0C              push        0Ch
  000002EF: E8 00 00 00 00     call        ??$Allocate@VScriptWrappable@blink@@@ThreadHeap@blink@@SAPAEI_N@Z
  000002F4: 83 C4 08           add         esp,8
  000002F7: 8B 50 FC           mov         edx,dword ptr [eax-4]
  000002FA: 8B 0F              mov         ecx,dword ptr [edi]   # RELOAD FROM MEMORY CLOBBERED BY STACKRESTORE!
  000002FC: C7 07 00 00 00 00  mov         dword ptr [edi],0

Some pass moves the load from [edi] down after the stackrestore and subsequent call that uses the stack memory. The "PUSH 0" instruction overwrites the value from raw_handles[i], and things go wrong from there.

The same problem can be seen in the IR, but less clearly, so I'm looking for a bug in an optimization pass.

Comment 11 by r...@chromium.org, Jan 17 (5 days ago)

The bug is that instcombine "sinks" the alloca across a stack save, and it's lifetime gets ended too early. Here's the IR before the bug:

  %inalloca.save45 = call i8* @llvm.stacksave()
  %argmem51 = alloca inalloca <{ %"class.mojo::ScopedHandleBase" }>, align 4

  %inalloca.save46 = call i8* @llvm.stacksave()
...
  ; store to alloca inside unrelated save/restore pair
  %12 = getelementptr inbounds <{ %"class.mojo::ScopedHandleBase" }>, <{ %"class.mojo::ScopedHandleBase" }>* %argmem51, i32 0, i32 0, i32 0, i32 0
  store i32 %tmp52, i32* %12, align 4
...
  call void @llvm.stackrestore(i8* %inalloca.save46)
...
  ; use again outside save restore
  %tmp3.i.i.i.i.i = load i32, i32* %12, align 4, !noalias !36
  store i32 0, i32* %12, align 4, !noalias !36
...
  call void @llvm.stackrestore(i8* %inalloca.save45)

Instcombine produces these logs:

IC: Visiting:   %inalloca.save45 = call i8* @llvm.stacksave()
IC: Sink:   %argmem51 = alloca inalloca <{ %"class.mojo::ScopedHandleBase" }>, align 4
IC: Visiting:   %argmem51 = alloca inalloca <{ %"class.mojo::ScopedHandleBase" }>, align 4

And afterwards we have this IR:

  %inalloca.save45 = call i8* @llvm.stacksave()
  %inalloca.save46 = call i8* @llvm.stacksave()
...
  %argmem51 = alloca inalloca <{ %"class.mojo::ScopedHandleBase" }>, align 4
  %tmp.i.i.i46 = load i32*, i32** %buffer_.i.i.i45, align 4
  %arrayidx.i.i47 = getelementptr inbounds i32, i32* %tmp.i.i.i46, i32 %i.086
  %tmp52 = load i32, i32* %arrayidx.i.i47, align 4
  %12 = getelementptr inbounds <{ %"class.mojo::ScopedHandleBase" }>, <{ %"class.mojo::ScopedHandleBase" }>* %argmem51, i32 0, i32 0, i32 0, i32 0
  store i32 %tmp52, i32* %12, align 4
  call void @llvm.stackrestore(i8* %inalloca.save46)

So, now the alloca is destroyed by the inner stackrestore, and it gets used afterwards.

Comment 12 by r...@chromium.org, Jan 17 (5 days ago)

A reduction showing the diff instcombine produces:

$ llvm-dis sink-inalloca.bc

$ opt -instcombine sink-inalloca.ll -S -o t2.ll

$ diff -U999 sink-inalloca.ll t2.ll
--- sink-inalloca.ll    2019-01-17 11:00:26.186621700 -0800
+++ t2.ll       2019-01-17 11:00:33.390299200 -0800
@@ -1,36 +1,36 @@
-; ModuleID = 'sink-inalloca.bc'
+; ModuleID = 'sink-inalloca.ll'
 source_filename = "sink-inalloca.ll"

 declare i32* @identity(i32*)

 declare i1 @cond()

 ; Function Attrs: nounwind
 declare i8* @llvm.stacksave() #0

 ; Function Attrs: nounwind
 declare void @llvm.stackrestore(i8*) #0

 define void @foo(i32 %x) {
 entry:
   %c1 = call i1 @cond()
   br i1 %c1, label %ret, label %nonentry

 nonentry:                                         ; preds = %entry
-  %argmem = alloca inalloca i32
   %sp = call i8* @llvm.stacksave()
   %c2 = call i1 @cond()
   br i1 %c2, label %ret, label %sinktarget

 sinktarget:                                       ; preds = %nonentry
-  %p = call i32* @identity(i32* %argmem)
-  store i32 13, i32* %p
+  %argmem = alloca inalloca i32, align 4
+  %p = call i32* @identity(i32* nonnull %argmem)
+  store i32 13, i32* %p, align 4
   call void @llvm.stackrestore(i8* %sp)
   %0 = call i32* @identity(i32* %p)
   br label %ret

 ret:                                              ; preds = %sinktarget, %nonentry, %entry
   ret void
 }

 attributes #0 = { nounwind }

Comment 13 by r...@chromium.org, Jan 17 (5 days ago)

The patch to make instcombine less aggressive is small:
https://reviews.llvm.org/D56872

Comment 14 by r...@chromium.org, Jan 17 (5 days ago)

Blocking: -922384
Fix is upstream in r351475, tomorrow we'll see if the bot goes green.

Comment 15 by h...@chromium.org, Jan 18 (4 days ago)

Blockedon: 922384
Excellent work, rnk!

It's all green: https://ci.chromium.org/buildbot/chromium.clang/ToTWin/3043

Marking this blocked on the roll.

Sign in to add a comment