New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 20
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 817298

Blocking:
issue 636111



Sign in to add a comment

Set Next Statement not working correctly with clang

Project Member Reported by jam@chromium.org, Dec 11 Back to list

Issue description

I've recently noticed that "Set Next Statement" context menu in Visual Studio isn't working well with clang. Often after I move it there'll be a crash in an implicit constructor call.

As a simple repro, run debug browser_tests.exe --single_process  --gtest_filter=CaptivePortalBrowserTest.Login

and then in this code:
"SetUpCaptivePortalService(browser()->profile(),
                            GURL(kMockCaptivePortalTestUrl));"
add a breakpoint. When it gets hit, you can move the statement somewhere else (like repeat the EXPECT_EQ call) and then move it again to the SetUpCaptivePortalService call. It'll crash in the std::string constructor.

This is interfering with my debugging, so marking as P1.
 
Blocking: 636111
Cc: inglorion@chromium.org zturner@chromium.org amccarth@chromium.org
TL;DR - if clang adjusts what instruction its line records point to be two instructions then this will work.

The relevant source code is here:

  // Double-check that the captive portal service isn't enabled by default for
  // browser tests.
  EXPECT_EQ(CaptivePortalService::DISABLED_FOR_TESTING,
            CaptivePortalService::get_state_for_testing());

  CaptivePortalService::set_state_for_testing(
      CaptivePortalService::SKIP_OS_CHECK_FOR_TESTING);
  EnableCaptivePortalDetection(browser()->profile(), true);

  // Set the captive portal service to use URLRequestMockCaptivePortalJob's
  // mock URL, by default.
  SetUpCaptivePortalService(browser()->profile(),
                            GURL(kMockCaptivePortalTestUrl));


The machine code around the SetUpCaptivePortalService call is here:
0285668C E8 3F 02 00 00       call        CaptivePortalBrowserTest::EnableCaptivePortalDetection (028568D0h)  
02856691 8D 4E 5C             lea         ecx,[esi+5Ch]  
02856694 8D 05 5B FC 5C 00    lea         eax,[string "http://mock.captive.portal.test/"... (05CFC5Bh)]  
0285669A 83 EC 04             sub         esp,4  
0285669D 89 04 24             mov         dword ptr [esp],eax  
028566A0 E8 7B 72 61 FE       call        base::BasicStringPiece<...>::BasicStringPiece (0E6D920h)  
028566A5 8D 8E 94 00 00 00    lea         ecx,[esi+94h]  
028566AB 8B 56 5C             mov         edx,dword ptr [esi+5Ch]  
028566AE 8B 7E 60             mov         edi,dword ptr [esi+60h]  
028566B1 83 EC 08             sub         esp,8  
028566B4 89 14 24             mov         dword ptr [esp],edx  
028566B7 89 7C 24 04          mov         dword ptr [esp+4],edi  
028566BB 89 46 10             mov         dword ptr [esi+10h],eax  
028566BE FF 15 A8 28 DF 0A    call        dword ptr [__imp_GURL::GURL (0ADF28A8h)]  
028566C4 8B 4E 40             mov         ecx,dword ptr [esi+40h]  
028566C7 89 46 0C             mov         dword ptr [esi+0Ch],eax  
028566CA E8 91 73 61 FE       call        InProcessBrowserTest::browser (0E6DA60h)  
028566CF 89 C1                mov         ecx,eax  
028566D1 E8 AA 73 61 FE       call        Browser::profile (0E6DA80h)  
028566D6 8D 8E 94 00 00 00    lea         ecx,[esi+94h]  
028566DC 83 EC 08             sub         esp,8  
028566DF 8B 56 40             mov         edx,dword ptr [esi+40h]  
028566E2 89 4E 08             mov         dword ptr [esi+8],ecx  
028566E5 89 D1                mov         ecx,edx  
028566E7 89 04 24             mov         dword ptr [esp],eax  
028566EA 8B 46 08             mov         eax,dword ptr [esi+8]  
028566ED 89 44 24 04          mov         dword ptr [esp+4],eax  
028566F1 E8 6A 02 00 00       call        CaptivePortalBrowserTest::SetUpCaptivePortalService (02856960h)  

Crucially, if you set a breakpoint on the call to SetUpCaptivePortalService or if you Set Next Statement to that line of code then you get dropped onto 0285669A. That means that the stack gets adjusted properly (which is good) but it means that ECX and EAX are not set up. Those are set up starting on 02856691 and if you Set Next Statement from EXPECT_EQ to SetUpCaptivePortalService then they have the wrong values. They are supposed to be:

EAX = 005CFC5B ECX = 0019D89C

but they are actually:

EAX = 00000000 ECX = 0019D8B4

EAX and ECX are supposed to hold the address of the string used for the constructor's input and the address of the object. These seem like important things.

The bad value of EAX means that the string will be treated as a NULL string instead of "http://mock.captive.portal.test/" and the incorrect value for ECX turns into a stack tromp - stack corruption - which can only end in tears.

It looks like the problem would be avoided if the debug information for that line was offset by two instructions.

Debugging techniques like this make me nervous but it is awesome if they work in VS and we should try to support them as well as VS does.

My gn args are below. The top four are probably important (fastlink was causing me problems so I used lld) and the bottom six are less so, but will help with making the builds fast.

use_lld = true
is_debug = true
is_component_build = true
target_cpu = "x86"

enable_nacl = false
remove_webcore_debug_symbols=true
use_jumbo_build = true
goma_dir = "C:\src\goma\goma-win64"
use_goma = true
symbol_level = 2

You should never do this in real life, but it’s actually possible to hex
edit your PDB with the correct value to confirm this fixes it. I don’t
actually expect you to do this, but I’m just putting it here in case you’re
like me and like doing things you’re not supposed to do.

The trick is to use llvm-pdbutil with the bytes subcommand which has some
options for dumping line tables (among other things) but showing you the
offset in the pdb where those bytes are. Since you know the cpp file and
the address of the current instruction it breaks on, you just have to
filter by that object file (use `llvm-pdbutil dump -modules | grep foo.obj`
to find the index) and then search for the address in the binary dump.

That said, this is probably already enough info to investigate. Thanks!
Ping: any update on an eta for a fix? This is interfering with my debugging.

Thanks.
rnk is out until next Tue. zturner, do you know if anyone in lexan land looked at this or is in a position to look this week? Or should we wait for rnk's return?
Nobody has looked at this yet. I can try to look, but I’m also trying to
figure out incremental pdb linking, which is also pretty important (and
probably something nobody else can do). inglorion@ could also maybe look at
this
I filed https://bugs.llvm.org/show_bug.cgi?id=35975 as an llvm-side tracking bug.
I started work on the LLVM bug and the details are at https://bugs.llvm.org/show_bug.cgi?id=35975.

The high-level summary is that we may be able to fix this in most cases, but I suspect there will always be corner cases where set next statement fails because LLVM -O0 codegen doesn't have any protections to prevent it from using registers across statement boundaries.

Fixing the source location for this particular code pattern is doable, but it will take a bit of effort and it's not clear if it is top priority. I aim to come back to this next week.
Hello, any update? I'm hitting this often, thank you.
No, last week got eaten by a sinus infection and two ski trips on each end. This week there's a CFI summit that I need to be at. I'll try to shop this around the team.
This has been open for 2 months and IMO is a major regression from switch to clang. I hope this can be fixed asap.
I'm not sure "set next statement" is something clang can reliably support. There is nothing ensuring that registers are not allocated across statement boundaries at -O0.

Even fixing this particular issue basically requires rewriting a large portion of the -O0 fastisel code generation strategy. The "local value area" concept is pretty deeply embedded in fastisel. I prototyped something yesterday that passes most tests, but I discovered some design flaws in it that might sink it. Hard to say yet.
I'm not familiar with the clang internals, but given that it's functionality that we had and is very useful when debugging, I would like us to do whatever it takes to undo the the regression.
I came up with a patch last Thursday that improves the situation: https://reviews.llvm.org/D43093

However, I expect it to take longer than normal to get into Chromium. The patch is small, but it makes invasive changes to fastisel.

I had to do a lot of iteration to get it to pass the test suite, and I suspect more issues will arise when we start testing it on larger codebases.

I also haven't measured if it has any compile time impact yet. The purpose of fastisel is to be fast, so if there are significant regressions, I think I'll have to try another approach.
Blockedon: 817298
This got committed in LLVM r327581. Hopefully we can get it in the next clang roll, but there is a possibility that it will be reverted.
Status: Fixed
The clang roll landed, so this should work now. Please reopen if it doesn't.

Sign in to add a comment