New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Mar 2
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 764744
issue 769761

Blocking:
issue 636111



Sign in to add a comment

Clang has poor optimized debug info for v8::internal::SlotSet::Iterate

Project Member Reported by u...@chromium.org, Aug 9 2017 Back to list

Issue description

WinDbg shows nonsense values for variables.

For example in minidump [1] the value of bit_offset must be within 0..31, but WinDbg shows 398920 (see screenshot)

[1]: https://crash.corp.google.com/browse?q=ReportID%3D%270be7e04448000000%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=&index=0#2


thakis@, hans@, could this be related to windows clang rollout?
 
windbg.PNG
246 KB View Download
Cc: r...@chromium.org
I don't suppose you checked which version this started at?

Comment 2 by u...@chromium.org, Aug 9 2017

No idea when it stared. I stumbled upon this while investigating a recent crash spike.

Here is a dremel query that finds similar crashes:
product.name='Chrome'  AND custom_data.ChromeCrashProto.channel='canary' AND custom_data.ChromeCrashProto.ptype='renderer' AND custom_data.ChromeCrashProto.magic_signature_1.name LIKE 'v8::internal::SlotSet::Iterate%'

If you give me version numbers, I can check if the corresponding minidumps have this issue. Btw, is it possible to tell from a minidump whether Chrome was built with clang?
Blocking: 636111

Comment 4 by r...@chromium.org, Aug 9 2017

I don't think there will be any obvious markers in the minidump as to whether the binary was built with clang or not. Chrome knows which compiler was used, though. You can check it at chrome://version.

Is debugging with windbg completely broken, or is this just one particular issue with the quality of clang's optimized debug info? The other variables in that screenshot look correct.
Issue 753934 has been merged into this issue.
Re comment 2: If you have access to the binary, you can check how add, adc, and, xor, or, sbb, sub, cmp, mov with two register operands are encoded: If the two lowest bits of the first byte of that instruction are set, then that file was produced by cl. If just the lowest bit is set, it was clang-cl.

(from https://plus.google.com/u/0/+NicoWeberCorp/posts/hZaCZv9ErMM)
Project Member

Comment 7 by bugdroid1@chromium.org, Aug 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e23053b4cb927c57f95a92adae1f6fd6f1959166

commit e23053b4cb927c57f95a92adae1f6fd6f1959166
Author: Nico Weber <thakis@chromium.org>
Date: Wed Aug 16 18:15:10 2017

clang: Locally apply https://reviews.llvm.org/D36596 to see what it does to minidumps.

This changes how we emit debug info. We now emit debug info for locals in optimized
build more often (fewer "optimized away" messages), but in return the displayed data
for a variable might sometimes be stale.  The motivation is to collect data on which
tradeoff feels better.

Bug:  753736 
Change-Id: I009f444a7b030b18751bcef31188e7db4b9a8584
Reviewed-on: https://chromium-review.googlesource.com/616267
Reviewed-by: Reid Kleckner <rnk@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#494864}
[add] https://crrev.com/e23053b4cb927c57f95a92adae1f6fd6f1959166/tools/clang/scripts/InstructionCombining.cpp
[modify] https://crrev.com/e23053b4cb927c57f95a92adae1f6fd6f1959166/tools/clang/scripts/update.py

Comment 8 by thakis@chromium.org, Aug 17 2017

Today's canary (62.0.3188.0) was built by the tweaked clang. Here's a minidump built with it: minidump with change: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20special_protos.asan_report.is_actionable%3D0%20AND%20product.Version%3D%2762.0.3188.0%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AFontCache%3A%3ACrashWithFontInfo%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest=&stbtiq=&reportid=f3d78c6ee93491cb&index=0

For reference, here's one from 62.0.3187.0: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20special_protos.asan_report.is_actionable%3D0%20AND%20product.Version%3D%2762.0.3187.0%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AFontCache%3A%3ACrashWithFontInfo%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest=&stbtiq=&reportid=a15d573290c71409&index=0

I've loaded both into windbg, ran `kM`, then clicked the frame above (well, below) WTF::debug::Alias -- FontCache::CrashWithFontInfo, and then ran `dv` to look at locals. The output used to be:

0:000> dv
     font_description = 0x0000019f`be940e88 16.000000px {...}, {...}, {...}
           font_cache = <value unavailable>
             font_mgr = <value unavailable>
         num_families = <value unavailable>
font_description_copy = <value unavailable>


Now it is:

0:000> dv
     font_description = 0x03d087e4 16.000000px {...}, {...}, {...}
           font_cache = 0x01fa5df8
             font_mgr = 0x00e0fae8
         num_families = 0n0
font_description_copy = 16.000000px {...}, {...}, {...}



Should we dupe this into issue 753934 (or the other way round)? We unduped because we wanted the other bug to be about base::debug::Alias(), but it sounds like base::debug::Alias() isn't as special as we originally thought.

Comment 9 by r...@chromium.org, Aug 23 2017

Blocking: 709690

Comment 10 by wfh@google.com, Aug 24 2017

Cc: wfh@chromium.org
I think this bug is specifically referring to base::debug::Alias - in which case this is fixed?  issue 756153  might be the more generic issue where some locals (not aliased) are not visible? If I am correct, please go ahead and close this one.

Comment 11 by r...@chromium.org, Aug 24 2017

Owner: r...@chromium.org
It isn't fixed permanently, the patch we rebuilt clang with (https://reviews.llvm.org/D36596) doesn't look like it's going in upstream.

I consider this bug to be about base::debug::Alias and things like it. Alias isn't special, it just takes the address of a local. There are tons of functions across Chrome that take arguments by reference, and IMO our debug info should be able to describe all of them. They're in memory, so this is the easy case, and we can't really ship clang until we handle that.

This is the upstream issue tracking the long-term plan for these types of variables: https://llvm.org/pr34136
In addition to fixing the compiler we need to have tests that verify the fix using cdb or similar - we should keep the bug open until that is done.

Comment 13 by h...@chromium.org, Sep 13 2017

Blockedon: 764744

Comment 14 by h...@chromium.org, Sep 13 2017

Cc: zturner@chromium.org
Zach is working on cdb tests. Zach, do you have a bug to track that?
No, I don't.  Should I create one here or on the LLVM side?

Comment 16 by h...@chromium.org, Sep 13 2017

Here is better I think (or both if you want one on the LLVM side too). Then we can track it as a blocker of  crbug.com/82385 .

Comment 17 by r...@chromium.org, Sep 13 2017

I was able to land the temporary fix (https://reviews.llvm.org/D36596) under a flag. We can add '-mllvm -instcombine-lower-dbg-declare' to our clang cflags in gn to get better debug info. This should fix base::debug::Alias and greatly improve variable availability on Windows *and* non-Windows, but it is known to introduce inaccuracies in some cases. We'll need another clang roll to pick it up. Thankfully, Hans has been keeping the ToT waterfall green this week...
Project Member

Comment 18 by bugdroid1@chromium.org, Sep 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/32750c1251012624a87fb97098bd0f6a60ffa8ed

commit 32750c1251012624a87fb97098bd0f6a60ffa8ed
Author: Hans Wennborg <hans@chromium.org>
Date: Fri Sep 15 23:10:14 2017

Clang: enable -instcombine-lower-dbg-declare=1

This enables better optimized debug info for address-taken values
at the cost of potentially inaccurate debug info in some situations.

We believe turning this on is the right trade-off for developers
debugging Chromium on all platforms, but especially on Windows where
values that are available in MSVC's debug info would otherwise show
as unavailable when using Clang.

The goal (crbug.com/765793) is to make Clang's optimized
debug info good enough in all cases that this compromise is not
necessary.

Note: This requires Clang r313108 or later.

Bug:  761633 ,  753736 , 765793
Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d
Reviewed-on: https://chromium-review.googlesource.com/669314
Commit-Queue: Hans Wennborg <hans@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502428}
[modify] https://crrev.com/32750c1251012624a87fb97098bd0f6a60ffa8ed/build/config/compiler/BUILD.gn

Project Member

Comment 19 by bugdroid1@chromium.org, Sep 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5487d51f9175a69613e7b715be437ece3c1d88e5

commit 5487d51f9175a69613e7b715be437ece3c1d88e5
Author: Xiaoqian Dai <xdai@chromium.org>
Date: Sat Sep 16 03:54:06 2017

Revert "Clang: enable -instcombine-lower-dbg-declare=1"

This reverts commit 32750c1251012624a87fb97098bd0f6a60ffa8ed.

Reason for revert: This CL breaks most of the informational builds https://uberchromegw.corp.google.com/i/chromeos.chrome/waterfall?reload=300. See an example here: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/veyron_minnie-tot-chrome-pfq-informational/builds/6039

Original change's description:
> Clang: enable -instcombine-lower-dbg-declare=1
> 
> This enables better optimized debug info for address-taken values
> at the cost of potentially inaccurate debug info in some situations.
> 
> We believe turning this on is the right trade-off for developers
> debugging Chromium on all platforms, but especially on Windows where
> values that are available in MSVC's debug info would otherwise show
> as unavailable when using Clang.
> 
> The goal (crbug.com/765793) is to make Clang's optimized
> debug info good enough in all cases that this compromise is not
> necessary.
> 
> Note: This requires Clang r313108 or later.
> 
> Bug:  761633 ,  753736 , 765793
> Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d
> Reviewed-on: https://chromium-review.googlesource.com/669314
> Commit-Queue: Hans Wennborg <hans@chromium.org>
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#502428}

TBR=thakis@chromium.org,hans@chromium.org,rnk@chromium.org

Change-Id: I4fea2ab8ec9571d4afb66cd3912c5975e0b9395c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  761633 ,  753736 , 765793
Reviewed-on: https://chromium-review.googlesource.com/669825
Reviewed-by: Xiaoqian Dai <xdai@chromium.org>
Commit-Queue: Xiaoqian Dai <xdai@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502484}
[modify] https://crrev.com/5487d51f9175a69613e7b715be437ece3c1d88e5/build/config/compiler/BUILD.gn

Project Member

Comment 20 by bugdroid1@chromium.org, Sep 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/074fa51f85b15ad2e2d6c804a03825d48dbeb063

commit 074fa51f85b15ad2e2d6c804a03825d48dbeb063
Author: Hans Wennborg <hans@chromium.org>
Date: Mon Sep 18 19:43:09 2017

Reland "Clang: enable -instcombine-lower-dbg-declare=1"

Trying harder to exclude CrOS build configs that don't use
Chromium's Clang.

Note to CrOS folks: If this breaks build configs that are not
tested upstream, please ping me and/or llozano before reverting.

This is a reland of 32750c1251012624a87fb97098bd0f6a60ffa8ed
Original change's description:
> Clang: enable -instcombine-lower-dbg-declare=1
>
> This enables better optimized debug info for address-taken values
> at the cost of potentially inaccurate debug info in some situations.
>
> We believe turning this on is the right trade-off for developers
> debugging Chromium on all platforms, but especially on Windows where
> values that are available in MSVC's debug info would otherwise show
> as unavailable when using Clang.
>
> The goal (crbug.com/765793) is to make Clang's optimized
> debug info good enough in all cases that this compromise is not
> necessary.
>
> Note: This requires Clang r313108 or later.
>
> Bug:  761633 ,  753736 , 765793
> Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d
> Reviewed-on: https://chromium-review.googlesource.com/669314
> Commit-Queue: Hans Wennborg <hans@chromium.org>
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#502428}

TBR=thakis

Bug:  761633 ,  753736 , 765793
Change-Id: I2c3670873f1bd1d75cf1c41191b170983f41e0d0
Reviewed-on: https://chromium-review.googlesource.com/671523
Reviewed-by: Hans Wennborg <hans@chromium.org>
Commit-Queue: Hans Wennborg <hans@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502646}
[modify] https://crrev.com/074fa51f85b15ad2e2d6c804a03825d48dbeb063/build/config/compiler/BUILD.gn

Comment 21 by h...@chromium.org, Sep 21 2017

Status: Started
> Here is better I think (or both if you want one on the LLVM side too). Then we can track it as a blocker of  crbug.com/82385 .
Zach, any progress on this?



The instcombine flag enabling (#20) is in Chrome 63.0.3219.0, and there are a few crash reports:

https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%20%27v8%3A%3Ainternal%3A%3ASlotSet%3A%3AIterate%25%27%20AND%20STRPOS(product.Version%2C%20%2763.0.3219%27)%20%3E%200&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D

63.0.3219.2 was built with Clang.

Reid, maybe we can look at this with your windbg skills tomorrow?

Comment 23 by r...@chromium.org, Sep 22 2017

I have a patch to improve bit_offset:
https://reviews.llvm.org/D38197

Basically, the debug information for bit_offset was getting emitted at the end of the block, long after the register holding the value was clobbered (ECX got reused).

The DBG_VALUE was becoming unglued from the instruction that defined the value. Still, that shouldn't cause us to emit inaccurate debug information, so there is a secondary issue here that needs further work.
I landed the fix last Monday at LLVM r314114. We should roll clang and confirm that the original issue is fixed.
Blockedon: 769761
 crbug.com/769761  is the next roll bug.
The roll landed on Friday and shipped in the Canary today as 64.0.3241.4. The crash reports are trickling in, but so far there's not enough to work with.
ulan: We don't have any crashes exactly at this stack with the current canary, but here are all crashes we have so far: https://crash.corp.google.com/browse?q=product.Version%3D%2764.0.3244.2%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports

There are a few v8 crashes in there. Can you click through them (using the arrows in the upper right) and check if things look better now?
Thanks for checking, hans@. 

I compared report db8eea23804e19b3 (generated with cl) with report ee6fb4cdfcca7534 (generated with clang) that have similar stack traces.

The cl minidump shows 9 values and the clang minidump shows only 4 values. This is a big regression for crash investigation. Is this caused by clang optimizing more or is it a bug? 

db8eea23804e19b3:
0:023> dv
           this = 0x000001d8`0a821e88
       callback = class v8::internal::RememberedSetUpdatingItem<v8::internal::MajorNonAtomicMarkingState>::UpdateUntypedPointers::__l8::<lambda_c430258cc105b3a5704b10e9418e6bab>
      new_count = 0n0
   bucket_index = 0n11
    cell_offset = 0n11328
in_bucket_count = 0n0
           cell = 0x20000000
       new_cell = <value unavailable>
           mask = 0x6000
           slot = <value unavailable>
       bit_mask = 0x20000000
     bit_offset = <value unavailable>

ee6fb4cdfcca7534:
0:004> dv
           this = 0x000000bb`6fb77e38
           mode = PREFREE_EMPTY_BUCKETS (0n1)
   bucket_index = <value unavailable>
      new_count = <value unavailable>
              i = <value unavailable>
in_bucket_count = <value unavailable>
           cell = <value unavailable>
    cell_offset = 0n43136
     bit_offset = <value unavailable>
       bit_mask = 0x2000000
           slot = <value unavailable>
           mask = <value unavailable>
       new_cell = <value unavailable>

We’ll need to look at this specific instance to see if there’s anything we
can do, but distancing ourselves from this specific side by side
comparison, there’s a theoretical limit to how good is actually possible at
all.

Even with the best optimized debug info, which variables are available
depends on how the code was optimized. It’s going to be impossible to get
everything. You can imagine, for example, a case of 2 optimizers where one
is so good it proves mathematically that a function returns 0 for every
possible input.  Nothing you can do will get more variables into the
debugger here, and it’s not because this particular compiler outputs worse
optimized debug info, but rather because it outputs /better/ code.

We do know there are some ways for us to improve, but I just want to manage
expectations that we will probably never have the same set of variables
present
Blocking: -709690
Labels: -Pri-1 Pri-3
Owner: ----
Status: Available
Summary: Clang has poor optimized debug info for v8::internal::SlotSet::Iterate (was: Minidump debugging in WinDbg is broken in recent canaries.)
Re-titling, leaving open to cover coming back and looking to see if we can widen the DBG_VALUE ranges here.

When I stared at the code when fixing the bit_offset accuracy bug, my recollection is that fixing this will be very difficult. The variables in question are small, easily folded integer computations, and the variables have very short register-only lifetimes. For example, bit_offset *is* available for a few instructions, but is not normally available across a function call. I think the situation may improve, but it's probably going to come from analyzing problems in small test cases like the backtrace.io CDQS test suite.
Should we just WontFix this? Debug info is generally reasonable now (though we should get rid of that -instcombine-lower-dbg-declare flag one day), and there are always going to be specific cases where things get optimized away.
Status: WontFix
Sure. I thought maybe one day I'd come back and analyze this some more, but bad debug info test cases are a dime a dozen. It's easy to find more and there's lots more work to do.

Sign in to add a comment