Clang has poor optimized debug info for v8::internal::SlotSet::Iterate |
|||||||||||
Issue descriptionWinDbg shows nonsense values for variables. For example in minidump [1] the value of bit_offset must be within 0..31, but WinDbg shows 398920 (see screenshot) [1]: https://crash.corp.google.com/browse?q=ReportID%3D%270be7e04448000000%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=&index=0#2 thakis@, hans@, could this be related to windows clang rollout?
,
Aug 9 2017
No idea when it stared. I stumbled upon this while investigating a recent crash spike. Here is a dremel query that finds similar crashes: product.name='Chrome' AND custom_data.ChromeCrashProto.channel='canary' AND custom_data.ChromeCrashProto.ptype='renderer' AND custom_data.ChromeCrashProto.magic_signature_1.name LIKE 'v8::internal::SlotSet::Iterate%' If you give me version numbers, I can check if the corresponding minidumps have this issue. Btw, is it possible to tell from a minidump whether Chrome was built with clang?
,
Aug 9 2017
,
Aug 9 2017
I don't think there will be any obvious markers in the minidump as to whether the binary was built with clang or not. Chrome knows which compiler was used, though. You can check it at chrome://version. Is debugging with windbg completely broken, or is this just one particular issue with the quality of clang's optimized debug info? The other variables in that screenshot look correct.
,
Aug 9 2017
Issue 753934 has been merged into this issue.
,
Aug 9 2017
Re comment 2: If you have access to the binary, you can check how add, adc, and, xor, or, sbb, sub, cmp, mov with two register operands are encoded: If the two lowest bits of the first byte of that instruction are set, then that file was produced by cl. If just the lowest bit is set, it was clang-cl. (from https://plus.google.com/u/0/+NicoWeberCorp/posts/hZaCZv9ErMM)
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e23053b4cb927c57f95a92adae1f6fd6f1959166 commit e23053b4cb927c57f95a92adae1f6fd6f1959166 Author: Nico Weber <thakis@chromium.org> Date: Wed Aug 16 18:15:10 2017 clang: Locally apply https://reviews.llvm.org/D36596 to see what it does to minidumps. This changes how we emit debug info. We now emit debug info for locals in optimized build more often (fewer "optimized away" messages), but in return the displayed data for a variable might sometimes be stale. The motivation is to collect data on which tradeoff feels better. Bug: 753736 Change-Id: I009f444a7b030b18751bcef31188e7db4b9a8584 Reviewed-on: https://chromium-review.googlesource.com/616267 Reviewed-by: Reid Kleckner <rnk@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#494864} [add] https://crrev.com/e23053b4cb927c57f95a92adae1f6fd6f1959166/tools/clang/scripts/InstructionCombining.cpp [modify] https://crrev.com/e23053b4cb927c57f95a92adae1f6fd6f1959166/tools/clang/scripts/update.py
,
Aug 17 2017
Today's canary (62.0.3188.0) was built by the tweaked clang. Here's a minidump built with it: minidump with change: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20special_protos.asan_report.is_actionable%3D0%20AND%20product.Version%3D%2762.0.3188.0%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AFontCache%3A%3ACrashWithFontInfo%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest=&stbtiq=&reportid=f3d78c6ee93491cb&index=0 For reference, here's one from 62.0.3187.0: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20special_protos.asan_report.is_actionable%3D0%20AND%20product.Version%3D%2762.0.3187.0%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%3D%27blink%3A%3AFontCache%3A%3ACrashWithFontInfo%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&unnest=&stbtiq=&reportid=a15d573290c71409&index=0 I've loaded both into windbg, ran `kM`, then clicked the frame above (well, below) WTF::debug::Alias -- FontCache::CrashWithFontInfo, and then ran `dv` to look at locals. The output used to be: 0:000> dv font_description = 0x0000019f`be940e88 16.000000px {...}, {...}, {...} font_cache = <value unavailable> font_mgr = <value unavailable> num_families = <value unavailable> font_description_copy = <value unavailable> Now it is: 0:000> dv font_description = 0x03d087e4 16.000000px {...}, {...}, {...} font_cache = 0x01fa5df8 font_mgr = 0x00e0fae8 num_families = 0n0 font_description_copy = 16.000000px {...}, {...}, {...} Should we dupe this into issue 753934 (or the other way round)? We unduped because we wanted the other bug to be about base::debug::Alias(), but it sounds like base::debug::Alias() isn't as special as we originally thought.
,
Aug 23 2017
,
Aug 24 2017
I think this bug is specifically referring to base::debug::Alias - in which case this is fixed? issue 756153 might be the more generic issue where some locals (not aliased) are not visible? If I am correct, please go ahead and close this one.
,
Aug 24 2017
It isn't fixed permanently, the patch we rebuilt clang with (https://reviews.llvm.org/D36596) doesn't look like it's going in upstream. I consider this bug to be about base::debug::Alias and things like it. Alias isn't special, it just takes the address of a local. There are tons of functions across Chrome that take arguments by reference, and IMO our debug info should be able to describe all of them. They're in memory, so this is the easy case, and we can't really ship clang until we handle that. This is the upstream issue tracking the long-term plan for these types of variables: https://llvm.org/pr34136
,
Aug 25 2017
In addition to fixing the compiler we need to have tests that verify the fix using cdb or similar - we should keep the bug open until that is done.
,
Sep 13 2017
,
Sep 13 2017
Zach is working on cdb tests. Zach, do you have a bug to track that?
,
Sep 13 2017
No, I don't. Should I create one here or on the LLVM side?
,
Sep 13 2017
Here is better I think (or both if you want one on the LLVM side too). Then we can track it as a blocker of crbug.com/82385 .
,
Sep 13 2017
I was able to land the temporary fix (https://reviews.llvm.org/D36596) under a flag. We can add '-mllvm -instcombine-lower-dbg-declare' to our clang cflags in gn to get better debug info. This should fix base::debug::Alias and greatly improve variable availability on Windows *and* non-Windows, but it is known to introduce inaccuracies in some cases. We'll need another clang roll to pick it up. Thankfully, Hans has been keeping the ToT waterfall green this week...
,
Sep 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/32750c1251012624a87fb97098bd0f6a60ffa8ed commit 32750c1251012624a87fb97098bd0f6a60ffa8ed Author: Hans Wennborg <hans@chromium.org> Date: Fri Sep 15 23:10:14 2017 Clang: enable -instcombine-lower-dbg-declare=1 This enables better optimized debug info for address-taken values at the cost of potentially inaccurate debug info in some situations. We believe turning this on is the right trade-off for developers debugging Chromium on all platforms, but especially on Windows where values that are available in MSVC's debug info would otherwise show as unavailable when using Clang. The goal (crbug.com/765793) is to make Clang's optimized debug info good enough in all cases that this compromise is not necessary. Note: This requires Clang r313108 or later. Bug: 761633 , 753736 , 765793 Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d Reviewed-on: https://chromium-review.googlesource.com/669314 Commit-Queue: Hans Wennborg <hans@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#502428} [modify] https://crrev.com/32750c1251012624a87fb97098bd0f6a60ffa8ed/build/config/compiler/BUILD.gn
,
Sep 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5487d51f9175a69613e7b715be437ece3c1d88e5 commit 5487d51f9175a69613e7b715be437ece3c1d88e5 Author: Xiaoqian Dai <xdai@chromium.org> Date: Sat Sep 16 03:54:06 2017 Revert "Clang: enable -instcombine-lower-dbg-declare=1" This reverts commit 32750c1251012624a87fb97098bd0f6a60ffa8ed. Reason for revert: This CL breaks most of the informational builds https://uberchromegw.corp.google.com/i/chromeos.chrome/waterfall?reload=300. See an example here: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/veyron_minnie-tot-chrome-pfq-informational/builds/6039 Original change's description: > Clang: enable -instcombine-lower-dbg-declare=1 > > This enables better optimized debug info for address-taken values > at the cost of potentially inaccurate debug info in some situations. > > We believe turning this on is the right trade-off for developers > debugging Chromium on all platforms, but especially on Windows where > values that are available in MSVC's debug info would otherwise show > as unavailable when using Clang. > > The goal (crbug.com/765793) is to make Clang's optimized > debug info good enough in all cases that this compromise is not > necessary. > > Note: This requires Clang r313108 or later. > > Bug: 761633 , 753736 , 765793 > Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d > Reviewed-on: https://chromium-review.googlesource.com/669314 > Commit-Queue: Hans Wennborg <hans@chromium.org> > Reviewed-by: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#502428} TBR=thakis@chromium.org,hans@chromium.org,rnk@chromium.org Change-Id: I4fea2ab8ec9571d4afb66cd3912c5975e0b9395c No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 761633 , 753736 , 765793 Reviewed-on: https://chromium-review.googlesource.com/669825 Reviewed-by: Xiaoqian Dai <xdai@chromium.org> Commit-Queue: Xiaoqian Dai <xdai@chromium.org> Cr-Commit-Position: refs/heads/master@{#502484} [modify] https://crrev.com/5487d51f9175a69613e7b715be437ece3c1d88e5/build/config/compiler/BUILD.gn
,
Sep 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/074fa51f85b15ad2e2d6c804a03825d48dbeb063 commit 074fa51f85b15ad2e2d6c804a03825d48dbeb063 Author: Hans Wennborg <hans@chromium.org> Date: Mon Sep 18 19:43:09 2017 Reland "Clang: enable -instcombine-lower-dbg-declare=1" Trying harder to exclude CrOS build configs that don't use Chromium's Clang. Note to CrOS folks: If this breaks build configs that are not tested upstream, please ping me and/or llozano before reverting. This is a reland of 32750c1251012624a87fb97098bd0f6a60ffa8ed Original change's description: > Clang: enable -instcombine-lower-dbg-declare=1 > > This enables better optimized debug info for address-taken values > at the cost of potentially inaccurate debug info in some situations. > > We believe turning this on is the right trade-off for developers > debugging Chromium on all platforms, but especially on Windows where > values that are available in MSVC's debug info would otherwise show > as unavailable when using Clang. > > The goal (crbug.com/765793) is to make Clang's optimized > debug info good enough in all cases that this compromise is not > necessary. > > Note: This requires Clang r313108 or later. > > Bug: 761633 , 753736 , 765793 > Change-Id: Ia0ed38c499fac282a3c8ab27c5fd5571fdfab84d > Reviewed-on: https://chromium-review.googlesource.com/669314 > Commit-Queue: Hans Wennborg <hans@chromium.org> > Reviewed-by: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#502428} TBR=thakis Bug: 761633 , 753736 , 765793 Change-Id: I2c3670873f1bd1d75cf1c41191b170983f41e0d0 Reviewed-on: https://chromium-review.googlesource.com/671523 Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: Hans Wennborg <hans@chromium.org> Cr-Commit-Position: refs/heads/master@{#502646} [modify] https://crrev.com/074fa51f85b15ad2e2d6c804a03825d48dbeb063/build/config/compiler/BUILD.gn
,
Sep 21 2017
> Here is better I think (or both if you want one on the LLVM side too). Then we can track it as a blocker of crbug.com/82385 . Zach, any progress on this? The instcombine flag enabling (#20) is in Chrome 63.0.3219.0, and there are a few crash reports: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%20LIKE%20%27v8%3A%3Ainternal%3A%3ASlotSet%3A%3AIterate%25%27%20AND%20STRPOS(product.Version%2C%20%2763.0.3219%27)%20%3E%200&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D 63.0.3219.2 was built with Clang. Reid, maybe we can look at this with your windbg skills tomorrow?
,
Sep 21 2017
,
Sep 22 2017
I have a patch to improve bit_offset: https://reviews.llvm.org/D38197 Basically, the debug information for bit_offset was getting emitted at the end of the block, long after the register holding the value was clobbered (ECX got reused). The DBG_VALUE was becoming unglued from the instruction that defined the value. Still, that shouldn't cause us to emit inaccurate debug information, so there is a secondary issue here that needs further work.
,
Oct 2 2017
I landed the fix last Monday at LLVM r314114. We should roll clang and confirm that the original issue is fixed.
,
Oct 2 2017
,
Oct 16 2017
The roll landed on Friday and shipped in the Canary today as 64.0.3241.4. The crash reports are trickling in, but so far there's not enough to work with.
,
Oct 19 2017
ulan: We don't have any crashes exactly at this stack with the current canary, but here are all crashes we have so far: https://crash.corp.google.com/browse?q=product.Version%3D%2764.0.3244.2%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports There are a few v8 crashes in there. Can you click through them (using the arrows in the upper right) and check if things look better now?
,
Oct 20 2017
> Can you click through them (using the arrows in the upper right) and check if things look better now? I can verify only for the reported stack since it has the variable bit_offset with the well defined range. We have to wait until such crashes start appearing with the new version. https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20product.version%3D%2764.0.3244.2%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20stable_signature%20LIKE%20%27v8%3A%3Ainternal%3A%3ASlotSet%3A%3AIterate%25%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=&index=0
,
Oct 20 2017
Since this crash seems low-volume, waffles increased today's win/clang canary to 50% users, so the crash should have a higher chance of showing up at https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20product.version%3D%2764.0.3245.2%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20stable_signature%20LIKE%20%27v8%3A%3Ainternal%3A%3ASlotSet%3A%3AIterate%25%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=&index=0
,
Oct 23 2017
Got one: https://crash.corp.google.com/browse?q=product.name%3D%27Chrome%27%20AND%20product.version%3D%2764.0.3245.2%27%20AND%20custom_data.ChromeCrashProto.channel%3D%27canary%27%20AND%20stable_signature%20LIKE%20%27v8%3A%3Ainternal%3A%3ASlotSet%3A%3AIterate%25%27%20AND%20custom_data.ChromeCrashProto.ptype%3D%27renderer%27&sql_dialect=googlesql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D&stbtiq=&reportid=ee6fb4cdfcca7534&index=0 Attaching a screenshot. Unfortunately, the bit_offset value is not available, presumably because the compiler has folded it into the computations below. The locals look good as far as I can tell. rnk, do you think there's anything more to be done here?
,
Oct 23 2017
Thanks for checking, hans@.
I compared report db8eea23804e19b3 (generated with cl) with report ee6fb4cdfcca7534 (generated with clang) that have similar stack traces.
The cl minidump shows 9 values and the clang minidump shows only 4 values. This is a big regression for crash investigation. Is this caused by clang optimizing more or is it a bug?
db8eea23804e19b3:
0:023> dv
this = 0x000001d8`0a821e88
callback = class v8::internal::RememberedSetUpdatingItem<v8::internal::MajorNonAtomicMarkingState>::UpdateUntypedPointers::__l8::<lambda_c430258cc105b3a5704b10e9418e6bab>
new_count = 0n0
bucket_index = 0n11
cell_offset = 0n11328
in_bucket_count = 0n0
cell = 0x20000000
new_cell = <value unavailable>
mask = 0x6000
slot = <value unavailable>
bit_mask = 0x20000000
bit_offset = <value unavailable>
ee6fb4cdfcca7534:
0:004> dv
this = 0x000000bb`6fb77e38
mode = PREFREE_EMPTY_BUCKETS (0n1)
bucket_index = <value unavailable>
new_count = <value unavailable>
i = <value unavailable>
in_bucket_count = <value unavailable>
cell = <value unavailable>
cell_offset = 0n43136
bit_offset = <value unavailable>
bit_mask = 0x2000000
slot = <value unavailable>
mask = <value unavailable>
new_cell = <value unavailable>
,
Oct 23 2017
We’ll need to look at this specific instance to see if there’s anything we can do, but distancing ourselves from this specific side by side comparison, there’s a theoretical limit to how good is actually possible at all. Even with the best optimized debug info, which variables are available depends on how the code was optimized. It’s going to be impossible to get everything. You can imagine, for example, a case of 2 optimizers where one is so good it proves mathematically that a function returns 0 for every possible input. Nothing you can do will get more variables into the debugger here, and it’s not because this particular compiler outputs worse optimized debug info, but rather because it outputs /better/ code. We do know there are some ways for us to improve, but I just want to manage expectations that we will probably never have the same set of variables present
,
Nov 23 2017
,
Nov 27 2017
Re-titling, leaving open to cover coming back and looking to see if we can widen the DBG_VALUE ranges here. When I stared at the code when fixing the bit_offset accuracy bug, my recollection is that fixing this will be very difficult. The variables in question are small, easily folded integer computations, and the variables have very short register-only lifetimes. For example, bit_offset *is* available for a few instructions, but is not normally available across a function call. I think the situation may improve, but it's probably going to come from analyzing problems in small test cases like the backtrace.io CDQS test suite.
,
Mar 1 2018
Should we just WontFix this? Debug info is generally reasonable now (though we should get rid of that -instcombine-lower-dbg-declare flag one day), and there are always going to be specific cases where things get optimized away.
,
Mar 2 2018
Sure. I thought maybe one day I'd come back and analyze this some more, but bad debug info test cases are a dime a dozen. It's easy to find more and there's lots more work to do. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by thakis@chromium.org
, Aug 9 2017