protoc.exe crashing with error code -1073740791 (0xC0000409) |
|||
Issue descriptionTL;DR - looks like an incremental linking bug. There have been several recent reports of protoc.exe crashing with error code -1073740791. Translated to hexadecimal this is 0xC0000409 which is STATUS_STACK_BUFFER_OVERRUN. The similarity of all ten digit negative numbers around negative one billion means that this error looks similar to the -1073741819 (0xC0000005) crashes which were the symptoms of crbug.com/644525 , but this is unrelated. One difference is the different failure code. The other difference is that the 0xC0000409 failures are due to a binary that is corrupt on disk. It can be run repeatedly and will keep showing the same failure, whereas with crbug.com/644525 the bad bytes were in the disk cache and the bytes on disk were valid. Loading one of the bad binaries under VS shows that STATUS_STACK_BUFFER_OVERRUN is misleading. Debugging shows that RtlFailFast2 is called from ntdll.dll!RtlpHandleInvalidUserCallTarget() and the bug is not an overflow at all. It's a CFG (Control Flow Guard) check last seen in crbug.com/766236 . In that case the failure was deterministic and was fixed by crrev.com/c/678063. In this case the failure is random. Comparing the results of "dumpbin /loadconfig" between good and bad binaries didn't show any obvious differences, but differences in the tables of functions could easily be the problem. The randomness of the failures suggests a linker bug, perhaps an incremental linking bug. Maybe the reason the failures suddenly started happening is because I recently re-enabled incremental linking for protoc.exe and others. crrev.com/c/917101 will help to avoid this confusion in the future by changing these error codes to print as hexadecimal when they are large negative numbers. I'll see about landing a change to disable incremental linking again for protoc.exe, and perhaps other binaries. It will be very similar to the last time I disabled incremental linking but with a different comment. Another possible fix would be to disable CFG on builds where incremental linking is enabled. Has anybody seen this on non-component builds? That is, has anybody seen this when incremental linking is not in use?
,
Feb 15 2018
This should only happen when /guard and incremental linking are both in use, which means only in release component builds. nektar@ was using such a build and chengx@ may have been. If there are any counter examples please let me know.
,
Feb 15 2018
The other reporter thinks it quite likely that they were also doing a release component build. So, I am going to fix this bug by disabling /guard:cf (CFG) on component builds (crrev.com/c/920761). I also filed a bug report: https://developercommunity.visualstudio.com/content/problem/198903/guardcf-and-incremental-leads-to-random-failures.html The recent resurgence of these bugs is because of restoring incremental linking in crrev.com/c/894448, which apparently was working around a different bug from its intended target.
,
Feb 15 2018
,
Feb 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dd23fadffd3a979017c07fda50828edf9c5583ae commit dd23fadffd3a979017c07fda50828edf9c5583ae Author: Bruce Dawson <brucedawson@chromium.org> Date: Thu Feb 15 15:48:51 2018 Disable CFG (/guard:cf) for component builds There appears to be a bug in Microsoft's linker when using /guard:cf with incremental linking. The table of functions occasionally gets corrupted or not update which leads to a CFG violation when the OS thinks that an invalid indirect branch is being taken. The stack shows: ntdll.dll!RtlFailFast2() ntdll.dll!RtlpHandleInvalidUserCallTarget() The error code returned is 0xC0000409 which is STATUS_STACK_BUFFER_OVERRUN which is quite non-obvious and confusing. A bug in the linker with incremental linking and CFG seems quite plausible, and that combination is quite worthless, so the fix is to not use CFG in component builds. Note that future occurrences of this bug, if any, will show an error code of 0xC0000409 where they used to show -1073741819. This is due to a separate change that alters how we print these error codes. Bug: 812421 Change-Id: I8042d4363ea93084ca56e0634124799183c4153c Reviewed-on: https://chromium-review.googlesource.com/920761 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Bruce Dawson <brucedawson@chromium.org> Cr-Commit-Position: refs/heads/master@{#537027} [modify] https://crrev.com/dd23fadffd3a979017c07fda50828edf9c5583ae/build/config/win/BUILD.gn
,
Feb 15 2018
For the record, since you asked, Bruce, here is my gn config where the issue showed up: is_debug = false is_component_build = true use_goma = true goma_dir = "c:\\src\\goma\\goma-win64" proprietary_codecs = true ffmpeg_branding = "Chrome" This was reproducing consistently even after gn clean out/Release. But after I removed out/Release and rebuilt, the issue went away.
,
Feb 16 2018
Thanks, all of the evidence is consistent with this being a bug that shows up in release component builds. I *think* that my change should resolve this, but please let me know if it does not. If I don't hear any objections I'll wait a few days and then close this as fixed.
,
Feb 20 2018
Do you think this might be due to a clang-cl bug somehow? The cfg and incremental flags are both link.exe flags, so it doesn't feel super likely, but the thought occurred to me.
,
Feb 20 2018
It's possible that clang-cl is creating an object file that is unexpected by the linker. My brief attempts at reproing this incremental linking issue with cl.exe failed but I didn't retry with clang-cl. I think the repro "should" be as simple as making an executable with some indirect calls (virtual function calls?), linking with /incremental and /cfg, and then modifying one of the target functions and relinking. But that didn't trigger anything and I didn't feel motivated to investigate why not. Incremental and /cfg together is not an important combination so I'm okay with hiding the bug. I'm going to close this as fixed and we can reopen if more occurrences show up. Note that these failures will now print as 0xC0000409 instead of -1073741819.
,
Feb 20 2018
Woot! Thanks for fixing this. |
|||
►
Sign in to add a comment |
|||
Comment 1 by chengx@chromium.org
, Feb 14 2018