New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment
link

Issue 812421: protoc.exe crashing with error code -1073740791 (0xC0000409)

Reported by brucedaw...@chromium.org, Feb 14 2018 Project Member

Issue description

TL;DR - looks like an incremental linking bug.

There have been several recent reports of protoc.exe crashing with error code -1073740791. Translated to hexadecimal this is 0xC0000409 which is STATUS_STACK_BUFFER_OVERRUN.

The similarity of all ten digit negative numbers around negative one billion means that this error looks similar to the -1073741819 (0xC0000005) crashes which were the symptoms of  crbug.com/644525 , but this is unrelated.

One difference is the different failure code. The other difference is that the 0xC0000409 failures are due to a binary that is corrupt on disk. It can be run repeatedly and will keep showing the same failure, whereas with  crbug.com/644525  the bad bytes were in the disk cache and the bytes on disk were valid.

Loading one of the bad binaries under VS shows that STATUS_STACK_BUFFER_OVERRUN is misleading. Debugging shows that RtlFailFast2 is called from ntdll.dll!RtlpHandleInvalidUserCallTarget() and the bug is not an overflow at all. It's a CFG (Control Flow Guard) check last seen in  crbug.com/766236 . In that case the failure was deterministic and was fixed by crrev.com/c/678063. In this case the failure is random.

Comparing the results of "dumpbin /loadconfig" between good and bad binaries didn't show any obvious differences, but differences in the tables of functions could easily be the problem.

The randomness of the failures suggests a linker bug, perhaps an incremental linking bug. Maybe the reason the failures suddenly started happening is because I recently re-enabled incremental linking for protoc.exe and others.

crrev.com/c/917101 will help to avoid this confusion in the future by changing these error codes to print as hexadecimal when they are large negative numbers.

I'll see about landing a change to disable incremental linking again for protoc.exe, and perhaps other binaries. It will be very similar to the last time I disabled incremental linking but with a different comment.

Another possible fix would be to disable CFG on builds where incremental linking is enabled.


Has anybody seen this on non-component builds? That is, has anybody seen this when incremental linking is not in use?
 

Comment 1 by chengx@chromium.org, Feb 14 2018

I was using component build.

Comment 2 by brucedaw...@chromium.org, Feb 15 2018

This should only happen when /guard and incremental linking are both in use, which means only in release component builds. nektar@ was using such a build and chengx@ may have been. If there are any counter examples please let me know.

Comment 3 by brucedaw...@chromium.org, Feb 15 2018

The other reporter thinks it quite likely that they were also doing a release component build. So, I am going to fix this bug by disabling /guard:cf (CFG) on component builds (crrev.com/c/920761).

I also filed a bug report:

https://developercommunity.visualstudio.com/content/problem/198903/guardcf-and-incremental-leads-to-random-failures.html

The recent resurgence of these bugs is because of restoring incremental linking in crrev.com/c/894448, which apparently was working around a different bug from its intended target.

Comment 4 by brat...@opera.com, Feb 15 2018

Cc: tmonius...@opera.com

Comment 5 by bugdroid1@chromium.org, Feb 15 2018

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dd23fadffd3a979017c07fda50828edf9c5583ae

commit dd23fadffd3a979017c07fda50828edf9c5583ae
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Thu Feb 15 15:48:51 2018

´╗┐Disable CFG (/guard:cf) for component builds

There appears to be a bug in Microsoft's linker when using /guard:cf
with incremental linking. The table of functions occasionally gets
corrupted or not update which leads to a CFG violation when the OS
thinks that an invalid indirect branch is being taken. The stack shows:

    ntdll.dll!RtlFailFast2()
    ntdll.dll!RtlpHandleInvalidUserCallTarget()

The error code returned is 0xC0000409 which is
STATUS_STACK_BUFFER_OVERRUN which is quite non-obvious and confusing.

A bug in the linker with incremental linking and CFG seems quite
plausible, and that combination is quite worthless, so the fix is to not
use CFG in component builds.

Note that future occurrences of this bug, if any, will show an error
code of 0xC0000409 where they used to show -1073741819. This is due to
a separate change that alters how we print these error codes.

Bug:  812421 
Change-Id: I8042d4363ea93084ca56e0634124799183c4153c
Reviewed-on: https://chromium-review.googlesource.com/920761
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>
Cr-Commit-Position: refs/heads/master@{#537027}
[modify] https://crrev.com/dd23fadffd3a979017c07fda50828edf9c5583ae/build/config/win/BUILD.gn

Comment 6 by rtoy@chromium.org, Feb 15 2018

For the record, since you asked, Bruce, here is my gn config where the issue showed up:

is_debug = false
is_component_build = true
use_goma = true
goma_dir = "c:\\src\\goma\\goma-win64"
proprietary_codecs = true
ffmpeg_branding = "Chrome"

This was reproducing consistently even after gn clean out/Release.  But after I removed out/Release and rebuilt, the issue went away.

Comment 7 by brucedaw...@chromium.org, Feb 16 2018

Thanks, all of the evidence is consistent with this being a bug that shows up in release component builds. I *think* that my change should resolve this, but please let me know if it does not. If I don't hear any objections I'll wait a few days and then close this as fixed.

Comment 8 by thakis@chromium.org, Feb 20 2018

Do you think this might be due to a clang-cl bug somehow? The cfg and incremental flags are both link.exe flags, so it doesn't feel super likely, but the thought occurred to me.

Comment 9 by brucedaw...@chromium.org, Feb 20 2018

Status: Fixed (was: Started)
It's possible that clang-cl is creating an object file that is unexpected by the linker. My brief attempts at reproing this incremental linking issue with cl.exe failed but I didn't retry with clang-cl.

I think the repro "should" be as simple as making an executable with some indirect calls (virtual function calls?), linking with /incremental and /cfg, and then modifying one of the target functions and relinking. But that didn't trigger anything and I didn't feel motivated to investigate why not.

Incremental and /cfg together is not an important combination so I'm okay with hiding the bug. I'm going to close this as fixed and we can reopen if more occurrences show up.

Note that these failures will now print as 0xC0000409 instead of -1073741819.

Comment 10 by chengx@chromium.org, Feb 20 2018

Woot! Thanks for fixing this.

Sign in to add a comment