New issue
Advanced search Search tips

Issue 703251 link

Starred by 5 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug-Regression



Sign in to add a comment

Windows: proto_zero_plugin failed with status code 3221226505 (STATUS_STACK_BUFFER_OVERRUN)

Project Member Reported by pdr@chromium.org, Mar 20 2017

Issue description

I'm getting the following error trying to build at Tip-of-tree (258b8078a7855588f6ee9ac0b18c6b1ded9bc58c / git position 458111):
[10/9664] ACTION //components/tracing/proto:protos_gen(//build/toolchain/win:x64)
FAILED: gen/components/tracing/proto/event.pbzero.h gen/components/tracing/proto/event.pbzero.cc gen/components/tracing/proto/events_chunk.pbzero.h gen/
components/tracing/proto/events_chunk.pbzero.cc
C:/python_27_amd64/files/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py event.proto events_chunk.proto --protoc ./protoc.exe --proto-in-dir ../
../components/tracing/proto --plugin proto_zero_plugin.exe --plugin-out-dir gen/components/tracing/proto --plugin-options wrapper_namespace=pbzero
--plugin_out: protoc-gen-plugin: Plugin failed with status code 3221226505.
Protoc has returned non-zero status: 1 .
[309/9664] CXX obj/media/audio/audio/audio_input_controller.obj
ninja: build stopped: subcommand failed.

Windows build error superstar Torne noticed 3221226505 = 0xc0000409 = STATUS_STACK_BUFFER_OVERRUN.
 

Comment 1 by pdr@chromium.org, Mar 20 2017

I found this is a regression and am bisecting. I currently know it's somewhere between 258b8078a7855588f6ee9ac0b18c6b1ded9bc58c~200 and checkout 258b8078a7855588f6ee9ac0b18c6b1ded9bc58c~50

Comment 2 by pdr@chromium.org, Mar 20 2017

Labels: -Pri-3 Pri-1
Owner: thomasanderson@chromium.org
Status: Assigned (was: Untriaged)
I am pretty sure this is:
77bec4dfd9e172a99543b9bfe9daf2580721bfdd
Statically link libprotobuf_lite on Linux component builds
Labels: -Type-Bug OS-Windows Type-Bug-Regression
Status: Started (was: Assigned)

Comment 4 by pdr@chromium.org, Mar 20 2017

Here's the exact repro steps:

git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd
gclient sync
ninja -j100 -C out/Release webkit_unit_tests
ninja: Entering directory `out/Release'
[214/745] ACTION //components/tracing/proto:protos_gen(//build/toolchain/win:x64)
FAILED: gen/components/tracing/proto/event.pbzero.h gen/components/tracing/proto/event.pbzero.cc gen/components/tracing/proto/events_chunk.pbzero.h gen/
components/tracing/proto/events_chunk.pbzero.cc
C:/python_27_amd64/files/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py event.proto events_chunk.proto --protoc ./protoc.exe --proto-in-dir ../
../components/tracing/proto --plugin proto_zero_plugin.exe --plugin-out-dir gen/components/tracing/proto --plugin-options wrapper_namespace=pbzero
--plugin_out: protoc-gen-plugin: Plugin failed with status code 3221226505.
Protoc has returned non-zero status: 1 .
[313/745] CXX obj/content/browser/service_worker/service_worker_proto/service_worker_database.pb.obj
ninja: build stopped: subcommand failed.

My gn args for out/Release are:
enable_nacl = false
is_component_build = true
is_debug = false
use_goma = true

I'm unable to repro on Win10 on ToT (8fe439b94) with the same ars.gn except use_goma=false.

Can you try with a fresh out dir on ToT?

Comment 6 by zmin@chromium.org, Mar 20 2017

I have the same issue when building Chrome on Win10:

ninja -C out\chrome chrome

I also have use_goma=true.
I'm unable to repro with use_goma=true either

Comment 8 by pdr@chromium.org, Mar 20 2017

I just did a clean build (blew away out/Release, etc) and was not able to reproduce. It sounds like this change should have invalidated something but failed to?

To reproduce:
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[success]
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[failure]

Comment 9 by pdr@chromium.org, Mar 20 2017

Cc: ericrk@chromium.org
ericrk just tried a build at tot and hit the same issue. He's now trying with "-t clean" to see if that fixes it, instead of deleting out/Release.
Cc: dpranke@chromium.org pkasting@chromium.org
Labels: -Pri-1 Pri-2
Lowering to P2 based on c#8

+pkasting for protobuf
+dpranke for build

BUILD.gn lists all of the files that were changed/added, so maybe a different component is missing a dep on protoc?
Cc: brucedaw...@chromium.org scottmg@chromium.org thakis@chromium.org
+some people fyi ...
I believe this is the exception triggered when /GS detects that the stack canary has been tromped. That is, it either means a stray write or an array overrun:

https://blogs.technet.microsoft.com/srd/2009/01/28/stack-overflow-stack-exhaustion-not-the-same-as-stack-buffer-overflow/

So, a serious bug. If we can get crash dumps for whatever process is failing then we should be able to analyze them, as long as the symbols have not been nuked.

Crash dumps from randomly crashing processes are not necessarily saved by default, but this can and should be enabled:

https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx

Attaching a debugger would also work, if we can reproduce the crash by running the problematic tool manually.

#12 The executable that's failing is protoc.exe, and is run with the following args (from within out/Release)

..\..\tools\protoc_wrapper\protoc_wrapper.py event.proto events_chunk.proto --protoc .\protoc.exe --proto-in-dir ..\..\components\tracing\proto --plugin proto_zero_plugin.exe --plugin-out-dir gen\components\tracing\proto --plugin-options wrapper_namespace=pbzero
Update: the problem is that proto_zero_plugin.exe is not getting regenerated because it is not a dependency of some things that use it:
https://cs.chromium.org/chromium/src/components/tracing/proto/BUILD.gn?rcl=08f54ecc76964badb626b2426b6f01f46cac7717&l=20
https://cs.chromium.org/chromium/src/components/tracing/BUILD.gn?rcl=08f54ecc76964badb626b2426b6f01f46cac7717&l=82

An immediate workaround is "del out\Release\proto_zero_plugin.exe" and rebuild.
Hm.. proto_zero_plugin IS added to deps here https://cs.chromium.org/chromium/src/third_party/protobuf/proto_library.gni?rcl=a4258cad75d8cc63170555cc74bd584b5bfca965&l=308
But it does not get built before the action() runs.  dpranke@ any idea why?

FYI if I build the proto_zero_plugin target, proto_zero_plugin.exe does get relinked.
I tried to dig in to why the old executable would crash. It crashes at startup - before even getting to main - which seems excessively cruel.

The crash occurs when initterm, which is marching through an array of function pointers, most of which are zero, finds that one of them is invalid. It decides that one of them (the third one?) is invalid so instead of calling it the initterm() function calls ntdll!RtlFailFast2 which executes int 29h which is then interpreted as buffer overrun in some contexts and other failure types in other contexts.

Here's the failure call stack:

00 ucrtbase!initterm
01 proto_zero_plugin_exe!__scrt_common_main_seh
02 KERNEL32!BaseThreadInitThunk
03 ntdll!RtlUserThreadStart

The latest ucrtbase.dll is missing symbols which complicates the analysis, but ultimately isn't critical. So, I think there are two mysteries:

1) Why is proto_zero_plugin.exe not getting regenerated in time?
2) Why is the old proto_zero_plugin.exe crashing immediately - it should be a valid binary that can run adequately, shouldn't it?

Just an update:  dpranke@ and I debugged this just now and we have reason to believe this is a bug in Microsoft's linker.  If we disable incremental linking (patch below) for proto_zero_plugin.exe, the bug goes away :P



diff --git a/components/tracing/tools/proto_zero_plugin/BUILD.gn b/components/tracing/tools/proto_zero_plugin/BUILD.gn
index cdf0eb1..418f4e1 100644
--- a/components/tracing/tools/proto_zero_plugin/BUILD.gn
+++ b/components/tracing/tools/proto_zero_plugin/BUILD.gn
@@ -12,5 +12,9 @@ if (current_toolchain == host_toolchain) {
     deps = [
       "//third_party/protobuf:protoc_lib",
     ]
+    if (is_win) {
+      configs -= [ "//build/config/win:default_incremental_linking" ]
+      configs += [ "//build/config/win:no_incremental_linking" ]
+    }
   }
 }
I saw something like that a while back (coalescing of functions by the linker into a table) that was caused by a bug in incremental linking:

https://bugs.chromium.org/p/chromium/issues/detail?id=251251#c4

In that case it was TLS destructors in ".CRT$XLn", but it sort of smells the same.
Hey, look at that. :(

FWIW, https://bugs.chromium.org/p/chromium/issues/detail?id=251251#c18 ... despite https://connect.microsoft.com/VisualStudio/feedback/details/967992/x64-incremental-linking-does-not-coalesce-sections-properly which implied it should have been fixed in 2015. So maybe it'll be fixed in 2017?
And the exact same bug appears with VS 2017.

I'll file a bug for this incremental linking bug in VS 2017.
I've already asked for ucrtbase.dll symbols to be supplied.
We should probably turn off incremental linking for this target? Or does it matter, if this is just a one-off?

VS 2017 build steps.

    Use these environment variables:
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
set GYP_MSVS_VERSION=2017

    Use this args.gn:
enable_nacl = false
is_component_build = true
is_debug = false
use_goma = true
treat_warnings_as_errors = false

    and the build steps from an earlier comment:
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[success] - you should probably archive proto_zero_plugin.exe at this point, and then set up for an incremental link of proto_zero_plugin.exe
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[failure]

Once you have reached this state then running proto_zero_plugin.exe is enough to see the crash. No arguments are required.

#20, Yeah, it couldn't hurt.  I CC'ed you on the CL
As I said on the CL, on deeper thought I think there's probably no value in disabling incremental linking for this target. It is almost certainly the transition from dynamic to static linking that triggered the bug, and that is a one-off that is not expected to happen again for this target.

Also, I filed this bug:

https://developercommunity.visualstudio.com/content/problem/34236/incremental-linking-creates-corrupt-executable.html

I'd say we should close this bug as ExternalDependency and move on. Good work on the investigation.

Project Member

Comment 23 by bugdroid1@chromium.org, Mar 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b5b19afd8b383045e4d9060eaf1be60da229745b

commit b5b19afd8b383045e4d9060eaf1be60da229745b
Author: thomasanderson <thomasanderson@google.com>
Date: Wed Mar 22 18:10:44 2017

Disable incremental linking for proto_zero_plugin

This CL works around a linker bug that is seen when incrementally
linking proto_zero_plugin.exe after crrev.com/2756543002.

The linker bug also occurs in VS 2017 RTM so a new bug report to
Microsoft was made:
https://developercommunity.visualstudio.com/content/problem/34236/incremental-linking-creates-corrupt-executable.html

BUG= 703251 
R=primiano@chromium.org

Review-Url: https://codereview.chromium.org/2763233002
Cr-Commit-Position: refs/heads/master@{#458802}

[modify] https://crrev.com/b5b19afd8b383045e4d9060eaf1be60da229745b/components/tracing/tools/proto_zero_plugin/BUILD.gn

#22 Hopefully we will be reverting the protobuf static linking change in the future, is there a possibility that a similar bug can surface with incremental linking enabled?
Also, I ticked the commit box before I saw your comment, so feel free to revert :)
The change to disable incremental linking is harmless so I'm inclined to leave it.

I checked to see whether the same incremental linking bug could happen when reverting the protobuf static linking by doing a clean link at 77bec4dfd9e172a99543b9bfe9daf2580721bfdd and then an incremental link at 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1. The generated binary seemed fine.

Also, the repro steps are *much* faster if the target is proto_zero_plugin.exe. The bug is then detected by running this binary to see whether it hangs (success!) or crashes (failure!). I put this in the Microsoft bug.

Status: Fixed (was: Started)
The latest update on the Microsoft bug is "It is a problem in incremental linking related to Control Flow Guard (/guard:cf) processing. We will ship a fix in VS 2017 Update 1. Thanks for the bug report."
 Issue 707806  has been merged into this issue.
This error seems to happen in the waterfall too, resulting in tree closure.  Issue 644525  might also be related - it had a dependency fix.  I'm guessing this instance is ~random incremental linking pdb corruption :/. Maybe a dependency issue also. Anyway, just leaving a note here to track the flakes. (tree has struggled to stay open over the last 24h..)

https://build.chromium.org/p/chromium/buildstatus?builder=Win&number=55546

https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium%2FWin%2F55546%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout

[13040/58876] ACTION //chrome/browser/safe_browsing/incident_reporting:state_store_data_proto_gen(//build/toolchain/win:x86)
FAILED: gen/chrome/browser/safe_browsing/incident_reporting/state_store_data.pb.h gen/chrome/browser/safe_browsing/incident_reporting/state_store_data.pb.cc pyproto/chrome/browser/safe_browsing/incident_reporting/state_store_data_pb2.py 
C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py state_store_data.proto --protoc ./protoc.exe --proto-in-dir ../../chrome/browser/safe_browsing/incident_reporting --cc-out-dir gen/chrome/browser/safe_browsing/incident_reporting --py-out-dir pyproto/chrome/browser/safe_browsing/incident_reporting
Protoc has returned non-zero status: -1073741819 .
[13041/58876] STAMP obj/chrome/app/resources/locale_settings_grit.stamp
[13042/58876] ACTION //components/autofill/core/browser/proto:proto_gen(//build/toolchain/win:x86)
FAILED: gen/components/autofill/core/browser/proto/autofill_sync.pb.h gen/components/autofill/core/browser/proto/autofill_sync.pb.cc pyproto/components/autofill/core/browser/proto/autofill_sync_pb2.py gen/components/autofill/core/browser/proto/server.pb.h gen/components/autofill/core/browser/proto/server.pb.cc pyproto/components/autofill/core/browser/proto/server_pb2.py 
C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py autofill_sync.proto server.proto --protoc ./protoc.exe --proto-in-dir ../../components/autofill/core/browser/proto --cc-out-dir gen/components/autofill/core/browser/proto --py-out-dir pyproto/components/autofill/core/browser/proto
Protoc has returned non-zero status: -1073741819 .
[13043/58876] ACTION //chrome/app:google_chrome_strings_grit(//build/toolchain/win:x86)
[13044/58876] STAMP obj/content/resources_grit.stamp
[13045/58876] ACTION //chrome/test:test_proto_gen(//build/toolchain/win:x86)
FAILED: gen/chrome/common/safe_browsing/ipc_protobuf_message_test.pb.h gen/chrome/common/safe_browsing/ipc_protobuf_message_test.pb.cc pyproto/chrome/common/safe_browsing/ipc_protobuf_message_test_pb2.py 
C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py ipc_protobuf_message_test.proto --protoc ./protoc.exe --proto-in-dir ../../chrome/common/safe_browsing --cc-out-dir gen/chrome/common/safe_browsing --py-out-dir pyproto/chrome/common/safe_browsing
Protoc has returned non-zero status: -1073741819 .

Summary: Windows: proto_zero_plugin failed with status code 3221226505 (STATUS_STACK_BUFFER_OVERRUN) (was: Windows: Plugin failed with status code 3221226505 (STATUS_STACK_BUFFER_OVERRUN))
tapted@ I'm going to de-dupe  bug 707806 .  This issue was for linking of proto_zero_plugin, and has been marked fixed.  Although  bug 707806  has the same symptom, the fix in c#23 will not fix that issue.

I think the fix would be to add the following to https://cs.chromium.org/chromium/src/third_party/protobuf/BUILD.gn?rcl=8cb8d28de8f2c439a50307ddc142c533c5b79e2c&l=583

+    if (is_win) {
+      configs -= [ "//build/config/win:default_incremental_linking" ]
+      configs += [ "//build/config/win:no_incremental_linking" ]
+    }

Sign in to add a comment