Issue metadata
Sign in to add a comment
|
Windows: proto_zero_plugin failed with status code 3221226505 (STATUS_STACK_BUFFER_OVERRUN) |
||||||||||||||||||||||
Issue descriptionI'm getting the following error trying to build at Tip-of-tree (258b8078a7855588f6ee9ac0b18c6b1ded9bc58c / git position 458111): [10/9664] ACTION //components/tracing/proto:protos_gen(//build/toolchain/win:x64) FAILED: gen/components/tracing/proto/event.pbzero.h gen/components/tracing/proto/event.pbzero.cc gen/components/tracing/proto/events_chunk.pbzero.h gen/ components/tracing/proto/events_chunk.pbzero.cc C:/python_27_amd64/files/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py event.proto events_chunk.proto --protoc ./protoc.exe --proto-in-dir ../ ../components/tracing/proto --plugin proto_zero_plugin.exe --plugin-out-dir gen/components/tracing/proto --plugin-options wrapper_namespace=pbzero --plugin_out: protoc-gen-plugin: Plugin failed with status code 3221226505. Protoc has returned non-zero status: 1 . [309/9664] CXX obj/media/audio/audio/audio_input_controller.obj ninja: build stopped: subcommand failed. Windows build error superstar Torne noticed 3221226505 = 0xc0000409 = STATUS_STACK_BUFFER_OVERRUN.
,
Mar 20 2017
I am pretty sure this is: 77bec4dfd9e172a99543b9bfe9daf2580721bfdd Statically link libprotobuf_lite on Linux component builds
,
Mar 20 2017
,
Mar 20 2017
Here's the exact repro steps: git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd gclient sync ninja -j100 -C out/Release webkit_unit_tests ninja: Entering directory `out/Release' [214/745] ACTION //components/tracing/proto:protos_gen(//build/toolchain/win:x64) FAILED: gen/components/tracing/proto/event.pbzero.h gen/components/tracing/proto/event.pbzero.cc gen/components/tracing/proto/events_chunk.pbzero.h gen/ components/tracing/proto/events_chunk.pbzero.cc C:/python_27_amd64/files/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py event.proto events_chunk.proto --protoc ./protoc.exe --proto-in-dir ../ ../components/tracing/proto --plugin proto_zero_plugin.exe --plugin-out-dir gen/components/tracing/proto --plugin-options wrapper_namespace=pbzero --plugin_out: protoc-gen-plugin: Plugin failed with status code 3221226505. Protoc has returned non-zero status: 1 . [313/745] CXX obj/content/browser/service_worker/service_worker_proto/service_worker_database.pb.obj ninja: build stopped: subcommand failed. My gn args for out/Release are: enable_nacl = false is_component_build = true is_debug = false use_goma = true
,
Mar 20 2017
I'm unable to repro on Win10 on ToT (8fe439b94) with the same ars.gn except use_goma=false. Can you try with a fresh out dir on ToT?
,
Mar 20 2017
I have the same issue when building Chrome on Win10: ninja -C out\chrome chrome I also have use_goma=true.
,
Mar 20 2017
I'm unable to repro with use_goma=true either
,
Mar 20 2017
I just did a clean build (blew away out/Release, etc) and was not able to reproduce. It sounds like this change should have invalidated something but failed to? To reproduce: git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1 gclient sync ninja -j300 -C out/Release webkit_unit_tests [success] git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd gclient sync ninja -j300 -C out/Release webkit_unit_tests [failure]
,
Mar 20 2017
ericrk just tried a build at tot and hit the same issue. He's now trying with "-t clean" to see if that fixes it, instead of deleting out/Release.
,
Mar 20 2017
Lowering to P2 based on c#8 +pkasting for protobuf +dpranke for build BUILD.gn lists all of the files that were changed/added, so maybe a different component is missing a dep on protoc?
,
Mar 20 2017
+some people fyi ...
,
Mar 20 2017
I believe this is the exception triggered when /GS detects that the stack canary has been tromped. That is, it either means a stray write or an array overrun: https://blogs.technet.microsoft.com/srd/2009/01/28/stack-overflow-stack-exhaustion-not-the-same-as-stack-buffer-overflow/ So, a serious bug. If we can get crash dumps for whatever process is failing then we should be able to analyze them, as long as the symbols have not been nuked. Crash dumps from randomly crashing processes are not necessarily saved by default, but this can and should be enabled: https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx Attaching a debugger would also work, if we can reproduce the crash by running the problematic tool manually.
,
Mar 21 2017
#12 The executable that's failing is protoc.exe, and is run with the following args (from within out/Release) ..\..\tools\protoc_wrapper\protoc_wrapper.py event.proto events_chunk.proto --protoc .\protoc.exe --proto-in-dir ..\..\components\tracing\proto --plugin proto_zero_plugin.exe --plugin-out-dir gen\components\tracing\proto --plugin-options wrapper_namespace=pbzero
,
Mar 21 2017
Update: the problem is that proto_zero_plugin.exe is not getting regenerated because it is not a dependency of some things that use it: https://cs.chromium.org/chromium/src/components/tracing/proto/BUILD.gn?rcl=08f54ecc76964badb626b2426b6f01f46cac7717&l=20 https://cs.chromium.org/chromium/src/components/tracing/BUILD.gn?rcl=08f54ecc76964badb626b2426b6f01f46cac7717&l=82 An immediate workaround is "del out\Release\proto_zero_plugin.exe" and rebuild.
,
Mar 21 2017
Hm.. proto_zero_plugin IS added to deps here https://cs.chromium.org/chromium/src/third_party/protobuf/proto_library.gni?rcl=a4258cad75d8cc63170555cc74bd584b5bfca965&l=308 But it does not get built before the action() runs. dpranke@ any idea why? FYI if I build the proto_zero_plugin target, proto_zero_plugin.exe does get relinked.
,
Mar 22 2017
I tried to dig in to why the old executable would crash. It crashes at startup - before even getting to main - which seems excessively cruel. The crash occurs when initterm, which is marching through an array of function pointers, most of which are zero, finds that one of them is invalid. It decides that one of them (the third one?) is invalid so instead of calling it the initterm() function calls ntdll!RtlFailFast2 which executes int 29h which is then interpreted as buffer overrun in some contexts and other failure types in other contexts. Here's the failure call stack: 00 ucrtbase!initterm 01 proto_zero_plugin_exe!__scrt_common_main_seh 02 KERNEL32!BaseThreadInitThunk 03 ntdll!RtlUserThreadStart The latest ucrtbase.dll is missing symbols which complicates the analysis, but ultimately isn't critical. So, I think there are two mysteries: 1) Why is proto_zero_plugin.exe not getting regenerated in time? 2) Why is the old proto_zero_plugin.exe crashing immediately - it should be a valid binary that can run adequately, shouldn't it?
,
Mar 22 2017
Just an update: dpranke@ and I debugged this just now and we have reason to believe this is a bug in Microsoft's linker. If we disable incremental linking (patch below) for proto_zero_plugin.exe, the bug goes away :P
diff --git a/components/tracing/tools/proto_zero_plugin/BUILD.gn b/components/tracing/tools/proto_zero_plugin/BUILD.gn
index cdf0eb1..418f4e1 100644
--- a/components/tracing/tools/proto_zero_plugin/BUILD.gn
+++ b/components/tracing/tools/proto_zero_plugin/BUILD.gn
@@ -12,5 +12,9 @@ if (current_toolchain == host_toolchain) {
deps = [
"//third_party/protobuf:protoc_lib",
]
+ if (is_win) {
+ configs -= [ "//build/config/win:default_incremental_linking" ]
+ configs += [ "//build/config/win:no_incremental_linking" ]
+ }
}
}
,
Mar 22 2017
I saw something like that a while back (coalescing of functions by the linker into a table) that was caused by a bug in incremental linking: https://bugs.chromium.org/p/chromium/issues/detail?id=251251#c4 In that case it was TLS destructors in ".CRT$XLn", but it sort of smells the same.
,
Mar 22 2017
Hey, look at that. :( FWIW, https://bugs.chromium.org/p/chromium/issues/detail?id=251251#c18 ... despite https://connect.microsoft.com/VisualStudio/feedback/details/967992/x64-incremental-linking-does-not-coalesce-sections-properly which implied it should have been fixed in 2015. So maybe it'll be fixed in 2017?
,
Mar 22 2017
And the exact same bug appears with VS 2017.
I'll file a bug for this incremental linking bug in VS 2017.
I've already asked for ucrtbase.dll symbols to be supplied.
We should probably turn off incremental linking for this target? Or does it matter, if this is just a one-off?
VS 2017 build steps.
Use these environment variables:
set DEPOT_TOOLS_WIN_TOOLCHAIN=0
set GYP_MSVS_VERSION=2017
Use this args.gn:
enable_nacl = false
is_component_build = true
is_debug = false
use_goma = true
treat_warnings_as_errors = false
and the build steps from an earlier comment:
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[success] - you should probably archive proto_zero_plugin.exe at this point, and then set up for an incremental link of proto_zero_plugin.exe
git checkout 77bec4dfd9e172a99543b9bfe9daf2580721bfdd
gclient sync
ninja -j300 -C out/Release webkit_unit_tests
[failure]
Once you have reached this state then running proto_zero_plugin.exe is enough to see the crash. No arguments are required.
,
Mar 22 2017
#20, Yeah, it couldn't hurt. I CC'ed you on the CL
,
Mar 22 2017
As I said on the CL, on deeper thought I think there's probably no value in disabling incremental linking for this target. It is almost certainly the transition from dynamic to static linking that triggered the bug, and that is a one-off that is not expected to happen again for this target. Also, I filed this bug: https://developercommunity.visualstudio.com/content/problem/34236/incremental-linking-creates-corrupt-executable.html I'd say we should close this bug as ExternalDependency and move on. Good work on the investigation.
,
Mar 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b5b19afd8b383045e4d9060eaf1be60da229745b commit b5b19afd8b383045e4d9060eaf1be60da229745b Author: thomasanderson <thomasanderson@google.com> Date: Wed Mar 22 18:10:44 2017 Disable incremental linking for proto_zero_plugin This CL works around a linker bug that is seen when incrementally linking proto_zero_plugin.exe after crrev.com/2756543002. The linker bug also occurs in VS 2017 RTM so a new bug report to Microsoft was made: https://developercommunity.visualstudio.com/content/problem/34236/incremental-linking-creates-corrupt-executable.html BUG= 703251 R=primiano@chromium.org Review-Url: https://codereview.chromium.org/2763233002 Cr-Commit-Position: refs/heads/master@{#458802} [modify] https://crrev.com/b5b19afd8b383045e4d9060eaf1be60da229745b/components/tracing/tools/proto_zero_plugin/BUILD.gn
,
Mar 22 2017
#22 Hopefully we will be reverting the protobuf static linking change in the future, is there a possibility that a similar bug can surface with incremental linking enabled?
,
Mar 22 2017
Also, I ticked the commit box before I saw your comment, so feel free to revert :)
,
Mar 22 2017
The change to disable incremental linking is harmless so I'm inclined to leave it. I checked to see whether the same incremental linking bug could happen when reverting the protobuf static linking by doing a clean link at 77bec4dfd9e172a99543b9bfe9daf2580721bfdd and then an incremental link at 77bec4dfd9e172a99543b9bfe9daf2580721bfdd~1. The generated binary seemed fine. Also, the repro steps are *much* faster if the target is proto_zero_plugin.exe. The bug is then detected by running this binary to see whether it hangs (success!) or crashes (failure!). I put this in the Microsoft bug.
,
Mar 22 2017
,
Mar 30 2017
The latest update on the Microsoft bug is "It is a problem in incremental linking related to Control Flow Guard (/guard:cf) processing. We will ship a fix in VS 2017 Update 1. Thanks for the bug report."
,
May 28 2017
Issue 707806 has been merged into this issue.
,
May 28 2017
This error seems to happen in the waterfall too, resulting in tree closure. Issue 644525 might also be related - it had a dependency fix. I'm guessing this instance is ~random incremental linking pdb corruption :/. Maybe a dependency issue also. Anyway, just leaving a note here to track the flakes. (tree has struggled to stay open over the last 24h..) https://build.chromium.org/p/chromium/buildstatus?builder=Win&number=55546 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium%2FWin%2F55546%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout [13040/58876] ACTION //chrome/browser/safe_browsing/incident_reporting:state_store_data_proto_gen(//build/toolchain/win:x86) FAILED: gen/chrome/browser/safe_browsing/incident_reporting/state_store_data.pb.h gen/chrome/browser/safe_browsing/incident_reporting/state_store_data.pb.cc pyproto/chrome/browser/safe_browsing/incident_reporting/state_store_data_pb2.py C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py state_store_data.proto --protoc ./protoc.exe --proto-in-dir ../../chrome/browser/safe_browsing/incident_reporting --cc-out-dir gen/chrome/browser/safe_browsing/incident_reporting --py-out-dir pyproto/chrome/browser/safe_browsing/incident_reporting Protoc has returned non-zero status: -1073741819 . [13041/58876] STAMP obj/chrome/app/resources/locale_settings_grit.stamp [13042/58876] ACTION //components/autofill/core/browser/proto:proto_gen(//build/toolchain/win:x86) FAILED: gen/components/autofill/core/browser/proto/autofill_sync.pb.h gen/components/autofill/core/browser/proto/autofill_sync.pb.cc pyproto/components/autofill/core/browser/proto/autofill_sync_pb2.py gen/components/autofill/core/browser/proto/server.pb.h gen/components/autofill/core/browser/proto/server.pb.cc pyproto/components/autofill/core/browser/proto/server_pb2.py C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py autofill_sync.proto server.proto --protoc ./protoc.exe --proto-in-dir ../../components/autofill/core/browser/proto --cc-out-dir gen/components/autofill/core/browser/proto --py-out-dir pyproto/components/autofill/core/browser/proto Protoc has returned non-zero status: -1073741819 . [13043/58876] ACTION //chrome/app:google_chrome_strings_grit(//build/toolchain/win:x86) [13044/58876] STAMP obj/content/resources_grit.stamp [13045/58876] ACTION //chrome/test:test_proto_gen(//build/toolchain/win:x86) FAILED: gen/chrome/common/safe_browsing/ipc_protobuf_message_test.pb.h gen/chrome/common/safe_browsing/ipc_protobuf_message_test.pb.cc pyproto/chrome/common/safe_browsing/ipc_protobuf_message_test_pb2.py C:/b/depot_tools/python276_bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py ipc_protobuf_message_test.proto --protoc ./protoc.exe --proto-in-dir ../../chrome/common/safe_browsing --cc-out-dir gen/chrome/common/safe_browsing --py-out-dir pyproto/chrome/common/safe_browsing Protoc has returned non-zero status: -1073741819 .
,
May 29 2017
tapted@ I'm going to de-dupe bug 707806 . This issue was for linking of proto_zero_plugin, and has been marked fixed. Although bug 707806 has the same symptom, the fix in c#23 will not fix that issue. I think the fix would be to add the following to https://cs.chromium.org/chromium/src/third_party/protobuf/BUILD.gn?rcl=8cb8d28de8f2c439a50307ddc142c533c5b79e2c&l=583 + if (is_win) { + configs -= [ "//build/config/win:default_incremental_linking" ] + configs += [ "//build/config/win:no_incremental_linking" ] + } |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by pdr@chromium.org
, Mar 20 2017