New issue
Advanced search Search tips

Issue 782128 link

Starred by 4 users

Issue metadata

Status: Duplicate
Merged: issue 644525
Owner: ----
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 779660



Sign in to add a comment

Windows builders fail with foo.exe has returned non-zero status: -1073741819

Project Member Reported by meade@chromium.org, Nov 7 2017

Issue description

I noticed this both yesterday and today during my sheriff shift, at 17:06 and 17:37. (There was previously a bug that caused failures at specific times during APAC shifts, which is why I mention the time, although it may be a coincidence in this case)


The build fails with various similar messages about processes returning large negative statuses. For example:

FAILED: gen/ipc/test_proto.pb.h gen/ipc/test_proto.pb.cc pyproto/ipc/test_proto_pb2.py 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py test_proto.proto --protoc ./protoc.exe --proto-in-dir ../../ipc --cc-out-dir gen/ipc --py-out-dir pyproto/ipc
Protoc has returned non-zero status: -1073740791 .

FAILED: gen/content/test/fuzzer/html_tree.pb.h gen/content/test/fuzzer/html_tree.pb.cc pyproto/content/test/fuzzer/html_tree_pb2.py 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../tools/protoc_wrapper/protoc_wrapper.py html_tree.proto --protoc ./protoc.exe --proto-in-dir ../../content/test/fuzzer --cc-out-dir gen/content/test/fuzzer --py-out-dir pyproto/content/test/fuzzer
Protoc has returned non-zero status: -1073741819 .

https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium%2FWin_x64%2F16196%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout



FAILED: gen/blink/core/inspector/protocol.json.bro 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py brotli.exe --force --no-copy-stat gen/blink/core/inspector/protocol.json -o gen/blink/core/inspector/protocol.json.bro
brotli.exe failed with exit code -1073741819

https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.chrome%2FGoogle_Chrome_Win%2F23558%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout


 

Comment 1 by no...@chromium.org, Nov 7 2017

Cc: primiano@chromium.org
Labels: Type-Bug
primano@ had same issue last year https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/KAvDcRzD9xg
what was the cause / resolution?
Cc: dpranke@chromium.org brucedaw...@chromium.org
We never found it, we moved some build files and eventually went away.
But IIRC +brucedawson or +dpranke (can't remember who) recently had an internal post where they had some theories about this (sorry my memory is very weak about this, I hope I am not completely dreaming)
All negative status results are not the same. Large negative status basically just means an exception - you need to convert to hex to decode:

-1073741819: 0xC0000005 - STATUS_ACCESS_VIOLATION
-1073740791: 0xC0000409 - STATUS_STACK_BUFFER_OVERRUN

I've seen access violations randomly happen on my machine due to bogus binaries being generated - all zeroes. Which seems really weird and inexplicable. I haven't seen that for a long time.

I'm not sure what triggers STATUS_STACK_BUFFER_OVERRUN. We'd really need to get copies of the bad binaries to investigate what's going wrong, or call stacks.

Thanks for looking into this!

Could we find an owner for this bug so it doesn't fall through cracks?
Currently it stays on sheriffs' radar since it doesn't have an owner yet.

That fact that it happens now after we have switched compilers (from VC++ to clang-cl) is a good data point because it proves that it isn't a compiler problem. The most likely explanation is that it is a linker problem. In the long term we will be moving away from the VC++ linker but that won't be happening soon.

Unless we can repro the problem I'm not sure what we can do other than close as norepro and move on.

This has just closed the tree again:

https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.chrome%2FGoogle_Chrome_Win%2F23700%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout

------------------- 8< -------------------
[4817/44007] ACTION //content/browser/devtools:compressed_protocol_json(//build/toolchain/win:win_clang_x86)
FAILED: gen/blink/core/inspector/protocol.json.bro 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py brotli.exe --force --no-copy-stat gen/blink/core/inspector/protocol.json -o gen/blink/core/inspector/protocol.json.bro
brotli.exe failed with exit code -1073741819
[4818/44007] ACTION //components/resources:compressed_about_credits(//build/toolchain/win:win_clang_x86)
FAILED: gen/components/resources/about_credits.bro 
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/gn_run_binary.py brotli.exe --force --no-copy-stat gen/components/resources/about_credits.html -o gen/components/resources/about_credits.bro
brotli.exe failed with exit code -1073741819
------------------- 8< -------------------

Labels: Pri-1
Until someone fixes the bug, sheriffs are just going to keep filing similar bugs over and over again. Since we don't do sheriff hand-offs. I filed  bug 779660  before.
Blockedon: 779660
Labels: -Sheriff-Chromium
There's nothing actionable for Chromium sheriffs here. I don't know what to do about the fact that removing the Sheriff-Chromium label means that sheriffs will see this as a new problem if/when it shows up again (as Lei pointed out in c#8), but keeping the Sheriff-Chromium label just has it making noise on sheriff-o-matic.
I hit this on one of my workstations and was able to investigate. In this case it was genstring.exe that was crashing. When I ran it it crashed in mainCRTStartup and the assembly language looked like this:

000000014000109B 00 00                add         byte ptr [rax],al  
000000014000109D 00 00                add         byte ptr [rax],al  
000000014000109F 00 00                add         byte ptr [rax],al  
00000001400010A1 00 00                add         byte ptr [rax],al  
00000001400010A3 00 00                add         byte ptr [rax],al  
mainCRTStartup:
00000001400010A5 00 00                add         byte ptr [rax],al  
00000001400010A7 00 00                add         byte ptr [rax],al  
00000001400010A9 00 00                add         byte ptr [rax],al  
00000001400010AB 00 00                add         byte ptr [rax],al  
00000001400010AD 00 00                add         byte ptr [rax],al  
_get_startup_commit_mode:
00000001400010AF 00 00                add         byte ptr [rax],al  
00000001400010B1 00 00                add         byte ptr [rax],al  
00000001400010B3 00 00                add         byte ptr [rax],al  
00000001400010B5 00 00                add         byte ptr [rax],al  
00000001400010B7 00 00                add         byte ptr [rax],al  

I then forced a relink (no recompilation) and on the next run it worked and the code for mainCRTStartup looked like this:

__GSHandlerCheckCommon:
00000001400010A0 E9 1B 3F 00 00       jmp         __GSHandlerCheckCommon (0140004FC0h)  
mainCRTStartup:
00000001400010A5 E9 B6 22 00 00       jmp         mainCRTStartup (0140003360h)  
__scrt_get_dyn_tls_dtor_callback:
00000001400010AA E9 21 34 00 00       jmp         __scrt_get_dyn_tls_dtor_callback (01400044D0h)  
_get_startup_commit_mode:
00000001400010AF E9 BC 32 00 00       jmp         _get_startup_commit_mode (0140004370h)  

What's going on is that this is an array of five-byte thunks, used in incremental linking to let the linker move functions around easily. In the bad builds the thunks are all zeroes which tends to be crashy.

So...

1) It's not a compiler bug. The object files are fine because relinking fixes the issue. But we already knew this because the bug happened with both VC++ and clang
2) It is an incremental linking linker bug.

Cc: jbudorick@chromium.org
 Issue 779660  has been merged into this issue.
Mergedinto: 644525
Status: Duplicate (was: Untriaged)

Sign in to add a comment