New issue
Advanced search Search tips

Issue 704286 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 683729



Sign in to add a comment

Win64 Debug compile is flaky: link.exe crashes with access violation

Project Member Reported by dgozman@chromium.org, Mar 22 2017

Issue description

LINK conversational_speech_generator_unittest.exe fails sometimes
Example builds:
https://build.chromium.org/p/client.webrtc/builders/Win64%20Debug/builds/11872
https://build.chromium.org/p/client.webrtc/builders/Win64%20Debug/builds/11870

Note: build succeeded in between https://build.chromium.org/p/client.webrtc/builders/Win64%20Debug/builds/11871



[1442/1471] LINK conversational_speech_generator_unittest.exe conversational_speech_generator_unittest.exe.pdb
FAILED: conversational_speech_generator_unittest.exe conversational_speech_generator_unittest.exe.pdb 
E:/b/depot_tools/python276_bin/python.exe ../../build/toolchain/win/tool_wrapper.py link-wrapper environment.x64 False link.exe /nologo /OUT:./conversational_speech_generator_unittest.exe /PDB:./conversational_speech_generator_unittest.exe.pdb @./conversational_speech_generator_unittest.exe.rsp


  Version 14.00.24213.1
  ExceptionCode            = C0000005
  ExceptionFlags           = 00000000
  ExceptionAddress         = 000000013F5D4227 (000000013F530000) "E:\b\depot_tools\win_toolchain\vs_files\d3cb0e37bdd120ad0ac4650b674b09e81be45616\win_sdk\bin\..\..\VC\bin\amd64\link.exe"
  NumberParameters         = 00000002
  ExceptionInformation[ 0] = 0000000000000000
  ExceptionInformation[ 1] = 000000013F82E818


CONTEXT:
  Rax    = 000000013F82E7C8  R8     = 0000000201155CB0
  Rbx    = 0000000201155B00  R9     = 000000000027EFA0
  Rcx    = 0000000201155CB0  R10    = 0000000000000000
  Rdx    = 00000000000DD980  R11    = 0000000000000003
  Rsp    = 000000000027EFA0  R12    = 0000000000038DE4
  Rbp    = 000000000027F099  E13    = 0000000000000000
  Rsi    = 000000000027F160  R14    = 000000000027F15C
  Rdi    = 0000000200400000  R15    = 000000013F5FD998
  Rip    = 000000013F5D4227  EFlags = 0000000000010246
  SegCs  = 0000000000000033  SegDs  = 000000000000002B
  SegSs  = 000000000000002B  SegEs  = 000000000000002B
  SegFs  = 0000000000000053  SegGs  = 000000000000002B
  Dr0    = 0000000000000000  Dr3    = 0000000000000000
  Dr1    = 0000000000000000  Dr6    = 0000000000000000
  Dr2    = 0000000000000000  Dr7    = 0000000000000000

 
Labels: -Sheriff-Chromium Infra-Troopers
Cc: scottmg@chromium.org brucedaw...@chromium.org
Summary: Win64 Debug compile is flaky: link.exe crashes with access violation (was: Win64 Debug compile is flaky)
+ Win toolchain people, maybe they know a workaround for a crashing linker.
Are there any crash dumps being saved? We tried to configure the build system so that they would be recorded but I don't know if it works. They should be in %localappdata%\crashdumps

If that doesn't work then the next hope is that we can reproduce the crash locally and get a linker repro or catch the crash in a debugger.

Failing *that*, the next obvious plan would be to disable incremental linking for that target, and hope that that avoids the issue.

No crashdumps on vm169-m3 or vm88-m3 (I've looked at C:\Users\chrome-bot\AppData\Local\...).
Unfortunately I am not surprised by the lack of crash dumps. The VC++ team loves to swallow exceptions and summarize them in text, even though crash dumps have been the preferred solution for years.

I loaded up link.exe into the debugger and did manual offset calculations (it's like the 90s all over again!) and found that this is the crashing instruction:

00007FF61B284227 FF 50 50             call        qword ptr [rax+50h]  

That fits the other numbers (ExceptionInformation[ 1] = 000000013F82E818 and that is equal to rax+50h). But, it doesn't get us any closer, except to know that the crash is in FRefByModFilesOnly.

Turn off incremental linking? I'll pass this along to Microsoft, if only to nudge them to save actual crash dumps.

Is further trooper action expected here? If we think this is a win toolchain issue, I'm guessing no?
Status: ExternalDependency (was: Untriaged)
No trooper action needed. I'm going to close this as ExternalDependency. The frequency is low enough that I think taking no action is fine.
Labels: -Infra-Troopers
Sounds good. Thanks!
Components: -Infra
Owner: brucedaw...@chromium.org
I got confirmation from Microsoft that this bug is fixed in VS 2017 RTM
Project Member

Comment 11 by bugdroid1@chromium.org, Apr 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/depot_tools/+/93e1a76ec4ba7e68fde71577a950f04907787eb4

commit 93e1a76ec4ba7e68fde71577a950f04907787eb4
Author: Bruce Dawson <brucedawson@chromium.org>
Date: Mon Apr 10 20:26:38 2017

Actually enable crash dump collection on builders

Change crrev.com/1825163003 attempted to enable crash dump collection
on build machines, and it worked fine on local testing. However it only
worked because local testing was done using 64-bit Python. The builders
use python from depot_tools which is 32-bit Python so the changes all
went to "HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft" instead of
to "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft", which means they were
ignored. For a year. This made investigation of linker crashes more
complicated than needed.

This change uses the necessary winreg.KEY_WOW64_64KEY dance so that the
64-bit registry is always used, whether running 32-bit Python or 64-bit
Python. Both versions were tested locally. The behavior on 32-bit
Windows is unknown but we don't support building on 32-bit Windows
anyway, and any failures would be rendered harmless by the try/except
block.

R=scottmg@chromium.org
BUG= 704286 

Change-Id: I6abc0e1e9c69b9a4e4b9c705bea9e4faadd0945c
Reviewed-on: https://chromium-review.googlesource.com/473567
Reviewed-by: Robbie Iannucci <iannucci@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Commit-Queue: Bruce Dawson <brucedawson@chromium.org>

[modify] https://crrev.com/93e1a76ec4ba7e68fde71577a950f04907787eb4/win_toolchain/get_toolchain_if_necessary.py

Cc: veranika@chromium.org
I started seeing many such failures on other internal bots.

  Version 14.00.24213.1

  ExceptionCode            = C0000005
  ExceptionFlags           = 00000000
  ExceptionAddress         = 00007FF78FFB4227 (00007FF78FF10000) "E:\b\depot_tools\win_toolchain\vs_files\d3cb0e37bdd120ad0ac4650b674b09e81be45616\win_sdk\bin\..\..\VC\bin\amd64_x86\link.exe"
  NumberParameters         = 00000002
  ExceptionInformation[ 0] = 0000000000000000
  ExceptionInformation[ 1] = 00007FF775B8E818

CONTEXT:
  Rax    = 00007FF775B8E7C8  R8     = 0000000203A6B560
  Rbx    = 0000000203A6B3C0  R9     = 00000064900FE940
  Rcx    = 0000000203A6B560  R10    = 0000000000000000
  Rdx    = 0000000000393A70  R11    = 000000000000DB40
  Rsp    = 00000064900FE940  R12    = 0000000000000000
  Rbp    = 00000064900FEA39  E13    = 0000000000000000
  Rsi    = 00000064900FEB00  R14    = 00000064900FEAFC
  Rdi    = 0000000200400000  R15    = 00007FF78FFDD998
  Rip    = 00007FF78FFB4227  EFlags = 0000000000010246
  SegCs  = 0000000000000033  SegDs  = 000000000000002B
  SegSs  = 000000000000002B  SegEs  = 000000000000002B
  SegFs  = 0000000000000053  SegGs  = 000000000000002B
  Dr0    = 0000000000000000  Dr3    = 0000000000000000
  Dr1    = 0000000000000000  Dr6    = 0000000000000000
  Dr2    = 0000000000000000  Dr7    = 0000000000000000
Blockedon: 683729
I have been told that this bug (it's the same bug - see the last four digits of the ExceptionAddress/Rip) is fixed in VS 2017 RTM. So I'm marking this as blocked on 683729 which is the tracking bug for moving to VS 2017.

The last four digits of ExceptionInformation[ 1] are also the same although I'm not sure if that is meaningful. The last four digits of Rip tend to be consistent because relocated PE files are always loaded on 64-KB boundaries, leaving the last four hex digits untouched.
We have now switched to VS 2017 (bug is still open to track final cleanup but the switched happened 13 days ago) so this bug can be considered fixed.

Please reopen with fresh details if these crashes are seen again.
Status: Fixed (was: ExternalDependency)
Actually marking as fixed...
I'm seeing some link.exe crashes occasionally (just started happening ~a week ago, usually after a fresh sync/rebase).

Here is the output from the build:
C:\src\chromium\src>ninja -C out\debug chrome -j500 -l32
ninja: Entering directory `out\debug'
[1/1] Regenerating ninja files
[14645/30406] LINK(DLL) sandbox.dll sandbox.dll.lib sandbox.dll.pdb
FAILED: sandbox.dll sandbox.dll.lib sandbox.dll.pdb
c:/src/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/toolchain/win/tool_wrapper.py link-wrapper environment.x64 False link.exe /nologo /IMPLIB:./sandbox.dll.lib /DLL /OUT:./sandbox.dll /PDB:./sandbox.dll.pdb @./sandbox.dll.rsp

  Version 14.11.25547.0

  ExceptionCode            = C0000005
  ExceptionFlags           = 00000000
  ExceptionAddress         = 00007FF61CD000F3 (00007FF61CCD0000) "c:\src\chromium\src\third_party\depot_tools\win_toolchain\vs_files\88c3b62e1eb0893b8cd57e3f4859c3af27907f64\win_sdk\bin\..\..\vc\tools\msvc\14.11.25503\bin\hostx64\x64\link.exe"
  NumberParameters         = 00000002
  ExceptionInformation[ 0] = 0000000000000000
  ExceptionInformation[ 1] = 00000003AF60D2D8

CONTEXT:
  Rax    = 000001EB25BCA1E0  R8     = FFFFFE1889A430F8
  Rbx    = 0000000000000000  R9     = 000001EB25C3FDC0
  Rcx    = 00000003AF60D2D8  R10    = 000001EB25C3FDC0
  Rdx    = 0000000000000063  R11    = 00000003AF60D2D8
  Rsp    = 000000152198DC48  R12    = 0000000000003A64
  Rbp    = 00000003AF60D2D8  E13    = 00000003AF60D2D8
  Rsi    = 0000000000008000  R14    = 0000000000000000
  Rdi    = 0000000000000000  R15    = 0000000000000000
  Rip    = 00007FF61CD000F3  EFlags = 0000000000010283
  SegCs  = 0000000000000033  SegDs  = 000000000000002B
  SegSs  = 000000000000002B  SegEs  = 000000000000002B
  SegFs  = 0000000000000053  SegGs  = 000000000000002B
  Dr0    = 0000000000000000  Dr3    = 0000000000000000
  Dr1    = 0000000000000000  Dr6    = 0000000000000000
  Dr2    = 0000000000000000  Dr7    = 0000000000000000
[15144/30406] CXX obj/components/sync/sync/uss_migrator.obj
ninja: build stopped: subcommand failed.



I have a link.exe crash dump as well.  My gn args:
is_debug = true
is_component_build = true
use_goma = true
is_clang = true
Status: Assigned (was: Fixed)
Reactivating, but if this is a separate issue I can file a new bug and we can close this one again.
Please share any link crash dumps that you have that use the vs_files\88c3b... toolchain (the latest version, VC++ 15.4). I can submit bug reports using those crash dumps. They may be too large to share on this bug in which case please contact me directly.
Status: Fixed (was: Assigned)
I filed a VS bug. They initially said that they needed to be able to reproduce the bug. They then said that they have fixed the bug! I guess there was enough information in the crash dump after all.

https://developercommunity.visualstudio.com/content/problem/137750/154-linker-crash.html

I would guess that the VS 15.5 toolchain will have the fix. We are currently on the VS 15.3 toolchain after a failed attempt at using the VS 15.4 toolchain.

Sign in to add a comment