angle_deqp_gles2_d3d11_tests failing on chromium.gpu.fyi/Win7 FYI dEQP Release (AMD) |
||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of wolenetz@chromium.org angle_deqp_gles2_d3d11_tests failing on chromium.gpu.fyi/Win7 FYI dEQP Release (AMD) Builders failed on: - Win7 FYI dEQP Release (AMD): https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20dEQP%20Release%20%28AMD%29
,
Jan 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b8c851991162df9699b10ae29de9391adfc7eca4 commit b8c851991162df9699b10ae29de9391adfc7eca4 Author: Matthew Wolenetz <wolenetz@chromium.org> Date: Fri Jan 04 19:35:46 2019 Revert "Reland "ARM64 capable toolchain hash"" This reverts commit 2a16f2ba48fc76eeb4a2812cc8f2d774a5c1f8c3. Reason for revert: Strongly suspected as culprit for bug 919163 : angle_deqp_gles2_d3d11_tests failing on chromium.gpu.fyi/Win7 FYI dEQP Release (AMD) Original change's description: > Reland "ARM64 capable toolchain hash" > > This is a reland of dd8c7b26d4da856b36edba4b2d8dff5a786d3eb4 > > Original change's description: > > ARM64 capable toolchain hash > > > > This updates Chrome to build with a toolchain based on VS 2017 15.9.3 > > and the Windows 10.0.17763.132 SDK, with the ARM64 components included to > > support building for ARM64 for Win32. This adds about 400 MB to the package > > size, a bit less than a 50% increase. > > > > Packaging was done on a Windows Server 2016 VM, cleanly created for this > > purpose. > > > > The package was created by downloading VS 2017 Update 9.3, from > > https://www.visualstudio.com/vs/, and then running the installer like this: > > > > $ PATH_TO_INSTALLER.EXE ^ > > --add Microsoft.VisualStudio.Workload.NativeDesktop ^ > > --add Microsoft.VisualStudio.Component.VC.ATLMFC ^ > > --add Microsoft.VisualStudio.Component.VC.Tools.ARM64 ^ > > --add Microsoft.VisualStudio.Component.VC.MFC.ARM64 ^ > > --includeRecommended --passive > > > > Then Add or Remove Programs was used to modify the SDK to add the > > Debuggers package. > > > > Then the packaging script was run like this: > > > > python depot_tools\win_toolchain\package_from_installed.py 2017 -w 10.0.17763.0 > > > > This was tested on two VMs to ensure that the results are consistent. > > > > This should not significantly affect the x86 and x64 builds, although > > there may have been some STL and CRT changes that could cause changes. > > > > Bug: 893460 > > Change-Id: I33dd5b915ffaf17f306d804f561d26bf82060b03 > > Reviewed-on: https://chromium-review.googlesource.com/c/1342814 > > Reviewed-by: Dirk Pranke <dpranke@chromium.org> > > Reviewed-by: Nico Weber <thakis@chromium.org> > > Commit-Queue: Bruce Dawson <brucedawson@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#616123} > > Bug: 893460, 915046 > Change-Id: I1f4f8fb7f37785ba935ef698964fa87f9a91e313 > Reviewed-on: https://chromium-review.googlesource.com/c/1377510 > Reviewed-by: Dirk Pranke <dpranke@chromium.org> > Reviewed-by: Nico Weber <thakis@chromium.org> > Reviewed-by: Michael Achenbach <machenbach@chromium.org> > Commit-Queue: Bruce Dawson <brucedawson@chromium.org> > Cr-Commit-Position: refs/heads/master@{#620001} TBR=thakis@chromium.org,dpranke@chromium.org,machenbach@chromium.org,brucedawson@chromium.org,scottmg@chromium.org Change-Id: Ic032ab4307045ada9b2f6e0955bc4ad7dc8e84f8 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 893460, 915046 , 919163 Reviewed-on: https://chromium-review.googlesource.com/c/1396560 Reviewed-by: Matthew Wolenetz <wolenetz@chromium.org> Commit-Queue: Matthew Wolenetz <wolenetz@chromium.org> Cr-Commit-Position: refs/heads/master@{#620030} [modify] https://crrev.com/b8c851991162df9699b10ae29de9391adfc7eca4/build/vs_toolchain.py
,
Jan 4
,
Jan 4
Awaiting the bot passing after #2, before closing this issue.
,
Jan 4
I'll take this. It might be related to crbug.com/915114 , and it's a blocker for crbug.com/915046
,
Jan 4
Thank you, brucedawson@.
,
Jan 4
Bruce: when I ran your CL through the GPU testing in ANGLE I think it only tested the compile step properly. The swarming tests were all using the ToT build/buildtools values in Chromium. What you can do in the future is add win_angle_rel_ng and win_angle_deqp_rel_ng via "choose trybots" on your WIP CL before landing. This should run all the swarming tests including the ones on Windows 7. This probably is what we should have done in the first place. Thanks for catching & reverting.
,
Jan 4
FYI #2 let the bot go green again: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20FYI%20dEQP%20Release%20%28AMD%29/32712
,
Jan 4
I'll add that to my list of verification steps for new toolchains
,
Jan 4
Typical failure output looks like this:
[ RUN ] dEQP.GLES2/functional_shaders_matrix_sub_uniform_highp_mat4_float_fragment
dEQP-GLES2.functional.shaders.matrix.sub.uniform_highp_mat4_float_fragment
../../third_party/angle/src/tests/deqp_support/angle_deqp_gtest.cpp(35): error: Failed
ensureInitialized(138): D3D compiler module not found.
Stack trace:
Backtrace:
StackTraceGetter::CurrentStackTrace [0x003CE138+40]
testing::internal::UnitTestImpl::CurrentOsStackTraceExceptTop [0x003D4B41+69]
testing::internal::AssertHelper::operator= [0x003D47AC+48]
std::basic_ostream<char,std::char_traits<char> >::operator<< [0x003C9B6B+501]
(No symbol) [0x6F083891]
(No symbol) [0x6F037A11]
(No symbol) [0x6F0FAB73]
(No symbol) [0x6F12AB47]
(No symbol) [0x6F12B6DB]
(No symbol) [0x6F04B30E]
(No symbol) [0x6EFDDA5C]
glu::Program::link [0x705C7B34+100]
glu::ShaderProgram::init [0x705C8165+923]
glu::ShaderProgram::ShaderProgram [0x705C7D79+121]
deqp::gls::ShaderRenderCase::init [0x70651791+251]
deqp::gles2::Functional::ShaderMatrixCase::init [0x70456885+5973]
deqp::gles2::TestCaseWrapper::init [0x7055B427+11]
tcu::RandomOrderExecutor::executeInner [0x706680E5+115]
tcu::RandomOrderExecutor::execute [0x7066801F+133]
deqp_libtester_run [0x7055C769+193]
../../third_party/angle/src/tests/deqp_support/angle_deqp_gtest.cpp(367): error: Value of: testPassed
Actual: false
Expected: true
Stack trace:
Backtrace:
StackTraceGetter::CurrentStackTrace [0x003CE138+40]
testing::internal::UnitTestImpl::CurrentOsStackTraceExceptTop [0x003D4B41+69]
testing::internal::AssertHelper::operator= [0x003D47AC+48]
[ FAILED ] dEQP.GLES2/functional_shaders_matrix_sub_uniform_highp_mat4_float_fragment, where GetParam() = 6208 (47 ms)
Presumably the only actually interesting part is this:
ensureInitialized(138): D3D compiler module not found.
Initial investigations shows no change in the d3dcompiler_47.dll file, so ???
,
Jan 4
It means LoadLibrary is failing for some reason. See HLSLCompiler.cpp in ANGLE. We can add a GetLastError call to log more info. But that might not be sufficient to diagnose the issue.
,
Jan 4
That's what I figured. There is a call right before to LoadLibrary(D3DCOMPILER_DLL) but the value of D3DCOMPILER_DLL is unchanged with the new toolchain so the cause is not immediately obvious.
,
Jan 7
Hi Bruce, what do you mean for "initial investigations shows no change in the d3dcompiler_47.dll file"? Both versions from 17134 and 17763 are identical?
,
Jan 7
I meant that the 17134 and 17763 versions were identical, but subsequent checking shows that they are not. I had four toolchain packages installed and I think that led to my incorrect assessment.
Looking at the differences makes the cause of the problem pretty obvious - the new d3dcompiler_47.dll imports some DLLs that are not guaranteed to be present on Windows 7. They are present on *most* Windows 7 machines (which is why my tests succeeded on my home machine) - maybe all machines in the real world - but not on our test machines. Here are the old and new DLL imports:
msvcrt.dll
KERNEL32.dll
ADVAPI32.dll
RPCRT4.dll
api-ms-win-crt-string-l1-1-0.dll
api-ms-win-crt-math-l1-1-0.dll
api-ms-win-crt-runtime-l1-1-0.dll
api-ms-win-crt-private-l1-1-0.dll
KERNEL32.dll
ADVAPI32.dll
api-ms-win-crt-time-l1-1-0.dll
RPCRT4.dll
So, msvcrt.dll is replaced with four api-ms-win-crt*.dll files.
Possible fixes would include shipping an old version of d3dcompiler_47.dll or shipping the api-ms-win-crt*.dll files with Chrome and to our test machines.
,
Jan 8
I've verified that we're getting error-code 126 when trying to load d3dcompiler_47.dll on a Server 2008 R2-SP1 (I had to write a test program because the angle tests fail with a different error on my VM: Exception running test: Got EGL_NOT_INITIALIZED: initialize(m_eglDisplay, &major, &minor) at egluGLContextFactory.cpp:331 I assume that I could configure my VM to get past that error and get to the error about not being able to load the DLL, but it seems unnecessary since I have a smaller repro of presumably the same bug.
,
Jan 8
Uggh. Cancel that. The DLL actually is loading. The error 126 must just be leftover from something else. Weird. My server already has the api-ms files installed, so it is useless for reproducing this problem.
,
Jan 8
Bruce, we can try connecting to the bots that were failing and looking for the files. Can you tell me what to look for?
,
Jan 8
On my VM I went to c:\windows and went "dir /s api-ms-win-crt-math-l1-1-0.dll" - this found six copies.
,
Jan 8
The following revision refers to this bug: https://chromium.googlesource.com/angle/angle/+/eb278cb04e970ad823ae09e9d4348f88879de6c0 commit eb278cb04e970ad823ae09e9d4348f88879de6c0 Author: Jamie Madill <jmadill@chromium.org> Date: Tue Jan 08 01:55:50 2019 D3D: Log GetLastError when compiler load fails. Also use the non-unicode version of LoadLibrary explicitly. Bug: chromium:919163 Change-Id: I4841c3eef586ff57563915da12765baaa6e0b146 Reviewed-on: https://chromium-review.googlesource.com/c/1398642 Reviewed-by: Bruce Dawson <brucedawson@chromium.org> Reviewed-by: Jamie Madill <jmadill@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> [modify] https://crrev.com/eb278cb04e970ad823ae09e9d4348f88879de6c0/src/libANGLE/renderer/d3d/HLSLCompiler.cpp
,
Jan 8
Well, now I'm more confused. I logged on to the bot and found 6 files as you would expect. Furthermore I downloaded and tested the isolate and everything ran find. Even tried moving D3DCompiler_47.dll to another folder and the tests were passing. What gives?
,
Jan 8
FYI bot I used was build113-m4: https://chromium-swarm.appspot.com/bot?id=build113-m4&sort_stats=total%3Adesc
,
Jan 8
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7c8b7f00ff2c9c7e5673da536258f99452714d4c commit 7c8b7f00ff2c9c7e5673da536258f99452714d4c Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Date: Tue Jan 08 03:31:30 2019 Roll src/third_party/angle 5df1d281a992..eb278cb04e97 (1 commits) https://chromium.googlesource.com/angle/angle.git/+log/5df1d281a992..eb278cb04e97 git log 5df1d281a992..eb278cb04e97 --date=short --no-merges --format='%ad %ae %s' 2019-01-08 jmadill@chromium.org D3D: Log GetLastError when compiler load fails. Created with: gclient setdep -r src/third_party/angle@eb278cb04e97 The AutoRoll server is located here: https://autoroll.skia.org/r/angle-chromium-autoroll Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:919163 TBR=ynovikov@chromium.org Change-Id: I67ded25f870177d2ff70f69f1fa5af9326a1ce0f Reviewed-on: https://chromium-review.googlesource.com/c/1400181 Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#620604} [modify] https://crrev.com/7c8b7f00ff2c9c7e5673da536258f99452714d4c/DEPS
,
Jan 8
> What gives? Good question. I don't know. I finally got crrev.com/c/1396851 (reland of reland of the toolchain change) to pass the angle tests on the try bots (they failed the two previous times). I did this by adding all of the UCRT files to the runtime_libs group even for non-component builds - see the change to build/win/BUILD.gn. So, that is one path forward, but a path that uses about 2 MB (for ucrtbase.dll and the api-ms*.dll files), for the isolates and for Chrome installs. That's unfortunate because on 99.9% of machines those files are unneeded because they are system installed. I'm talking to some people at Microsoft about this and hope to be able to understand the failure and decide on the best options.
,
Jan 8
Okay, I understand this well enough now. The 17763 version of d3dcompiler_47.dll uses the universal CRT (UCRT). It imports functions from five api-ms-win-crt*.dll files. Those five files don't have any imports but if you look at their exports (the functions that d3dcompiler_47.dll imports) then you will see that they are all forwarded to ucrtbase.dll. ucrtbase.dll then imports a bunch (all?) of the api-ms-win-core*.dll files. So, roughly speaking, using the latest d3dcompiler_47.dll requires distributing 943,648 bytes of api-ms-win-*.dll files and 1,193,808 bytes of ucrtbase.dll (this is for x86 - the calculation is slightly different for x64). Most users will already have recent versions of the UCRT installed through Windows update, so we could roll the dice and ship the new d3dcompiler_47.dll without the UCRT files (while shipping the UCRT files to our test bots to avoid those known failures), but we don't know how many people this would affect. Alternately, we could ship the old version of d3dcompiler_47.dll. This complicates the toolchain packaging, but that is just a bi-annual project so not a big deal. I have confirmed that adding all of the UCRT to the isolates fixes the angle failures. I am about to run a test with a toolchain package with the old d3dcompielr_47.dll and I assume that will also fix the failures. Any thoughts?
,
Jan 8
Issue 915114 has been merged into this issue.
,
Jan 8
Is it possible to get numbers on how many users this affects?
,
Jan 8
We could ship the new d3dcompiler_47.dll under a different name and LoadLibrary() it to see how often it fails, but that probably causes too many problems, especially since we'd have to ship this test all the way to stable to get meaningful numbers - the canary/dev/beta populations are far too different to be meaningful. I created a modified package that uses the old d3dcompiler_47.dll and that works well. It's in crrev.com/c/1401403. That technique is safe, simple, and avoids shipping any additional DLLs.
,
Jan 8
That works to get the roll unblocked, but it also means we miss out on bugfixes and perf improvements that are in the new version, right?
,
Jan 8
That is correct. I'm talking with Microsoft about the consequences. The initial response was 'sgtm' but I'll try to get more details.
,
Jan 8
I would prefer if we could use the updated D3D compiler version for our Windows users. It will have a number of bug fixes that we've reported to Microsoft as well as other improvements. metzman, do you know if the Graphics Fuzz team had any experience with security bugs that could occur in the D3D compiler?
,
Jan 8
>metzman, do you know if the Graphics Fuzz team had any experience with security bugs that could occur in the D3D compiler? I don't. Alastair or Hugues does the GraphicsFuzz team have experience with this?
,
Jan 9
A bunch of stuff depends on a new runtime, so pushing a new toolchain with the old d3d while we figure out how many people will break if we bundle the new one async still seems like an ok plan, no?
,
Jan 9
That is a very tempting option. That lets us switch almost everything, an unblock some ARM64/Win32 tasks, and keep us on a path towards being able to use new STL and Windows features, while letting us separately figure out how to most easily ship the new D3D compiler. One option would be to ship *both*, try to load the latest, then fall back to the old version, and record statistics on how often we use the fallback. If that route seems tempting then I could prepare a toolchain package that contains both (assuming that d3dcompiler_47.dll doesn't mind being loaded under a slightly different name), which would make for a smoother transition.
,
Jan 9
I'd do just the old dll for starters. That seems safest and is probably the fastest path to identifying more unknown unknowns for the toolchain package (...or just landing it for good). The two dll method sounds like a good plan for the follow-up to me. (I'd give the new dll the slightly different name, so that in case the name causes an issue we see it in the common case, not in the presumably super rare case.)
,
Jan 9
We don't have direct experience with security bugs in the d3d compiler, but GraphicsFuzz has led to the discovery of several security issues related to WebGL on Android and iOS. I think it would be useful to use it to test the d3d compiler in this context.
,
Jan 9
OK, unblocking the ARM work seems like the most important thing so let's try shipping the old DLL. But if you want to spend a couple minutes making a package with the new + old version (maybe default name and _fallback.dll) I can add an ANGLE commit that tries to load the old version when the default fails to load.
,
Jan 9
I'll make a new package with both DLLs, to avoid having to create another new package in the future. I'll use the name d3dcompiler_47.dll for the old version so that initially nothing changes. For the new version maybe d3dcompiler_47_ucrt.dll. Some additional changes will be needed to copy the new DLL to the build directory, to the isolates, and to the installer before the new one will be fully activated. > I'd give the new dll the slightly different name... Yep. If I go with this naming then Angle would have to (eventually) try d3dcompiler_47_ucrt.dll first and then fallback to d3dcompiler_47.dll. Is that okay?
,
Jan 9
Eh, yes, that is a bit of a mess. Maybe just ship the old DLL for now. And then maybe if people are kind we can ship the new DLL with the bundled dependencies. Or figure out how to blacklist those users. The worse case right now if you would ship the new DLL is that these users would fallback to software rendering with SwiftShader.
,
Jan 9
Note: shipping a second copy of d3dcompiler_47.dll actually adds 3.6 MB to the package size, which is more than the size of the UCRT. If either way we have to ship more files then it may be simpler and more consistent to just ship the extra UCRT files, although I may still temporarily ship a toolchain package with the old compiler just to unblock ARM, and then do a more complete fix in the next few weeks.
,
Jan 9
Mo, there's been discussion about dropping support for a limited subset of Windows 7 users who don't have the new Windows CRT installed. Would you be okay with that? As it stands they would fall back to SwiftShader. Ideally we could blacklist them. But I'm not sure if we can blacklist based on the presence or absence of DLLs. Do you know if we can do that?
,
Jan 9
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1fe5e8635c22f578ae96fffbec0047fc4cd5098c commit 1fe5e8635c22f578ae96fffbec0047fc4cd5098c Author: Bruce Dawson <brucedawson@chromium.org> Date: Wed Jan 09 23:51:03 2019 ARM64 capable toolchain with old d3dcompiler This is a reland of crrev.com/c/1377510, with a new toolchain hash due to packaging an old version of d3dcompiler_47.dll. This updates Chrome to build with a toolchain based on VS 2017 15.9.3 and the Windows 10.0.17763.132 SDK, with the ARM64 components included to support building for ARM64 for Win32. This adds about 400 MB to the package size, a bit less than a 50% increase. This package contains the 17134 version of d3dcompiler_47.dll, to avoid gaining a dependency on the universal CRT which would cause problems for some Windows 7 machines ( bug 919163 ). Packaging was done on a Windows Server 2016 VM, cleanly created for this purpose. The package was created by downloading VS 2017 Update 9.3, from https://www.visualstudio.com/vs/, and then running the installer like this: $ PATH_TO_INSTALLER.EXE ^ --add Microsoft.VisualStudio.Workload.NativeDesktop ^ --add Microsoft.VisualStudio.Component.VC.ATLMFC ^ --add Microsoft.VisualStudio.Component.VC.Tools.ARM64 ^ --add Microsoft.VisualStudio.Component.VC.MFC.ARM64 ^ --includeRecommended --passive Then Add or Remove Programs was used to modify the SDK to add the Debuggers package. Then the packaging script was run like this: python depot_tools\win_toolchain\package_from_installed.py 2017 -w 10.0.17763.0 This was tested on two VMs to ensure that the results are consistent. Since the new d3dcompiler_47.dll uses the UCRT and we want to avoid shipping that the final packaging step was to unzip the package, copy over the two copies of that DLL from win_sdk\Redist, and then repackage the toolchain with: > python package_from_installed.py --repackage=<full-path-to-toolchain-dir> This should not significantly affect the x86 and x64 builds, although there may have been some STL and CRT changes that could cause changes. Bug: 893460, 915046 , 919163 Change-Id: I5891479863dd8371505ca9a4439f6939aa564ba8 Reviewed-on: https://chromium-review.googlesource.com/c/1401403 Commit-Queue: Bruce Dawson <brucedawson@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#621376} [modify] https://crrev.com/1fe5e8635c22f578ae96fffbec0047fc4cd5098c/build/vs_toolchain.py
,
Jan 9
re comment 40: That's imho a product decision, not an eng decision, so we should probably talk to rpop (and my guess is the first question there will be "well, how many people exactly does this affect?") :-)
,
Jan 10
I'm calling this bug fixed and I opened crbug.com/920704 to track a plan to upgrade d3dcompiler_47.dll.
,
Jan 10
Thanks Bruce. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by wolenetz@chromium.org
, Jan 4