media_unittests and gfx_unittests crashing on failure with VS 2017 |
|||||
Issue descriptionOS: Windows What steps will reproduce the problem? (1) Build 64-bit release media_unittests with VS 2017 (2) Run it with --gtest_filter=VpxVideoDecoderTest.MemoryPoolAllowsMultipleDisplay (3) Notice crash The crash is because vpx_internal_error is called (in VS 2015 and VS 2017) and it calls longjmp and longjmp eventually calls RtlFastFail2, saying: "Security check failure or stack buffer overrun - code c0000409 (!!! second chance !!!)" That is, the crash happens inside of longjmp instead of vpx_internal_error. The call to vpx_internal_error and the call to longjmp succeed in VS 2015. Is it normal and expected that vpx_internal_error should be called during this test? The call is to report corruption. Note that the crash either doesn't happen or happens less frequently when running with --single-process-tests. Adding __debugbreak() to vpx_internal_error causes the test to crash on both VS 2015 and VS 2017, thus showing that vpx_internal_error is called with both compilers. This failure and some other failures started happening on this build (when VS 2017 became the default): https://build.chromium.org/p/chromium.win/builders/Win10%20Tests%20x64/builds/15967 and stopped happening on this build (when the revert landed): https://build.chromium.org/p/chromium.win/builders/Win10%20Tests%20x64/builds/16015 The failures weren't initially noticed because the bot was red for other reasons (since fixed).
,
Sep 18 2017
Note that the crashes are 64-bit only. The only gn args needed to reproduce them are is_debug = false (64-bit is default).
,
Sep 18 2017
+libvpx folk for that question; I don't know.
,
Sep 18 2017
Oh wait, I do know one of the answers :) Yes that test is playing a corrupted file, so it should trigger an error.
,
Sep 18 2017
+jzern for error handling
,
Sep 18 2017
Thanks for confirming that the corrupted state is expected. Given that the crash in vpx_internal_error only happens in 64-bit multi-process tests with VS 2017 it seems likely that this is a genuine VS 2017 bug, so don't spend any time on it. I will investigate and report back.
,
Sep 19 2017
Sigh... After much fighting with windbg it appears that windbg is incompatible with /guard:cf. Frustratingly enough I used windbg because the crash only happened in multi-process debugging. But that's not correct. The crash is only *reported* when multi-process debugging. I could have debugged with VS instead of windbg. I'm not sure what the next steps should be.
,
Sep 19 2017
Starting with VS 2017 the /guard:cf compiler option is required when using the /guard:cf linker option. For some unknown reason it was optional with VS 2015 which is why we messed this up. So, the fix is ultimately trivial. This change will resolve the media_unittests test failures: https://chromium-review.googlesource.com/673364 With luck this change will resolve other failures as well.
,
Sep 19 2017
These other VS 2017 failures have the same root cause: gfx_unittests, JPEGCodec.DecodeCorrupted and PNGCodec.DecodeCorrupted
,
Sep 21 2017
My initial pass at fixing this was to pass /guard:cf to the compiler always. However this raised issues including increased binary size. Looking at binary size results from this config:
is_component_build = false
is_debug = false
target_cpu = "x64"
enable_nacl = false
I see 3.0% and 3.9% increases in binary size for chrome_child.dll and chrome.dll.
The next step was to apply /guard:cf to just some build targets. I started with enabling /guard:cf for the no_chromium_code config and various others, including three BUILD.gn files that are not part of the Chromium repo. These three are static_library("libyuv_internal"), static_library("image_diff"), static_library("fxcodec").
With these changes the increase dropped to 0.73% and 0.65% for chrome_child.dll and chrome.dll.
That was still annoyingly high so I removed the setjmp config for no_chromium_code and made the fairly modest set of changes needed to make that work. That dropped the binary-size increase down to 0.49% and 0.41%.
I then cheated and hack-disabled /guard:cf for jumbo_component("platform"). This made no difference to chrome.dll but dropped chrome_child.dll to a 0.38% increase. Most of this final improvement can be realized by breaking the jumbo_component("platform") target in two - just a couple of source files need /guard:cf.
The increase (even after optimization) is annoying but it does buy us improved security. It is probably possible to lower the overhead further by breaking apart build targets to more tightly apply the settings to just the required files, but I'm not convinced it is worth it. At this point there are fewer than 940 translation units with /guard:cf applied to them (when building the chrome target).
I would like to do some more tests (including on 32-bit builds) and official builds. Then, after cleaning up the CL I would propose:
a) Land an initial change that includes the config that enables access to setjmp and various target changes, including to no_chromium_code - this increases binary size by 0.73% and 0.65%
b) Land the necessary libyuv and pdfium changes (must be done after a) - this fixes the test failures
c) Remove the setjmp config from no_chromium_code (must be done after b) - this shrinks the size increase to 0.49% and 0.41%
d) Split jumbo_component("platform") to pull out JPEGImageDecoder.cpp and PNGImageReader.cpp into a static library - this shrinks the size increase of chrome_child.dll to 0.38%
e) Push a VS 2017 package that includes my local modifications to make it impossible to use setjmp without using the setjmp config - this ensures we don't make illegal setjmp calls
Step e will eventually become unnecessary because Microsoft has indicated that they want to make setjmp calls without /guard:cf on the compile line incompatible with /guard:cf on the linker line. That is, they will make this problem a linker error instead of a crash.
Thoughts?
,
Sep 21 2017
As said in the review, it's not just because of binary size, by also because it makes the clang switch more complicated. We agreed with security team to not pursue this for this reason. Maybe I missed how your suggested approach helps with that?
,
Sep 21 2017
> Maybe I missed how your suggested approach helps with that? This plan does not deal with that issue. I thinking of this comment: "We don't want /guard:icf compile flags globally as discussed with security team in depth earlier this year. Using it for the couple of longjmp/setjmp targets (with a clear comment saying that it shouldn't be used more widely) seems like a way forward for 2017 without requiring that though." but reading more closely I see that that was not clearly a feasible solution. I will upload the modified CL for testing while we consider the possibilities.
,
Sep 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/20cc6ec55cf1322cb1afe3e87728d16df8f63834 commit 20cc6ec55cf1322cb1afe3e87728d16df8f63834 Author: Bruce Dawson <brucedawson@chromium.org> Date: Fri Sep 22 19:43:52 2017 Disable CFG checks for longjmp VS 2015 does not implement CFG checks for longjmp but VS 2017 does. This means that the VS 2017 linker, by default, requires that all source files that use setjmp must be compiled with /guard:cf. This has implications for clang-cl and for binary size so this change disables CFG checks for longjmp. That is, this is a linker flag change that affects runtime behavior so that VS 2017 generated binaries will behave the same way as VS 2015 generated binaries. Bug: 766236 Change-Id: I1c959858bceead4ccb28f273bf24926244d38083 Reviewed-on: https://chromium-review.googlesource.com/678063 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: Bruce Dawson <brucedawson@chromium.org> Cr-Commit-Position: refs/heads/master@{#503825} [modify] https://crrev.com/20cc6ec55cf1322cb1afe3e87728d16df8f63834/build/config/win/BUILD.gn
,
Sep 22 2017
Tested on a local VS 2017 build that used to fail - it works now. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by brucedaw...@chromium.org
, Sep 18 2017