ToT CrWinAsan* bots busted after making them 64-bit |
||||||||||
Issue descriptionhttps://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/612 (the other 2 bots too) exception steps failed aura_unittests failed base_unittests exception browser_tests exception components_unittests failed content_browsertests failed crashpad_tests failed ipc_tests failed net_unittests failed sbox_integration_tests failed service_manager_unittests failed setup_unittests failed unit_tests failed views_unittests I guess short term we need to undo the switch.
,
May 18 2018
That's a lot of failures to work through. It also suggests that switching CF to 64-bit might have been premature. Let's revert for now.
,
May 18 2018
We have switched CF a few months back and were seeing kind of many failures. It is hard to go back since newer testcases got attached to 64-bit builds. We can wait sometime till you guys get to fix these. Can you please help to take a look given it is reproducible in CrWinAsan bot ?
,
May 22 2018
See bug 805414 for one problem that appears to be sbox/runtime interaction.
,
May 25 2018
,
May 25 2018
,
May 25 2018
As discussed in meeting, lets do this as first step. And then i think most other bugs like the ones mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=783296 go away.
,
May 25 2018
,
May 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/add2efb3a194e081d759a51117e8fe863f1380cf commit add2efb3a194e081d759a51117e8fe863f1380cf Author: Reid Kleckner <rnk@google.com> Date: Fri May 25 22:37:43 2018 Drop redundant ASan library from exe link line This works around a bug (https://llvm.org/pr37592) in LLD, where if foo.lib appears on the link line without -wholearchive:, it doesn't take effect. Instead, only pass the -wholearchive:foo.lib flag, which adds the library as an input and turns on the wholearchive behavior that we need. Fixes base_unittests StackSamplingProfilerTest.OtherLibrary and related tests, which were failing because chrome.exe was missing an exported function. Remove the /include flag workaround for https://crbug.com/777087 . BUG= 777087 , 844398 R=mmoroz@chromium.org Change-Id: I493e1dcf6963048f7e83df1c937b4a4a62dd96bb Reviewed-on: https://chromium-review.googlesource.com/1073890 Reviewed-by: Max Moroz <mmoroz@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#562042} [modify] https://crrev.com/add2efb3a194e081d759a51117e8fe863f1380cf/build/config/sanitizers/BUILD.gn
,
May 25 2018
Locally I was able to run these suites successfully:
base_unittests
aura_unittests
ipc_tests
views_unittests
Failures:
sbox_integration_tests:
5 tests failed:
ProcessMitigationsTest.CheckWin10DynamicCodeOptOutPolicySuccess (../../sandbox/win/src/process_mitigations_dyncode_unittest.cc:486)
ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Baseline (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:420)
ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Failure (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:461)
ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Success (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:441)
ProcessMitigationsTest.CheckWin81DynamicCodePolicySuccess (../../sandbox/win/src/process_mitigations_dyncode_unittest.cc:405)
There was a major issue with unit_tests. ASan makes the executable about 5 times bigger, which seems to drastically slow down process startup, which makes it hard to run the tests. It looks like we can disable CFG as a workaround, but we should also try to make ASan generate smaller code.
,
May 30 2018
I filed issue 846966 to document the ASan+CFG issue, and I have https://chromium-review.googlesource.com/c/chromium/src/+/1074337 out to disable CFG when ASan is enabled.
,
May 30 2018
New passes: net_unittests setup_unittests components_unittests crashpad_tests I'll try the browser tests next, but they are large and likely to fail.
,
May 31 2018
I relanded the "make the bots 64-bit" CL, but I forgot to include the bug number. This build includes the change, so it should give us 64-bit results soon: https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/670
,
Jun 1 2018
Looks still pretty broken: steps failed base_unittests failed blink_heap_unittests exception browser_tests failed components_browsertests exception components_unittests exception content_browsertests failed courgette_unittests failed extensions_browsertests failed extensions_unittests failed gfx_unittests failed interactive_ui_tests failed ipc_tests failed media_blink_unittests failed media_unittests failed mojo_unittests failed net_unittests failed sbox_integration_tests failed service_manager_unittests failed services_unittests failed setup_unittests failed sync_integration_tests failed unit_tests failed views_unittests failed wm_unittests issue 845010 is prior art for the base_unittests failure
,
Jun 1 2018
Hm, looks like the bot is randomly busted in 32-bit mode too? And the base_unittest failure bug is probably marked fixed incorrectly.
,
Jun 1 2018
Oh, here is a problem: gfx_unittests gfx_unittests Run on OS: 'Windows-7-SP1' Let's not waste time debugging Win7 problems, let's move to Win10 swarming bots.
,
Jun 1 2018
I filed an infra bug for that: https://crbug.com/848911 I wasn't able to figure out how to switch the swarming os dimension to win10, but if anyone knows how to do that, let me know.
,
Jun 1 2018
+cc John, who would probably know answer to c#17.
,
Jun 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f2d1e453de33756fb4454dd881ba8fa786bed919 commit f2d1e453de33756fb4454dd881ba8fa786bed919 Author: Reid Kleckner <rnk@google.com> Date: Fri Jun 01 23:33:32 2018 Run CrWinASan test steps on Windows 10 swarming bots 64-bit WinASan doesn't run very well on Windows 7 SP 1. TBR=huangml@chromium.org, thakis@chromium.org BUG=848911, 844398 Change-Id: If9206d818f301d0959aa3fb0316657ea57921bbb Reviewed-on: https://chromium-review.googlesource.com/1083613 Reviewed-by: Reid Kleckner <rnk@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#563863} [modify] https://crrev.com/f2d1e453de33756fb4454dd881ba8fa786bed919/testing/buildbot/chromium.clang.json [modify] https://crrev.com/f2d1e453de33756fb4454dd881ba8fa786bed919/testing/buildbot/waterfalls.pyl
,
Jun 4 2018
That helped a lot, but there's also still quite a few failing test suites. https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/682 Newly passing: base_unittests blink_heap_unittests components_browsertests components_unittests courgette_unittests extensions_browsertests extensions_unittests gfx_unittests interactive_ui_tests ipc_tests media_blink_unittests mojo_unittests net_unittests service_manager_unittests services_unittests setup_unittests sync_integration_tests unit_tests views_unittests wm_unittests Newly failing (as in, used to pass and now doesn't): blink_platform_unittests (aka issue 849251 ), content_unittests, remoting_unittests Still failing: browser_tests, content_browsertests, media_unittests, sbox_integration_tests (but this one has way fewer failures now). Some of the failures look media-codec related.
,
Jun 4 2018
I think we should keep the 64-bit change, because at the very least we're testing what's being used. As you say, the media/codec failures are probably all related. We can probably get this cleaned up in a week.
,
Jun 4 2018
Hm, there's another problem starting here: https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/687 [15984/52912] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:win_clang_x64) FAILED: obj/third_party/libvpx/libvpx_yasm/intrapred_sse2.o C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../third_party/yasm/run_yasm.py ./yasm -fwin64 -m amd64 -I../../third_party/libvpx/source/config -I../../third_party/libvpx/source/config/win/x64 -I../../third_party/libvpx/source/libvpx -I. -I../.. -Igen -DCHROMIUM -o obj/third_party/libvpx/libvpx_yasm/intrapred_sse2.o ../../third_party/libvpx/source/libvpx/vpx_dsp/x86/intrapred_sse2.asm ==7012==ERROR: AddressSanitizer failed to allocate 0x00dfd8040000 (961401847808) bytes at 0x002167dc0000 (error code: 1455) ==7012==ReserveShadowMemoryRange failed while trying to map 0xdfd8040000 bytes. Perhaps you're using ulimit -v [15985/52912] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:win_clang_x64) FAILED: obj/third_party/libvpx/libvpx_yasm/highbd_subpel_variance_impl_sse2.o C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../third_party/yasm/run_yasm.py ./yasm -fwin64 -m amd64 -I../../third_party/libvpx/source/config -I../../third_party/libvpx/source/config/win/x64 -I../../third_party/libvpx/source/libvpx -I. -I../.. -Igen -DCHROMIUM -o obj/third_party/libvpx/libvpx_yasm/highbd_subpel_variance_impl_sse2.o ../../third_party/libvpx/source/libvpx/vpx_dsp/x86/highbd_subpel_variance_impl_sse2.asm ==3768==ERROR: AddressSanitizer failed to allocate 0x00dfd8040000 (961401847808) bytes at 0x002167dc0000 (error code: 1455) ==3768==ReserveShadowMemoryRange failed while trying to map 0xdfd8040000 bytes. Perhaps you're using ulimit -v It seems that yasm is instrumented with ASan, and the build slaves are still Windows Server 2008 R2, which is probably what's preventing us from reserving 1/8 of the address space. I guess we need new build slave images for this to be reliable.
,
Jun 11 2018
We got new slaves, so the c#22 issue is resolved. Now these are the failing tests: browser_tests viz_browser_tests elevation_service_unittests headless_browsertests sbox_integration_tests browser_tests & viz_browser_tests are the same, and most of the ExtensionInstalledBubbleBrowserTest suite is failing, probably for related reasons. The rest look like one-off crash handling tests that don't work with WinASan.
,
Jun 12 2018
Hm, sbox_integration_tests looks like they fail without ASan. Maybe they fail only on Windows 10? I notice they are passing here: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20Tests%20x64/24574 But they run on 'Windows-10-15063', not 'Windows-10'. Maybe that explains it?
,
Jun 13 2018
There is some unrelated ASan redness caused by my LLVM change, r334313, from last Friday: https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan%28dll%29/672 Basically, ASan reports an OOB in ATL code at: #0 0x7ffcc56d7ee3 in ATL::CAtlComModule::Term C:\b\c\win_toolchain\vs_files\3bc0ec615cf20ee342f3bc29bc991b5ad66d8d2c\VC\Tools\MSVC\14.14.26428\atlmfc\include\atlbase.h:2619 #1 0x7ffce3fe8672 in o__execute_onexit_table+0xd2 (C:\Windows\System32\ucrtbase.dll+0x180008672) On the global ATL::__pobjMapEntryLast, which is defined like so: #pragma section("ATL$__a", read) #pragma section("ATL$__z", read) #pragma section("ATL$__m", read) extern "C" { __declspec(selectany) __declspec(allocate("ATL$__a")) _ATL_OBJMAP_ENTRY_EX* __pobjMapEntryFirst = NULL; __declspec(selectany) __declspec(allocate("ATL$__z")) _ATL_OBJMAP_ENTRY_EX* __pobjMapEntryLast = NULL; } #pragma comment(linker, "/merge:ATL=.rdata") Obviously, the ATL section is intended to be sorted and treated as an array at runtime. The ASan instrumentation pass has a special case for .CRT sections, but we should widen it to any section with '$' in the name. I have a quick fix for that coming soon.
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4a80148f2b4850e24af2ae1326619d4b90be4841 commit 4a80148f2b4850e24af2ae1326619d4b90be4841 Author: Reid Kleckner <rnk@google.com> Date: Wed Jun 13 20:47:48 2018 Test WinASan on the same OS version as the main win10 bots The chromium.win Windows 10 testers use 'Windows-10-15063' as the swarming os dimension, not 'Windows-10', as I did for the ASan bots. It seems that sbox_integration_tests fails on newer versions of Windows 10, and some other tests might as well. For now, make sure these bots match the main waterfall. For every test that this fixes, I'll file a follow-up bug about that test not passing on recent versions of Windows. TBR=thakis@chromium.org BUG= 844398 Change-Id: I959f2d19459c722e54da9dd09b5fc169e3b171fd Reviewed-on: https://chromium-review.googlesource.com/1099567 Reviewed-by: Reid Kleckner <rnk@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#566977} [modify] https://crrev.com/4a80148f2b4850e24af2ae1326619d4b90be4841/testing/buildbot/chromium.clang.json [modify] https://crrev.com/4a80148f2b4850e24af2ae1326619d4b90be4841/testing/buildbot/waterfalls.pyl
,
Jun 13 2018
r334653 should solve the ATL issues in c#25, and help fix CrWinASan(dll).
,
Jun 14 2018
OK, now these bots are completely on fire. Basically everything is timing out. One step forward, two steps back.
,
Jun 14 2018
All that purple was caused by more wholearchive issues, described in https://crbug.com/852679 . Tomorrow we should see fewer failures.
,
Jun 18 2018
,
Jun 19 2018
The browser_tests and viz_browsertests failures look like an ASan true positive. We've found a bug in Chrome, woohoo! I filed https://crbug.com/854355 for it, and will disable the tests soon.
,
Jun 20 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/21fbf9f352d4b328f0090527567b27887772ce46 commit 21fbf9f352d4b328f0090527567b27887772ce46 Author: Reid Kleckner <rnk@google.com> Date: Wed Jun 20 00:47:36 2018 Disable ExtensionInstalledBubbleBrowserTest.InvokeUI_* tests Windows ASan reports a use-after-free on an ExtensionInstalledBubble object. While these tests don't fail on other bots, it's likely that they are flaky elsewhere, so we should disable them until the issue is resolved. R=rdevlin.cronin@chromium.org BUG= 854355 , 844398 Change-Id: I19f5d646022847ab20a9fef54c5917a13153805c Reviewed-on: https://chromium-review.googlesource.com/1107001 Reviewed-by: Devlin <rdevlin.cronin@chromium.org> Commit-Queue: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#568668} [modify] https://crrev.com/21fbf9f352d4b328f0090527567b27887772ce46/chrome/browser/ui/extensions/extension_installed_bubble_browsertest.cc
,
Jun 20 2018
Question: Do you see any of these two crashes on CrWinASan bots. https://clusterfuzz.com/v2/testcase-detail/5004843543166976 https://clusterfuzz.com/v2/testcase-detail/5772750468415488 Any idea what could be causing failure to load chrome dll in flaky fashion. This has been top crasher on ClusterFuzz (https://clusterfuzz.com/v2/crash-stats?block=day&days=7&end=424863&group=platform&job=windows_asan_chrome&number=count&sort=total_count) since Win64 deployment and causing so much pain for regression range calculation. Any help on debugging this is highly appreciated. 1) [0424/175725.320:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\symbolized\release\asan-win32-release_x64-552025\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A) [0424/175725.327:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\symbolized\release\asan-win32-release_x64-552025\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A) ================================================================= ==4388==ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff938a0c4 (pc 0x7ffff938a0c4 bp 0x00000000000a sp 0x0031601ffc88 T0) 2) [0304/095320.589:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\revisions\asan-win32-release_x64-539018\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A) ================================================================= ==2516==ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff960b544 (pc 0x7ffff960b544 bp 0x00000000000a sp 0x006b771ffbf8 T0) SCARINESS: 10 (signal) #0 0x7ffff960b543 (<unknown module>) #1 0x7ff8463a7ec7 in RtlProcessFlsData (C:\Windows\SYSTEM32\ntdll.dll+0x180007ec7) #2 0x7ff8463a7fb5 in LdrShutdownProcess (C:\Windows\SYSTEM32\ntdll.dll+0x180007fb5) #3 0x7ff8463a7d93 in RtlExitUserProcess (C:\Windows\SYSTEM32\ntdll.dll+0x180007d93)
,
Jun 20 2018
For c#33, looks like we are seeing bug 635715 all over again. Same stack. Any ideas?
,
Jun 20 2018
bug 635715 was ultimately caused by binaries that were too large. That shouldn't really affect 64-bit ASan, even if ASan-ified chrome binaries are still too huge. One thing that's interesting is: ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff938a0c4 (pc 0x7ffff938a0c4 Note that the pc is equal to the faulting address. Sounds like some kind of page protection bug, where we try to execute from a page that isn't executable.
,
Jul 30
We had gotten the set of ASan failures down to a few browser_tests. I don't think the remaining failures are related to the 64-bit transition anymore. Today we're seeing a large number of browser_tests timeouts, but that's unrelated and I'll file a new bug for it. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by infe...@chromium.org
, May 18 2018