New issue
Advanced search Search tips

Issue 844398 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 30
Cc:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocked on:
issue 805414
issue 848911

Blocking:
issue 783296
issue 838553



Sign in to add a comment

ToT CrWinAsan* bots busted after making them 64-bit

Project Member Reported by thakis@chromium.org, May 18 2018

Issue description

https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/612 (the other 2 bots too)

exception steps 
failed aura_unittests 
failed base_unittests 
exception browser_tests 
exception components_unittests 
failed content_browsertests 
failed crashpad_tests 
failed ipc_tests 
failed net_unittests 
failed sbox_integration_tests 
failed service_manager_unittests 
failed setup_unittests 
failed unit_tests 
failed views_unittests


I guess short term we need to undo the switch.
 
Cc: siggi@chromium.org chrisha@chromium.org

Comment 2 by r...@chromium.org, May 18 2018

That's a lot of failures to work through. It also suggests that switching CF to 64-bit might have been premature. Let's revert for now.
We have switched CF a few months back and were seeing kind of many failures. It is hard to go back since newer testcases got attached to 64-bit builds. We can wait sometime till you guys get to fix these. Can you please help to take a look given it is reproducible in CrWinAsan bot ?

Comment 4 by siggi@chromium.org, May 22 2018

See bug 805414 for one problem that appears to be sbox/runtime interaction.
Blocking: 838553
Blocking: 783296
Cc: mmoroz@chromium.org
Labels: -Pri-3 Pri-1
Status: Assigned (was: Untriaged)
As discussed in meeting, lets do this as first step. And then i think most other bugs like the ones mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=783296 go away.
Cc: h...@chromium.org dxf@google.com
Project Member

Comment 9 by bugdroid1@chromium.org, May 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/add2efb3a194e081d759a51117e8fe863f1380cf

commit add2efb3a194e081d759a51117e8fe863f1380cf
Author: Reid Kleckner <rnk@google.com>
Date: Fri May 25 22:37:43 2018

Drop redundant ASan library from exe link line

This works around a bug (https://llvm.org/pr37592) in LLD, where if
foo.lib appears on the link line without -wholearchive:, it doesn't take
effect. Instead, only pass the -wholearchive:foo.lib flag, which adds
the library as an input and turns on the wholearchive behavior that we
need.

Fixes base_unittests StackSamplingProfilerTest.OtherLibrary and related
tests, which were failing because chrome.exe was missing an exported
function.

Remove the /include flag workaround for  https://crbug.com/777087 .

BUG= 777087 , 844398 
R=mmoroz@chromium.org

Change-Id: I493e1dcf6963048f7e83df1c937b4a4a62dd96bb
Reviewed-on: https://chromium-review.googlesource.com/1073890
Reviewed-by: Max Moroz <mmoroz@chromium.org>
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#562042}
[modify] https://crrev.com/add2efb3a194e081d759a51117e8fe863f1380cf/build/config/sanitizers/BUILD.gn

Comment 10 by r...@chromium.org, May 25 2018

Locally I was able to run these suites successfully:
base_unittests
aura_unittests
ipc_tests
views_unittests


Failures:
sbox_integration_tests:
5 tests failed:
    ProcessMitigationsTest.CheckWin10DynamicCodeOptOutPolicySuccess (../../sandbox/win/src/process_mitigations_dyncode_unittest.cc:486)
    ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Baseline (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:420)
    ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Failure (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:461)
    ProcessMitigationsTest.CheckWin10ImageLoadPreferSys32_Success (../../sandbox/win/src/process_mitigations_imageload_unittest.cc:441)
    ProcessMitigationsTest.CheckWin81DynamicCodePolicySuccess (../../sandbox/win/src/process_mitigations_dyncode_unittest.cc:405)

There was a major issue with unit_tests. ASan makes the executable about 5 times bigger, which seems to drastically slow down process startup, which makes it hard to run the tests. It looks like we can disable CFG as a workaround, but we should also try to make ASan generate smaller code.

Comment 11 by r...@chromium.org, May 30 2018

I filed issue 846966 to document the ASan+CFG issue, and I have https://chromium-review.googlesource.com/c/chromium/src/+/1074337 out to disable CFG when ASan is enabled.

Comment 12 by r...@chromium.org, May 30 2018

New passes:
net_unittests
setup_unittests
components_unittests
crashpad_tests

I'll try the browser tests next, but they are large and likely to fail.

Comment 13 by r...@chromium.org, May 31 2018

I relanded the "make the bots 64-bit" CL, but I forgot to include the bug number.

This build includes the change, so it should give us 64-bit results soon:
https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/670
Looks still pretty broken: steps failed base_unittests failed blink_heap_unittests exception browser_tests failed components_browsertests exception components_unittests exception content_browsertests failed courgette_unittests failed extensions_browsertests failed extensions_unittests failed gfx_unittests failed interactive_ui_tests failed ipc_tests failed media_blink_unittests failed media_unittests failed mojo_unittests failed net_unittests failed sbox_integration_tests failed service_manager_unittests failed services_unittests failed setup_unittests failed sync_integration_tests failed unit_tests failed views_unittests failed wm_unittests

 issue 845010  is prior art for the base_unittests failure
Hm, looks like the bot is randomly busted in 32-bit mode too? And the base_unittest failure bug is probably marked fixed incorrectly.

Comment 16 by r...@chromium.org, Jun 1 2018

Oh, here is a problem:

gfx_unittests gfx_unittests
Run on OS: 'Windows-7-SP1'

Let's not waste time debugging Win7 problems, let's move to Win10 swarming bots.

Comment 17 by r...@chromium.org, Jun 1 2018

I filed an infra bug for that: https://crbug.com/848911

I wasn't able to figure out how to switch the swarming os dimension to win10, but if anyone knows how to do that, let me know.
Cc: jbudorick@chromium.org
+cc John, who would probably know answer to c#17.
Project Member

Comment 19 by bugdroid1@chromium.org, Jun 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f2d1e453de33756fb4454dd881ba8fa786bed919

commit f2d1e453de33756fb4454dd881ba8fa786bed919
Author: Reid Kleckner <rnk@google.com>
Date: Fri Jun 01 23:33:32 2018

Run CrWinASan test steps on Windows 10 swarming bots

64-bit WinASan doesn't run very well on Windows 7 SP 1.

TBR=huangml@chromium.org, thakis@chromium.org
BUG=848911, 844398 

Change-Id: If9206d818f301d0959aa3fb0316657ea57921bbb
Reviewed-on: https://chromium-review.googlesource.com/1083613
Reviewed-by: Reid Kleckner <rnk@chromium.org>
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#563863}
[modify] https://crrev.com/f2d1e453de33756fb4454dd881ba8fa786bed919/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/f2d1e453de33756fb4454dd881ba8fa786bed919/testing/buildbot/waterfalls.pyl

That helped a lot, but there's also still quite a few failing test suites.

https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/682

Newly passing: base_unittests blink_heap_unittests components_browsertests components_unittests courgette_unittests extensions_browsertests extensions_unittests gfx_unittests interactive_ui_tests ipc_tests media_blink_unittests mojo_unittests net_unittests service_manager_unittests services_unittests setup_unittests sync_integration_tests unit_tests views_unittests wm_unittests

Newly failing (as in, used to pass and now doesn't): blink_platform_unittests (aka  issue 849251 ), content_unittests, remoting_unittests

Still failing: browser_tests, content_browsertests, media_unittests, sbox_integration_tests (but this one has way fewer failures now).

Some of the failures look media-codec related.

Comment 21 by r...@chromium.org, Jun 4 2018

I think we should keep the 64-bit change, because at the very least we're testing what's being used.

As you say, the media/codec failures are probably all related. We can probably get this cleaned up in a week.

Comment 22 by r...@chromium.org, Jun 4 2018

Hm, there's another problem starting here:
https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan/687

[15984/52912] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:win_clang_x64)
FAILED: obj/third_party/libvpx/libvpx_yasm/intrapred_sse2.o
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../third_party/yasm/run_yasm.py ./yasm -fwin64 -m amd64 -I../../third_party/libvpx/source/config -I../../third_party/libvpx/source/config/win/x64 -I../../third_party/libvpx/source/libvpx -I. -I../.. -Igen -DCHROMIUM -o obj/third_party/libvpx/libvpx_yasm/intrapred_sse2.o ../../third_party/libvpx/source/libvpx/vpx_dsp/x86/intrapred_sse2.asm
==7012==ERROR: AddressSanitizer failed to allocate 0x00dfd8040000 (961401847808) bytes at 0x002167dc0000 (error code: 1455)
==7012==ReserveShadowMemoryRange failed while trying to map 0xdfd8040000 bytes. Perhaps you're using ulimit -v
[15985/52912] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:win_clang_x64)
FAILED: obj/third_party/libvpx/libvpx_yasm/highbd_subpel_variance_impl_sse2.o
C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../third_party/yasm/run_yasm.py ./yasm -fwin64 -m amd64 -I../../third_party/libvpx/source/config -I../../third_party/libvpx/source/config/win/x64 -I../../third_party/libvpx/source/libvpx -I. -I../.. -Igen -DCHROMIUM -o obj/third_party/libvpx/libvpx_yasm/highbd_subpel_variance_impl_sse2.o ../../third_party/libvpx/source/libvpx/vpx_dsp/x86/highbd_subpel_variance_impl_sse2.asm
==3768==ERROR: AddressSanitizer failed to allocate 0x00dfd8040000 (961401847808) bytes at 0x002167dc0000 (error code: 1455)
==3768==ReserveShadowMemoryRange failed while trying to map 0xdfd8040000 bytes. Perhaps you're using ulimit -v

It seems that yasm is instrumented with ASan, and the build slaves are still Windows Server 2008 R2, which is probably what's preventing us from reserving 1/8 of the address space.

I guess we need new build slave images for this to be reliable.

Comment 23 by r...@chromium.org, Jun 11 2018

Blockedon: 848911
We got new slaves, so the c#22 issue is resolved. Now these are the failing tests:

browser_tests
viz_browser_tests
elevation_service_unittests 
headless_browsertests 
sbox_integration_tests 

browser_tests & viz_browser_tests are the same, and most of the ExtensionInstalledBubbleBrowserTest suite is failing, probably for related reasons.

The rest look like one-off crash handling tests that don't work with WinASan.

Comment 24 by r...@chromium.org, Jun 12 2018

Hm, sbox_integration_tests looks like they fail without ASan. Maybe they fail only on Windows 10? I notice they are passing here:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win10%20Tests%20x64/24574

But they run on 'Windows-10-15063', not 'Windows-10'. Maybe that explains it?

Comment 25 by r...@chromium.org, Jun 13 2018

There is some unrelated ASan redness caused by my LLVM change, r334313, from last Friday:
https://ci.chromium.org/buildbot/chromium.clang/CrWinAsan%28dll%29/672

Basically, ASan reports an OOB in ATL code at:
#0 0x7ffcc56d7ee3 in ATL::CAtlComModule::Term C:\b\c\win_toolchain\vs_files\3bc0ec615cf20ee342f3bc29bc991b5ad66d8d2c\VC\Tools\MSVC\14.14.26428\atlmfc\include\atlbase.h:2619
#1 0x7ffce3fe8672 in o__execute_onexit_table+0xd2 (C:\Windows\System32\ucrtbase.dll+0x180008672)

On the global ATL::__pobjMapEntryLast, which is defined like so:
#pragma section("ATL$__a", read)
#pragma section("ATL$__z", read)
#pragma section("ATL$__m", read)
extern "C"
{
__declspec(selectany) __declspec(allocate("ATL$__a")) _ATL_OBJMAP_ENTRY_EX* __pobjMapEntryFirst = NULL;
__declspec(selectany) __declspec(allocate("ATL$__z")) _ATL_OBJMAP_ENTRY_EX* __pobjMapEntryLast = NULL;
}

#pragma comment(linker, "/merge:ATL=.rdata")

Obviously, the ATL section is intended to be sorted and treated as an array at runtime. The ASan instrumentation pass has a special case for .CRT sections, but we should widen it to any section with '$' in the name. I have a quick fix for that coming soon.
Project Member

Comment 26 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4a80148f2b4850e24af2ae1326619d4b90be4841

commit 4a80148f2b4850e24af2ae1326619d4b90be4841
Author: Reid Kleckner <rnk@google.com>
Date: Wed Jun 13 20:47:48 2018

Test WinASan on the same OS version as the main win10 bots

The chromium.win Windows 10 testers use 'Windows-10-15063' as the
swarming os dimension, not 'Windows-10', as I did for the ASan bots.  It
seems that sbox_integration_tests fails on newer versions of Windows 10,
and some other tests might as well. For now, make sure these bots match
the main waterfall. For every test that this fixes, I'll file a
follow-up bug about that test not passing on recent versions of Windows.

TBR=thakis@chromium.org
BUG= 844398 

Change-Id: I959f2d19459c722e54da9dd09b5fc169e3b171fd
Reviewed-on: https://chromium-review.googlesource.com/1099567
Reviewed-by: Reid Kleckner <rnk@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#566977}
[modify] https://crrev.com/4a80148f2b4850e24af2ae1326619d4b90be4841/testing/buildbot/chromium.clang.json
[modify] https://crrev.com/4a80148f2b4850e24af2ae1326619d4b90be4841/testing/buildbot/waterfalls.pyl

Comment 27 by r...@chromium.org, Jun 13 2018

r334653 should solve the ATL issues in c#25, and help fix CrWinASan(dll).

Comment 28 by r...@chromium.org, Jun 14 2018

OK, now these bots are completely on fire. Basically everything is timing out.  One step forward, two steps back.

Comment 29 by r...@chromium.org, Jun 14 2018

All that purple was caused by more wholearchive issues, described in  https://crbug.com/852679 . Tomorrow we should see fewer failures.

Comment 30 by r...@chromium.org, Jun 18 2018

Blockedon: 805414

Comment 31 by r...@chromium.org, Jun 19 2018

The browser_tests and viz_browsertests failures look like an ASan true positive. We've found a bug in Chrome, woohoo! I filed  https://crbug.com/854355  for it, and will disable the tests soon.
Project Member

Comment 32 by bugdroid1@chromium.org, Jun 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/21fbf9f352d4b328f0090527567b27887772ce46

commit 21fbf9f352d4b328f0090527567b27887772ce46
Author: Reid Kleckner <rnk@google.com>
Date: Wed Jun 20 00:47:36 2018

Disable ExtensionInstalledBubbleBrowserTest.InvokeUI_* tests

Windows ASan reports a use-after-free on an ExtensionInstalledBubble
object. While these tests don't fail on other bots, it's likely that
they are flaky elsewhere, so we should disable them until the issue is
resolved.

R=rdevlin.cronin@chromium.org
BUG= 854355 , 844398 

Change-Id: I19f5d646022847ab20a9fef54c5917a13153805c
Reviewed-on: https://chromium-review.googlesource.com/1107001
Reviewed-by: Devlin <rdevlin.cronin@chromium.org>
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#568668}
[modify] https://crrev.com/21fbf9f352d4b328f0090527567b27887772ce46/chrome/browser/ui/extensions/extension_installed_bubble_browsertest.cc

Question: Do you see any of these two crashes on CrWinASan bots.
https://clusterfuzz.com/v2/testcase-detail/5004843543166976
https://clusterfuzz.com/v2/testcase-detail/5772750468415488

Any idea what could be causing failure to load chrome dll in flaky fashion. This has been top crasher on ClusterFuzz (https://clusterfuzz.com/v2/crash-stats?block=day&days=7&end=424863&group=platform&job=windows_asan_chrome&number=count&sort=total_count)  since Win64 deployment and causing so much pain for regression range calculation. Any help on debugging this is highly appreciated.

1)
[0424/175725.320:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\symbolized\release\asan-win32-release_x64-552025\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A)
[0424/175725.327:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\symbolized\release\asan-win32-release_x64-552025\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A)
=================================================================
==4388==ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff938a0c4 (pc 0x7ffff938a0c4 bp 0x00000000000a sp 0x0031601ffc88 T0)

2) [0304/095320.589:ERROR:main_dll_loader_win.cc(134)] Failed to load Chrome DLL from c:\clusterfuzz\slave-bot\builds\chromium-browser-asan_win32-release_x64_e8abf88e7a5ec8bcd0cd391cfae402f143e8ddb2\revisions\asan-win32-release_x64-539018\chrome_child.dll: A dynamic link library (DLL) initialization routine failed. (0x45A)
=================================================================
==2516==ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff960b544 (pc 0x7ffff960b544 bp 0x00000000000a sp 0x006b771ffbf8 T0)
SCARINESS: 10 (signal)
#0 0x7ffff960b543  (<unknown module>)
#1 0x7ff8463a7ec7 in RtlProcessFlsData (C:\Windows\SYSTEM32\ntdll.dll+0x180007ec7)
#2 0x7ff8463a7fb5 in LdrShutdownProcess (C:\Windows\SYSTEM32\ntdll.dll+0x180007fb5)
#3 0x7ff8463a7d93 in RtlExitUserProcess (C:\Windows\SYSTEM32\ntdll.dll+0x180007d93)
Cc: thakis@chromium.org
For c#33, looks like we are seeing  bug 635715  all over again. Same stack. Any ideas?

Comment 35 by r...@chromium.org, Jun 20 2018

 bug 635715  was ultimately caused by binaries that were too large. That shouldn't really affect 64-bit ASan, even if ASan-ified chrome binaries are still too huge.

One thing that's interesting is:
ERROR: AddressSanitizer: access-violation on unknown address 0x7ffff938a0c4 (pc 0x7ffff938a0c4

Note that the pc is equal to the faulting address. Sounds like some kind of page protection bug, where we try to execute from a page that isn't executable.
Status: Fixed (was: Assigned)
We had gotten the set of ASan failures down to a few browser_tests. I don't think the remaining failures are related to the 64-bit transition anymore. Today we're seeing a large number of browser_tests timeouts, but that's unrelated and I'll file a new bug for it.

Sign in to add a comment