x86-generic-tot-asan-informational failures ASAN unable to mmap / pthread_create() failure |
|||||||||||
Issue descriptionThe x86-generic-tot-asan-informational builder: https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-asan-informational is failing almost constantly, mostly in login_Cryptohome This may or may not be related to issue 618392
,
Sep 20 2016
Output from the most recent failure: https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/x86-generic-tot-asan-informational/R55-8819.0.0-b11989/vm_test_results_1/test_harness/all/SimpleTestVerify/1_autotest_tests/results-33-login_Cryptohome/debug/ Output snippet: 09/20 08:27:06.773 INFO | oobe:0039| Invoking Oobe.loginForTesting 09/20 08:27:09.798 ERROR| browser:0062| Failure while starting browser backend. Traceback (most recent call last): File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 163, in Start self._gaia_id, not self.browser_options.disable_gaia_services) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/oobe.py", line 61, in NavigateFakeLogin enterprise_enroll) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/oobe.py", line 40, in _ExecuteOobeApi self.WaitForJavaScriptExpression("typeof Oobe == 'function'", 20) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 123, in WaitForJavaScriptExpression util.WaitFor(IsJavaScriptExpressionTrue, timeout) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/util.py", line 86, in WaitFor res = condition() File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 116, in IsJavaScriptExpressionTrue return bool(self.EvaluateJavaScript(expr)) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 187, in EvaluateJavaScript expr, context_id=None, timeout=timeout) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 215, in EvaluateJavaScriptInContext expr, context_id=context_id, timeout=timeout) File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 37, in inner inspector_backend._ConvertExceptionFromInspectorWebsocket(e) File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function return func(*args, **kwargs) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 34, in inner return func(inspector_backend, *args, **kwargs) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 208, in EvaluateJavaScript return self._runtime.Evaluate(expr, context_id, timeout) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_runtime.py", line 45, in Evaluate res = self._inspector_websocket.SyncRequest(request, timeout) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 110, in SyncRequest res = self._Receive(timeout) File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 149, in _Receive data = self._socket.recv() File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 596, in recv opcode, data = self.recv_data() File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 606, in recv_data frame = self.recv_frame() File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 637, in recv_frame self._frame_header = self._recv_strict(2) File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 746, in _recv_strict bytes = self._recv(shortage) File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 739, in _recv raise WebSocketConnectionClosedException() DevtoolsTargetCrashException: Devtools target crashed ******************************************************************************** (/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:394 _ConvertExceptionFromInspectorWebsocket) Original exception: ******************************************************************************** (/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:415 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed. ******************************************************************************** (/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:416 _AddDebuggingInformation) Debugger url: ws://127.0.0.1:60286/devtools/page/cb1638a9-1128-4679-8367-0d910dab0a2a Found Minidump: False Stack Trace: ******************************************************************************** Cannot get stack trace on CrOS ******************************************************************************** Standard output: ******************************************************************************** Cannot get standard output on CrOS ********************************************************************************
,
Sep 20 2016
Scanning through the chrome logs I do not see any smoking guns.
However, there are two ASAN log entries:
==26899==ERROR: AddressSanitizer failed to allocate 0xb09000 (11571200) bytes of FakeStack (error code: 12)
ERROR: Failed to mmap
==26899==Process memory map follows:
0x00155000-0x00156000
AND:
==28561==ERROR: AddressSanitizer failed to allocate 0xb09000 (11571200) bytes of FakeStack (error code: 12)
==28561==Process memory map follows:
...
==28561==End of process memory map.
==28561==AddressSanitizer CHECK failed: /var/tmp/portage/sys-devel/llvm-3.9_pre265926-r11/work/llvm-3.9_pre265926/projects/compiler-rt/lib/sanitizer_common/sanitizer_common.cc:183 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
#0 0x56812206 (/opt/google/chrome/chrome+0x1201206)
#1 0x56819c59 (/opt/google/chrome/chrome+0x1208c59)
#2 0x56819e6d (/opt/google/chrome/chrome+0x1208e6d)
#3 0x56820f81 (/opt/google/chrome/chrome+0x120ff81)
#4 0x56766541 (/opt/google/chrome/chrome+0x1155541)
#5 0x568167c6 (/opt/google/chrome/chrome+0x12057c6)
#6 0x56766c58 (/opt/google/chrome/chrome+0x1155c58)
#7 0x5f4fa959 (/opt/google/chrome/chrome+0x9ee9959)
#8 0x5f4fcb03 (/opt/google/chrome/chrome+0x9eebb03)
#9 0x56816bff (/opt/google/chrome/chrome+0x1205bff)
#10 0x5676cb9e (/opt/google/chrome/chrome+0x115bb9e)
#11 0x5534d584 (/lib/libpthread.so.0+0x6584)
,
Sep 20 2016
achuith@, ihf@, I don't suppose either of you have any idea of how to get symbols from the asan_log files?
,
Sep 20 2016
or oshima@?
,
Sep 20 2016
I looked at it. Can't help with symbols, but this crash is typically OOM. ASAN uses extra memory, and maybe we are bumping against the limits? Or maybe there is a leak that it doesn't catch. I think memory looks ok before/after results-31-desktopui_KillRestart/desktopui_KillRestart.session/sysinfo/iteration.1 MemTotal: 2068296 kB MemFree: 1004168 kB MemAvailable: 1662676 kB before results-32-security_RootCA/security_RootCA/sysinfo/iteration.1 we have MemTotal: 2068296 kB MemFree: 157196 kB MemAvailable: 817508 kB Very curious what happened between the two? Now results-33-login_Cryptohome/login_Cryptohome/sysinfo/iteration.1 before: MemTotal: 2068296 kB MemFree: 51124 kB MemAvailable: 686152 kB after (crash seems to clean up usage) MemTotal: 2068296 kB MemFree: 1042492 kB MemAvailable: 1660620 kB In results-34-security_RestartJob/security_RestartJob/sysinfo/iteration.1 usage remains reasonable MemTotal: 2068296 kB MemFree: 425784 kB MemAvailable: 1046288 kB / MemTotal: 2068296 kB MemFree: 422540 kB MemAvailable: 1043044 kB
,
Sep 21 2016
Investigating a different failure with similar symptoms: https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-asan-informational/builds/11996 VMTest1 only fails in desktopui_ScreenLocker: https://pantheon.corp.google.com/storage/browser/chromeos-image-archive/x86-generic-tot-asan-informational/R55-8823.0.0-b11996/vm_test_results_1/test_harness/all/SimpleTestVerify/1_autotest_tests/results-20-desktopui_ScreenLocker There are 2 asan_log files that start with: ==15117==ERROR: AddressSanitizer failed to allocate 0xb09000 (11571200) bytes of FakeStack (error code: 12) However the starting memeinfo looks fine: MemTotal: 2067756 kB MemFree: 218796 kB MemAvailable: 896920 kB It also contains similar debug info (from desktopui_ScreenLocker.INFO): 09/21 05:47:29.093 INFO | oobe:0039| Invoking Oobe.loginForTesting 09/21 05:47:29.911 ERROR| browser:0062| Failure while starting browser backend. Traceback (most recent call last): File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__ self._browser_backend.Start() ... File "/usr/local/telemetry/src/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 739, in _recv raise WebSocketConnectionClosedException() DevtoolsTargetCrashException: Devtools target crashed ******************************************************************************** (/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:394 _ConvertExceptionFromInspectorWebsocket) Original exception: ******************************************************************************** (/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:415 _AddDebuggingInformation) Received a socket error in the browser connection and the tab no longer exists. The tab probably crashed. My question now is, is the catapult output the actual failure (which suggests that the render process is crashing), or a symptom of ASAN failures catching something else?
,
Sep 27 2016
I tried to reproduce this locally but failed:
(cr) ~/trunk/src/scripts $ KEEP_CHROME_DEBUG_SYMBOLS=1 USE="accessibility asan autotest build_tests buildcheck chrome_debug chrome_remoting clang cups evdev_gestures fonts gn gold highdpi nacl opengles ozone runhooks v4l2_codec vaapi xkbcommon -X -afdo_use -app_shell -chrome_debug_tests chrome_internal -chrome_media -component_build -envoy -hardfp -internal_gles_conform -internal_khronos_glcts -mojo -opengl -v4lplugin -verbose -vtable_verify" emerge-${BOARD} chromeos-chrome
(cr) ~/trunk/src/scripts $ ./build_image --board=${BOARD} test && ./image_to_vm.sh --board=${BOARD} --test_image
~/trunk/src/scripts $ ./bin/cros_run_vm_test --no_graphics --board=x86-generic suite:smoke --results_dir_root /tmp/vm_tests_asan
All tests passed.
->current gardener to try and investigate
,
Sep 27 2016
Note: In comment #8 $BOARD = x86-generic
,
Oct 24 2016
,
Oct 24 2016
-> Current gardener
,
Oct 24 2016
,
Oct 31 2016
Still failing this week -> current gardener (me!) Steven, for your attempt to repro in #8 -- where did you get those USE flags?
,
Oct 31 2016
So I found this occasionally in /var/log/ui/ui.<foo> for failing test runs: [4311:4312:1031/131541:ERROR:platform_thread_posix.cc(119)] pthread_create: Resource temporarily unavailable [4311:4312:1031/131541:FATAL:child_thread_impl.cc(160)] Check failed: CreateWaitAndExitThread(base::TimeDelta::FromSeconds(60)). #0 0x00005672d785 __interceptor_backtrace #1 0x00005eb34cca base::debug::StackTrace::StackTrace() #2 0x00005eb88915 logging::LogMessage::~LogMessage() #3 0x000068551633 content::(anonymous namespace)::SuicideOnChannelErrorFilter::OnChannelError() #4 0x000061b29270 IPC::ChannelProxy::Context::OnChannelError() #5 0x000061b5ec78 IPC::SyncChannel::SyncContext::OnChannelError() #6 0x000061b2ac06 IPC::ChannelProxy::Context::OnSendMessage() #7 0x000061b3112b _ZN4base8internal7InvokerINS0_9BindStateIMN3IPC12ChannelProxy7ContextEFvSt10unique_ptrINS3_7MessageESt14default_deleteIS7_EEEJ13scoped_refptrIS5_ENS0_13PassedWrapperISA_EEEEEFvvEE3RunEPNS0_13BindStateBaseE #8 0x00005ed91a5e base::debug::TaskAnnotator::RunTask() #9 0x00005ebae1a7 base::MessageLoop::RunTask() #10 0x00005ebaf0ab base::MessageLoop::DeferOrRunPendingTask() #11 0x00005ebb0381 base::MessageLoop::DoWork() #12 0x00005ebbae78 base::MessagePumpLibevent::Run() #13 0x00005ebad658 base::MessageLoop::RunHandler() #14 0x00005ec49267 base::RunLoop::Run() #15 0x00005ecce121 base::Thread::Run() #16 0x00005ecce4d6 base::Thread::ThreadMain() #17 0x00005ecbb0ab base::(anonymous namespace)::ThreadFunc() #18 0x0000567c4080 __asan::AsanThread::ThreadStart() #19 0x00005671a01f asan_thread_start() #20 0x0000552b3585 <unknown> #21 0x000054d9700e clone There's some discussion of this in issue 552097. It would happen if pthread_create failed. So far this doesn't seem to correlate exactly with failing tests, but it seems suspicious.
,
Oct 31 2016
Talked to rockot@ about this. That stack is suspicious for memory exhaustion. (It could be some other system resource, but I'm not sure what). I think what's happening in that SuicideOnChannelErrorFilter class is: * Renderer detects that browser process died (got an error on the IPC channel) * Renderer tries to spawn a thread to kill itself 60 seconds later * Renderer can't spawn the thread because pthread_create() fails Also, I can reproduce the problem in a local VM, using flags similar to #8, but only if I run the entire test suite. If I just run the first failing test (security_NetworkListeners) it passes. Does anyone know how to get cros_run_vm_test to just run 2 (or N) tests instead of a whole suite? Or how to get memory information out of the VM?
,
Oct 31 2016
James - is it possible that the VM has too little memory? We run it with 2 GB, but that may not be sufficient for the ASAN build. It's set here: https://cs.corp.google.com/chromeos_public/src/scripts/lib/cros_vm_lib.sh?l=294 We increase it for moblab here: https://cs.corp.google.com/chromeos_public/src/scripts/lib/cros_vm_lib.sh?l=248 We may want to increase it for asan by using this function: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/cros/asan.py Do you want to see if this works locally?
,
Nov 1 2016
Giving the VM more memory didn't seem to help. I changed it to 4 GB by changing https://cs.corp.google.com/chromeos_public/src/scripts/lib/cros_vm_lib.sh?l=294 to "-m 4G" just like moblab does. The VM picked up the change. Here's meminfo.before from the first test: MemTotal: 4148272 kB MemFree: 2542696 kB MemAvailable: 3227604 kB However, security_NetworkListeners still fails (and is the first test to fail): Unhandled DevtoolsTargetCrashException: Devtools target crashed For the record, here are all the failing tests: results-12-security_NetworkListeners/security_NetworkListeners FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed results-20-desktopui_ScreenLocker/desktopui_ScreenLocker FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed results-23-security_SandboxedServices/security_SandboxedServices FAIL: One or more processes failed sandboxing results-25-login_OwnershipTaken/login_OwnershipTaken FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed results-35-login_Cryptohome/login_Cryptohome FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed results-38-login_OwnershipNotRetaken/login_OwnershipNotRetaken FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed results-42-login_LogoutProcessCleanup/login_LogoutProcessCleanup FAIL: Unhandled unicode: Unhandled DevtoolsTargetCrashException: Devtools target crashed Maybe this isn't memory exhaustion after all. I see some discussion of other things than can cause pthread_create to fail: http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel Other ideas?
,
Nov 1 2016
Are we sure there's not a real memory leak here? Nothing in the asan logs?
,
Nov 1 2016
Hrm, I assume this is the right place to look for ASAN logs: results-1-security_NetworkListeners/security_NetworkListeners/sysinfo/var/log_diff/asan
If so, yes there is an error there:
==3025==ERROR: AddressSanitizer failed to allocate 0xb09000 (11571200) bytes of FakeStack (error code: 12)
==3025==Process memory map follows:
<<<big memory map>>>
==3025==AddressSanitizer CHECK failed: /var/tmp/portage/sys-devel/llvm-3.9_pre265926-r13/work/llvm-3.9_pre265926/projects/compiler-rt/lib/sanitizer_common/sanitizer_common.cc:183 "((0 && "unable to mmap")) != (0)" (0x0, 0x0)
#0 0x567ed4c6 (/opt/google/chrome/chrome+0x12484c6)
#1 0x567f4f19 (/opt/google/chrome/chrome+0x124ff19)
#2 0x567f512d (/opt/google/chrome/chrome+0x125012d)
#3 0x567fc241 (/opt/google/chrome/chrome+0x1257241)
#4 0x56741801 (/opt/google/chrome/chrome+0x119c801)
#5 0x567f1a86 (/opt/google/chrome/chrome+0x124ca86)
#6 0x56741f18 (/opt/google/chrome/chrome+0x119cf18)
#7 0x5ecddc79 (/opt/google/chrome/chrome+0x9738c79)
#8 0x5ecdfe23 (/opt/google/chrome/chrome+0x973ae23)
#9 0x567f1ebf (/opt/google/chrome/chrome+0x124cebf)
#10 0x56747e5e (/opt/google/chrome/chrome+0x11a2e5e)
#11 0x552e1584 (/lib/libpthread.so.0+0x6584)
So something (ASAN itself?) is attempting to allocate 11 MB of "FakeStack" and failing, which causes a CHECK inside ASAN.
I see that libpthread is the last thing on the stack, so maybe this is the pthread_create() call resulting in ASAN dying internally.
Full asan log attached.
I see the ASAN CHECK listed in issue 508949, but the sheriffs there just disabled the failing tests.
https://github.com/google/sanitizers/issues/165 mentions this CHECK error message.
Does anyone know how to symbolize the above stack?
,
Nov 1 2016
The good news is I got a stack. The bad news is that I'm not sure what to make of it.
jamescook@rubella2:/w/chrome/src (configview)$ tools/valgrind/asan/asan_symbolize.py < ~/asan_stack.txt
#0 0x567ed4c6 in __asan::AsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) crtstuff.c:?
#1 0x567f4f19 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ??:?
#2 0x567f512d in __sanitizer::ReportMmapFailureAndDie(unsigned long, char const*, char const*, int, bool) crtstuff.c:?
#3 0x567fc241 in __sanitizer::MmapOrDie(unsigned long, char const*, bool) crtstuff.c:?
#4 0x56741801 in __asan::FakeStack::Create(unsigned long) crtstuff.c:?
#5 0x567f1a86 in __asan::AsanThread::AsyncSignalSafeLazyInitFakeStack() crtstuff.c:?
#6 0x56741f18 in __asan_stack_malloc_0 ??:?
#7 0x5ecddc79 in ?? ../../../chromeos-cache/distfiles/target/chrome-src/src/base/threading/platform_thread_linux.cc:80:0
#8 0x5ecdfe23 in SetCurrentThreadPriority ../../../chromeos-cache/distfiles/target/chrome-src/src/base/threading/platform_thread_posix.cc:249:7
#9 0x5ecdfe23 in ThreadFunc ../../../chromeos-cache/distfiles/target/chrome-src/src/base/threading/platform_thread_posix.cc:63:0
#10 0x567f1ebf in __asan::AsanThread::ThreadStart(unsigned long, __sanitizer::atomic_uintptr_t*) crtstuff.c:?
#11 0x56747e5e in asan_thread_start(void*) crtstuff.c:?
#12 0x552e1584 in __pthread_get_minstack ??:?
The stack looks correct to me (apart from #12). This is the code starting at line 80 in platform_thread_linux.cc:
bool SetCurrentThreadPriorityForPlatform(ThreadPriority priority) {
#if !defined(OS_NACL)
FilePath cpuset_directory = ThreadPriorityToCpusetDirectory(priority);
...
}
__asan_stack_malloc() sounds like it's just allocating some space on the stack. Hmm.
Recipe: I manually edited the paths in the above stack trace to look like this:
#0 0x567ed4c6 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x12484c6)
#1 0x567f4f19 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x124ff19)
#2 0x567f512d (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x125012d)
#3 0x567fc241 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x1257241)
#4 0x56741801 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x119c801)
#5 0x567f1a86 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x124ca86)
#6 0x56741f18 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x119cf18)
#7 0x5ecddc79 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x9738c79)
#8 0x5ecdfe23 (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x973ae23)
#9 0x567f1ebf (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x124cebf)
#10 0x56747e5e (/x/chromeos/chroot/var/cache/chromeos-chrome/chrome-src/src/out_x86-generic/Release/chrome+0x11a2e5e)
#11 0x552e1584 (/x/chromeos/chroot/build/x86-generic/lib/libpthread.so.0+0x6584)
and ran asan_symbolize.py from my own chromium checkout.
,
Nov 1 2016
It's trying to allocate "fake stack" which is actually heap. It's used to detect stack-use-after-free. It's out of 32-bit address space. The memory dump in #19 attachment shows just 43MB total free address space, with the largest hole of less than 8MB.
,
Nov 1 2016
Now that we know the cause of the problem, I think we should turn off this bot. All shipping Intel Chromebooks are 64-bit and we have a 64-bit ASAN bot. See issue 661347 for discussion.
,
Nov 2 2016
Retitled since this is broken at least as far back at August 27, which is over 2 months. https://build.chromium.org/p/chromiumos.chromium/builders/x86-generic-tot-asan-informational/builds/11790 It's hard to know the exact date since the failures are somewhat flaky and there's no overview view of the builds that far back.
,
Nov 2 2016
Also, I don't think this is Chrome itself consuming too much memory. I just checked the 64-bit Intel ASAN bot and it's running a 2 GB VM. We don't exhaust memory there when running the same tests. Typical meminfo.after on the 64-bit bot: MemTotal: 2046644 kB MemFree: 1333116 kB MemAvailable: 1538384 kB
,
Nov 2 2016
Let me back off a little. There seems to be a slow memory leak on 32 bit. There are 50 some tests to run. We pass the first 40 something and run out of memory. Before turning down the builder, we could just reboot the VM between tests. But obviously this will hide the memory or address space leak.
,
Nov 2 2016
I get the same failures running locally with a single test. ./bin/cros_run_vm_test --no_graphics --board=x86-generic --results_dir_root=JAMES_TOT "security_NetworkListeners" Fails >50% of the time, even in a 4GB VM.
,
Nov 2 2016
eugenis noted in an email thread that we are using 860MB address space for stacks - 20MB per thread. We're not sure where that is coming from, though. Most of the thread creation code uses 0 for requested stack size, which means OS / pthread default. https://cs.chromium.org/chromium/src/base/threading/platform_thread_linux.cc?sq=package:chromium&dr=C&rcl=1478101093&l=157 OS default on 64-bit Chrome OS is 8MB (output of ulimit -s). I'm not sure what the default is on 32-bit. I've tried configuring ASAN to use less memory, but I still get failures running locally. See https://codereview.chromium.org/2471333003/
,
Nov 3 2016
Wow, these stack sizes are huge. I had to reduce stack sizes for Flash to < 1MB at some point.
,
Nov 8 2016
We turned down this bot, per discussion on issue 661347. Thanks for the help, everyone! |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by steve...@chromium.org
, Sep 20 2016