New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 665516 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Xoogler
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug



Sign in to add a comment

memory-infra makes browser very crashy

Project Member Reported by dcheng@chromium.org, Nov 15 2016

Issue description

I'm seeing this when I use chrome://tracing on canary to profile memory usage. See https://crash.corp.google.com/browse?q=reportid=%276178d52f00000000%27

0x00007ffe1125e879	(ntdll.dll + 0x0002e879 )	RtlLockHeap
0x00007ffe0dd50e29	(KERNELBASE.dll + 0x00070e29 )	HeapLock
0x00007ffdd9c2cf24	(chrome.dll -malloc_dump_provider.cc:179 )	base::trace_event::`anonymous namespace'::WinHeapMemoryDumpImpl
0x00007ffdd9c2d12f	(chrome.dll -malloc_dump_provider.cc:241 )	base::trace_event::MallocDumpProvider::OnMemoryDump(base::trace_event::MemoryDumpArgs const &,base::trace_event::ProcessMemoryDump *)
0x00007ffdd9bde7e5	(chrome.dll -memory_dump_manager.cc:595 )	base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *)
0x00007ffdd9bde2fb	(chrome.dll -memory_dump_manager.cc:506 )	base::trace_event::MemoryDumpManager::SetupNextMemoryDump(std::unique_ptr<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState,std::default_delete<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState> >)
0x00007ffdd9bde84a	(chrome.dll -memory_dump_manager.cc:601 )	base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *)
0x00007ffdd9bde2fb	(chrome.dll -memory_dump_manager.cc:506 )	base::trace_event::MemoryDumpManager::SetupNextMemoryDump(std::unique_ptr<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState,std::default_delete<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState> >)
0x00007ffdd9bde84a	(chrome.dll -memory_dump_manager.cc:601 )	base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *)
0x00007ffdd9c35c8b	(chrome.dll -task_annotator.cc:52 )	base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *)
0x00007ffdd9be830d	(chrome.dll -message_loop.cc:413 )	base::MessageLoop::RunTask(base::PendingTask *)
0x00007ffdd9be8eef	(chrome.dll -message_loop.cc:515 )	base::MessageLoop::DoWork()
0x00007ffdd9c37118	(chrome.dll -message_pump_default.cc:35 )	base::MessagePumpDefault::Run(base::MessagePump::Delegate *)
0x00007ffdd9c0d2b2	(chrome.dll -run_loop.cc:35 )	base::RunLoop::Run()
0x00007ffdd9be6c68	(chrome.dll -thread.cc:333 )	base::Thread::ThreadMain()
0x00007ffdd9bb7bf8	(chrome.dll -platform_thread_win.cc:84 )	base::`anonymous namespace'::ThreadFunc
0x00007ffe0ed88101	(KERNEL32.DLL + 0x00018101 )	BaseThreadInitThunk
0x00007ffe1128c5b3	(ntdll.dll + 0x0005c5b3 )	RtlUserThreadStart

Not sure if there's heap corruption of some sort... I think there was something with ALPC blocking that caused this awhile back, but I thought we reverted that.
 

Comment 1 by dskiba@chromium.org, Nov 15 2016

Cc: primiano@chromium.org liamjm@chromium.org
Hmm, last change in that area (https://codereview.chromium.org/2242953002) changed CHECK(HeapLock()) to "if (!HeapLock()) return" precisely to battle invalid heaps it seems.

Looks like that's not enough, and we shouldn't call HeapLock() on invalid heaps at all.
Yeah this seems to be causing lot of problems on Windows. See  Issue 661194  which has more people and discussion.
Unfortunately I don't have too much knowledge (neither time to learn) about winheap.
Would be great if somebody could tell us what is the right thing to do here to dump the winheap stats.

Comment 3 by dcheng@chromium.org, Nov 15 2016

Why is my heap getting corrupted though?
somebody claims that things improved on the bot after https://codereview.chromium.org/2488803002.
Can you check a canary >=  56.0.2914.0 ?

> Why is my heap getting corrupted though?
To be honest at this point I am not sure that this is about heap corruption. Seems to happen too frequently on bots (See  Issue 661194 )

Comment 5 by dcheng@chromium.org, Nov 15 2016

I was running with a newer verison: 56.0.2920.0-64

FWIW, I can repro pretty easily: I open chrome://tracing, capture a short trace, save. Open a new tab, close the old one, navigate the new tab to chrome://tracing, and repeat.
Labels: M-56
Owner: kraynov@chromium.org
+kraynov let's look at this once we got back to London.
Daniel seems to repro this quite constantly. 
Cc: picksi@chromium.org ericrk@chromium.org
kraynov: PING this is quite important

Comment 9 by ericrk@chromium.org, Nov 22 2016

This reproduces 100% on my local Windows machine.

I did a bisect and the issue started when we enabled Control Flow Guard compiler/linker option for Windows. The CL that triggers the issue is crrev.com/2412983006 - reverting this fixes the issue on my local machine.
Ok, it appears that CFG causes the creation of one additional heap. Unfortunately, this heap is created with HEAP_NO_SERIALIZATION, which means that we can't call HeapLock on this heap (result is undefined, per https://msdn.microsoft.com/en-us/library/windows/desktop/aa366702(v=vs.85).aspx) - calling HeapLock causes a crash.

I've confirmed that if we skip the HeapLock on these new heaps, everything works as expected (we are able to enumerate them).

Unfortunately, there doesn't appear to be a way to query a heap for whether or not it requires locking - you have to know ahead of time (due to the flags passed when creating the heap).

It seems like there are a few options we could take here:
1) Enumerate the heap (HeapWalk) without locking. This is allowed, but could result in bad data if anything modifies the heap during enumeration (somewhat likely?). This isn't ideal, but seems better than a crash.
2) Somehow skip heaps not created by Chrome. Right now, these problematic heaps are always the last heap in each process, but I'm not sure we can rely on this...
Ok, after reading more, it sounds like enumerating the heap may require locking, and there may be no safe way to enumerate third party heaps that specified HEAP_NO_SERIALIZATION. So the best option seems to be to exclude non-Chrome heaps from our enumeration.

I did a quick test, and it seems like the primary heap accounts for >97% of the heap memory in all processes other than the GPU process. So for most processes, only enumerating the primary heap is probably good enough.

For the GPU process, the primary heap seems to hover around 50% of allocations, probably because DX/Drivers are managing their own heaps and using a significant amount of memory. That said, due to our inability to control these third party heaps, we probably can't safely enumerate them anyway, so 50% may be as good as we can get here?

It feels like at least in the short term we should disable enumeration of secondary heaps so we can get things working again? WDYT
thanks a lot for the digging. looks like we don't have those many options then. :/
"less but not crashy" sgtm as a strategy. 
Will take a look shortly
Labels: -Pri-2 Pri-1
Status: Started (was: Unconfirmed)
P2 -> P1
So, I'm going to try to query heap info and look for HEAP_NO_SERIALIZATION flag.
Hope secondary heaps in GPU process don't have such flag and will be eligible for instrumentation. In this case reporting 97% (as ericrk@ has calculated) sounds okay as a payment for additional security provided by Control Flow Guard.
1. Reproduced the issue
2. Seems got fixed it https://codereview.chromium.org/2519313005
3. Need to figure out how much memory is allocated in skipped heaps...
So, we can't do that safely, more context here https://codereview.chromium.org/2526343002/
Will go ahead with accounting only main heap :(
Project Member

Comment 17 by bugdroid1@chromium.org, Nov 25 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ad50729b102fa81b770559dbbd33109b7cb5d6f5

commit ad50729b102fa81b770559dbbd33109b7cb5d6f5
Author: kraynov <kraynov@chromium.org>
Date: Fri Nov 25 18:01:23 2016

Account only main WinHeap in MemoryDumpProvider on Windows.

Querying private heaps not owned by your code is racy and crashy
task for WinAPI. See bug for more context.

Heads for for perf sheriffs:
---------------------------
This change will very likely decrease the malloc memory reported
on Windows bots and is WAI, however, not a real improvement, sorry.

BUG= 665516 

Review-Url: https://codereview.chromium.org/2528053002
Cr-Commit-Position: refs/heads/master@{#434537}

[modify] https://crrev.com/ad50729b102fa81b770559dbbd33109b7cb5d6f5/base/trace_event/malloc_dump_provider.cc

Ok hopefully this should be solved. As a reminder we have to re-enable the tests that were disabled on windows due to this. There should be a bug tracking those, but i'm running late right now.
Status: Fixed (was: Started)
Issue 665465 has been merged into this issue.
Components: Internals>Instrumentation>Memory

Sign in to add a comment