memory-infra makes browser very crashy |
|||||||
Issue descriptionI'm seeing this when I use chrome://tracing on canary to profile memory usage. See https://crash.corp.google.com/browse?q=reportid=%276178d52f00000000%27 0x00007ffe1125e879 (ntdll.dll + 0x0002e879 ) RtlLockHeap 0x00007ffe0dd50e29 (KERNELBASE.dll + 0x00070e29 ) HeapLock 0x00007ffdd9c2cf24 (chrome.dll -malloc_dump_provider.cc:179 ) base::trace_event::`anonymous namespace'::WinHeapMemoryDumpImpl 0x00007ffdd9c2d12f (chrome.dll -malloc_dump_provider.cc:241 ) base::trace_event::MallocDumpProvider::OnMemoryDump(base::trace_event::MemoryDumpArgs const &,base::trace_event::ProcessMemoryDump *) 0x00007ffdd9bde7e5 (chrome.dll -memory_dump_manager.cc:595 ) base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *) 0x00007ffdd9bde2fb (chrome.dll -memory_dump_manager.cc:506 ) base::trace_event::MemoryDumpManager::SetupNextMemoryDump(std::unique_ptr<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState,std::default_delete<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState> >) 0x00007ffdd9bde84a (chrome.dll -memory_dump_manager.cc:601 ) base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *) 0x00007ffdd9bde2fb (chrome.dll -memory_dump_manager.cc:506 ) base::trace_event::MemoryDumpManager::SetupNextMemoryDump(std::unique_ptr<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState,std::default_delete<base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState> >) 0x00007ffdd9bde84a (chrome.dll -memory_dump_manager.cc:601 ) base::trace_event::MemoryDumpManager::InvokeOnMemoryDump(base::trace_event::MemoryDumpManager::ProcessMemoryDumpAsyncState *) 0x00007ffdd9c35c8b (chrome.dll -task_annotator.cc:52 ) base::debug::TaskAnnotator::RunTask(char const *,base::PendingTask *) 0x00007ffdd9be830d (chrome.dll -message_loop.cc:413 ) base::MessageLoop::RunTask(base::PendingTask *) 0x00007ffdd9be8eef (chrome.dll -message_loop.cc:515 ) base::MessageLoop::DoWork() 0x00007ffdd9c37118 (chrome.dll -message_pump_default.cc:35 ) base::MessagePumpDefault::Run(base::MessagePump::Delegate *) 0x00007ffdd9c0d2b2 (chrome.dll -run_loop.cc:35 ) base::RunLoop::Run() 0x00007ffdd9be6c68 (chrome.dll -thread.cc:333 ) base::Thread::ThreadMain() 0x00007ffdd9bb7bf8 (chrome.dll -platform_thread_win.cc:84 ) base::`anonymous namespace'::ThreadFunc 0x00007ffe0ed88101 (KERNEL32.DLL + 0x00018101 ) BaseThreadInitThunk 0x00007ffe1128c5b3 (ntdll.dll + 0x0005c5b3 ) RtlUserThreadStart Not sure if there's heap corruption of some sort... I think there was something with ALPC blocking that caused this awhile back, but I thought we reverted that.
,
Nov 15 2016
Yeah this seems to be causing lot of problems on Windows. See Issue 661194 which has more people and discussion. Unfortunately I don't have too much knowledge (neither time to learn) about winheap. Would be great if somebody could tell us what is the right thing to do here to dump the winheap stats.
,
Nov 15 2016
Why is my heap getting corrupted though?
,
Nov 15 2016
somebody claims that things improved on the bot after https://codereview.chromium.org/2488803002. Can you check a canary >= 56.0.2914.0 ? > Why is my heap getting corrupted though? To be honest at this point I am not sure that this is about heap corruption. Seems to happen too frequently on bots (See Issue 661194 )
,
Nov 15 2016
I was running with a newer verison: 56.0.2920.0-64 FWIW, I can repro pretty easily: I open chrome://tracing, capture a short trace, save. Open a new tab, close the old one, navigate the new tab to chrome://tracing, and repeat.
,
Nov 16 2016
,
Nov 19 2016
+kraynov let's look at this once we got back to London. Daniel seems to repro this quite constantly.
,
Nov 22 2016
kraynov: PING this is quite important
,
Nov 22 2016
This reproduces 100% on my local Windows machine. I did a bisect and the issue started when we enabled Control Flow Guard compiler/linker option for Windows. The CL that triggers the issue is crrev.com/2412983006 - reverting this fixes the issue on my local machine.
,
Nov 22 2016
Ok, it appears that CFG causes the creation of one additional heap. Unfortunately, this heap is created with HEAP_NO_SERIALIZATION, which means that we can't call HeapLock on this heap (result is undefined, per https://msdn.microsoft.com/en-us/library/windows/desktop/aa366702(v=vs.85).aspx) - calling HeapLock causes a crash. I've confirmed that if we skip the HeapLock on these new heaps, everything works as expected (we are able to enumerate them). Unfortunately, there doesn't appear to be a way to query a heap for whether or not it requires locking - you have to know ahead of time (due to the flags passed when creating the heap). It seems like there are a few options we could take here: 1) Enumerate the heap (HeapWalk) without locking. This is allowed, but could result in bad data if anything modifies the heap during enumeration (somewhat likely?). This isn't ideal, but seems better than a crash. 2) Somehow skip heaps not created by Chrome. Right now, these problematic heaps are always the last heap in each process, but I'm not sure we can rely on this...
,
Nov 22 2016
Ok, after reading more, it sounds like enumerating the heap may require locking, and there may be no safe way to enumerate third party heaps that specified HEAP_NO_SERIALIZATION. So the best option seems to be to exclude non-Chrome heaps from our enumeration. I did a quick test, and it seems like the primary heap accounts for >97% of the heap memory in all processes other than the GPU process. So for most processes, only enumerating the primary heap is probably good enough. For the GPU process, the primary heap seems to hover around 50% of allocations, probably because DX/Drivers are managing their own heaps and using a significant amount of memory. That said, due to our inability to control these third party heaps, we probably can't safely enumerate them anyway, so 50% may be as good as we can get here? It feels like at least in the short term we should disable enumeration of secondary heaps so we can get things working again? WDYT
,
Nov 23 2016
thanks a lot for the digging. looks like we don't have those many options then. :/ "less but not crashy" sgtm as a strategy.
,
Nov 23 2016
Will take a look shortly
,
Nov 23 2016
P2 -> P1 So, I'm going to try to query heap info and look for HEAP_NO_SERIALIZATION flag. Hope secondary heaps in GPU process don't have such flag and will be eligible for instrumentation. In this case reporting 97% (as ericrk@ has calculated) sounds okay as a payment for additional security provided by Control Flow Guard.
,
Nov 23 2016
1. Reproduced the issue 2. Seems got fixed it https://codereview.chromium.org/2519313005 3. Need to figure out how much memory is allocated in skipped heaps...
,
Nov 24 2016
So, we can't do that safely, more context here https://codereview.chromium.org/2526343002/ Will go ahead with accounting only main heap :(
,
Nov 25 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ad50729b102fa81b770559dbbd33109b7cb5d6f5 commit ad50729b102fa81b770559dbbd33109b7cb5d6f5 Author: kraynov <kraynov@chromium.org> Date: Fri Nov 25 18:01:23 2016 Account only main WinHeap in MemoryDumpProvider on Windows. Querying private heaps not owned by your code is racy and crashy task for WinAPI. See bug for more context. Heads for for perf sheriffs: --------------------------- This change will very likely decrease the malloc memory reported on Windows bots and is WAI, however, not a real improvement, sorry. BUG= 665516 Review-Url: https://codereview.chromium.org/2528053002 Cr-Commit-Position: refs/heads/master@{#434537} [modify] https://crrev.com/ad50729b102fa81b770559dbbd33109b7cb5d6f5/base/trace_event/malloc_dump_provider.cc
,
Nov 25 2016
Ok hopefully this should be solved. As a reminder we have to re-enable the tests that were disabled on windows due to this. There should be a bug tracking those, but i'm running late right now.
,
Nov 25 2016
,
Dec 9 2016
Issue 665465 has been merged into this issue.
,
Jan 4 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by dskiba@chromium.org
, Nov 15 2016