Issue metadata
Sign in to add a comment
|
ConvertTraceEventsToTraceFormat recycles a lot of memory |
||||||||||||||||||||||||
Issue descriptionSometimes during last stages of tracing (i.e. after you clicked 'Stop' on the tracing dialog) renderer process dies because it exhausts AllocationRegister' allocations capacity (which is 1.5 M). I have long suspected that it has to do with json, because adding more stuff to HeadDumpWriter's Entry crashes renderer almost always. So I used ThreadHeapUsageTracker to get a sense what happens in ConvertTraceEventsToTraceFormat(), and voila: [Format is bytes(count), i.e. the first line allocated 7624640 in 120005 allocations.] 102662 | ConvertTraceEventsToTraceFormat: allocated: 7624640(120005) freed: 7559072(120003), diff: 65568(2) 102648 | ConvertTraceEventsToTraceFormat: allocated: 45995216(693196) freed: 43514928(693176), diff: 2480288(20) 102638 | ConvertTraceEventsToTraceFormat: allocated: 61042160(1007373) freed: 58582544(1007361), diff: 2459616(12) 102967 | ConvertTraceEventsToTraceFormat: allocated: 76251120(1265357) freed: 73110368(1265343), diff: 3140752(14) 102695 | ConvertTraceEventsToTraceFormat: allocated: 131873424(2116273) freed: 126565920(2116256), diff: 5307504(17) 102587 | ConvertTraceEventsToTraceFormat: allocated: 67354096(1093805) freed: 63700672(1093772), diff: 3653424(33) 102695 = renderer 102967 = renderer 102662 = utility 102648 = gpu-process 102587 = browser (Patch attached.)
,
Dec 6 2016
Doesn't seem to be related to DevTools. Fadi could you please take a look.
,
Dec 6 2016
Assigning to primiano for now, if you don't have enough bandwidth, feel free to bounce it back.
,
Jan 14 2017
OK, some more data on this. As we saw, ConvertTraceEventsToTraceFormat did over 2 millions allocations in renderer 102695 (2,116,256 to be precise). I tried native heap profiling on Android, and found that it was crashing because apparently 1.5 M allocations configured in AllocationRegister was not enough. I checked last allocation (which caused the crash) and found that it was from ConvertTraceEventsToTraceFormat() converting trace events into JSON. What's interesting is that ConvertTraceEventsToTraceFormat() uses HEAP_PROFILER_SCOPED_IGNORE macro to attribute allocations with "tracing_overhead" type. But that the macro doesn't actually ignore allocations, they are still inserted to the AllocationRegister, and still cause it to overflow. I decided to ignore allocations in HEAP_PROFILER_SCOPED_IGNORE scopes for real, i.e. simply not register them. To get some numbers to justify that I logged max allocation count (i.e. max count of touched entries in allocation hash table, next_unused_cell_) in ctor/dtor of HeapProfilerScopedIgnore class. The results of running startup tracing on Android in simplest possible setup (run Chrome to the NTP, take dump after 13 seconds): GPU ~HeapProfilerScopedIgnore() in ConvertTraceEventsToTraceFormat: 148735 allocations (grew by 144423) Renderer ~HeapProfilerScopedIgnore() in ConvertTraceEventsToTraceFormat: 429624 allocations (grew by 407158) Browser ~HeapProfilerScopedIgnore() in ConvertTraceEventsToTraceFormat: 2435242 allocations (grew by 2299684) (Oh right, I also increased AllocationRegister capacity to 2.5M to avoid crashes.) So yeah, the process of converting stuff to JSON at some point results in 2,299,684 new allocations being alive at the same time. No wonder capacity of 1.5M allocations was not enough.
,
Jan 16 2017
First of all thanks for the analysis as usual! I added some comments in the CL. Considering that the heap profiler memory is not resident until we actually use it, I'd just bump it to 5M entries. By ignoring HEAP_PROFILER_SCOPED_IGNORE (instead of marking as overhead) we'd cause even more discrepancies in the total reported by malloc vs heap profiler and that might be extremely puzzling. Can we afford 60 MB per process until we get to TracingV2 and get rid of most ConvertableToTraceFormats? If the answer is no, a possible alternative action is to use a separate: 1. counter for "ignored bytes" 2. a more compact hashtable that just maps addr to size, without any furhter context (to adjust the counter down when freeing). BUt this is extra work and complexity, and not sure we should go there.
,
Jan 17 2017
As we know, right now mallinfo() on Android reports lots of other stuff, and it's simply not possible to correlate it with what heap profiler measures. Changing heap profiler to report lower number isn't going to change that - there is simply no such thing as "more discrepancies", because those numbers just can't be correlated. Next, the point of "tracing_overhead" value is moot. Especially with it being in the malloc tree. I've complained about it before - during investigations you're not interested in tracing_overhead, but yet you have to do this little exercise of subtracting tracing_overhead from heap profiler's total to get "real" total. So from my point of view, we're bumping capacities and wasting memory just to account for (terribly) memory inefficient JSON conversion to get a useless number. Other way might be to actually fix JSON conversion to not produce 2.3M simultaneously allocated objects in the process. Regarding V2 - how it's going to convert from its protobuf to JSON? If there is straight protobuf -> JSON conversion function, then I think it can be pretty easily adapted to handle TracedValues too.
,
Jan 17 2017
allright then > Regarding V2 - how it's going to convert from its protobuf to JSON? If there is straight protobuf -> JSON conversion function, then I think it can be pretty easily adapted to handle TracedValues too. Ok you make a good point, this will apply as long as we have to support JSON. We could write a smarter JSON writer, which doesn't need the base::values. but right now there are too many other more important things to do, so let's keep focus. will lg the cl, thanks for the good points.
,
Jan 17 2017
OK, thanks! I'll take a look at JSON conversion at my spare time, hopefully by the time V2 lands we'll have something better.
,
Jan 17 2017
As https://codereview.chromium.org/2637783002 is about to kill tracing_overhead, it's time to see what it is. I modified AllocationContextTracker to prepend stack frames with "tracing_overhead" frame, instead of completely overwriting frames with it. I.e. you can now drill into "tracing_overhead" item. Then I took a simple trace (NTP, 13s): https://drive.google.com/file/d/0B_Hmi138MnbJWVZyejVYdDZCNmM/view?usp=sharing Findings: Browser: tracing_overhead is 290 KiB, most of which (215 KiB) comes from ProcessMetricsMemoryDumpProvider::DumpProcessMemoryMaps() adding entries to ProcessMemoryDump::process_mmaps_::vm_regions_. Renderer: tracing_overhead is 4821 KiB. There are three main contributors: 2268 KiB: BlinkGCMemoryDumpProvider::OnMemoryDump(), with majority (2047 KiB) coming from TracedValue::SetString() (Pickle::WriteString -> Pickle::Resize -> realloc). 2298 KiB: PartitionAllocMemoryDumpProvider::OnMemoryDump(), again 2047 KiB coming from TracedValue::SetString(). 204 KiB: ProcessMetricsMemoryDumpProvider::DumpProcessMemoryMaps() Everything else is due to deduplicators (which we measure and report later), and minor things that are related to tracing machinery. Given that I think it's pretty reasonable to drop tracing_overhead, because: 1. It lists allocations that are ProcessMemoryDump-related, so once dump is destroyed, memory is freed. 2. It depends on when MallocDumpProvider is dumped. For example for the renderer we see allocations from blink_gc / partition_alloc only because MallocDumpProvider is triggered after. Thus tracing_overhead misses allocations by MallocDumpProvider itself (of similar magnitude), and all dump providers that come after.
,
Jan 18 2017
Thanks for looking into this. Yeah I always knew about VMMaps being silly, we'll get there at some point :) THere is one thing missing, that you don't see because your trace is short. If you do a longer trace, or if you just record in circular buffer mode, you should see something like 30 MB for the trace buffer. Anyways this is not that much related with the heap traffic that JSON serialization causes really.
,
Jan 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9ab14b2d78c1d15909fa2d6cc12760a757edb405 commit 9ab14b2d78c1d15909fa2d6cc12760a757edb405 Author: dskiba <dskiba@chromium.org> Date: Wed Jan 18 21:53:42 2017 [tracing] Don't register tracing-related allocations. Tracing-related scopes are instrumented with HEAP_PROFILER_SCOPED_IGNORE macro, which caused all scoped allocations to have "tracing_overhead" type, but didn't prevent those allocations to be registered. However, it was discovered (see the bug) that the last stages of tracing create substantial pressure on allocation register, adding millions of short-lived allocations. Instead of bumping allocation register capacity once again, this CL makes HEAP_PROFILER_SCOPED_IGNORE to completely ignore allocations in its scope, i.e. not register them. BUG= 664350 Review-Url: https://codereview.chromium.org/2637783002 Cr-Commit-Position: refs/heads/master@{#444500} [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/base/trace_event/heap_profiler_allocation_context_tracker.cc [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/base/trace_event/heap_profiler_allocation_context_tracker.h [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/base/trace_event/heap_profiler_allocation_context_tracker_unittest.cc [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/base/trace_event/malloc_dump_provider.cc [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/third_party/WebKit/Source/platform/PartitionAllocMemoryDumpProvider.cpp [modify] https://crrev.com/9ab14b2d78c1d15909fa2d6cc12760a757edb405/third_party/WebKit/Source/platform/heap/BlinkGCMemoryDumpProvider.cpp
,
Jul 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/131e21f45f5d7b04874b636c1ffa642c13944f1e commit 131e21f45f5d7b04874b636c1ffa642c13944f1e Author: dskiba <dskiba@chromium.org> Date: Sun Jul 16 19:41:39 2017 [tracing] Optimize TracedValue::AppendAsTraceFormat(). This CL optimizes TracedValue::AppendAsTraceFormat() by removing intermediate base::Value conversion. HeapProfilerPerfTest.AppendStackFramesAsTraceFormat shows ~1.9x improvement on macOS (1730ms -> 920ms). TBR=jbauman@chromium.org BUG= 739378 , 664350 Review-Url: https://codereview.chromium.org/2975033002 Cr-Commit-Position: refs/heads/master@{#487019} [modify] https://crrev.com/131e21f45f5d7b04874b636c1ffa642c13944f1e/base/trace_event/trace_event_argument.cc [modify] https://crrev.com/131e21f45f5d7b04874b636c1ffa642c13944f1e/base/trace_event/trace_event_argument_unittest.cc [modify] https://crrev.com/131e21f45f5d7b04874b636c1ffa642c13944f1e/cc/base/filter_operations_unittest.cc [modify] https://crrev.com/131e21f45f5d7b04874b636c1ffa642c13944f1e/components/tracing/common/graphics_memory_dump_provider_android_unittest.cc
,
May 18 2018
Chrome is moving to perfetto, where this won't be an issue. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by kozyatinskiy@chromium.org
, Nov 14 2016Status: Assigned (was: Untriaged)