Issue metadata
Sign in to add a comment
|
New heap dump format |
||||||||||||||||||||||||
Issue descriptionNew heap dump format was mainly discussed in several private email threads. This bug includes important bits from these conversations.
,
Apr 6 2017
I prototyped first version of the new format: https://codereview.chromium.org/2052803002/ I then did startup tracing for 30 seconds, with 3 second interval between dumps, all while opening 18 tabs ([cnn, youtube, reddit, dailymotion, yahoo, wsj] x 3). The results: Without patch (master): 242232331 bytes With patch: 167779701 bytes I.e. with the patch trace file includes all allocations and yet is 30% smaller.
,
Apr 6 2017
Google-only document summarizing the above: https://docs.google.com/document/d/1OtqTLbXjS9Nm74UIqRJTcaBRmFkerdSSPPMI9EJEKv4/edit#heading=h.yipccu7m5o8y
,
Apr 6 2017
At some point I realized that the above 30% improvement was in case of native heap profiling, and pseudo profiling mode can show different results.
So, I repeated the experiment in pseudo profiling mode ("included_categories": ["*", "disabled-by-default-memory-infra"]):
Master: 352212896
Patch: 349293686
Essentially size stayed the same, but we still have two other advantages: all data + easy format (both to generate and to work with).
,
Apr 6 2017
Note that in both experiments above I traced for 30 seconds with 3 second interval between dumps, i.e. there were 10 dumps.
When there is just one dump per trace, file size regresses ~2x.
That's because new heap format does two things:
1. Removes size threshold (writes all allocations). Size impact:
* Negative - more types are written
* Negative - more stack frames are written
2. Optimizes heap entries format. Size impact:
* Positive - new format is more compact
The experiments above had several detailed dumps from several renderers, so positive impact of #2 was dominating.
I wrote a simple program to break down trace file size, and got the following:
Trace from #2 (many heap dumps):
Current format:
Sizes for "Renderer" processes (18):
type names: 803 KiB
stack frames: 4.0 MiB
heaps: 158.3 MiB
TOTAL: 163.1 MiB
Sizes for "Browser" processes (1):
type names: 3 KiB
stack frames: 723 KiB
heaps: 9.7 MiB
TOTAL: 10.5 MiB
GRAND TOTAL SIZE: 173.9 MiB
New format:
Sizes for "Renderer" processes (18):
type names: 2.2 MiB
stack frames: 21.8 MiB
heaps: 52.6 MiB
TOTAL: 76.6 MiB
Sizes for "Browser" processes (1):
type names: 7 KiB
stack frames: 6.6 MiB
heaps: 12.0 MiB
TOTAL: 18.7 MiB
GRAND TOTAL SIZE: 95.7 MiB
When there is a single heap dump the situation is very different:
Current format:
Process Renderer (124962):
type names: 433 top level items (1299 total), 55 KiB
stack frames: 19829 top level items (158587 total), 1.1 MiB
unique frames: 4672
root frames: 15
leaf frames: 2011
heaps (1):
blink_gc: 30847 top level items (381694 total), 1.7 MiB
malloc: 19127 top level items (221024 total), 1016 KiB
partition_alloc: 31406 top level items (379766 total), 1.7 MiB
Process Browser (124916):
type names: 108 top level items (324 total), 3 KiB
stack frames: 13841 top level items (110548 total), 748 KiB
unique frames: 4081
root frames: 60
leaf frames: 2093
heaps (1):
malloc: 28945 top level items (334759 total), 1.5 MiB
Sizes for "Renderer" processes (1):
type names: 55 KiB
stack frames: 1.1 MiB
heaps: 4.4 MiB
TOTAL: 5.5 MiB
Sizes for "Browser" processes (1):
type names: 3 KiB
stack frames: 748 KiB
heaps: 1.5 MiB
TOTAL: 2.2 MiB
GRAND TOTAL SIZE: 7.7 MiB
New format:
Process Renderer (122493):
type names: 917 top level items (2751 total), 126 KiB
stack frames: 139040 top level items (1112098 total), 7.6 MiB
unique frames: 20194
root frames: 74
leaf frames: 21673
heaps (1):
blink_gc: 4302 top level items (88540 total), 403 KiB
malloc: 12611 top level items (178162 total), 814 KiB
partition_alloc: 5607 top level items (103730 total), 462 KiB
Process Browser (122455):
type names: 227 top level items (681 total), 7 KiB
stack frames: 200826 top level items (1606383 total), 11.0 MiB
unique frames: 28645
root frames: 75
leaf frames: 34985
heaps (1):
malloc: 34985 top level items (494302 total), 2.1 MiB
Sizes for "Renderer" processes (1):
type names: 126 KiB
stack frames: 7.6 MiB
heaps: 1.6 MiB
TOTAL: 9.4 MiB
Sizes for "Browser" processes (1):
type names: 7 KiB
stack frames: 11.0 MiB
heaps: 2.1 MiB
TOTAL: 13.2 MiB
GRAND TOTAL SIZE: 22.6 MiB
So I took a look at how we store stack frames. Currently for
[Thread]
a
b
b
we store the following:
"stackFrames": {
"0": {
"name": "[Thread]"
},
"1": {
"name": "a",
"parent": "0"
},
"2": {
"name": "b",
"parent": "1"
},
"3": {
"name": "b",
"parent": "0"
}
}
There are two issues with this format:
1. It's verbose. Stack deduplicator builds a tree, and then dumps every node as "id: {name, parent_id}", even though all child nodes have the same parent_id.
2. It writes nodes several times. In the example above "b" is written twice. This is especially bad for the new format when it's used with native heap profiling where (1) we have way more frames and way more duplication and (2) symbolization inflates short strings like "pc:7f4a749fc093" into (very) long ones.
,
Apr 6 2017
I prototyped a new way of storing stack frames (https://codereview.chromium.org/2052803002/): 1. Moved all names into string table 2. Changed format to use nested dictionaries, to avoid specifying parent id In the new format the stack from #5 looks like: "stackFrames": { "hierarchy": { "0": { "1": { "bt": "4" }, "2": { "1": { "3": { "bt": "3" }, "bt": "2" } } } }, "names": [ "[Thread]", "b", "a", "c" ] } With this change trace file size with single detailed dump drops almost to what it is now (i.e. without any new formats): 8.7 MiB vs 8 MiB. The fact that frame names are stored only once really shows when trace file is symbolized: it's now just 12.7 MiB, which is again pretty close to what we have now (11.4 MiB).
,
Apr 6 2017
At this point V8 team got in touch and proposed that we bring our new format closer to theirs.
Here is how V8 profile event looks like:
{
"args": {
"data": {
"cpuProfile": {
"nodes": [ ... ],
"samples": [ ... ]
},
"timeDeltas": [ ... ]
}
},
"cat": "disabled-by-default-v8.cpu_profiler",
"name": "ProfileChunk",
"ph": "P",
...
}
Current format dumps the following event
{
"args": {
"dumps": {
"heaps": {
"<provider>": {
"entries": [ ... ]
}
},
...
}
},
"cat": "disabled-by-default-memory-infra",
"name": "periodic_interval",
"ph": "v",
...
}
per heap dump per provider, and
{
"args": {
"typeNames": { ... }
},
"cat": "__metadata",
"name": "typeNames",
"ph": "M",
...
},
{
"args": {
"stackFrames": { ... }
},
"cat": "__metadata",
"name": "stackFrames",
"ph": "M",
...
}
once per trace.
The proposal is to dump the following events instead:
{
"args": {
"data": {
"<provider 1>": {
"counts": [ ... ],
"nodes": [ ... ],
"sizes": [ ... ],
"types": [ ... ]
},
"<provider 2>": {
"counts": [ ... ],
"nodes": [ ... ],
"sizes": [ ... ],
"types": [ ... ]
},
...
"maps": {
"nodes": [ ... ],
"types": [ ... ]
}
}
},
"cat": "disabled-by-default-memory-infra",
"name": "heap_profile",
"ph": "P",
...
}
Similarities:
* maps/nodes is what previously was in "stackFrames" metadata event. The only thing is that it's written incrementally, i.e. each event is a diff from the previous one. The incremental idea is from V8.
* maps/types == "typeNames" metadata event. Also incremental.
Differences:
* Crazy aggregation of the current format is gone, and rules now are:
1. Deduplicate backtrace, type name, get integer ids
2. Use those ids to make aggregation key {backtrace_id, type_id}
3. Aggregate remaining data by the key (for now remaining data is AllocationMetrics, i.e. {count, size})
4. Dump key + data as a set of arrays, i.e.
struct Entry {
struct {
int backtrace_id;
int type_id;
} key;
AllocationMetrics data;
};
is dumped as four arrays: nodes, types, counts and sizes. Transposing array of structures into dictionary of arrays saves space as we don't need to repeat field names over and over (V8 also does this).
* Phase for the new event is TRACE_EVENT_PHASE_SAMPLE ('P'), because this is what V8 uses.
* <provider> is one of malloc, blink_gc, partition_alloc, and all providers for a process are encoded in one event. That's because stack frame / type name deduplicators are per process, not per provider.
* Backtrace array is named "nodes" is because this is what V8 does (although there doesn't seem to be any solid reason for that).
* "maps" dictionary allows for automatic handling of new fields: if there is an array named X in "<provider>" dictionary, then depending on whether maps/X exists we either translate values from provider/X or just treat them as numbers.
,
Apr 6 2017
Hector summarized changes from #7 in a doc: https://docs.google.com/document/d/1zcAIbrvzfK87d5Evwd_o7CJtWqP2SKbkBRKk4NJZ-ts/edit?usp=sharing
,
Apr 6 2017
With that I created a CL for the Chrome changes: https://codereview.chromium.org/2650863003 During the review the following changes were made: 1. Per-allocator profiles were moved into 'allocators' node 2. 'strings' mapping was introduced, and both 'nodes' and 'types' were changed to store string ids 3. 'P' events were dropped, and the new data moved to "heaps_v2" With that per-heap dump event looks like this: { "args": { "dumps": { "heaps_v2": { "version": 1, "allocators": { ["malloc", "partition_alloc", "blinkgc"]: { "nodes": [<stack_frame_id1>, <stack_frame_id2>, ...], "types": [<type_id1>, <type_id2>, ...], "counts": [<count1>, <count2>, ...], "sizes": [<size1>, <size2>, ...] } }, "maps": { "nodes": [ { "id": <stack_frame_id>, "parent": <parent_id>, "name_sid": <name_string_id> }, ... ], "types": [ { "id": <type_id>, "name_sid": <name_string_id> } ], "strings": [ { "id": <string_id>, "string": <string> } ] } }, ... } }, "cat": "disabled-by-default-memory-infra", "name": "periodic_interval", "ph": "v", ... } For other aspects (e.g. aggregation rules) see base/trace_event/heap_profiler_event_writer.h: https://codereview.chromium.org/2650863003/patch/180001/190006
,
Apr 6 2017
,
May 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/45829114399facebf06f2d4abfedb08f6ac24f0f commit 45829114399facebf06f2d4abfedb08f6ac24f0f Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Thu May 04 06:33:21 2017 Roll src/third_party/catapult/ 0d00147b4..18b10cbe6 (1 commit) https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/0d00147b4f72..18b10cbe616e $ git log 0d00147b4..18b10cbe6 --date=short --no-merges --format='%ad %ae %s' 2017-05-03 dskiba symbolize_trace: support new heap dump format. Created with: roll-dep src/third_party/catapult BUG= 708930 Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, see: http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: I6f4ede67585683d8a723daae7684c81e123e4dde Reviewed-on: https://chromium-review.googlesource.com/495372 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#469287} [modify] https://crrev.com/45829114399facebf06f2d4abfedb08f6ac24f0f/DEPS
,
Jun 6 2017
,
Jul 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dd1180754539805af312a7f2ac52f758d543226f commit dd1180754539805af312a7f2ac52f758d543226f Author: Dmitry Skiba <dskiba@chromium.org> Date: Thu Jul 20 10:12:37 2017 Revert "[tracing] Switch to new heap dump format." This reverts commit d4a5e98235da0cc8c4fb1ae2f67826b0b480ce4c. Reason for revert: regressed performance ( crbug.com/739378 , crbug.com/736714 ). The main cause is that new heap format produces a lot more stack frames in an attempt to dump all information in contrast with the old format, where small stack branches are coalesced together. I'll that functionality to the new format and reland it. > [tracing] Switch to new heap dump format. > > This CL switches tracing to a new heap dump format, that offers the > following advantages: > > 1. Dumps include all the information collected by Chrome's heap profiler. > > 2. The format is simpler and more compact. > > 3. The format can easily be extended to include additional per-entry data. > > 4. The format allows for post-processing (see recategorization from > crrev.com/2906413002 as an example). > > BUG= 708930 > > Review-Url: https://codereview.chromium.org/2650863003 > Cr-Commit-Position: refs/heads/master@{#480580} TBR=mark@chromium.org Bug: 708930 , 739378 , 736714 Change-Id: Ifc2b647aa1f57f4decae6e593ad032d72208b304 Reviewed-on: https://chromium-review.googlesource.com/575682 Commit-Queue: Dmitry Skiba <dskiba@chromium.org> Reviewed-by: Primiano Tucci <primiano@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Cr-Commit-Position: refs/heads/master@{#488186} [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/BUILD.gn [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_event_writer.cc [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_event_writer.h [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_event_writer_unittest.cc [add] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_heap_dump_writer.cc [add] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_heap_dump_writer.h [add] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_heap_dump_writer_unittest.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_serialization_state.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_serialization_state.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_stack_frame_deduplicator.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_stack_frame_deduplicator.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_stack_frame_deduplicator_unittest.cc [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_string_deduplicator.cc [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_string_deduplicator.h [delete] https://crrev.com/e773fb28d92d781058b58897b4f8638166fc5a45/base/trace_event/heap_profiler_string_deduplicator_unittest.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_type_name_deduplicator.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_type_name_deduplicator.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/heap_profiler_type_name_deduplicator_unittest.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/malloc_dump_provider.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/memory_dump_manager.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/process_memory_dump.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/process_memory_dump.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/process_memory_dump_unittest.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/sharded_allocation_register.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/sharded_allocation_register.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/trace_event_memory_overhead.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/base/trace_event/trace_event_memory_overhead.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/components/tracing/test/heap_profiler_perftest.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/third_party/WebKit/Source/platform/PartitionAllocMemoryDumpProvider.cpp [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/third_party/WebKit/Source/platform/heap/BlinkGCMemoryDumpProvider.cpp [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/third_party/WebKit/Source/platform/instrumentation/tracing/web_process_memory_dump.cc [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/third_party/WebKit/Source/platform/instrumentation/tracing/web_process_memory_dump.h [modify] https://crrev.com/dd1180754539805af312a7f2ac52f758d543226f/tools/gn/bootstrap/bootstrap.py
,
May 18 2018
Heap profiling is done by memlog now, which is using the new format. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by dskiba@chromium.org
, Apr 6 2017First I noticed that the current format dumps too many entries. For example for stack like this: [Thread] a b c Where function c allocates three types: X: 1x 10000 bytes Y: 1x 20000 bytes Z: 1x 30000 bytes The current format dumps the following: "heaps": { "malloc": { "entries": [ { "bt": "", "count": "3", "size": "ea60" }, { "bt": "", "count": "1", "size": "2710", "type": "1" }, { "bt": "", "count": "1", "size": "4e20", "type": "2" }, { "bt": "", "count": "1", "size": "7530", "type": "3" }, { "bt": "0", "count": "3", "size": "ea60" }, { "bt": "0", "count": "1", "size": "2710", "type": "1" }, { "bt": "0", "count": "1", "size": "4e20", "type": "2" }, { "bt": "0", "count": "1", "size": "7530", "type": "3" }, { "bt": "1", "count": "3", "size": "ea60" }, { "bt": "1", "count": "1", "size": "2710", "type": "1" }, { "bt": "1", "count": "1", "size": "4e20", "type": "2" }, { "bt": "1", "count": "1", "size": "7530", "type": "3" }, { "bt": "2", "count": "3", "size": "ea60" }, { "bt": "2", "count": "1", "size": "2710", "type": "1" }, { "bt": "2", "count": "1", "size": "4e20", "type": "2" }, { "bt": "2", "count": "1", "size": "7530", "type": "3" }, { "bt": "3", "count": "3", "size": "ea60" }, { "bt": "3", "count": "1", "size": "2710", "type": "1" }, { "bt": "3", "count": "1", "size": "4e20", "type": "2" }, { "bt": "3", "count": "1", "size": "7530", "type": "3" } ] } } I.e. the original 3 allocations were exploded into 20 entries. Primiano noted that the current format is described in two docs: "A12 Heap Profiling in memory-infra": https://docs.google.com/document/d/1xMbBA0w5UunhTzZkdUfFIS63nRNjU6PP3wd2vf1_Ca8/edit#heading=h.7r0ivm55j5x7 "4. Heap Dump Format ": https://docs.google.com/document/d/1NqBg1MzVnuMsnvV1AKLdKaPSPGpd81NaMPVk5stYanQ/edit#heading=h.ve4u7vezmflq And suggested looking at real example, instead of toy one.