New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 694810 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug
Hotlist-MemoryInfra

Blocking:
issue 698266
issue 686208



Sign in to add a comment

Clean up PseudoStackFrame operator==

Project Member Reported by ajwong@chromium.org, Feb 21 2017

Issue description

Forking from 686208.

The current operator== erroneously relies on string pooling to consistently collapse all equal string literals that tracing macros of the following form:

TRACE_EVENT_BEGIN0("hi")
TRACE_EVENT_END0("hi")

end up effectively being passed the same pointer even though the literals are written on two different lines.

This pooling behavior is default on gcc and clang, but not in MSVC in debug mode. In https://codereview.chromium.org/2689793003, wez@ put a temporary hack around to make MSVC behave like gcc/clang, but it's still sorta shaky to depend on implementation defined behavior (note that by spec, this IS implementation defined behavior and NOT undefined behavior so we are actually safe on these toolchains...it just feels wrong regardless.)

The cleaner fix, suggested by primiano is

  DCHECK(a == b || strcmp(a, b) ==0)

which fast-tracks an equal which is the common case.

Note also that using the TraceEvent.id() field doesn't work since that's not guaranteed to be the same across macro invocations.

There was also a discussion of removing the operator== by awong (me), but apparently that was based on an outdated understanding of the styleguide... operator== is apparently preferred now. (wow).
 

Comment 1 by ajwong@chromium.org, Feb 21 2017

Blocking: 686208

Comment 2 by ssid@chromium.org, Feb 23 2017

Cc: dskiba@chromium.org
Replying to Primiano:

> ## Option 1
> We deem two pseudo-StackFrame equal if their strings are deep-equal. This solution is: mentally clean, more complex code-wise, very likely perf-intrusive.
> We can still make the operator== speculatively fast in most cases (look at pointer equality first, and than do deep strcmp). However I don't see any easy way around the hash operator. I think we have no options other than hashing the full contents of the string, and this might affect performances badly.
>
> ## Option 2
> We bind the identity of stackrames (and hence BackTraces) to the string pointer and not the actual string content and reconcile everything in the HeapDumpWriter, which is out of the fastpath.
> This solution is: mentally a bit more complex, clean and simple code-wise, non perf-intrusive, a bit more memory intrusive.
>
> Won't Option 2 break things? Not if we factor this in into the HeapDumpWriter (the thing that writes the grouped snapshot into the trace)
> So this is what I am saying here. Imagine that we have two allocations made by
> "main" -> "MessageLoop::Run()" -> "TheFunctionName"(0x1234)
> "main" -> "MessageLoop::Run()" -> "TheFunctionName"(0x5678)
> (1234 and 5678 being the hypotetical const char* ptr).
> Option 2 means that they will be treated as distinct entries by the heap profiler in the chromium C++ code, until the very last export stage.
> Nothing is functionally wrong so far. Thankfully no lookup ever happens in the other direction (read: from within the heap profiler C++ code, we never query how many allocations are tied to "TheFunctionName", again % the exporter).
> The only problem, so far, is that now we will waste two entries in the deduplicator because the two "TheFunctionName" have two identities. But again, assuming that string are *mostly* pooled this is not a real problem.
>
> From a coding perspective this means leaving 2 and 3 as they are (% adding comments to explain what's going on).
> The only place where the identity of "TheFunctionName" comes into play is at the exporter level, when we are in heap_profiler_heap_dump_writer.cc .
> This is the thing that needs to change. This will have to group things by their actual content and hence doing full string comparison / hashing.
> But now, here's the good news: this one is out of the fast path. This does NOT happen while we intercept a malloc/free call. This happens when we decide to inject a snapshot in the trace (today: every X seconds), and when it happens, it does run on a background thread. So making this even 2x slower doesn't make me worry too much.

I think we do not even need to fix heap_profiler_heap_dump_writer because the UI already does the same grouping. So, what really happens now is that the heap dump write deduplicates based on the pointer equality and writes strings into the trace. The UI when reading the heap dumps groups the frames with same name together and adds up the numbers.

> So, all this has been a bit of a big braindump.
> Trying to to make an overall balance, I would personally opt for fixing the DCHECK (in whatever form you like) and go for Option 2 here.
> Rationale:
> - It will keep everything as fast as possible.
> - It requires no code changes (% comments) in the existing fastpath classes.
> - It requires code changes in a piece of code (heap_profiler_heap_dump_writer.cc) that we were already planning to change, in order to improve the serialization format (dskiba, do we have a bug that tracks this work? if not we should, to keep everybody aligned).
>
> Thoughts?
>

As you said option 1 is not really needed since it does affect the fast path. Do we really need to touch the slow path and fix the heap dump writer here, when the TraceViewer already does the same disambiguation for these strings. The only bad part right now is that we dump the strings multiple times into the trace file. But since we have already seen that these are not very frequent, we can afford to have few extra entries in the trace.

I would not suggest we go with option 1 here and add a strcmp in the DCHECK because it affects the speed. Since we are trying to enable the heap profiler in the field soon, we can't afford to lose the performance of the allocation register.
If we feel that we need to deduplicate these strings for the trace file size, then lets go with a special function that does deep equals in the deduplicator only on the background thread.
> Do we really need to touch the slow path and fix the heap dump writer here, when the TraceViewer already does the same disambiguation for these strings.
Let's reason on this once the new heap stuff is landed. Depends on how/where we do the aggregation math. If it is not too much effort and will keep things easier to reason about (fixing the heap dump writer).

Comment 4 by dskiba@chromium.org, Feb 23 2017

Cc: hjd@chromium.org
We don't need to do anything (apart from fixing DCHECK) - UI already (somehow) handles situation where several frames at the same level are named the same (but have different ids).

This is currently happening in native traces - after symbolization distinct pointers can end up having similar name.

I.e. for

main() {
  foo();
}

foo() {
  bar();
  baz();
}

stack unwinding produces two different addresses for bar() / baz() invocations, and we end up with:

0x1000   [return address of foo(), points somewhere in main(), id=1]
  0x2000 [return address of bar(), points somewhere in foo(),  id=2]
    ...
  0x2010 [return address of baz(), points somewhere in foo(),  id=777]
    ...

After symbolization this stack trace looks like:

main()    [id=1]
  foo()   [id=2]
    ...
  foo()   [id=777]
    ...

And UI manages to show this correctly. Adding hjb@ to confirm.

(Note that for the new format I have plans to dedup stack entries to make traces smaller.)

Comment 5 by hjd@chromium.org, Feb 23 2017

Confirm. For better or worse this line: https://github.com/catapult-project/catapult/blob/574285df8dae4445d2370362a2cff4cce0eaa8c0/tracing/tracing/ui/analysis/memory_dump_heap_details_pane.html#L428 effectively de-dupes the stack frames.
getUserFriendlyStackTrace() returns an array of strings (one for each level) by walking the stack frame tree from leaf to root and we insert the data associated with that stack frame into our model based on comparing the strings in that array.

Blocking: 698266

Comment 7 by ssid@chromium.org, Mar 6 2017

Adding a note here: We also need this feature in UI for grouping allocations based on categories. Tasks posted from 2 files in the same folder are grouped under same category by the UI since the c++ logic considers them as different type names.
ssid@, can you explain #7 more?

I think this is WAI - TypeNameDeduplicator converts different paths into similar categories, and UI aggregates them and shows as one entry.

Comment 9 by ssid@chromium.org, Mar 6 2017

Yes, it's WAI. I am just pointing out one other use case of UI doing the aggregation in case the discussion leads to "chrome should do deduplication completely and UI should just display as is"

Sign in to add a comment