Tracing (and DevTools) makes custom element callbacks much slower |
||||||||||
Issue descriptionGoogle Chrome 51.0.2667.0 (Official Build) canary (64-bit) Revision d285b019189141a8c235f9eed8764d6129043380-refs/heads/master@{#379160} OS Mac OS X, 2012 Retina MBP Load Chrome Downloads with and without the v8 and devtools/devtools.timeline categories. Notice how long the various slices take. The final resource loading slice (ChannelProxy::Context::OnDispatchMessage) takes an extra 25ms with the v8 category turned on. This appears to be because the V8ScriptRunner::callFunction is covered in inspector and tracing overhead, for example it does: TRACE_EVENT1("devtools.timeline,v8", "FunctionCall", "data", devToolsTraceEventData(isolate, context, function)); but a web components (ex. Polymer) app will have hundreds of callbacks which is then hundreds of calls to devToolsTraceEventData which is doing lots of malloc and string copy. So the page ends up doing thousands of string copies and number to string conversions on load. This is handicapping custom element performance when looked at in the devtools. A microbenchmark is attached. The realCustomElement case takes 14ms for me without tracing, and 60ms with it.
,
Mar 12 2016
,
Mar 12 2016
there's no need for this. mmohammad@, could you explain why you need this? The issue was filed by Blink engineers.
,
Mar 12 2016
,
Mar 12 2016
Sure thank you .
,
Apr 15 2016
,
Apr 15 2016
Bisecting InspectorFunctionCallEvent::data on MBP. Unfortunately, i only have dcheck_always_on=1 in hand and I can already see how Canary is different.
no devtools: ~12ms
w/ devtools: ~14ms
timeline / tracing: ~50ms
TRACE_EVENT w/ blank TracedValue: ~20ms
timeline w/o setCallStack: ~30ms
timeline w/o setCallStack, resourceName: ~20ms
V8ScriptRunner::callFunction with no TRACE_EVENT results in the same timing as w/o tracing. However, inserting TRACE_EVENT0("devtools.timeline,v8", "FunctionCall"); immediately regresses it from 14ms to 20ms. Are those assert artifacts? @caseq: could you take a look?
,
Apr 19 2016
I've done some measurements on release Chromium: https://docs.google.com/spreadsheets/d/1VUae8gatBBd8frpLlOTTZvPwIkhrnu1OkJDJv6CR708/edit#gid=0 Summary is the following. There are three types of data is collected for the event: 1. Function name and details 2. Call stack sample 3. Extra info (line number etc) for the top frame Each piece contributes ~ 10% to the execution time.
,
Apr 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0d984b7d76cd1945834793344888127c16879ee7 commit 0d984b7d76cd1945834793344888127c16879ee7 Author: alph <alph@chromium.org> Date: Tue Apr 26 02:00:15 2016 DevTools: Do not record nested FunctionCall events. BUG= 593991 Review URL: https://codereview.chromium.org/1911243002 Cr-Commit-Position: refs/heads/master@{#389664} [modify] https://crrev.com/0d984b7d76cd1945834793344888127c16879ee7/third_party/WebKit/LayoutTests/inspector/tracing/timeline-receive-response-event-expected.txt [modify] https://crrev.com/0d984b7d76cd1945834793344888127c16879ee7/third_party/WebKit/LayoutTests/inspector/tracing/timeline-receive-response-event.html [modify] https://crrev.com/0d984b7d76cd1945834793344888127c16879ee7/third_party/WebKit/Source/bindings/core/v8/V8ScriptRunner.cpp [modify] https://crrev.com/0d984b7d76cd1945834793344888127c16879ee7/third_party/WebKit/Source/core/inspector/InspectorTraceEvents.cpp
,
May 2 2016
,
May 3 2016
I see no improvement in Chrome Canary on Mac. Note that this problem appears to be Mac specific - Linux demonstrates very little overhead when tracing through dev tools.
,
May 3 2016
When you say Linux do you mean a Z620? They're much too fast and erratic to get decent numbers.
,
May 3 2016
No, it's a Z840. Erratic or no, I'm seeing ~80ms become ~90ms on Linux, while ~80ms becomes ~180ms on Mac. I'd say there's a fairly clear difference. For what it's worth, the numbers don't actually seem to be any more spread on the Z840 than on the Mac.
,
May 3 2016
That's good to know, perhaps the TLS in tracing is very slow on Mac? If tracing is much slower on Mac for some reason we should figure out why. We definitely need to do something for this bug, many developers are on Mac.
,
May 16 2016
@alph any updates? Can you explain how the patch from #9 relates to #8 and #7?
,
Aug 22 2016
Just checked it on Canary. Seems to be good for me so far. no DT: noCustomElement 41.81999999999999 fakeCustomElement 40.57500000000002 realCustomElement 65.70000000000005 fakeCustomElementWithProto 52.27999999999997 fakeCustomElementMoreCrossing 55.704999999999984 DT (Console panel is open): noCustomElement 50.86 fakeCustomElement 44.96000000000001 realCustomElement 66.45999999999998 fakeCustomElementWithProto 51.79499999999996 fakeCustomElementMoreCrossing 47.96500000000003 DT (timeline recording) noCustomElement 47.725 fakeCustomElement 46.32000000000001 realCustomElement 64.97 fakeCustomElementWithProto 50.16 fakeCustomElementMoreCrossing 50.75 The patch makes it to record no trace event when a nested function call is executed. Thus the recording entirely relies on sampling. So it should not skew things as there's no instrumentation depending on the number of calls being invoked. I'm going to mark is as fixed. Please feed free to reopen if it doesn't work for you. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by mmohammad@chromium.org
, Mar 12 2016