devirt-want: hoist a pointer to a virtual function called in a loop |
|||
Issue descriptionChrome has quite a few hot places, where the same virtual is called inside a loop (sometimes, double loop). Initially, that was discovered during the analysis of the performance regressions due to enabling CFI ( https://crbug.com/634139#c12 , but the (somewhat lower) overhead being paid by the regular build as well. From V8: https://cs.chromium.org/chromium/src/v8/src/global-handles.cc?sq=package:chromium&rcl=1470309818&l=1158 void GlobalHandles::IterateAllRootsWithClassIds(ObjectVisitor* v) { for (NodeIterator it(this); !it.done(); it.Advance()) { if (it.node()->IsRetainer() && it.node()->has_wrapper_class_id()) { v->VisitEmbedderReference(it.node()->location(), it.node()->wrapper_class_id()); } } } From CC: https://cs.chromium.org/chromium/src/cc/playback/display_item_list.cc?q=DisplayItemList::Raster&sq=package:chromium&dr=CSs&l=132 void DisplayItemList::Raster(SkCanvas* canvas, SkPicture::AbortCallback* callback) const { ... for (size_t index : indices) { ... if (callback && callback->abort()) break; } } In both cases a pointer to the particular implementation can be computed before a loop, as nor the type of the visitor, nor the type of the callback are being changed there. So, instead of a virtual call, where we read vptr, then read from vtable by the index, then jump, we can same the pointer and then make an indirect jump without additional loads or instructions. We need to double check with C++ standard experts that such optimization is allowed.
,
Aug 27 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/825ce48757eb1ec0428fe573262614635d25cb7f commit 825ce48757eb1ec0428fe573262614635d25cb7f Author: krasin <krasin@google.com> Date: Sat Aug 27 11:01:11 2016 Disable CFI on a few src/cc methods for perf reasons. While we have not observed CFI to slowdown real-world use cases, there are a few blink_perf microbenchmarks, which are somewhat affected by the change. This change removes CFI protection from the methods which contribute the most to the slowdown. Eventually, when a proposed optimization in Clang is implemented (https://crbug.com/638056), these attributes would be possible to remove from all/most of the methods. BUG=638056, 634139 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Review-Url: https://codereview.chromium.org/2257323002 Cr-Commit-Position: refs/heads/master@{#414916} [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/base/compiler_specific.h [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/cc/base/contiguous_container.h [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/cc/blink/web_layer_impl.cc [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/cc/playback/display_item_list.cc [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/cc/playback/drawing_display_item.cc [modify] https://crrev.com/825ce48757eb1ec0428fe573262614635d25cb7f/cc/trees/layer_tree_host.cc
,
Sep 8 2016
,
Aug 1
|
|||
►
Sign in to add a comment |
|||
Comment 1 by krasin@chromium.org
, Aug 16 2016As a speculative extension to the proposed optimization, we might want to remember pointers to the virtual function implementation for items in loops to speed up the code like: for (const auto& item : inputs_.items) { external_memory_usage += item.ExternalMemoryUsage(); } while the actual type of items could be different, the adjacent items will likely have the same type, so we could often reuse a pointer from the previous iteration (as well as avoid a CFI check).