base::StringPiece methods force callers to pass by reference (inefficient) |
|||
Issue description
The general Chrome rule is to pass class objects by reference, but the StringPiece documentation [1] explicitly overrules this:
"""
// Prefer passing StringPieces by value:
// void MyFunction(StringPiece arg);
// If circumstances require, you may also pass by const reference:
// void MyFunction(const StringPiece& arg); // not preferred
// Both of these have the same lifetime semantics. Passing by value
// generates slightly smaller code.
"""
However, ALL of the internal functions in the base::internal namespace in string_piece.h violate this rule, accepting StringPiece arguments by reference; e.g.:
size_t find(const StringPiece& self, const StringPiece& s, size_t pos);
Despite the StringPiece::find() method itself being inlined, the find() helper function is not, and therefore, callers of StringPiece::find are forced to supply a real memory address to a StringPiece. This often prevents optimizations where they can be stored and passed in registers.
Case study (using clang++ on Linux 64-bit):
void StringPieceFindTestVal(const std::vector<base::StringPiece>& vector) {
for (base::StringPiece piece : vector)
printf("%zu\n", piece.find('x'));
}
Assembly: https://gist.github.com/mgiuca/fb081ed2a9442ce72db52fb3e87cfe86
versus
void StringPieceFindTestRef(const std::vector<base::StringPiece>& vector) {
for (const base::StringPiece& piece : vector)
printf("%zu\n", piece.find('x'));
}
Assembly: https://gist.github.com/mgiuca/567df7eab7d16ab64edaa903756165db
The code for the by-value version is significantly less efficient. The ref version takes the address of each element of the vector and passes it to StringPiece::find. The val version allocates stack space for a StringPiece, copies the 16 bytes from the vector onto the stack (using some funky floating-point instructions, yay optimizations), then passes the address of that stack slot into StringPiece::find.
If base::internal::find was instead declared like this:
size_t find(StringPiece self, StringPiece s, size_t pos);
Then both versions would generate exactly the same code, which would likely pass the arguments by registers, never allocating stack space. (I haven't tried this change yet.)
It's a bit scary to make this change because it could potentially make some things worse (e.g., functions that take a const StringPiece& and call find() on it would rather that find() take a reference). But I think overall this would have performance benefits, especially for code that is correctly following the advice at the top of this file.
[1] https://cs.chromium.org/chromium/src/base/strings/string_piece.h
,
Mar 21 2017
Thanks for investigating this. My personal intuition is that const auto& would be better when iterating. The rationale is that if we're iterating through a list, the StringPiece is already somewhere in memory, so we have to incur the cost of the dereference to get the actual StringPiece anyway. So it's a matter of dereferencing upfront to get the by value copy, or dereferencing later when we use the reference to the StringPiece. I am curious to see if changing the internal helpers to pass args by value will change anything though...
,
Mar 22 2017
#1: > I think this is unfair ... You would never want to copy each element of the vector while iterating. But that is the very question I am trying to answer: should you iterate by value or by reference? (Your "You'd write this as" version is the baseline I'm comparing it to; you can see the assembly differences between them above.) I think iteration is somewhat of a red herring. The question applies equally to functions that take StringPiece by value. I did some analysis (looking at assembly + logicking it out) and ended up with this doc: https://docs.google.com/document/d/1tLO-Y85l-K-fFLi96xddqu2dXW8Hn8Sj-kgllPG5YBM/edit (I think writing C-equivalent code of the ASM is much easier to read than pasting in raw ASM generated by the compiler.) The conclusion I came to is that if you write a function that takes a StringPiece, it will be inefficient to call a StringPiece method. If you write a function that takes a const StringPiece&, it will be inefficient for somebody to call your function with a std::string. Therefore, we should make StringPiece methods take StringPiece by value, and uphold the guideline of taking StringPiece by value. Iteration is less critical, since if the functions/methods take StringPiece by value, then it doesn't matter much whether you iterate by value or by reference. But there are minor optimizations (potentially fewer loads) if you iterate by value, so we probably should do that (if we do the above). It's also easier to have a blanket rule "treat StringPieces as primitives, like int" (which means iterate by value, not by reference), rather than having a special case for function arguments. Either way, the documentation should clarify how you should iterate over a vector<StringPiece>. #2: > My personal intuition is that const auto& would be better when iterating. The rationale is that > if we're iterating through a list, the StringPiece is already somewhere in memory, so we have to > incur the cost of the dereference to get the actual StringPiece anyway. So it's a matter of > dereferencing upfront to get the by value copy, or dereferencing later when we use the reference > to the StringPiece. It's situational; if it's used only once, and passed by value, then it comes out exactly the same. If it's passed by reference, it's very bad to iterate by value because it has to be copied to the stack (!). But if it's being passed by value, and used multiple times within the loop body, it's better to iterate by value because then you only have to do a memory load once. (I'm surprised that LLVM doesn't optimize repeated loads into a single load, but it doesn't.) > I am curious to see if changing the internal helpers to pass args by value will change anything > though... I did that too. Here is the assembly code: https://gist.github.com/mgiuca/5642d9146b11b80b6a3d5f3324a6666b In my above doc, the assemblies I linked in this bug report are equivalent to IterValToRef and IterRefToRef, respectively; this one is equivalent to IterValToVal (or IterRefToVal, same code). The summary is it's a bit less efficient having find() take by-value (than doing by-ref iteration and having find() take by-ref). But there are other good reasons to have find() take by-value.
,
Apr 25 2017
Since there is an owner, this bug is not untriaged. Updating the status.
,
Jul 28 2017
|
|||
►
Sign in to add a comment |
|||
Comment 1 by danakj@chromium.org
, Mar 21 2017I think this is unfair: void StringPieceFindTestVal(const std::vector<base::StringPiece>& vector) { for (base::StringPiece piece : vector) printf("%zu\n", piece.find('x')); } You would never want to copy each element of the vector while iterating. You'd write this as: void StringPieceFindTestVal(const std::vector<base::StringPiece>& vector) { for (const auto& piece : vector) printf("%zu\n", piece.find('x')); } This seems orthogonal to the question of passing a StringPiece. By value allows rvalue optimizations that const& does not.