Research better algorithm for range insertion in flat_set.
Reported by
dyaros...@yandex-team.ru,
Jan 18 2017
|
|||
Issue descriptionThis task is a followup to https://bugs.chromium.org/p/chromium/issues/detail?id=671759 and https://bugs.chromium.org/p/chromium/issues/detail?id=682215 Currently flat_set::insert(first, last) is implemented as insertion by one element with a hint after boost, eastl and folly implementations. This seems to be a suboptimal implementation (for example if flat_set is empty we could copy everything, call stable_sort and unique to get the same result). Watch out for std::inplace_merge, it's currently unstable on mac - https://bugs.chromium.org/p/chromium/issues/detail?id=668821
,
Jan 19 2017
ajha: why did you add those labels? This doesn't seem restricted to the omnibox component, nor do I see how/why it needs triage help. This is a followup for code improvements from other work.
,
Jan 19 2017
UI>Browser>Omnibox label came from Issue 671759 referenced in C#0. As part of triaging, we are adding the components to the Unconfirmed bugs so that this reaches to the respective team. Apologies if the labels are not correct, Please adjust to more appropriate component if you feel so.
,
Jan 23 2017
,
Mar 31 2017
What do you think about simply inserting everything at the end of the underlying storage, followed by a stable_sort and unique? You already hinted at stable_sort and unique in the empty case. This throws away information about the existing range already being sorted, however it should still be stable overall and the time complexity is O(n log n) in the worst case. Once std::inplace_merge is fixed, we could only stable_sort the just inserted elements followed by inplace_merge and unique. This should be faster, given that we don't have to stable_sort everything anymore.
,
Mar 31 2017
@jdoerrie: Well, std::stable_sort has bad complexity for insert. With implace_merge - it was my original implementation: https://codereview.chromium.org/2333253002/diff/20001/base/containers/flat_sorted_container_base.h For my huge surprise, it was unbelievably slower than inserting by one (at least in history, where I tested). My guess would be that it works very bad for ranges of significantly different sizes. (especially the unique thing). I (very very much from time to time) work on benchmarks for this (and other flat_set related staff). Unfortunately so far I'm unable to get consistent results. Without the benchmarks I don't know how to do this task)
,
Mar 31 2017
@jdoerrie: If you are interested - I'm more than happy if you do the benchmarks.
,
Mar 31 2017
@dyaroshev: That is very surprising indeed, I would not expect that. I'm happy to do a couple of benchmarks on this, but I can't promise any time frame. I am very interested to get to the bottom of this, though.
,
May 3 2017
I had some time to spare and ended up doing some benchmarks. Code: http://crrev.com/2854353002 Results: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=0 I tested the following insertion strategies: - "One by One": This is simple single element insert in a loop - "Stable Sort All": This is what I hinted at in #c5, i.e. insert the new range at the end of the vector and stable sort everything. - "Stable Sort Smart": Only stable sort the new range and then use inplace_merge. I was not able to reproduce your findings that one by one seems to be better. In fact, only in a few degenerate cases "one by one" is competitive. These are the cases where "one by one" has O(n log n) complexity, which include inserting an already sorted range into an empty set or trying to insert a range where every item is already present. In general "One by One" is very sensitive to the duplication rate, which should not be too surprising. Relevant filter views: - Sorted Range in Empty Set: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=0&fvid=1821993993 - Insertion with every element already present: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=0&fvid=1341229300 - Insertion of large range into large set with varying duplicate rate: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=0&fvid=1980801371 In general, "Stable Sort Smart" seems to be the best option, and can be up to 3 times faster than "Stable Sort All". These are cases where we try to insert a small range into an already existing big set: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=0&fvid=499056603 However, they perform pretty much equal if the inserted range is about the same size of the existing set or larger. Please feel free to run your own experiments and point out possible mistakes in my benchmark. I am very happy to discuss this further. However, assuming my findings are correct, I suggest we implement the "Stable Sort Smart" solution, which also seems to be folly's implementation: https://github.com/facebook/folly/blob/master/folly/sorted_vector_types.h#L147 There is still the issue of libc++'s inplace_merge being unstable for some inputs, though. I submitted a fix for that issue, but I don't know how long it would take for it to land and take effect for us: https://reviews.llvm.org/D32788 I suggest we wait for the fix to be present on all platforms, but we could use "Stable Sort All" as an intermediate solution until then.
,
May 4 2017
@jdoerrie. 1 - Good job. Fixing libc++ is awesome. I've reproduced results that scared me away from sort + inplace_merge + unique: https://codereview.chromium.org/2862813002 2 - FYI: std::inplace_merge is std::merge if there is enough memory available. This could be a reasonable way to work around the fact that std::inplace_merge is unstable.
,
May 4 2017
@dyaroshev: 1 - Thank you for providing your code. I was able to reproduce your results, however I noticed the following: - You are inserting an already sorted range, which does benefit one-by-one a lot if the items in the range are bigger than the items that are already present in the set. In these cases one-by-one is able to achieve an overall complexity of O(n) if combined with an insertion hint at the end of the container. - In the cases where MERGE performs horribly you are inserting many ranges of size one. Logging the occurrences of sizes I obtained the following: Size, Count 1, 2461 2, 15 3, 8 4, 12 5, 1 68, 1 10000, 2 MERGE performs badly here, because it unconditionally performs a full pass over all items during the unique step. Since one by one does not do this, it performs a lot better. When fixing MERGE to do single element insert for ranges of size one the performance issue goes away. This seems like a good idea in general and should also be done in the final implementation. 2 - Unfortunately this is not true. http://en.cppreference.com/w/cpp/algorithm/merge: "The behavior is undefined if the destination range overlaps either of the input ranges (the input ranges may overlap each other)." However, we could work around the stability issues by implementing our own base::InplaceMerge until the fix is live in libc++.
,
May 4 2017
@jdoerrie: 1) Very close to what I was thinking. But: - What about 2 elements? 3? - Maybe we ought to merge, but using binary search? - Maybe we can avoid inserting repeating elements? - What about creating a buffer via std::get_temporary_buffer, and sorting there? - Can we take advantage of the fact that we can assume that we'll exceed current capacity and create a new vector to merge into? 2) We could do smth like this (sudo code). std::inplace_merge (at least according to Stepanov's course would do something like this). vec buffer; unique_set_union(begin(), end(), first, last, std::back_inserter(buffer)); body_ = move(buffer);
,
May 4 2017
1) Those are good points. Here's another idea that just came to my mind: - Do a copy_if of the input range to the end of body filtering out the ones that are already present via binary search. - (Stable) Sort and unique the just inserted elements. - Follow up with a call to inplace_merge, which can be omitted if the smallest of the new elements is larger than the biggest of the old ones. This removes the need to call unique on the whole range. In order to find out what is best we definitely should do more benchmarks, though. 2) Does unique_set_union exist in the standard library? If not, could you provide a reference to the course?
,
May 4 2017
1) Well, good benchmarks are problematic. I've been experimenting with google benchmark library - seems to work well, but I haven't got around to introducing it into the code base.
2) unique_set_union does not exist in the standard library, but I don't believe it's hard to implement.
I meant that Stepanov says we can relax demands for merge: it's ok to merge as long as you have enough distance between output iterator and inputs.
Meaning:
void merge_intersecting(I1 f1, I1 l1, I2 f2, I2 l2, I1 out) {
DCHECK(f1 - out < l2 - f2); // this is enough.
}
Here is a link to the course: https://www.youtube.com/playlist?list=PLHxtyCq_WDLXryyw91lahwdtpZsmo4BGD but it's more than 20 2 hour lectures, and I couldn't find the notes online.
I've looked at the original stl: https://www.sgi.com/tech/stl/stl_algo.h
they do smth different: they use buffer to merge while the smallest range has elements and then just copy to the result. We could used tail of our vector like that.
,
May 5 2017
Thanks for the links to the lecture and implementation, that looks very interesting! If I read this correctly, libc++'s implementation actually does something very similar. They also use a buffer where they move the smaller range and then use the original input to store the result of the merge. In the mean time I went ahead and implemented my idea from #c13. You can find it here (look for the STABLE_SORT_SMARTER label): https://codereview.chromium.org/2854353002/diff/20001/base/containers/flat_tree.h#newcode653 This performs very well on all benchmarks I tested it on, including yours with many insertions of small ranges. Furthermore, it avoids the issue of unstable std::inplace_merge, since by the time it is called all elements are already unique, and thus the instability doesn't matter. Feel free to experiment more, and let me know if you can spot other issues with it. Otherwise I would like to send this out for code review including appropriate tests.
,
May 5 2017
@jdoerrie Do I understand correctly that std::inplace_merge works well with ranges of different sizes and std::unique was the only problem? If so - I think you can go ahead and create a final CL. Please add danakj and brettw to reviewers. I'll try to crack holes in your algorithm over the weekend and if I come up with something, I'll let you know. Great Job.
,
May 5 2017
@dyaroshev: std::unique was the biggest problem, that's true. std::inplace_merge also has linear complexity, but I do an explicit check to skip it if it is a no-op. In case it's not, the linear complexity can't be completely avoided, think about inserting a small range at the very beginning of the underlying body for instance. However, you do raise a good point. I suppose if you had an initial set containing the number 1,000,000 and were to do single element range inserts of the numbers 1 through 999,999 my current algorithm would exhibit quadratic performance, while the dead simple one by one method would likely be faster given that there is almost no shifting of existing elements. It might be worth doing a binary search for the first insertion point prior to calling std::inplace_merge. Could you think of something else?
,
May 5 2017
@jdoerrie At the very least, we should test for the case where one set is much bigger than the other. Actually look at __merge_without_buffer in the original stl : it does binary search for the first element. I have a hunch that a main problem with std::inplace_merge is that it chooses between two algorithms based solely on the avaliable buffer size. What do you think? How hard it is for you to measure __merge_without_buffer vs __merge and __merge_backwards?
,
May 8 2017
@dyaroshev: I sent out my algorithm for review in http://crrev.com/2853333004 and added you and the others as reviewers. I added a stress test for single element insertions to http://crrev.com/2854353002 where I implemented what I hinted at in #c17. I added my results to the spread sheet, you can find them here: https://docs.google.com/spreadsheets/d/1a-6VnhWAo0j3aZIP3S0JnLiWBRiDvc8duZZC-PEko1Y/edit#gid=1713685737 As expected, single element insert performs best here. My current algorithm performs very close though. The other two tested algorithms are way to slow to be useful here. I wasn't able to test the various merge methods unfortunately. I don't think choosing between algorithms is a problem, since I expect it is common that enough extra space is available, at which point it does choose the more efficient algorithm.
,
May 10 2017
@jdoerrie - I've replied in your review. Please open access to the documents, when you share them.
,
May 10 2017
Sorry, my bad. I thought I had granted global access. I gave you access now.
,
May 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/154dc49655238f0fc9928763bcf47930823a8e02 commit 154dc49655238f0fc9928763bcf47930823a8e02 Author: jdoerrie <jdoerrie@chromium.org> Date: Fri May 19 10:35:28 2017 Add range insertion for base::flat_tree This change implements an efficient algorithm for base::flat_tree::insert(Iter, Iter). BUG=671759, 682249 Review-Url: https://codereview.chromium.org/2853333004 Cr-Commit-Position: refs/heads/master@{#473148} [modify] https://crrev.com/154dc49655238f0fc9928763bcf47930823a8e02/base/containers/flat_map.h [modify] https://crrev.com/154dc49655238f0fc9928763bcf47930823a8e02/base/containers/flat_set.h [modify] https://crrev.com/154dc49655238f0fc9928763bcf47930823a8e02/base/containers/flat_tree.h [modify] https://crrev.com/154dc49655238f0fc9928763bcf47930823a8e02/base/containers/flat_tree_unittest.cc |
|||
►
Sign in to add a comment |
|||
Comment 1 by ajha@chromium.org
, Jan 19 2017Labels: TE-NeedsTriageHelp