Select a new and better hash function for words |
||||||||
Issue descriptionFont code uses a cache of Shapes to speed up text layout operations. The hash used is SuperFastHash (created by Paul Hsie in 2004). Issue is that since then, there have being research for developing better and faster hashes. The code is at: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/wtf/StringHasher.h?l=42 The objective here is to find a better hash that will perform well and help in avoiding collisions.
,
Jun 21 2017
Candidates so far are: a) SIPHash: https://github.com/veorq/SipHash b) CityHash: https://github.com/google/cityhash c) SpookyHash: http://burtleburtle.net/bob/hash/spooky.html d) HighwayHash: https://github.com/google/highwayhash e) CRC32 (as there is a dedicated instruction on ARMv8 and Intel).
,
Jun 21 2017
,
Jun 21 2017
,
Jun 21 2017
,
Jun 21 2017
I've collected data on ARMv7/ARMv8 and Intel for: a) SIPHash b) CityHash c) SpookyHash e) CRC32 Pending to collect data on HighwayHash.
,
Jun 21 2017
We already have a dep on cityhash. Unless it's a lot worse than the rest, can we use that?
,
Jun 21 2017
For ARMv7 (majority of mobile devices on user's hands today), CityHash is about 20% slower than FastHash (on ARMv8 it was about 5%) for the current scenario (i.e. words from a web page). But for Intel it is actually faster about 26% faster. I haven't collected data concerning collisions on common web pages, as for that we need to fix: https://bugs.chromium.org/p/chromium/issues/detail?id=735663
,
Jun 22 2017
Thanks for the word cache hash function data in the other bug, https://bugs.chromium.org/p/chromium/issues/detail?id=735663#c5 Thanks Adenilson, this is quite interesting data. I don't have a good reference for whether this is a good result or whether the spread should be better with less collisions. So if we had a better spread and could reduce rehashes let's say by half, that would mean we could save about 1000 re-hash computations? I am not sure how much time is spent in those. So before we optimize this, it would also be interesting to see whether changing the hash function can improve for example page cycler results on mobile or desktop.
,
Jun 27 2017
@drott: sorry for the delay for replying, I was collecting further data on the hashes. On the data I shared in the related issue (https://bugs.chromium.org/p/chromium/issues/detail?id=735663#c10), it was clear that pretty much any other hash function would generate less collisions than, say what we use today (SuperFastHash). On top of that, there is the issue of the performance, where I shared privately the data I collected and also identified 2 alternative hash functions that performed better than SuperFastHash (between 20 to 24% faster). The only pending question was the rehashes. If I understood correctly from reading the code in HashTable, rehashes can happen when: a) A new item is inserted and the Table has to grow. b) If there is an unbalance between the empty slots x elements in the table, so it will shrink in size (therefore, the need for rehashing). Rehashes in the HashTable are undesirable because they imply reallocating the table and moving the elements around from the older table to a new one. That being said, AFAICT, it will recycle the pre-calculated hashes for each item. One solution I found to avoid those rehashes was to assign an initial capacity for the HashTable using HashTable::reserveCapacityForSize(). With an initial size of 1000 elements, it lowered the rehashes to only 2 (instead of 16, so 8x less). I will experiment a bit more with it (and try to assess the memory use), but since the ShapeCache can grow to 10K words, maybe 10 or 5% of that size would acceptable? (i.e. 1000 or 500 elements). A better match would be a size that would be close to the median number of words in a common web page. I wonder if someone has this data?
,
Jul 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cb4e97648309bafb70ee7aa725d39b6599d6011e commit cb4e97648309bafb70ee7aa725d39b6599d6011e Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com> Date: Tue Jul 18 18:59:45 2017 Specialized character type constructors to improve ShapeCache hashing This allows to avoid copying text runs character-per-character for 16bits text. Also there is no need to hash per character, just hash the whole string in one go. This also helps to save the construction of a StringHasher object (as we use the static method to hash the strings). Finally, reserve in HashTable space for 500 words (this reduces the number of rehashes in up to 4x). Bug: 735674 Change-Id: I4c5e23bb1e21dad995d4948fae0943a2b01309e0 Reviewed-on: https://chromium-review.googlesource.com/564076 Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org> Reviewed-by: Emil A Eklund <eae@chromium.org> Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> Cr-Commit-Position: refs/heads/master@{#487543} [modify] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/BUILD.gn [add] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp [modify] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h
,
Jul 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c commit 2b0a3c9b588ba16318dea3a7704bdde8662a7b3c Author: Annie Sullivan <sullivan@chromium.org> Date: Mon Jul 24 16:15:28 2017 BUG= 747028 Revert "Specialized character type constructors to improve ShapeCache hashing" This reverts commit cb4e97648309bafb70ee7aa725d39b6599d6011e. Reason for revert: Reverting per comments on bug 747028 , it's a memory regression and we'll revert during investigation. Original change's description: > Specialized character type constructors to improve ShapeCache hashing > > This allows to avoid copying text runs character-per-character > for 16bits text. > > Also there is no need to hash per character, just hash the > whole string in one go. This also helps to save the construction > of a StringHasher object (as we use the static method to hash > the strings). > > Finally, reserve in HashTable space for 500 words (this reduces > the number of rehashes in up to 4x). > > Bug: 735674 > Change-Id: I4c5e23bb1e21dad995d4948fae0943a2b01309e0 > Reviewed-on: https://chromium-review.googlesource.com/564076 > Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org> > Reviewed-by: Emil A Eklund <eae@chromium.org> > Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> > Cr-Commit-Position: refs/heads/master@{#487543} TBR=thakis@chromium.org,eae@chromium.org,drott@chromium.org,cavalcantii@chromium.org,adenilson.cavalcanti@arm.com # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 735674 Change-Id: I5b8efd92dcbdd03a3f6b07973c80e9f0ffbc7f93 Reviewed-on: https://chromium-review.googlesource.com/583127 Reviewed-by: Annie Sullivan <sullivan@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Annie Sullivan <sullivan@chromium.org> Cr-Commit-Position: refs/heads/master@{#488989} [modify] https://crrev.com/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c/third_party/WebKit/Source/platform/BUILD.gn [delete] https://crrev.com/434d4dfe55708db7f54d7820626fcf5de0d0b63c/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp [modify] https://crrev.com/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h
,
Aug 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/498c6180fb8e7993176e089ac9225f0357d1a21a commit 498c6180fb8e7993176e089ac9225f0357d1a21a Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com> Date: Wed Aug 09 21:45:01 2017 Specialized character type constructors to improve ShapeCache hashing This allows to avoid copying text runs character-per-character for 16bits text. Also there is no need to hash per character, just hash the whole string in one go. This also helps to save the construction of a StringHasher object (as we use the static method to hash the strings). Bug: 735674 Change-Id: I5221ea9722970f4d977924ad3183fa73097c4184 Reviewed-on: https://chromium-review.googlesource.com/604215 Reviewed-by: Emil A Eklund <eae@chromium.org> Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> Cr-Commit-Position: refs/heads/master@{#493152} [modify] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/BUILD.gn [add] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp [modify] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h
,
Nov 2
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb commit c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com> Date: Fri Nov 02 01:11:10 2018 Updating ShapeCache hash function Blink is using a suboptimal hash function both in hashing quality (i.e. collisions) as also in speed as since 2004 new and faster hash functions were developed. Based on testing on ARMv7 and x86, CityHash seems like a good alternative based on both performance, portability and quality. For further data: https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/ We should consider next adding xxHash as it performed better on AArch64. Bug: 735674 Change-Id: I8c7ccf2d834a42a9fcb01e2cc80ecda1e1c01772 Reviewed-on: https://chromium-review.googlesource.com/c/1311798 Reviewed-by: Emil A Eklund <eae@chromium.org> Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org> Cr-Commit-Position: refs/heads/master@{#604794} [modify] https://crrev.com/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb/third_party/blink/renderer/platform/BUILD.gn [modify] https://crrev.com/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb/third_party/blink/renderer/platform/fonts/shaping/shape_cache.cc
,
Nov 7
,
Nov 8
It was a boost of around 19% in time spent on layout for wikipedia C++ page in a Xeon processor and 23% in power efficient/low cost A53 processors running Chrome in 64bits on Android.
,
Nov 9
,
Nov 9
,
Nov 9
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by cavalcantii@chromium.org
, Jun 21 2017Owner: cavalcantii@chromium.org
Status: Started (was: Untriaged)