New issue
Advanced search Search tips

Issue 735674 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 9
Cc:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 3
Type: Feature

Blocked on:
issue 735663

Blocking:
issue 902789
issue 903973



Sign in to add a comment

Select a new and better hash function for words

Project Member Reported by cavalcantii@chromium.org, Jun 21 2017

Issue description

Font code uses a cache of Shapes to speed up text layout operations.

The hash used is SuperFastHash (created by Paul Hsie in 2004). Issue is that since then, there have being research for developing better and faster hashes.

The code is at: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/wtf/StringHasher.h?l=42

The objective here is to find a better hash that will perform well and help in avoiding collisions.
 
Cc: drott@chromium.org thakis@chromium.org
Owner: cavalcantii@chromium.org
Status: Started (was: Untriaged)
Candidates so far are:

a) SIPHash: https://github.com/veorq/SipHash

b) CityHash: https://github.com/google/cityhash

c) SpookyHash: http://burtleburtle.net/bob/hash/spooky.html

d) HighwayHash: https://github.com/google/highwayhash

e) CRC32 (as there is a dedicated instruction on ARMv8 and Intel).
Labels: OS-All
Labels: -Type-Bug Type-Feature
Blockedon: 735663
I've collected data on ARMv7/ARMv8 and Intel for:
a) SIPHash
b) CityHash
c) SpookyHash
e) CRC32

Pending to collect data on HighwayHash.

Comment 7 by thakis@chromium.org, Jun 21 2017

We already have a dep on cityhash. Unless it's a lot worse than the rest, can we use that?
For ARMv7 (majority of mobile devices on user's hands today), CityHash is about 20% slower than FastHash (on ARMv8 it was about 5%) for the current scenario (i.e. words from a web page).

But for Intel it is actually faster about 26% faster.

I haven't collected data concerning collisions on common web pages, as for that we need to fix: https://bugs.chromium.org/p/chromium/issues/detail?id=735663

Comment 9 by drott@chromium.org, Jun 22 2017

Cc: e...@chromium.org
Thanks for the word cache hash function data in the other bug, https://bugs.chromium.org/p/chromium/issues/detail?id=735663#c5

Thanks Adenilson, this is quite interesting data. I don't have a good reference for whether this is a good result or whether the spread should be better with less collisions. 

So if we had a better spread and could reduce rehashes let's say by half, that would mean we could save about 1000 re-hash computations? I am not sure how much time is spent in those. 

So before we optimize this, it would also be interesting to see whether changing the hash function can improve for example page cycler results on mobile or desktop.
@drott: sorry for the delay for replying, I was collecting further data on the hashes.

On the data I shared in the related issue (https://bugs.chromium.org/p/chromium/issues/detail?id=735663#c10), it was clear that pretty much any other hash function would generate less collisions than, say what we use today (SuperFastHash).

On top of that, there is the issue of the performance, where I shared privately the data I collected and also identified 2 alternative hash functions that performed better than SuperFastHash (between 20 to 24% faster).

The only pending question was the rehashes. If I understood correctly from reading the code in HashTable, rehashes can happen when:

a) A new item is inserted and the Table has to grow.

b) If there is an unbalance between the empty slots x elements in the table, so it will shrink in size (therefore, the need for rehashing).

Rehashes in the HashTable are undesirable because they imply reallocating the table and moving the elements around from the older table to a new one. That being said, AFAICT, it will recycle the pre-calculated hashes for each item.

One solution I found to avoid those rehashes was to assign an initial capacity for the HashTable using HashTable::reserveCapacityForSize(). With an initial size of 1000 elements, it lowered the rehashes to only 2 (instead of 16, so 8x less).

I will experiment a bit more with it (and try to assess the memory use), but since the ShapeCache can grow to 10K words, maybe 10 or 5% of that size would acceptable? (i.e. 1000 or 500 elements).

A better match would be a size that would be close to the median number of words in a common web page. I wonder if someone has this data?
Project Member

Comment 11 by bugdroid1@chromium.org, Jul 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/cb4e97648309bafb70ee7aa725d39b6599d6011e

commit cb4e97648309bafb70ee7aa725d39b6599d6011e
Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com>
Date: Tue Jul 18 18:59:45 2017

Specialized character type constructors to improve ShapeCache hashing

This allows to avoid copying text runs character-per-character
for 16bits text.

Also there is no need to hash per character, just hash the
whole string in one go. This also helps to save the construction
of a StringHasher object (as we use the static method to hash
the strings).

Finally, reserve in HashTable space for 500 words (this reduces
the number of rehashes in up to 4x).

Bug:  735674 
Change-Id: I4c5e23bb1e21dad995d4948fae0943a2b01309e0
Reviewed-on: https://chromium-review.googlesource.com/564076
Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org>
Reviewed-by: Emil A Eklund <eae@chromium.org>
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#487543}
[modify] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/BUILD.gn
[add] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp
[modify] https://crrev.com/cb4e97648309bafb70ee7aa725d39b6599d6011e/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h

Project Member

Comment 12 by bugdroid1@chromium.org, Jul 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c

commit 2b0a3c9b588ba16318dea3a7704bdde8662a7b3c
Author: Annie Sullivan <sullivan@chromium.org>
Date: Mon Jul 24 16:15:28 2017

BUG= 747028 


Revert "Specialized character type constructors to improve ShapeCache hashing"

This reverts commit cb4e97648309bafb70ee7aa725d39b6599d6011e.

Reason for revert: Reverting per comments on  bug 747028 , it's a memory regression and we'll revert during investigation.

Original change's description:
> Specialized character type constructors to improve ShapeCache hashing
> 
> This allows to avoid copying text runs character-per-character
> for 16bits text.
> 
> Also there is no need to hash per character, just hash the
> whole string in one go. This also helps to save the construction
> of a StringHasher object (as we use the static method to hash
> the strings).
> 
> Finally, reserve in HashTable space for 500 words (this reduces
> the number of rehashes in up to 4x).
> 
> Bug:  735674 
> Change-Id: I4c5e23bb1e21dad995d4948fae0943a2b01309e0
> Reviewed-on: https://chromium-review.googlesource.com/564076
> Reviewed-by: Adenilson Cavalcanti <cavalcantii@chromium.org>
> Reviewed-by: Emil A Eklund <eae@chromium.org>
> Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#487543}

TBR=thakis@chromium.org,eae@chromium.org,drott@chromium.org,cavalcantii@chromium.org,adenilson.cavalcanti@arm.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug:  735674 
Change-Id: I5b8efd92dcbdd03a3f6b07973c80e9f0ffbc7f93
Reviewed-on: https://chromium-review.googlesource.com/583127
Reviewed-by: Annie Sullivan <sullivan@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Annie Sullivan <sullivan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#488989}
[modify] https://crrev.com/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c/third_party/WebKit/Source/platform/BUILD.gn
[delete] https://crrev.com/434d4dfe55708db7f54d7820626fcf5de0d0b63c/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp
[modify] https://crrev.com/2b0a3c9b588ba16318dea3a7704bdde8662a7b3c/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h

Project Member

Comment 13 by bugdroid1@chromium.org, Aug 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/498c6180fb8e7993176e089ac9225f0357d1a21a

commit 498c6180fb8e7993176e089ac9225f0357d1a21a
Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com>
Date: Wed Aug 09 21:45:01 2017

Specialized character type constructors to improve ShapeCache hashing

This allows to avoid copying text runs character-per-character
for 16bits text.

Also there is no need to hash per character, just hash the
whole string in one go. This also helps to save the construction
of a StringHasher object (as we use the static method to hash
the strings).

Bug:  735674 
Change-Id: I5221ea9722970f4d977924ad3183fa73097c4184
Reviewed-on: https://chromium-review.googlesource.com/604215
Reviewed-by: Emil A Eklund <eae@chromium.org>
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#493152}
[modify] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/BUILD.gn
[add] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.cpp
[modify] https://crrev.com/498c6180fb8e7993176e089ac9225f0357d1a21a/third_party/WebKit/Source/platform/fonts/shaping/ShapeCache.h

Project Member

Comment 14 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb

commit c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb
Author: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com>
Date: Fri Nov 02 01:11:10 2018

Updating ShapeCache hash function

Blink is using a suboptimal hash function both in hashing quality (i.e. collisions)
as also in speed as since 2004 new and faster hash functions were developed.

Based on testing on ARMv7 and x86, CityHash seems like a good alternative based on
both performance, portability and quality. For further data:
https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests/

We should consider next adding xxHash as it performed better on AArch64.

Bug:  735674 
Change-Id: I8c7ccf2d834a42a9fcb01e2cc80ecda1e1c01772
Reviewed-on: https://chromium-review.googlesource.com/c/1311798
Reviewed-by: Emil A Eklund <eae@chromium.org>
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#604794}
[modify] https://crrev.com/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb/third_party/blink/renderer/platform/BUILD.gn
[modify] https://crrev.com/c4406c067d12ac8b3d22a9bcca8b00efe06ef7bb/third_party/blink/renderer/platform/fonts/shaping/shape_cache.cc

Blocking: 902789
It was a boost of around 19% in time spent on layout for wikipedia C++ page in a Xeon processor and 23% in power efficient/low cost A53 processors running Chrome in 64bits on Android.
rock64_vanilla_city.png
373 KB View Download
wiki_x86_vanilla_city.png
342 KB View Download
Status: Verified (was: Started)
Blocking: 903973

Sign in to add a comment