New issue
Advanced search Search tips

Issue 907244 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Nov 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 3
Type: Feature

Blocking:
issue 903973



Sign in to add a comment

Optimize harfbuzz big integer conversions

Project Member Reported by cavalcantii@chromium.org, Nov 20

Issue description

Profiling showed that type conversions were adding significant cycles in time spent doing text shaping.

The idea is to optimize it using native processor instructions to help layout performance.
 
Status: Started (was: Untriaged)
Initial results on x86 were 5% faster shaping and as also ARM (6% big core, 3% little core).

Output of HB running on ARM big core (A72) on linux:

a) Vanilla
root@lcds-mb12:~/harfbuzz/release# uname -a
Linux lcds-mb12 4.16.0-1-arm64 #1 SMP Debian 4.16.5-1 (2018-04-29)
aarch64 GNU/Linux
root@lcds-mb12:~/harfbuzz/release# ./run.sh

 Performance counter stats for './hb-shape Roboto-Regular.ttf
--text-file=thelittleprince.txt --font-funcs=ot --output-format=
--output-file=/dev/null --num-iterations=10':

        890.296880      task-clock (msec)         #    0.998 CPUs
utilized
                 1      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               197      page-faults               #    0.221 K/sec
     1,780,578,232      cycles                    #    2.000 GHz
     3,506,476,997      instructions              #    1.97  insn per
cycle
   <not supported>      branches
         7,915,967      branch-misses

       0.892088959 seconds time elapsed

       0.886894000 seconds user
       0.003995000 seconds sys

b) Patched

root@lcds-mb12:~/harfbuzz/insane# ./run.sh

 Performance counter stats for './hb-shape Roboto-Regular.ttf
--text-file=thelittleprince.txt --font-funcs=ot --output-format=
--output-file=/dev/null --num-iterations=10':

        838.048920      task-clock (msec)         #    0.999 CPUs
utilized
                17      context-switches          #    0.020 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               198      page-faults               #    0.236 K/sec
     1,676,042,364      cycles                    #    2.000 GHz
     3,257,338,204      instructions              #    1.94  insn per
cycle
   <not supported>      branches
         6,514,902      branch-misses

       0.838823263 seconds time elapsed

       0.838726000 seconds user
       0.000000000 seconds sys
Blink layout perf tests on x86 also show good benefits (11.4% boost for latin-ebook and 6.9% for hindi-line-layout).

ebook.png
60.3 KB View Download
hindi.png
30.0 KB View Download
all_tests.png
443 KB View Download
Blocking: 903973
Cc: drott@chromium.org ikilpatrick@chromium.org e...@chromium.org behdad@chromium.org
Initial WIP (Work In Progress) patch upstream:
https://github.com/harfbuzz/harfbuzz/pull/1398
Status: Verified (was: Started)
REV16 optimization landed upstream, marking the issue as completed.

Sign in to add a comment