Optimize harfbuzz big integer conversions |
||||
Issue descriptionProfiling showed that type conversions were adding significant cycles in time spent doing text shaping. The idea is to optimize it using native processor instructions to help layout performance.
,
Nov 20
Blink layout perf tests on x86 also show good benefits (11.4% boost for latin-ebook and 6.9% for hindi-line-layout).
,
Nov 20
,
Nov 20
Initial WIP (Work In Progress) patch upstream: https://github.com/harfbuzz/harfbuzz/pull/1398
,
Nov 28
REV16 optimization landed upstream, marking the issue as completed. |
||||
►
Sign in to add a comment |
||||
Comment 1 by cavalcantii@chromium.org
, Nov 20Initial results on x86 were 5% faster shaping and as also ARM (6% big core, 3% little core). Output of HB running on ARM big core (A72) on linux: a) Vanilla root@lcds-mb12:~/harfbuzz/release# uname -a Linux lcds-mb12 4.16.0-1-arm64 #1 SMP Debian 4.16.5-1 (2018-04-29) aarch64 GNU/Linux root@lcds-mb12:~/harfbuzz/release# ./run.sh Performance counter stats for './hb-shape Roboto-Regular.ttf --text-file=thelittleprince.txt --font-funcs=ot --output-format= --output-file=/dev/null --num-iterations=10': 890.296880 task-clock (msec) # 0.998 CPUs utilized 1 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 197 page-faults # 0.221 K/sec 1,780,578,232 cycles # 2.000 GHz 3,506,476,997 instructions # 1.97 insn per cycle <not supported> branches 7,915,967 branch-misses 0.892088959 seconds time elapsed 0.886894000 seconds user 0.003995000 seconds sys b) Patched root@lcds-mb12:~/harfbuzz/insane# ./run.sh Performance counter stats for './hb-shape Roboto-Regular.ttf --text-file=thelittleprince.txt --font-funcs=ot --output-format= --output-file=/dev/null --num-iterations=10': 838.048920 task-clock (msec) # 0.999 CPUs utilized 17 context-switches # 0.020 K/sec 0 cpu-migrations # 0.000 K/sec 198 page-faults # 0.236 K/sec 1,676,042,364 cycles # 2.000 GHz 3,257,338,204 instructions # 1.94 insn per cycle <not supported> branches 6,514,902 branch-misses 0.838823263 seconds time elapsed 0.838726000 seconds user 0.000000000 seconds sys