New issue
Advanced search Search tips

Issue 907138 link

Starred by 1 user

Issue metadata

Status: Unconfirmed
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Speed: zram performance

Reported by dave.rod...@arm.com, Nov 20

Issue description

Zram performance with lzo has room for improvement. Improvements to lzo performance when switching tabs under heavy memory pressure can help reduce tab-switching time (especially worst-case behaviour, when lots of swapping is happening).

Experiments with adding support for run-length encoding of zeros in lzo show significant performance improvements (for favourable data).
 
Cc: sonnyrao@chromium.org
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 21

Labels: merge-merged-chromeos-4.4
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a4eb10ea07d28ba95bc90b631a3d6fa288a71143

commit a4eb10ea07d28ba95bc90b631a3d6fa288a71143
Author: Matt Sealey <matt.sealey@arm.com>
Date: Wed Nov 21 09:24:47 2018

CHROMIUM: lib: lzo: clean-up by introducing COPY16

Most compilers should be able to merge adjacent loads/stores of sizes which
are less than but effect a multiple of a machine word size (in effect a
memcpy() of a constant amount). However the semantics of the macro are that
it just does the copy, the pointer increment is in the code, hence we see

    *a = *b
    a += 8
    b += 8
    *a = *b
    a += 8
    b += 8

This introduces a dependency between the two groups of statements which
seems to defeat said compiler optimizers and generate some very strange
sequences of addition and subtraction of address offsets (i.e. it is
overcomplicated).

Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs), just
define COPY16 as COPY8,COPY8. We leave the definition to preserve the need
to do unaligned accesses to machine-sized words per the original code
intent, we just don't use it in the code proper.

COPY16 then gives us code like:

    *a = *b
    *(a+8) = *(b+8)
    a += 16
    b += 16

This seems to allow compilers to generate much better code by using
base register writeback or simply positively incrementing offsets which
seems to positively affect performance. It is, at least, fewer instructions
to do the same job.

BUG=chromium:907138
TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400'
TEST=zramctl; dmesg
TEST=zramctl should show evidence of swapping using the lzo algorithm,
TEST=and dmesg should not report any errors

Change-Id: Ia6f2e90977a7f232139fba1b0555673b24e9a837
Signed-off-by: Matt Sealey <matt.sealey@arm.com>
Reviewed-on: https://chromium-review.googlesource.com/1307374
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzo1x_compress.c
[modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzo1x_decompress_safe.c
[modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzodefs.h

Project Member

Comment 3 by bugdroid1@chromium.org, Nov 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88a427fa2dab74868d7c74596f95964eb7d1a3e5

commit 88a427fa2dab74868d7c74596f95964eb7d1a3e5
Author: Matt Sealey <matt.sealey@arm.com>
Date: Wed Nov 21 09:24:49 2018

CHROMIUM: lib: lzo: enable 64-bit CTZ on Arm

ARMv6 Thumb state introduced an RBIT instruction which, combined with CLZ
as present in ARMv5, introduces an extremely fast path for counting
trailing zeroes.

Enable the use of the GCC builtin for this on ARMv6+ with
CONFIG_THUMB2_KERNEL to ensure we get the 'new' instruction usage.

We do not bother enabling LZO_USE_CTZ64 support for ARMv5 as the builtin
code path does the same thing as the LZO_USE_CTZ32 code, only with more
register pressure.

BUG=chromium:907138
TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400'
TEST=zramctl; dmesg
TEST=zramctl should show evidence of swapping using the lzo algorithm,
TEST=and dmesg should not report any errors

Change-Id: I3c0061fcfaa5f9e35e58f4df69fe304f46e1ddd7
Signed-off-by: Matt Sealey <matt.sealey@arm.com>
Reviewed-on: https://chromium-review.googlesource.com/1307375
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/88a427fa2dab74868d7c74596f95964eb7d1a3e5/lib/lzo/lzodefs.h

Project Member

Comment 4 by bugdroid1@chromium.org, Nov 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ef2974f233c7d7b11c23f0b5008c6faec65d4bf1

commit ef2974f233c7d7b11c23f0b5008c6faec65d4bf1
Author: Matt Sealey <matt.sealey@arm.com>
Date: Wed Nov 21 09:24:50 2018

CHROMIUM: lib: lzo1x: 64-bit CTZ on Arm aarch64

LZO leaves some performance on the table by not realising that AArch64 can
optimize count-trailing-zeros bit operations.

Add __aarch64__ to the checked definitions alongside __x86_64__ to enable
the use of rbit/clz instructions on full 64-bit quantities.

BUG=chromium:907138
TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400'
TEST=zramctl; dmesg
TEST=zramctl should show evidence of swapping using the lzo algorithm,
TEST=and dmesg should not report any errors

Change-Id: I96a33ffd5079b1806bfd59fee2d9eb9c1aa71891
Signed-off-by: Matt Sealey <matt.sealey@arm.com>
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Reviewed-on: https://chromium-review.googlesource.com/1307376
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/ef2974f233c7d7b11c23f0b5008c6faec65d4bf1/lib/lzo/lzodefs.h

Project Member

Comment 5 by bugdroid1@chromium.org, Nov 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/67efa93711fcdd06f6cc4e8bb950e027d5bba59f

commit 67efa93711fcdd06f6cc4e8bb950e027d5bba59f
Author: Dave Rodgman <dave.rodgman@arm.com>
Date: Wed Nov 21 09:24:52 2018

CHROMIUM: lib/lzo: implement run-length encoding

When using zram, we frequently encounter long runs of zero bytes.
This adds a special case which identifies runs of zeros and encodes
them using run-length encoding.

This is faster for both compression and decompresion. For
high-entropy data which doesn't hit this case, impact is minimal.

Compression ratio is within a few percent in all cases.

This modifies the bitstream in a way which is backwards compatible
(i.e., we can decompress old bitstreams, but old versions of lzo
cannot decompress new bitstreams).

BUG=chromium:907138
TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400'
TEST=zramctl; dmesg
TEST=zramctl should show evidence of swapping using the lzo algorithm,
TEST=and dmesg should not report any errors

Change-Id: I6f33a481483dd7dc9462e076f92b1827dd252c69
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Reviewed-on: https://chromium-review.googlesource.com/1307378
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzo1x_compress.c
[modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/Documentation/lzo.txt
[modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/include/linux/lzo.h
[modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzo1x_decompress_safe.c
[modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzodefs.h

I've started upstreaming this to the mainline Linux kernel here: https://lkml.org/lkml/2018/11/21/637
Cc: diand...@chromium.org
Components: OS>Performance>Memory
Labels: OS-Chrome
Thanks for posting this!  I think we should probably also port this to our older and newer kernels as well.  Hopefully shouldn't be much work.  I can take a look at this too if needed.
Project Member

Comment 8 by bugdroid1@chromium.org, Nov 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b551bdbbee361e0b13dd1bb590f18b11396817ae

commit b551bdbbee361e0b13dd1bb590f18b11396817ae
Author: Guenter Roeck <groeck@chromium.org>
Date: Thu Nov 22 06:49:13 2018

FIXUP: CHROMIUM: lib: lzo: enable 64-bit CTZ on Arm

0day says:

In file included from lib/lzo/lzo1x_compress.c:18:0:
lib/lzo/lzodefs.h:39:35: error: "CONFIG_THUMB2_KERNEL" is not defined, evaluates to 0 [-Werror=undef]

Use a valid construct instead.

BUG=chromium:907138
TEST=Build with CONFIG_THUMB2_KERNEL not defined

Change-Id: Ib6f71be1eae2a1a00c9d8163bd286bf2123a72a6
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1347720
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/b551bdbbee361e0b13dd1bb590f18b11396817ae/lib/lzo/lzodefs.h

Sonny - yes, I'd really like this to get into the next generation of devices too (4.19?), and also Android. I'll follow-up with upstream - if we can get it landed there then hopefully that should ease the path for Android, etc.
Labels: Pri-2
Issue has a component, but no priority. Updating to have default priority (Pri-2)

Sign in to add a comment