Speed: zram performance
Reported by
dave.rod...@arm.com,
Nov 20
|
||||
Issue descriptionZram performance with lzo has room for improvement. Improvements to lzo performance when switching tabs under heavy memory pressure can help reduce tab-switching time (especially worst-case behaviour, when lots of swapping is happening). Experiments with adding support for run-length encoding of zeros in lzo show significant performance improvements (for favourable data).
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a4eb10ea07d28ba95bc90b631a3d6fa288a71143 commit a4eb10ea07d28ba95bc90b631a3d6fa288a71143 Author: Matt Sealey <matt.sealey@arm.com> Date: Wed Nov 21 09:24:47 2018 CHROMIUM: lib: lzo: clean-up by introducing COPY16 Most compilers should be able to merge adjacent loads/stores of sizes which are less than but effect a multiple of a machine word size (in effect a memcpy() of a constant amount). However the semantics of the macro are that it just does the copy, the pointer increment is in the code, hence we see *a = *b a += 8 b += 8 *a = *b a += 8 b += 8 This introduces a dependency between the two groups of statements which seems to defeat said compiler optimizers and generate some very strange sequences of addition and subtraction of address offsets (i.e. it is overcomplicated). Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs), just define COPY16 as COPY8,COPY8. We leave the definition to preserve the need to do unaligned accesses to machine-sized words per the original code intent, we just don't use it in the code proper. COPY16 then gives us code like: *a = *b *(a+8) = *(b+8) a += 16 b += 16 This seems to allow compilers to generate much better code by using base register writeback or simply positively incrementing offsets which seems to positively affect performance. It is, at least, fewer instructions to do the same job. BUG=chromium:907138 TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400' TEST=zramctl; dmesg TEST=zramctl should show evidence of swapping using the lzo algorithm, TEST=and dmesg should not report any errors Change-Id: Ia6f2e90977a7f232139fba1b0555673b24e9a837 Signed-off-by: Matt Sealey <matt.sealey@arm.com> Reviewed-on: https://chromium-review.googlesource.com/1307374 Commit-Ready: Sonny Rao <sonnyrao@chromium.org> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzo1x_compress.c [modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzo1x_decompress_safe.c [modify] https://crrev.com/a4eb10ea07d28ba95bc90b631a3d6fa288a71143/lib/lzo/lzodefs.h
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/88a427fa2dab74868d7c74596f95964eb7d1a3e5 commit 88a427fa2dab74868d7c74596f95964eb7d1a3e5 Author: Matt Sealey <matt.sealey@arm.com> Date: Wed Nov 21 09:24:49 2018 CHROMIUM: lib: lzo: enable 64-bit CTZ on Arm ARMv6 Thumb state introduced an RBIT instruction which, combined with CLZ as present in ARMv5, introduces an extremely fast path for counting trailing zeroes. Enable the use of the GCC builtin for this on ARMv6+ with CONFIG_THUMB2_KERNEL to ensure we get the 'new' instruction usage. We do not bother enabling LZO_USE_CTZ64 support for ARMv5 as the builtin code path does the same thing as the LZO_USE_CTZ32 code, only with more register pressure. BUG=chromium:907138 TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400' TEST=zramctl; dmesg TEST=zramctl should show evidence of swapping using the lzo algorithm, TEST=and dmesg should not report any errors Change-Id: I3c0061fcfaa5f9e35e58f4df69fe304f46e1ddd7 Signed-off-by: Matt Sealey <matt.sealey@arm.com> Reviewed-on: https://chromium-review.googlesource.com/1307375 Commit-Ready: Sonny Rao <sonnyrao@chromium.org> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/88a427fa2dab74868d7c74596f95964eb7d1a3e5/lib/lzo/lzodefs.h
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ef2974f233c7d7b11c23f0b5008c6faec65d4bf1 commit ef2974f233c7d7b11c23f0b5008c6faec65d4bf1 Author: Matt Sealey <matt.sealey@arm.com> Date: Wed Nov 21 09:24:50 2018 CHROMIUM: lib: lzo1x: 64-bit CTZ on Arm aarch64 LZO leaves some performance on the table by not realising that AArch64 can optimize count-trailing-zeros bit operations. Add __aarch64__ to the checked definitions alongside __x86_64__ to enable the use of rbit/clz instructions on full 64-bit quantities. BUG=chromium:907138 TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400' TEST=zramctl; dmesg TEST=zramctl should show evidence of swapping using the lzo algorithm, TEST=and dmesg should not report any errors Change-Id: I96a33ffd5079b1806bfd59fee2d9eb9c1aa71891 Signed-off-by: Matt Sealey <matt.sealey@arm.com> Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Reviewed-on: https://chromium-review.googlesource.com/1307376 Commit-Ready: Sonny Rao <sonnyrao@chromium.org> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/ef2974f233c7d7b11c23f0b5008c6faec65d4bf1/lib/lzo/lzodefs.h
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/67efa93711fcdd06f6cc4e8bb950e027d5bba59f commit 67efa93711fcdd06f6cc4e8bb950e027d5bba59f Author: Dave Rodgman <dave.rodgman@arm.com> Date: Wed Nov 21 09:24:52 2018 CHROMIUM: lib/lzo: implement run-length encoding When using zram, we frequently encounter long runs of zero bytes. This adds a special case which identifies runs of zeros and encodes them using run-length encoding. This is faster for both compression and decompresion. For high-entropy data which doesn't hit this case, impact is minimal. Compression ratio is within a few percent in all cases. This modifies the bitstream in a way which is backwards compatible (i.e., we can decompress old bitstreams, but old versions of lzo cannot decompress new bitstreams). BUG=chromium:907138 TEST=python -c 'l = [0] * 1024*1024*400; l2 = [0] * 1024*1024*400' TEST=zramctl; dmesg TEST=zramctl should show evidence of swapping using the lzo algorithm, TEST=and dmesg should not report any errors Change-Id: I6f33a481483dd7dc9462e076f92b1827dd252c69 Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Reviewed-on: https://chromium-review.googlesource.com/1307378 Commit-Ready: Sonny Rao <sonnyrao@chromium.org> Tested-by: Sonny Rao <sonnyrao@chromium.org> Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzo1x_compress.c [modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/Documentation/lzo.txt [modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/include/linux/lzo.h [modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzo1x_decompress_safe.c [modify] https://crrev.com/67efa93711fcdd06f6cc4e8bb950e027d5bba59f/lib/lzo/lzodefs.h
,
Nov 21
I've started upstreaming this to the mainline Linux kernel here: https://lkml.org/lkml/2018/11/21/637
,
Nov 21
Thanks for posting this! I think we should probably also port this to our older and newer kernels as well. Hopefully shouldn't be much work. I can take a look at this too if needed.
,
Nov 22
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b551bdbbee361e0b13dd1bb590f18b11396817ae commit b551bdbbee361e0b13dd1bb590f18b11396817ae Author: Guenter Roeck <groeck@chromium.org> Date: Thu Nov 22 06:49:13 2018 FIXUP: CHROMIUM: lib: lzo: enable 64-bit CTZ on Arm 0day says: In file included from lib/lzo/lzo1x_compress.c:18:0: lib/lzo/lzodefs.h:39:35: error: "CONFIG_THUMB2_KERNEL" is not defined, evaluates to 0 [-Werror=undef] Use a valid construct instead. BUG=chromium:907138 TEST=Build with CONFIG_THUMB2_KERNEL not defined Change-Id: Ib6f71be1eae2a1a00c9d8163bd286bf2123a72a6 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/1347720 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/b551bdbbee361e0b13dd1bb590f18b11396817ae/lib/lzo/lzodefs.h
,
Nov 22
Sonny - yes, I'd really like this to get into the next generation of devices too (4.19?), and also Android. I'll follow-up with upstream - if we can get it landed there then hopefully that should ease the path for Android, etc.
,
Jan 11
Issue has a component, but no priority. Updating to have default priority (Pri-2) |
||||
►
Sign in to add a comment |
||||
Comment 1 by sonny...@google.com
, Nov 20