New issue
Advanced search Search tips

Issue 922178 link

Starred by 3 users

Issue metadata

Status: Untriaged
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 923721



Sign in to add a comment

Measure crostini CPU performance

Project Member Reported by cylee@chromium.org, Jan 15

Issue description

Run some CPU benchmark to understand the overhead of crosvm. 

While trying to find a suitable open source benchmark for CPU performance, I get some interesting numbers: the benchmark run in the container tend to outperform that on the host. For example, run the benchmark "sysbench":

=== host ===
$ sysbench cpu run
...
CPU speed:
    events per second:  1165.94
...

=== container ===
# sysbench cpu run
...
CPU speed:
    events per second:  1251.34
...

I've run it for several times and it's quite stable - around 1250 : 1170. 


So I turned to another benchmark stress-ng
(strictly speaking it's not a benchmark as it stated on its official site:
"stress-ng can also measure test throughput rates; this can be useful to observe performance changes across different operating system releases or types of hardware. However, it has never been intended to be used as a precise benchmark test suite, so do NOT use it in this manner.")

But many uses it as a linux CPU benchmark, and the author left the message on some stackoverflow question:

"As the author of stress-ng, I'd better elaborate on this. stress-ng is good enough to get some comparative benchmarks results out of it, but it is not been thoroughly calibrated to say how much deviation there is on the each specific stressor. I therefore suggest running a stress-ng stressor several times and seeing how much variation there is on a specific stress test, and if it does not vary much, then it can be considered reliable enough for a benchmark for that specific use case. It all depends on now noisy/busy a system is, how well I/O performs, if it swaps, etc.")

I've tried it anyway, the result is the same - it performed better in the container:

=== host ===
# stress-ng --cpu 0 -t 3s --metrics-brief
stress-ng: info:  [6541] dispatching hogs: 4 cpu
stress-ng: info:  [6541] cache allocate: default cache size: 4096K
stress-ng: info:  [6541] successful run completed in 3.06s
stress-ng: info:  [6541] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [6541]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [6541] cpu                2276      3.05     12.11      0.00       747.29       187.94

== container ===
$ stress-ng --cpu 0 -t 3s --metrics-brief
stress-ng: info:  [12597] dispatching hogs: 4 cpu
stress-ng: info:  [12597] cache allocate: default cache size: 4096K
stress-ng: info:  [12597] successful run completed in 3.01s
stress-ng: info:  [12597] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [12597]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [12597] cpu                2491      3.00     11.96      0.00       829.41       208.28

the field  "bogo ops/s" also shows container can execute more operations in a fixed time.

BTW there's no swap happening when running the experiments. Any idea how could it happen ?
 
One thing that I could think of would be different toolchains/compiler flags. When running these on the host, are you using binaries from portage? Debian hasn't turned on a bunch of the same flags (e.g. -fstack-protector-strong) that we have.
Possibly. 

The sysbench on host is installed by portage
  https://cs.corp.google.com/chromeos_public/src/third_party/portage-stable/app-benchmarks/sysbench/sysbench-1.0.10.ebuild

Container version is installed by the binary package from
  https://github.com/akopytov/sysbench#installing-from-binary-packages
Also I've tried to build it from source at 
  https://github.com/akopytov/sysbench.git
They both report the same value.

Is there a way to bypass those flags when building from portage?

Comment 3 by cylee@chromium.org, Jan 17 (5 days ago)

The compiler option used by portage is bit different from the default value in the container. For example, host has "-march=corei7 -g -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables" but container has "-march=core2". I manually synced them when compiling sysbench in the container. By the end it's like

portage/host
===============================================================================
sysbench version   : 1.0.10
CC                 : x86_64-cros-linux-gnu-clang
CFLAGS             : -O2 -ggdb3  -O2 -pipe  -march=corei7 -g -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -W -Wall -Wextra -Wpointer-arith -Wbad-function-cast   -Wstrict-prototypes -Wnested-externs -Wno-inline -Wno-format-zero-length   -funroll-loops  -Wundef -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wcast-align        -pthread
CPPFLAGS           : -D_GNU_SOURCE   -I$(top_srcdir)/src -I/build/eve/usr/include/luajit-2.0  -D__x86_64__
LDFLAGS            : -Wl,-O2 -Wl,--as-needed
LIBS               : -lm
EXTRA_LDFLAGS      :

prefix             : /usr
bindir             : ${prefix}/bin
libexecdir         : ${prefix}/libexec
mandir             : /usr/share/man
datadir            : /usr/share

MySQL support      : no
Drizzle support    : no
AttachSQL support  : no
Oracle support     : no
PostgreSQL support : no

LuaJIT             : system
LUAJIT_CFLAGS      : -I/build/eve/usr/include/luajit-2.0
LUAJIT_LIBS        : -lluajit-5.1  -ldl
LUAJIT_LDFLAGS     : -rdynamic

Concurrency Kit    : system
===============================================================================

Debian/container
===============================================================================
sysbench version   : 1.0.10-c1c0df8
CC                 : gcc
CFLAGS             : -O2 -ggdb3 -O2 -pipe  -march=corei7 -g -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -W -Wall -Wextra -Wpointer-arith -Wbad-function-cast   -Wstrict-prototypes -Wnested-externs -Wno-inline -Wno-format-zero-length   -funroll-loops  -Wundef -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wcast-align        -pthread
CPPFLAGS           : -D_GNU_SOURCE   -I$(top_srcdir)/src -I$(abs_top_builddir)/third_party/luajit/inc -I$(abs_top_builddir)/third_party/concurrency_kit/include
LDFLAGS            : -Wl,-O2 -Wl,--as-needed
LIBS               : -laio -lm
EXTRA_LDFLAGS      :

prefix             : /usr/local
bindir             : ${prefix}/bin
libexecdir         : ${prefix}/libexec
mandir             : ${prefix}/share/man
datadir            : ${prefix}/share

MySQL support      : no
Drizzle support    : no
AttachSQL support  : no
Oracle support     : no
PostgreSQL support : no

LuaJIT             : bundled
LUAJIT_CFLAGS      : -I$(abs_top_builddir)/third_party/luajit/inc
LUAJIT_LIBS        : $(abs_top_builddir)/third_party/luajit/lib/libluajit-5.1.a -ldl
LUAJIT_LDFLAGS     : -rdynamic

Concurrency Kit    : bundled
CK_CFLAGS          : -I$(abs_top_builddir)/third_party/concurrency_kit/include
CK_LIBS            : $(abs_top_builddir)/third_party/concurrency_kit/lib/libck.a
configure flags    :
===============================================================================

However, the version in the container still runs a little faster than that on the host. I then managed to run the binary built in the container on host and vice versa. The end result is that it's the binary which makes the difference, but not the running environment.

Since the compiler options are the same, the only difference is that portage uses clang, while the container uses gcc. It could be the compiler itself that causes the performance difference.

To make an accurate measurement of CPU performance, I'll need to us the same binary and make it run on both side. Since they share the same virtual hardware, I can copy needed dynamic libraries and the dynamic linker itself from host to container and run the binary built for host. It should workarounds the situation I bump into here.

Comment 4 Deleted

Comment 5 by cylee@chromium.org, Jan 20 (2 days ago)

My proposed solution is to copy the binary on host to container to avoid any performance impact caused by compiler/compiler option difference. However copying a file from host to container is non-trivial. I'm slowed down by the issue recently. See crbug/923721 for details. 

Sign in to add a comment