New issue
Advanced search Search tips

Issue 897240 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 28
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 899484



Sign in to add a comment

Add Crostini Disk IO Test

Project Member Reported by cylee@chromium.org, Oct 19

Issue description

Add a performance test to measure disk IO latency/throughput from within the container and how it differs from that on the bare metal. 

 
Inspired by hardware_StorageFio, I'm playing with fio to see how it works inside/outside the container. First I tried to run fio with some existing setting used in hardware_StorageFio outside the container (on the stateful partition of a plain ChromeOS build):
https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/site_tests/hardware_StorageFio/1m_write

That is, I tried fio size=1G readwrite=randwrite bs=1m ioengine=libaio iodepth=4 direct=1

However, the number is unreasonable high 

Run status group 0 (all jobs):
   READ: bw=1513MiB/s (1586MB/s), 1513MiB/s-1513MiB/s (1586MB/s-1586MB/s), io=1024MiB (1074MB), run=677-677msec
  WRITE: bw=953MiB/s (1000MB/s), 953MiB/s-953MiB/s (1000MB/s-1000MB/s), io=1024MiB (1074MB), run=1074-1074msec

Full log:
https://paste.googleplex.com/5356397516554240

The write speed is ~1G/s, which exceeds the theoretical maximum number. 
If I don't use direct=1, sequential write speed could be up to 2G MB/s. Also it's a write operation, not read, so there should be no cache problem. I also tried flags like end_fsync but the result is the same.

On the other hand, I checked chromeperf for the benchmark on some existing board, and the result looks normal:
https://chromeperf.appspot.com/report?sid=0f56f3c79cffd97ba67c351288016bf3eb4b534d4499fcf31c841ea35f7a2184

I also tried to run it inside the container. The number is a bit smaller, but still around 500M-1G MB /s.

If I use dd instead of fio, I can get a number ~ 200-300 MB/s, which is much reasonable. 

Any idea what's wrong with my experiment?
Forget to say I'm using a pixelbook and refreshed the image last week.
The devices that you linked to on chromeperf appear to be eMMC and SSD-based. The highest eve SKU (512 GB) does have nvme which is indeed fast, although those numbers do look high. Are there existing results for hardware_StorageFio on nvme eves? I couldn't find them on chromeperf. 
Thanks~ I'm a eve 512 GB machine so it explains if it's NVMe.

For some unknown reason the last run of hardware_StorageFio on eve is 8/22 and the log is not available.
https://sponge.corp.google.com/invocations?quickSearch=hardware_StorageFio+eve+label%3Astorage%3Anvme&tz=

I'll run it on my pixelbook later to double check it.
Blockedon: 899484
While developing the benchmark, I often see fio causes crosvm crash and couldn't find the reason yet. Filed crbug/899484 to trace it.
Labels: -Pri-1 M-72 Pri-2
Status: Assigned (was: Untriaged)
Status: Started (was: Assigned)
Runs into another problem: when performing the test, I always write to a file of size 1G on the container home directory. The disk should have plenty of room to  accommodate the file, but it reports "no space on disk" at times. When it happens, I got

testuser@penguin:/$ du -h -d 1 | sort -h
......
0       ./boot
0       ./media
0       ./mnt
0       ./proc
0       ./root
0       ./srv
0       ./sys
0       ./tmp
4.0K    ./lib64
21K     ./dev
124K    ./run
4.5M    ./etc
5.3M    ./sbin
6.2M    ./bin
23M     ./lib
72M     ./opt
151M    ./var
781M    ./usr
1.1G    ./home
2.1G    .

but
testuser@penguin:/$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.0G  3.4G   71M  99% /

There's a 1.3G gap between the root directory (2.1G) and the "used" space observed by df. If I delete the file manually, root directory becomes 1.1G and the "used" number by df becomes 1.4G. It looks the actually disk space consumed is doubled. Could it be snapshot of btrfs or something ? 


The problem happens easily even if I specify the test file size to 500m instead of 1G (should be ~2G left in the root partition of the container). However when it happens, df doesn't always show a full partition like in #7. Often times I see 

testuser@penguin:~$ du -h -d 1 /
...
1.6G    /

and

testuser@penguin:~$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        4.0G  1.9G  1.9G  50% /

when fio reports error and exits the test.
(Copy the comment from the CL to here for easier discussion)

- As described at https://bugs.chromium.org/p/chromium/issues/detail?id=897240#c7 , it reports "no space left on disk" error from time to time when writing file to the container while there should be still plenty of room.
- The perf number is quite unstable. The throughput number could easily double or cut in half, thus the ratio of guest/host fluctuates a lot in consecutive runs.
- Sometimes (especially in read operations), the ratio of guest/host could be >1, even close to 2. Perhaps could be explained by that file cache hits more easily in the container than in the host machine ?

From dverkamp@:
I think traditionally BTRFS has had some cases where it would report ENOSPC before the disk is totally full, due to the way it allocates space in large chunks.  It might be worth trying to 'sync' in between tests to see if that will finish some background work, although I don't have high hopes.  As a workaround, it might be easiest to just create a larger disk image, if the host disk has enough space.  (Some more BTRFS ENOSPC notes here: https://btrfs.wiki.kernel.org/index.php/ENOSPC - but I'm not sure any of that is useful to an end user.)

We recently switched the default disk type for new VMs to use raw rather than qcow2 on host kernels that support FALLOC_FL_PUNCH_HOLE; it would be worth trying to recreate your VM and see if raw provides better/more consistent performance.

There is definitely some interaction with the host disk cache; currently, all disk operations use normal (non-O_DIRECT) I/O on the host, which goes through the host page cache, even if you are using direct=1 in the FIO options in the guest.

I've put together a crosvm change that enables O_DIRECT for raw disk files: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1324772 - it would be interesting to see the results of this change along with a raw disk (e.g. if this makes the speed more consistent).

------

I've synced source code to head, tried the patch Daniel mentioned above. But unfortunately I can't see obvious improvement regarding consistency. 
Adding fio option "end_fsync=1", however, mitigated the"out of disk space" problem. It still happens sometimes. 

Here's a sample output of the perf test (the CL is to be ready soon, thanks to reviewers Dan and nya):
https://cylee.users.x20web.corp.google.com/crostini_disk_io_perf/results-chart.json 
where guest_xxx is the perf score in the container, host_xxx is that on the host. ratio_xxx is the guest_xxx/host_xxx

Project Member

Comment 10 by bugdroid1@chromium.org, Nov 16

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/ad315080fbd62ffeca993128f4d44449a6186742

commit ad315080fbd62ffeca993128f4d44449a6186742
Author: Cheng-Yu Lee <cylee@chromium.org>
Date: Fri Nov 16 23:11:00 2018

tast-tests: Add Crostini disk IO performance test.

BUG= chromium:897240 
TEST=tast --verbose run DUT vm.CrostiniPerf

Change-Id: Ib3b1e47dbb367398874ad8ddd4d5113dd00506a7
Reviewed-on: https://chromium-review.googlesource.com/1303774
Commit-Ready: Cheng-Yu Lee <cylee@chromium.org>
Tested-by: Cheng-Yu Lee <cylee@chromium.org>
Reviewed-by: Shuhei Takahashi <nya@chromium.org>

[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/data/disk_io_perf_fio_stress_rw.ini
[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/data/disk_io_perf_fio_seq_read.ini
[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/data/disk_io_perf_fio_rand_read.ini
[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/crostini_disk_io_perf.go
[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/data/disk_io_perf_fio_rand_write.ini
[add] https://crrev.com/ad315080fbd62ffeca993128f4d44449a6186742/src/chromiumos/tast/local/bundles/cros/vm/data/disk_io_perf_fio_seq_write.ini

Status: Fixed (was: Started)

Sign in to add a comment