Arc++ storage performance |
||||
Issue descriptionPerformance inside the android container is significantly worse than it is outside, specifically in sdcard. Fio measurements show degradation in iops and fuse seems to be the main cause. Output from the perf tool shows that when running fio in sdcard, the functions with some of the highest overhead percentages are mwait_idle_with_hints.constprop and fuse_request_alloc. We seem to spend a lot of time waiting. I have attached graphs of iops created from the output of running fio with the 4k_read script (with ioengine=sync) on a cyan, caroline, and lulu. In the graphs, android_sdcard refers to fio run in the /var/run/arc/sdcard/default/emulated/ directory, android_data refers to fio run in /data/cache/, and android_tmp refers to fio run in /data/local/tmp/. I have also attached output from the perf tool on both cyan and caroline, the labels of sd and data having the same meaning as sdcard and data in the graphs. The sd_perf_perc and data_perf_perc for both devices are the results of running perf report --no-children on the raw output of running perf record while running fio. For more information I have also attached the function graph trace results of fio on cyan.
,
Jun 9 2017
,
Jun 9 2017
I tried to see if I could process the ftrace files with the function profiling described here: https://lwn.net/Articles/370423/ which would have given me the time spent in the functions, but I haven't been able to do that. For the _perc files, I have been exploring the different options of the perf tool and have been able to get new output that should be ordered by CPU time. I am attaching new perc files for caroline, which came from using the cpu-clock event option for perf record. They don't show actual CPU times, but the percentages and ordering should now be based on CPU time. I am looking into the different options and ways to filter and sort the output so that I can weed out unrelated entries, so hopefully I will able to better pinpoint the problem areas. The new perc files with CPU time ordering are definitely different, but it still looks like we spend a lot of time idle.
,
Jun 12 2017
Did you try rerunning the test (feel free to add an autotest that can run the same fio script both in and out the android container) with function profiling (as documented in the same article) For function profiling, you may need to alter kernel config and add CONFIG_FUNCTION_GRAPH_TRACER and CONFIG_FUNCTION_TRACER. (you can do it from cros-kernel2.eclass and use the --reconf option of cros_workon_make) Comparing the traces, I see an extra memory allocation (fuse_request_alloc) which is not good on data path.
,
Jun 17 2017
I've been running the test with function profiling (altering the kernel config worked) so I will attach some of the profiles. Comparing the profiles, in the sdcard case we are definitely spending a lot of time in the various fuse functions, including fuse_request_alloc which has a large number of hits, which is not good. The profiles that I'm attaching have the sleep-time option disabled, and I am including both profiles that have the graph-time option enabled and disabled (labeled with wout_gtime). I am also attaching outputs from the perf tool using the cpu-clock option from both inside (data: /data/cache/ and sd: /var/run/arc/sdcard/default/emulated) and outside (out: in a cryptohome user directory) the android container.
,
Jun 28 2017
I am attaching more iops graphs showing results from running the fio tests on a fizz. The graphs have the same format as the previous ones. The graphs show that on the fizz we still see that performance in sdcard is about 1/5 of what it is outside of the android container. When using nvme, performance in sdcard is about 1/6 of what it is outside of the container.
,
Aug 1
|
||||
►
Sign in to add a comment |
||||
Comment 1 by gwendal@google.com
, Jun 9 20171.3 KB
1.3 KB View Download
1.3 KB
1.3 KB View Download
1.3 KB
1.3 KB View Download