New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 896920 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug
Gfx



Sign in to add a comment

Run dEQP, piglit and glbench on crosvm

Project Member Reported by davidri...@chromium.org, Oct 18

Issue description

Run dEQP and piglit test suites on crosvm.
 
20181018:

Results with crosvm/eve running ToT plus the following changes:
kernel: chain ending at https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1275008/3
crosvm: chain ending at https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1227054/4 plus https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1288960
platform2: chain ending at https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1279274/1

Killed/19444] fail: 397, pass: 2790, skip: 6514 //-\

Test was killed by OOM killer, but potentially caused by building dEQP at the same time in the guest, so will retry independently.
 
Previous run from 20181013 which did not have fix for attach backing hung with the following status:
[18345/19444] fail: 3918, pass: 1056, skip: 13371

In particular, hang seem to be related to the following tests:
davidri+  9849     1  5 Oct13 ?        03:33:52 /usr/lib/x86_64-linux-gnu/piglit/bin/ext_transform_feedback-generatemipmap prims_written -auto -fbo
davidri+ 11724     1  5 Oct13 ?        03:32:23 /usr/lib/x86_64-linux-gnu/piglit/bin/ext_transform_feedback-generatemipmap prims_generated -auto -fbo

Cc: pwang@chromium.org
Summary: Run dEQP, piglit and glbench on crosvm (was: Run dEQP and piglit on crosvm)
Preliminary glbench results:
https://docs.google.com/spreadsheets/d/1soaP06c8DJ01uLV8eaATjh4q_J8gcjPe1dl3KmKri7I/edit#gid=1487276513
Piglit instructions:
sudo apt-get install piglit
mkdir piglit-results
cd piglit-results
piglit run sanity results/sanity
piglit run gpu results/gpu

dEQP instructions:
git clone https://android.googlesource.com/platform/external/deqp
cd deqp
python external/fetch_sources.py
curl -L https://android-review.googlesource.com/tools/hooks/commit-msg -o .git/hooks/commit-msg && chmod a+x .git/hooks/commit-msg
sudo apt-get install cmake ant
sudo apt install gcc-multilib g++-multilib
mkdir build-glx
cd build-glx
cmake .. -DCMAKE_BUILD_TYPE=Debug -DDEQP_TARGET=x11_glx
make -j
cd modules/gles2
./deqp-gles2 --deqp-log-images=enable --deqp-watchdog=disable --deqp-surface-width=256 --deqp-surface-height=256 --deqp-log-filename=/tmp/gles2.qpa

glbench instructions:
sudo apt-get install libgflags-dev
git clone https://github.com/waffle-gl/waffle
cd waffle
mkdir build
cd build
cmake ..
make -j
sudo make install
cd ../..
git clone https://chromium.googlesource.com/chromiumos/third_party/autotest
cd autotest/client/deps/glbench/src
USE="X" graphics_backend="OPENGL" make
cd ..
./glbench -notemp
Cc: gurcheta...@chromium.org
Re c#2: 

A few potential differences, things to try:
- z840 vs eve
- qemu vs crosvm
- guest mesa tot vs 13.0.6

I was also running crosvm with a non-release build.  The results are marginally better with a result build.

Will retry with ToT guest mesa next week.
Here some results with Nocturne as well (ToT guest Mesa):

https://gitlab.freedesktop.org/virgl/virglrenderer/issues/46

Cc: puneetster@chromium.org
Switching from default stretch mesa (13.06) to ToT doesn't seem to make much difference in glbench performance.

Some profiling results (out.svg) of the glbench texture_update tests with perf/FlameGraph:

guest:
./glbench -tests texture_update -notemp

host:
perf record -F 120 -p 3300 -g -- sleep 60
perf script > out.perf

desktop:
git clone https://github.com/brendangregg/FlameGraph
./stackcollapse-perf.pl /tmp/out.perf > /tmp/out.folded
./flamegraph.pl /tmp/out.folded >/tmp/out.svg

Investigate a bit with texture upload, specifically texture_upload_rgba_textimage2d_32.  10.23 mtexel_sec with eve/crosvm/tatl/stretch (debug, 13.0.6 mesa) vs 259.56 mtexel_sec on eve.

I modified glbench to reduce test to just that specific test and ran it for 10k iterations in the guest, along with some instrumented logs at the completion of every decode+process_gpu_command within process_descriptor.  I ran a script (delta-time.py) to prepend each logline with the delta between the previous log line. (glbench-crosvm-commands.txt)

During that sequence, virtio gpu reported 8595 descriptors being processed, with 8435 of them being TransferToHost3d.  I extracted a sequence of 3588 TransferToHost3d with no other commands processed in between (consecutive-transfers.txt)

Here's some typical sequences from that:
1.261ms 2018-10-22T17:16:44.978809-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (70.385µs)
0.816ms 2018-10-22T17:16:44.979625-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (73.654µs)
0.808ms 2018-10-22T17:16:44.980433-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (73.674µs)
0.771ms 2018-10-22T17:16:44.981204-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (73.031µs)
1.866ms 2018-10-22T17:16:44.983070-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (92.89µs)
1.171ms 2018-10-22T17:16:44.984241-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (91.938µs)
1.070ms 2018-10-22T17:16:44.985311-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (63.891µs)
snip
0.235ms 2018-10-22T17:16:45.132587-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.787µs)
0.230ms 2018-10-22T17:16:45.132817-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.686µs)
0.230ms 2018-10-22T17:16:45.133047-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.713µs)
0.223ms 2018-10-22T17:16:45.133270-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (12.83µs)
0.230ms 2018-10-22T17:16:45.133500-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.425µs)
0.237ms 2018-10-22T17:16:45.133737-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.567µs)
0.227ms 2018-10-22T17:16:45.133964-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.677µs)
snip
0.242ms 2018-10-22T17:16:45.700502-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (17.311µs)
0.077ms 2018-10-22T17:16:45.700579-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (12.017µs)
1.175ms 2018-10-22T17:16:45.701754-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (14.477µs)
0.238ms 2018-10-22T17:16:45.701992-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (18.31µs)
0.077ms 2018-10-22T17:16:45.702069-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (11.934µs)
1.193ms 2018-10-22T17:16:45.703262-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (13.968µs)
0.304ms 2018-10-22T17:16:45.703566-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (18.147µs)
0.096ms 2018-10-22T17:16:45.703662-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (15.203µs)
1.134ms 2018-10-22T17:16:45.704796-07:00 DEBUG localhos[16972]: crosvm[97]: [devices/src/virtio/gpu/mod.rs:384] Some(TransferToHost3d) -> OkNoData (15.196µs)

The entire elapsed wall-clock time from the first to the last transfer was 1011.936 ms.  The total cumulative time spent decoding and processing gpu commands, was 63.469 ms.  As a fraction of the overall wall-lock time 6.27% was spent actually decoding, processing, or calling into virgl_renderer_transfer_write_iov (ie very little).
➜ awk '{print $NF}' /tmp/consecutive-transfers.txt | perl -ne 's/^\D+(\d+\.\d+).*/\1/; $total += $1; printf "%s: %s\n", $1, $total' | tail -1
26.146: 63469.584

Distribution of processing times, in microseconds, 10us buckets.  Fairly small variance, most of it right at the beginning after the previous non-TransferToHost3d command:
➜ maphimbu -s 1 </tmp/tt2 -d 10
                10              3115
                20               268
                30               108
                40                13
                50                40
                60                17
                70                17
                80                 5
                90                 2
               100                 1
               170                 1
               240                 1

Distribution of invocation-to-invocation, in microseconds, 100us buckets.  A lot more variation, including many that are in the 1000us range (when the actual work being down is <100us).
➜ maphimbu -s 1 </tmp/tt3 -d 100
                 0                96
               100              1453
               200              1443
               300               187
               400                74
               500                68
               600                17
               700                18
               800                21
               900                15
              1000                11
              1100                98
              1200                79
              1300                 6
              1400                 1
              1800                 1

My interpretation of this data is that the majority of time is being spent outside of virgl and the specific TransferToHost3d logic, either in the crosvm virtio queue or main virtio gpu loop, the guest kernel or the guest userspace.
delta-time.py
347 bytes View Download
glbench-crosvm-commands.txt
1.2 MB View Download
consecutive-transfers.txt
556 KB View Download
Status: Assigned (was: Untriaged)
Another datapoint.  Ran webgl aquarium in Chrome launched from the container and I was seeing about 13-15 fps with 500 fish, vs ~55-60 fps run from the Chrome OS instance of Chrome.  There's obviously a hit here, but it's nothing like the slowdown was experienced in supertuxkart.

Also texture upload speed split off into another bug: crbug.com/898725.
I filed a bug on WebGL Aquarium slowness (probably affects all GPU-intensive apps though):


https://gitlab.freedesktop.org/virgl/virglrenderer/issues/57
Labels: -Grfx Gfx

Sign in to add a comment