New issue
Advanced search Search tips

Issue 816692 link

Starred by 5 users

Issue metadata

Status: Verified
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

crosvm: silently uses 100% CPU while guest is idle

Project Member Reported by za...@chromium.org, Feb 26 2018

Issue description

Even when the guest is idling, crosvm is silently pegging an entire CPU on the host side. Experience says this is likely because a poll loop is trying to poll on a socket that was hungup. Other than the bloated CPU usage, there is no visible sign of this failure because the poll loop degenerates into busy polling without reporting anything to the log.

The first step to smoking out these issues is to make this degenerate case make noise to the logs.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Mar 8 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/a5358e8ffd53d04750dc2c3376abc3fb5df92659

commit a5358e8ffd53d04750dc2c3376abc3fb5df92659
Author: Zach Reizner <zachr@google.com>
Date: Thu Mar 08 00:54:45 2018

sys_util: add PollContext interface for using epoll

A common cause of silent 100% CPU usage on otherwise idle VMs is because
some poll loop is waiting on sockets that were hung up on. An unrelated
issue is that using the Poller interface requires dynamic allocation on
every poll call for dynamically sized poll lists.

The PollContext struct detects and warns about the first problem at runtime
and solves the latter problem.

TEST=cargo test -p sys_util
BUG= chromium:816692 

Change-Id: I42a9c961db07191d25bcba77c5136f5729400ec9
Reviewed-on: https://chromium-review.googlesource.com/933870
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>

[modify] https://crrev.com/a5358e8ffd53d04750dc2c3376abc3fb5df92659/sys_util/src/poll.rs

Project Member

Comment 2 by bugdroid1@chromium.org, Mar 8 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/25c6bc137ecfd1f608993a3d2f24c5736c5c186a

commit 25c6bc137ecfd1f608993a3d2f24c5736c5c186a
Author: Zach Reizner <zachr@google.com>
Date: Thu Mar 08 00:54:46 2018

sys_util: custom derive for PollToken

Using an enum implementing PollToken is the recommended way to use
PollContext, but writing the trait impls for each enum is mechanical yet
error prone. This is a perfect candidate for a custom derive, which
automates away the process using a simple derive attribute on an enum.

BUG= chromium:816692 
TEST=cargo test -p sys_util

Change-Id: If21d0f94f9af4b4f6cef1f24c78fc36b50471053
Reviewed-on: https://chromium-review.googlesource.com/940865
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>

[add] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/poll_token_derive/tests.rs
[modify] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/src/lib.rs
[add] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/poll_token_derive/Cargo.toml
[modify] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/Cargo.lock
[add] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/poll_token_derive/poll_token_derive.rs
[modify] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/Cargo.toml
[modify] https://crrev.com/25c6bc137ecfd1f608993a3d2f24c5736c5c186a/sys_util/src/poll.rs

Project Member

Comment 3 by bugdroid1@chromium.org, Mar 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/d604dbbab4d3acbf9b3184e991c121505b517f5d

commit d604dbbab4d3acbf9b3184e991c121505b517f5d
Author: Zach Reizner <zachr@google.com>
Date: Fri Mar 09 03:28:52 2018

crosvm/plugin: refactor poll loop to use PollContext

This change simplifies plugin processing by removing the awkward
run_until_started loop. This also switches to use PollContext instead
of the Poller/Pollable interface, which required reallocating a Vec
every loop to satisfy the borrow checker.

TEST=cargo test --features plugin
BUG= chromium:816692 

Change-Id: Iedf26a32840a9a038205c4be8d1adb2f1b565a5c
Reviewed-on: https://chromium-review.googlesource.com/938653
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/d604dbbab4d3acbf9b3184e991c121505b517f5d/src/plugin/process.rs
[modify] https://crrev.com/d604dbbab4d3acbf9b3184e991c121505b517f5d/src/plugin/mod.rs
[modify] https://crrev.com/d604dbbab4d3acbf9b3184e991c121505b517f5d/sys_util/src/signalfd.rs

Comment 4 by vapier@chromium.org, Mar 24 2018

to be clear, this isn't resolved yet right ?  i'm still seeing crosvm eat 100% of one cpu no matter what i do.

(1) run top.  see no crosvm running.
(2) in new crosh, run `vmc start v`.
(3) go back to top.  see crosvm steady at ~25% cpu usage.
(4) exit automatic vsh session.  crosvm still at ~25% cpu usage.
(5) run `vmc stop v`.  see crosvm exit.

Comment 5 by za...@chromium.org, Mar 26 2018

I need to finish refactoring all the poll loops before we can be sure this isn't caused by socket hangups inducing busy waiting.

Comment 6 by za...@chromium.org, Mar 28 2018

An interesting wrinkle is that the old Poller interface specifically filtered out all non-POLLIN events meaning busy loops are all but assured to happen on POLLHUP: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/62a4063aa6c28d1f73e93fd0e7da2135d4d46d02/sys_util/src/poll.rs#143
Project Member

Comment 7 by bugdroid1@chromium.org, Mar 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/1028f53ed2bcdc3088f73f59f268f3e99d5c06c9

commit 1028f53ed2bcdc3088f73f59f268f3e99d5c06c9
Author: Zach Reizner <zachr@google.com>
Date: Fri Mar 30 04:59:45 2018

sys_util: have Poller return token on POLLHUP

If POLLHUP is filtered out of the returned tokens, the caller of
Poller::poll will likely just put the same (token, fd) in the next call
to poll which will return instantly. This degrades into a busy poll loop
without the chance for the caller to change the poll list.

Instead, this change changes the filter to return tokens on POLLHUP so
that the caller will hopefully notice the FD associated with the token
has been hungup and will close it.

BUG= chromium:816692 
TEST=None

Change-Id: Ie36d8a647a5fd7faabfd57a562205f75c77991e7
Reviewed-on: https://chromium-review.googlesource.com/985616
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/1028f53ed2bcdc3088f73f59f268f3e99d5c06c9/sys_util/src/poll.rs

Project Member

Comment 8 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/f96be03cad1a133c24de3a4bc29f2df9161641c3

commit f96be03cad1a133c24de3a4bc29f2df9161641c3
Author: Zach Reizner <zachr@google.com>
Date: Thu Apr 05 05:53:22 2018

devices: block: use PollContext in block device

Switching to PollContext so that there is one less user of Poller, which
will be removed.

TEST=run any vm with a block device
BUG= chromium:816692 

Change-Id: I2e1301ea9d66012262f1fcb69eaeee9f7464f3b3
Reviewed-on: https://chromium-review.googlesource.com/983036
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>

[modify] https://crrev.com/f96be03cad1a133c24de3a4bc29f2df9161641c3/seccomp/aarch64/block_device.policy
[modify] https://crrev.com/f96be03cad1a133c24de3a4bc29f2df9161641c3/seccomp/x86_64/block_device.policy
[modify] https://crrev.com/f96be03cad1a133c24de3a4bc29f2df9161641c3/devices/src/virtio/block.rs

Project Member

Comment 9 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/5bed0d2ffa79bc3f2542e11da5ad5cf134f73e96

commit 5bed0d2ffa79bc3f2542e11da5ad5cf134f73e96
Author: Zach Reizner <zachr@google.com>
Date: Thu Apr 05 05:53:27 2018

crosvm/linux: switch to using PollContext in control loop

This avoids the pitfalls of Poller, which required dynamic allocation on
every loop for the dynamically added Pollables. Using PollContext also
makes busy poll loops less silent.

TEST=run a linux vm
BUG= chromium:816692 

Change-Id: If44e47bcbbd7c889399f957ad5bcca66eca57b8e
Reviewed-on: https://chromium-review.googlesource.com/983038
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/5bed0d2ffa79bc3f2542e11da5ad5cf134f73e96/src/linux.rs

Project Member

Comment 10 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/fc62c45dabfb72b5fd2e43831fd4caab40e61592

commit fc62c45dabfb72b5fd2e43831fd4caab40e61592
Author: Zach Reizner <zachr@google.com>
Date: Thu Apr 05 22:20:42 2018

devices: use PollContext for all virtio deivces

BUG= chromium:816692 
TEST=run any VM

Change-Id: I4219050fdb7947ca513f599f1ac57cde6052d397
Reviewed-on: https://chromium-review.googlesource.com/996917
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/aarch64/rng_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/x86_64/balloon_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/aarch64/net_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/x86_64/rng_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/aarch64/vhost_net_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/devices/src/virtio/vhost/worker.rs
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/devices/src/virtio/vhost/mod.rs
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/x86_64/vhost_net_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/devices/src/virtio/net.rs
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/devices/src/virtio/rng.rs
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/devices/src/virtio/balloon.rs
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/aarch64/vhost_vsock_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/aarch64/balloon_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/x86_64/vhost_vsock_device.policy
[modify] https://crrev.com/fc62c45dabfb72b5fd2e43831fd4caab40e61592/seccomp/x86_64/net_device.policy

Project Member

Comment 11 by bugdroid1@chromium.org, Apr 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/c1b74eb8b1d123a940cabefc7be864cf33d74d00

commit c1b74eb8b1d123a940cabefc7be864cf33d74d00
Author: Zach Reizner <zachr@google.com>
Date: Sat Apr 07 02:50:32 2018

sys_util: add method for copying PollEvents

Making a copy of PollEvents is useful to drop the PollEvents structure
which borrows from a PollContext. Even though immutably borrowing from a
PollContext does not prevent any operations on a PollContext, it does
prevent mutable method calls on any structure that owns PollContext.

TEST=None
BUG= chromium:816692 

Change-Id: I9527fd5c122a703933deb973ad549b792226e4c6
Reviewed-on: https://chromium-review.googlesource.com/1000101
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/c1b74eb8b1d123a940cabefc7be864cf33d74d00/sys_util/src/poll.rs

Project Member

Comment 12 by bugdroid1@chromium.org, Apr 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/d86e698ec800db139edee03a45140078850abfad

commit d86e698ec800db139edee03a45140078850abfad
Author: Zach Reizner <zachr@google.com>
Date: Sat Apr 07 02:50:33 2018

devices: use nested PollContext in wayland device

The wl device was the last user of the old Poller.

BUG= chromium:816692 
TEST=run wayland under crosvm

Change-Id: I6c1c1db2774a6e783b7bd1109288328d75ad2223
Reviewed-on: https://chromium-review.googlesource.com/1000102
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/d86e698ec800db139edee03a45140078850abfad/devices/src/virtio/wl.rs
[modify] https://crrev.com/d86e698ec800db139edee03a45140078850abfad/seccomp/x86_64/wl_device.policy

Project Member

Comment 13 by bugdroid1@chromium.org, Apr 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/4fcd1af11ead83e11104e952a15582b6fc064d5b

commit 4fcd1af11ead83e11104e952a15582b6fc064d5b
Author: Zach Reizner <zachr@google.com>
Date: Sat Apr 07 02:50:33 2018

sys_util: remove deprecated Poller/Pollable interface

Now that there are no users of that interface, we should remove it.

TEST=./build_test
BUG= chromium:816692 

Change-Id: Ifdbde22984f557b945e49559ba47076e99db923b
Reviewed-on: https://chromium-review.googlesource.com/1000103
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/4fcd1af11ead83e11104e952a15582b6fc064d5b/sys_util/src/terminal.rs
[modify] https://crrev.com/4fcd1af11ead83e11104e952a15582b6fc064d5b/sys_util/src/signalfd.rs
[modify] https://crrev.com/4fcd1af11ead83e11104e952a15582b6fc064d5b/net_util/src/lib.rs
[modify] https://crrev.com/4fcd1af11ead83e11104e952a15582b6fc064d5b/sys_util/src/eventfd.rs
[modify] https://crrev.com/4fcd1af11ead83e11104e952a15582b6fc064d5b/sys_util/src/poll.rs

Project Member

Comment 14 by bugdroid1@chromium.org, Apr 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/7a7268faf0a43c79b6a4520f5c2f35c3e0233932

commit 7a7268faf0a43c79b6a4520f5c2f35c3e0233932
Author: Sonny Rao <sonnyrao@chromium.org>
Date: Thu Apr 12 01:08:32 2018

crosvm: aarch64: add epoll syscalls to seccomp policy for wayland

Match the configuration for x86_64

BUG= chromium:816692 
TEST=run wayland under crosvm on kevin

Change-Id: If21bccddba362656fc02b213b9f30166f2c4be13
Reviewed-on: https://chromium-review.googlesource.com/1006488
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/7a7268faf0a43c79b6a4520f5c2f35c3e0233932/seccomp/aarch64/wl_device.policy

Project Member

Comment 15 by bugdroid1@chromium.org, Apr 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/dafdbc01cbdd318180b6f28dbc6d05b60cf443b7

commit dafdbc01cbdd318180b6f28dbc6d05b60cf443b7
Author: Sonny Rao <sonnyrao@chromium.org>
Date: Fri Apr 27 04:10:10 2018

crosvm: aarch64: fix seccomp entry for ftruncate on aarch64

Aarch64 seems to use ftruncate64 rather than ftruncate.

BUG= chromium:816692 
TEST=run VM on kevin using concierge

Change-Id: I944f52d75fb9f5a3aaf5fe9e85708c48f249bb1a
Reviewed-on: https://chromium-review.googlesource.com/1031175
Commit-Ready: Sonny Rao <sonnyrao@chromium.org>
Tested-by: Sonny Rao <sonnyrao@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/dafdbc01cbdd318180b6f28dbc6d05b60cf443b7/seccomp/aarch64/block_device.policy

<triage>zachr, could you give an update on this?</triage>
Labels: Hotlist-Crostini-Platform

Comment 18 by za...@chromium.org, May 15 2018

Status: Verified (was: Started)
This should be fixed, as I haven't seen reports in a while. Re-open if somebody else observes this behavior.

Sign in to add a comment