New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 799523 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

crosvm broken in 10281.0.0

Project Member Reported by reve...@chromium.org, Jan 5 2018

Issue description

When running 'top' it looks like croswm is quickly increasing in memory until chrome crashes and I get logged out.

10278.0.0 worked fine so regression is in range: https://crosland.corp.google.com/log/10278.0.0..10281.0.0

crosvm related changes are:

8fb5211c zachr@google.com crosvm: refactor linux vm running sequence - https://chromium.googlesource.com/chromiumos/platform/crosvm/+/8fb5211c3b36accfcd25ff6984316f6e9947cd1e
d3a7a1f6 zachr@google.com crosvm: have DeviceManager make direct VM changes - https://chromium.googlesource.com/chromiumos/platform/crosvm/+/d3a7a1f63e68f0b3d5bf059bedeb819cc62a883e
 

Comment 1 by za...@chromium.org, Jan 5 2018

This does not reproduce for me, but I still have issues: no wayland applications work even though wl0 and wayland-0 exist. Every time an application tries to connect, it errors out with some variation of connection refused. Xwayland outright segfaults. I've tried this with a brand new container and an old one that I know works on 10272. There are no errors in the syslog that might indicate the cause of this.
Does weston-info work? If it does, then you probably don't have the virtwl patched libraries for some reason as that results on silent errors as you described.

Comment 3 by za...@chromium.org, Jan 5 2018

weston-info exits successfully but with no output. I know the libraries are patched because I used a container I know works on a different machine.
That's wierd. I had no issues like this with 10278.0.0. and 10282.0.0 doesn't even get me to the point where I could try to repro as VM doesn't start for me.

Comment 5 by za...@chromium.org, Jan 5 2018

I was trying to repro on 10281.0.0. Are you using 10282.0.0?
Sorry, meant to say 10281.0.0.

Comment 7 by za...@chromium.org, Jan 6 2018

I rebuilt crosvm locally against master and pushed it to my DUT. For some reason, this makes it so I can repo this issue.

I noticed that crosvm process was getting dumped, which would use lots of memory if all the non-resident guest pages got faulted in. As an experiment, I marked the memory DONTDUMP and chrome still gets killed after a VM starts, but much more quickly (about a second later). My theory is that the memory usage you saw was all the mapped memory becoming resident, followed by a crash 30 seconds later.

The crash itself is still a mystery. It would appear that even in the absence of memory pressure, something is killing UI. Session manager maybe?
we probably should mark all the guest pages as DONTDUMP regardless right ?  those are pretty much never going to be useful for crash reports and are going to cause overhead if/when crosvm does crash.

Comment 9 by za...@chromium.org, Jan 6 2018

The root of this seems to be two things:
1) the wayland TempDir struct gets dropped too early in the refactoring I recently checked in. minijail fails to start the virtio wayland process which causes crosvm to abort.
2) memfd_create now now includes the MFD_ALLOW_SEALING flag, which breaks the virtio wayland seccomp filter.
Project Member

Comment 10 by bugdroid1@chromium.org, Jan 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/df48453432ec28ede00d22557975571ec74f8918

commit df48453432ec28ede00d22557975571ec74f8918
Author: Stephen Barber <smbarber@chromium.org>
Date: Sat Jan 06 07:59:06 2018

crosvm: remove stderr from preserved FDs

This makes process cleanup difficult because minijail calls
setsid(), and that removes the devices from the main process's
process group.

BUG= chromium:799523 
TEST=stop crosvm and ensure there are no zombies hanging around

Change-Id: I14c54cf250bdc7339970c886cdab9ff2f4b8a135
Reviewed-on: https://chromium-review.googlesource.com/852987
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/df48453432ec28ede00d22557975571ec74f8918/src/device_manager.rs

Project Member

Comment 11 by bugdroid1@chromium.org, Jan 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/3b1d8a577313891da7d904ce66b6cb03453cccac

commit 3b1d8a577313891da7d904ce66b6cb03453cccac
Author: Stephen Barber <smbarber@chromium.org>
Date: Tue Jan 09 03:56:44 2018

crosvm: use tsync for seccomp jails

TSYNC isn't particularly useful for the device jails since they start
with just a single thread. But a useful side effect of having minijail
use TSYNC is that instead of the default SECCOMP_RET_KILL_THREAD behavior,
minijail switches to SECCOMP_RET_TRAP and uses the default signal disposition
which dumps core.

Until SECCOMP_RET_KILL_PROCESS is available on all kernel versions with crosvm,
using TSYNC this way allows killing the entire device process instead of just
one thread. This ensures if seccomp kills a worker thread in a device, the
entire device process will die, and the crosvm main process will exit.

BUG= chromium:799523 
TEST=add banned syscall to net device worker thread and ensure crosvm exits

Change-Id: Ie9ebfc90c79dcf49283cb2628dc8d4c848e8385b
Reviewed-on: https://chromium-review.googlesource.com/853302
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>

[modify] https://crrev.com/3b1d8a577313891da7d904ce66b6cb03453cccac/src/linux.rs

Status: Verified (was: Assigned)
Components: OS>Systems>Containers

Sign in to add a comment