crosvm broken in 10281.0.0 |
|||
Issue descriptionWhen running 'top' it looks like croswm is quickly increasing in memory until chrome crashes and I get logged out. 10278.0.0 worked fine so regression is in range: https://crosland.corp.google.com/log/10278.0.0..10281.0.0 crosvm related changes are: 8fb5211c zachr@google.com crosvm: refactor linux vm running sequence - https://chromium.googlesource.com/chromiumos/platform/crosvm/+/8fb5211c3b36accfcd25ff6984316f6e9947cd1e d3a7a1f6 zachr@google.com crosvm: have DeviceManager make direct VM changes - https://chromium.googlesource.com/chromiumos/platform/crosvm/+/d3a7a1f63e68f0b3d5bf059bedeb819cc62a883e
,
Jan 5 2018
Does weston-info work? If it does, then you probably don't have the virtwl patched libraries for some reason as that results on silent errors as you described.
,
Jan 5 2018
weston-info exits successfully but with no output. I know the libraries are patched because I used a container I know works on a different machine.
,
Jan 5 2018
That's wierd. I had no issues like this with 10278.0.0. and 10282.0.0 doesn't even get me to the point where I could try to repro as VM doesn't start for me.
,
Jan 5 2018
I was trying to repro on 10281.0.0. Are you using 10282.0.0?
,
Jan 5 2018
Sorry, meant to say 10281.0.0.
,
Jan 6 2018
I rebuilt crosvm locally against master and pushed it to my DUT. For some reason, this makes it so I can repo this issue. I noticed that crosvm process was getting dumped, which would use lots of memory if all the non-resident guest pages got faulted in. As an experiment, I marked the memory DONTDUMP and chrome still gets killed after a VM starts, but much more quickly (about a second later). My theory is that the memory usage you saw was all the mapped memory becoming resident, followed by a crash 30 seconds later. The crash itself is still a mystery. It would appear that even in the absence of memory pressure, something is killing UI. Session manager maybe?
,
Jan 6 2018
we probably should mark all the guest pages as DONTDUMP regardless right ? those are pretty much never going to be useful for crash reports and are going to cause overhead if/when crosvm does crash.
,
Jan 6 2018
The root of this seems to be two things: 1) the wayland TempDir struct gets dropped too early in the refactoring I recently checked in. minijail fails to start the virtio wayland process which causes crosvm to abort. 2) memfd_create now now includes the MFD_ALLOW_SEALING flag, which breaks the virtio wayland seccomp filter.
,
Jan 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/df48453432ec28ede00d22557975571ec74f8918 commit df48453432ec28ede00d22557975571ec74f8918 Author: Stephen Barber <smbarber@chromium.org> Date: Sat Jan 06 07:59:06 2018 crosvm: remove stderr from preserved FDs This makes process cleanup difficult because minijail calls setsid(), and that removes the devices from the main process's process group. BUG= chromium:799523 TEST=stop crosvm and ensure there are no zombies hanging around Change-Id: I14c54cf250bdc7339970c886cdab9ff2f4b8a135 Reviewed-on: https://chromium-review.googlesource.com/852987 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/df48453432ec28ede00d22557975571ec74f8918/src/device_manager.rs
,
Jan 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/3b1d8a577313891da7d904ce66b6cb03453cccac commit 3b1d8a577313891da7d904ce66b6cb03453cccac Author: Stephen Barber <smbarber@chromium.org> Date: Tue Jan 09 03:56:44 2018 crosvm: use tsync for seccomp jails TSYNC isn't particularly useful for the device jails since they start with just a single thread. But a useful side effect of having minijail use TSYNC is that instead of the default SECCOMP_RET_KILL_THREAD behavior, minijail switches to SECCOMP_RET_TRAP and uses the default signal disposition which dumps core. Until SECCOMP_RET_KILL_PROCESS is available on all kernel versions with crosvm, using TSYNC this way allows killing the entire device process instead of just one thread. This ensures if seccomp kills a worker thread in a device, the entire device process will die, and the crosvm main process will exit. BUG= chromium:799523 TEST=add banned syscall to net device worker thread and ensure crosvm exits Change-Id: Ie9ebfc90c79dcf49283cb2628dc8d4c848e8385b Reviewed-on: https://chromium-review.googlesource.com/853302 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Dylan Reid <dgreid@chromium.org> [modify] https://crrev.com/3b1d8a577313891da7d904ce66b6cb03453cccac/src/linux.rs
,
Jan 9 2018
,
May 9 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by za...@chromium.org
, Jan 5 2018