New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 900962 link

Starred by 6 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 27
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

crosvm fails to start after "devices: refactor proxy to use msg_socket"

Project Member Reported by dverkamp@chromium.org, Nov 1

Issue description

It looks like something is going wrong when running crosvm on a Chromebook after the proxy.rs refactoring:

2018-11-01T09:39:51.337454-07:00 NOTICE dbus[629]: [system] Activating service name='org.chromium.ImageLoader' (using servicehelper)
2018-11-01T09:39:51.374906-07:00 NOTICE dbus[629]: [system] Successfully activated service 'org.chromium.ImageLoader'
2018-11-01T09:39:51.388979-07:00 INFO vm_concierge[24256]: Received CreateDiskImage request
2018-11-01T09:39:51.390231-07:00 INFO vm_concierge[24256]: Received StartVm request
2018-11-01T09:39:51.391011-07:00 WARNING kernel: [597800.086798] audit_printk_skb: 6 callbacks suppressed
2018-11-01T09:39:51.391017-07:00 NOTICE kernel: [597800.086810] audit: type=1400 audit(1541090391.389:1955): avc:  granted  { read } for  pid=24256 comm="vm_concierge" name="cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391020-07:00 NOTICE kernel: [597800.086839] audit: type=1400 audit(1541090391.389:1956): avc:  granted  { read open } for  pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391022-07:00 NOTICE kernel: [597800.086868] audit: type=1400 audit(1541090391.389:1957): avc:  granted  { getattr } for  pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391025-07:00 NOTICE kernel: [597800.086925] audit: type=1400 audit(1541090391.389:1958): avc:  granted  { getattr } for  pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391026-07:00 NOTICE kernel: [597800.086964] audit: type=1400 audit(1541090391.389:1959): avc:  granted  { getattr } for  pid=24256 comm="vm_concierge" path="/run/imageloader" dev="tmpfs" ino=12358 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.392263-07:00 INFO seneschal[3954]: Received request to start new 9p server
2018-11-01T09:39:51.393187-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: socket
2018-11-01T09:39:51.393213-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: connect
2018-11-01T09:39:51.393227-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: sendto
2018-11-01T09:39:51.393240-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: writev
2018-11-01T09:39:51.394247-07:00 INFO seneschal[3954]: Started server on /run/seneschal/scoped_dir_XlgV31
2018-11-01T09:39:51.396849-07:00 WARNING 9s[24281]: libminijail[24]: logging seccomp filter failures
2018-11-01T09:39:51.405122-07:00 INFO localhos[24284]: crosvm[4]: [src/linux.rs:770] crosvm entering multiprocess mode
2018-11-01T09:39:51.419942-07:00 WARNING localhos[24284]: crosvm[4]: [devices/src/virtio/block.rs:644] Disk size 88994550988 is not a multiple of sector size 512; the remainder will not be visible to the guest.
2018-11-01T09:39:51.420868-07:00 INFO crosvm[24284]: libminijail[4]: mount 'none' -> '/' type 'tmpfs' flags 0xe
2018-11-01T09:39:51.420884-07:00 INFO crosvm[24284]: libminijail[4]: mount '/run/chrome' -> '/wayland' type '' flags 0x1000
2018-11-01T09:39:51.440728-07:00 INFO crash_reporter[24299]: libminijail[24299]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.442478-07:00 INFO crash_reporter[24301]: libminijail[24301]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.444785-07:00 WARNING localhos[24284]: crosvm[4]: [src/linux.rs:854] failed to add stdin to poll context: Error(1)
2018-11-01T09:39:51.444993-07:00 WARNING kernel: [597800.140953] Pid 1(crosvm) over core_pipe_limit
2018-11-01T09:39:51.444998-07:00 WARNING kernel: [597800.140959] Skipping core dump
2018-11-01T09:39:51.447996-07:00 WARNING kernel: [597800.143731] Pid 1(crosvm) over core_pipe_limit
2018-11-01T09:39:51.448016-07:00 WARNING kernel: [597800.143738] Skipping core dump
2018-11-01T09:39:51.451522-07:00 INFO crash_reporter[24303]: libminijail[24303]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.452874-07:00 INFO crash_reporter[24305]: libminijail[24305]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.454244-07:00 ERR localhos[24284]: crosvm[4]: [src/linux.rs:954] child 9 died: signo 17, status 31, code 2
2018-11-01T09:39:51.464660-07:00 WARNING crash_reporter[24299]: [user] Received crash notification for crosvm[24287] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.464660-07:00 WARNING crash_reporter[24301]: [user] Received crash notification for crosvm[24288] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.464988-07:00 WARNING crash_reporter[24303]: [user] Received crash notification for crosvm[24296] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.465529-07:00 INFO crash_reporter[24301]: State of crashed process [24288]: S (sleeping)
2018-11-01T09:39:51.465539-07:00 INFO crash_reporter[24299]: State of crashed process [24287]: S (sleeping)
2018-11-01T09:39:51.465756-07:00 INFO crash_reporter[24299]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.465791-07:00 INFO crash_reporter[24301]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.466482-07:00 INFO crash_reporter[24303]: State of crashed process [24296]: S (sleeping)
2018-11-01T09:39:51.466753-07:00 INFO crash_reporter[24303]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.466907-07:00 WARNING crash_reporter[24303]: Crash directory /var/spool/crash already full with 32 pending reports
2018-11-01T09:39:51.466925-07:00 ERR crash_reporter[24303]: Could not create crash directory
2018-11-01T09:39:51.466933-07:00 ERR crash_reporter[24303]: Unable to find/create process-specific crash path
2018-11-01T09:39:51.467769-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.470672-07:00 WARNING crash_reporter[24305]: [user] Received crash notification for crosvm[24291] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.470697-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.471513-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.471976-07:00 INFO crash_reporter[24301]: Stored minidump to /var/spool/crash/crosvm.20181101.093951.24288.dmp
2018-11-01T09:39:51.472267-07:00 INFO crash_reporter[24301]: Leaving core file at /proc/self/fd/5/crosvm.20181101.093951.24288.core due to developer image
2018-11-01T09:39:51.472391-07:00 INFO crash_reporter[24299]: Stored minidump to /var/spool/crash/crosvm.20181101.093951.24287.dmp
2018-11-01T09:39:51.472645-07:00 INFO crash_reporter[24299]: Leaving core file at /proc/self/fd/5/crosvm.20181101.093951.24287.core due to developer image
2018-11-01T09:39:51.473507-07:00 INFO crash_reporter[24305]: State of crashed process [24291]: S (sleeping)
2018-11-01T09:39:51.473758-07:00 INFO crash_reporter[24305]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.473931-07:00 WARNING crash_reporter[24305]: Crash directory /var/spool/crash already full with 32 pending reports
2018-11-01T09:39:51.474001-07:00 ERR crash_reporter[24305]: Could not create crash directory
2018-11-01T09:39:51.474013-07:00 ERR crash_reporter[24305]: Unable to find/create process-specific crash path
2018-11-01T09:39:51.474985-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:53.485139-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:53.485251-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:55.485217-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:55.485292-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:57.485047-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:57.485118-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:59.485099-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:59.485167-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:40:01.485094-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:40:01.486001-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:40:03.486094-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:40:03.486154-07:00 INFO localhos[24284]: crosvm[4]: [src/main.rs:520] crosvm has exited normally
2018-11-01T09:40:21.403543-07:00 ERR vm_concierge[24256]: VM failed to start in 30 seconds
2018-11-01T09:40:21.404440-07:00 WARNING vm_concierge[24256]: Shutdown RPC failed for VM 17 with error code 14: Connect Failed
2018-11-01T09:40:21.408767-07:00 ERR localhos[24351]: crosvm[22]: [src/main.rs:560] failed to connect to socket at '/run/vm/vm.5kbMC5/crosvm.sock': No such file or directory (os error 2)
2018-11-01T09:40:21.410395-07:00 ERR vm_concierge[24256]: Unable to send signal to 5: No such process
2018-11-01T09:40:21.411262-07:00 INFO seneschal[3954]: Received request to stop server
2018-11-01T09:40:21.411952-07:00 INFO seneschal[3954]: Process 24 killed by signal 15

 
Labels: -Pri-3 Pri-0
Could it be that sendmsg/recvmsg needs to be added to the policy files? It looks like msg_on_socket uses ScmSocket::send_with_fds which uses those syscalls. The only current devices that have sendmsg/recvmsg are wayland and gpu.
Yes. A revert is here:
https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1313014

I will actually try to avoid unnecessary sendmsg/recvmsg.
Thanks.
Status: Started (was: Untriaged)
Cc: cywang@chromium.org jkwang@chromium.org jkardatzke@chromium.org derat@chromium.org
 Issue 901322  has been merged into this issue.
Can we have this fix chumped? I don't think we want to wait for the CQ given that crosvm is completely busted on ToT.
I will chump the revert, which is lower risk. 
done. TOT should be good now. Sorry for the trouble.
Project Member

Comment 8 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/994e418039e514acde8d31a593d18f0595fc1442

commit 994e418039e514acde8d31a593d18f0595fc1442
Author: Jingkui Wang <jkwang@google.com>
Date: Fri Nov 02 21:50:47 2018

msg_socket: avoid send_msg/recv_msg call if it's not necessary

This patch avoids sendmsg/recvmsg if there is no fd.

BUG= chromium:900962 
TEST=build local image and test

Change-Id: I3a5fd52232dc7d98dacd78aa0b383a436056ffb7
Reviewed-on: https://chromium-review.googlesource.com/1313656
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Jingkui Wang <jkwang@google.com>
Reviewed-by: Zach Reizner <zachr@chromium.org>

[modify] https://crrev.com/994e418039e514acde8d31a593d18f0595fc1442/msg_socket/src/lib.rs

Status: Verified (was: Started)
The fix and re-land of the original commit have landed as of https://chromium.googlesource.com/chromiumos/platform/crosvm/+/b23c091c8bfd37cab8be15c0912d684187969c63 so this is resolved now.
Status: Started (was: Verified)
Maybe spoke too soon - on ARM devices, crosvm fails to start after the re-landed "refactor proxy" commit.

This is also showing up in tast.vm tests, such as this one on bob: https://stainless.corp.google.com/browse/chromeos-autotest-results/259088604-chromeos-test/

It looks like it's throwing a SIGSYS due to a sigaction causing a rt_sigaction syscall somewhere.

2018-11-19T17:04:32.779883-08:00 WARNING crash_reporter[16135]: [user] Received crash notification for crosvm[16117] sig 31, user 299 group 299 (developer build - not testing - always dumping)

I also see this in the syslog right before/during the crash, unsure if it's related:

2018-11-19T17:04:32.739280-08:00 ALERT kernel: [348511.303169] Unhandled fault: alignment fault (0x92000021) at 0x000000000e0a38b1


gdb backtrace (I think syscall number is in r7 at this point - 174 is SYS_rt_sigaction on arm):

Core was generated by `/usr/bin/crosvm run --cpus 6 --mem 2859 --root /run/imageloader/cros-termina/11'.
Program terminated with signal SIGSYS, Bad system call.
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
47	../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory.
[Current thread is 1 (LWP 1)]
(gdb) bt
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
#1  0xf04c6232 in __GI___libc_sigaction (sig=7, act=0xf05d4be0, oact=0x0) at ../sysdeps/unix/sysv/linux/arm/sigaction.c:67
#2  0x0daf85ec in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info registers
r0             0x7	7
r1             0xf05d4aac	4032645804
r2             0x0	0
r3             0x8	8
r4             0xf05d4aac	4032645804
r5             0xf05d5ac0	4032649920
r6             0x0	0
r7             0xae	174
r8             0x7	7
r9             0x7	7
r10            0x8	8
r11            0x51000	331776
r12            0xae	174
sp             0xf05d4aa0	0xf05d4aa0
lr             0xf04c6233	-263429581
pc             0xf04b72d6	0xf04b72d6 <__libc_do_syscall+6>
cpsr           0x10030	65584
(gdb) 

Better backtrace with symbols:

(gdb) bt
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
#1  0xf5a67232 in __GI___libc_sigaction (sig=7, act=0xf5b75be0, oact=0x0) at ../sysdeps/unix/sysv/linux/arm/sigaction.c:67
#2  0x04e605ec in std::sys::unix::stack_overflow::imp::signal_handler () at libstd/sys/unix/stack_overflow.rs:114
#3  <signal handler called>
#4  0x04e32268 in devices::proxy::child_proc ()
#5  0x04e81170 in <aarch64::AArch64 as arch::LinuxArch>::build_vm ()
#6  0x04e87e0c in crosvm::crosvm_main ()
#7  0x04e83c34 in crosvm::main ()

Running it under gdb, I see a SIGBUS in child_proc - I think what is probably happening is DataInit's from_slice is reinterpreting a slice of u8 into a reference to a larger type with higher alignment requirements:

  Some(unsafe { &*(data.as_ptr() as *const Self) })

Call stack would be something like this (guessing since the release build doesn't have symbols for this, probably getting inlined):

data_model/src/lib.rs: DataInit::from_slice()
le_impl! $le_type::from_slice()
msg_on_socket.rs:read_from_buffer()
msg_socket/src/lib.rs:MsgReceiver::recv()
proxy.rs:child_proc()

Cc: dtolnay@chromium.org
Project Member

Comment 14 by bugdroid1@chromium.org, Nov 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/7a9c6def98c92d12c825e0d427c54e178e6ee860

commit 7a9c6def98c92d12c825e0d427c54e178e6ee860
Author: Jingkui Wang <jkwang@google.com>
Date: Wed Nov 21 13:53:03 2018

msg_socket: fix msg_on_socket alignment issue

Instead of using unaligned memory. Allocate aligned memory and copy into it, we
were already doing an clone. There should be no overhead for this new
approach.

BUG= chromium:900962 
TEST=build and run

Change-Id: I011d4c93a872d7d285e8898ff332f3ee1ef104a9
Reviewed-on: https://chromium-review.googlesource.com/1344225
Commit-Ready: Jingkui Wang <jkwang@google.com>
Tested-by: Jingkui Wang <jkwang@google.com>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>

[modify] https://crrev.com/7a9c6def98c92d12c825e0d427c54e178e6ee860/msg_socket/src/msg_on_socket.rs

Cc: rohi...@chromium.org tbuck...@chromium.org
 Issue 907605  has been merged into this issue.
Labels: ReleaseBlock-Dev
Adding dev blocker for GRU devices as per  Issue 907605 .
Project Member

Comment 17 by sheriffbot@chromium.org, Nov 22

Cc: avkodipelli@chromium.org
This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone.

All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: M-72
Status: Fixed (was: Started)
Tested with 11297 on kevin. It was working.
Status: Verified (was: Fixed)
Double checked on 11302.0.0. 
Project Member

Comment 21 by bugdroid1@chromium.org, Dec 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosvm/+/da37f7a586d71b6b092aa4ba58e040a871335dc7

commit da37f7a586d71b6b092aa4ba58e040a871335dc7
Author: Zach Reizner <zachr@google.com>
Date: Tue Dec 04 04:32:05 2018

data_model: prevent unaligned DataInit::from_slice

Because the alignment of the data passed into from_slice is not checked,
it is very easy to pass in unaligned data that will get dereferenced at
a later point in the code. On ARM, this will lead to a SIGBUS.

This change adds an alignment check to prevent getting a signal.
Instead, the caller will get `None`.

BUG= chromium:900962 
TEST=cargo test -p data_model

Change-Id: I7a0f835f7d0ffd8c3d44bbcd80a790027f652bc9
Reviewed-on: https://chromium-review.googlesource.com/1343989
Commit-Ready: Zach Reizner <zachr@chromium.org>
Tested-by: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Daniel Verkamp <dverkamp@chromium.org>

[modify] https://crrev.com/da37f7a586d71b6b092aa4ba58e040a871335dc7/data_model/src/lib.rs

Sign in to add a comment