crosvm fails to start after "devices: refactor proxy to use msg_socket" |
||||||||||
Issue description
It looks like something is going wrong when running crosvm on a Chromebook after the proxy.rs refactoring:
2018-11-01T09:39:51.337454-07:00 NOTICE dbus[629]: [system] Activating service name='org.chromium.ImageLoader' (using servicehelper)
2018-11-01T09:39:51.374906-07:00 NOTICE dbus[629]: [system] Successfully activated service 'org.chromium.ImageLoader'
2018-11-01T09:39:51.388979-07:00 INFO vm_concierge[24256]: Received CreateDiskImage request
2018-11-01T09:39:51.390231-07:00 INFO vm_concierge[24256]: Received StartVm request
2018-11-01T09:39:51.391011-07:00 WARNING kernel: [597800.086798] audit_printk_skb: 6 callbacks suppressed
2018-11-01T09:39:51.391017-07:00 NOTICE kernel: [597800.086810] audit: type=1400 audit(1541090391.389:1955): avc: granted { read } for pid=24256 comm="vm_concierge" name="cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391020-07:00 NOTICE kernel: [597800.086839] audit: type=1400 audit(1541090391.389:1956): avc: granted { read open } for pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391022-07:00 NOTICE kernel: [597800.086868] audit: type=1400 audit(1541090391.389:1957): avc: granted { getattr } for pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391025-07:00 NOTICE kernel: [597800.086925] audit: type=1400 audit(1541090391.389:1958): avc: granted { getattr } for pid=24256 comm="vm_concierge" path="/run/imageloader/cros-termina" dev="tmpfs" ino=20031 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.391026-07:00 NOTICE kernel: [597800.086964] audit: type=1400 audit(1541090391.389:1959): avc: granted { getattr } for pid=24256 comm="vm_concierge" path="/run/imageloader" dev="tmpfs" ino=12358 scontext=u:r:minijail:s0 tcontext=u:object_r:tmpfs:s0 tclass=dir
2018-11-01T09:39:51.392263-07:00 INFO seneschal[3954]: Received request to start new 9p server
2018-11-01T09:39:51.393187-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: socket
2018-11-01T09:39:51.393213-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: connect
2018-11-01T09:39:51.393227-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: sendto
2018-11-01T09:39:51.393240-07:00 WARNING seneschal[3954]: libminijail[1]: allowing syscall: writev
2018-11-01T09:39:51.394247-07:00 INFO seneschal[3954]: Started server on /run/seneschal/scoped_dir_XlgV31
2018-11-01T09:39:51.396849-07:00 WARNING 9s[24281]: libminijail[24]: logging seccomp filter failures
2018-11-01T09:39:51.405122-07:00 INFO localhos[24284]: crosvm[4]: [src/linux.rs:770] crosvm entering multiprocess mode
2018-11-01T09:39:51.419942-07:00 WARNING localhos[24284]: crosvm[4]: [devices/src/virtio/block.rs:644] Disk size 88994550988 is not a multiple of sector size 512; the remainder will not be visible to the guest.
2018-11-01T09:39:51.420868-07:00 INFO crosvm[24284]: libminijail[4]: mount 'none' -> '/' type 'tmpfs' flags 0xe
2018-11-01T09:39:51.420884-07:00 INFO crosvm[24284]: libminijail[4]: mount '/run/chrome' -> '/wayland' type '' flags 0x1000
2018-11-01T09:39:51.440728-07:00 INFO crash_reporter[24299]: libminijail[24299]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.442478-07:00 INFO crash_reporter[24301]: libminijail[24301]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.444785-07:00 WARNING localhos[24284]: crosvm[4]: [src/linux.rs:854] failed to add stdin to poll context: Error(1)
2018-11-01T09:39:51.444993-07:00 WARNING kernel: [597800.140953] Pid 1(crosvm) over core_pipe_limit
2018-11-01T09:39:51.444998-07:00 WARNING kernel: [597800.140959] Skipping core dump
2018-11-01T09:39:51.447996-07:00 WARNING kernel: [597800.143731] Pid 1(crosvm) over core_pipe_limit
2018-11-01T09:39:51.448016-07:00 WARNING kernel: [597800.143738] Skipping core dump
2018-11-01T09:39:51.451522-07:00 INFO crash_reporter[24303]: libminijail[24303]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.452874-07:00 INFO crash_reporter[24305]: libminijail[24305]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001
2018-11-01T09:39:51.454244-07:00 ERR localhos[24284]: crosvm[4]: [src/linux.rs:954] child 9 died: signo 17, status 31, code 2
2018-11-01T09:39:51.464660-07:00 WARNING crash_reporter[24299]: [user] Received crash notification for crosvm[24287] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.464660-07:00 WARNING crash_reporter[24301]: [user] Received crash notification for crosvm[24288] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.464988-07:00 WARNING crash_reporter[24303]: [user] Received crash notification for crosvm[24296] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.465529-07:00 INFO crash_reporter[24301]: State of crashed process [24288]: S (sleeping)
2018-11-01T09:39:51.465539-07:00 INFO crash_reporter[24299]: State of crashed process [24287]: S (sleeping)
2018-11-01T09:39:51.465756-07:00 INFO crash_reporter[24299]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.465791-07:00 INFO crash_reporter[24301]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.466482-07:00 INFO crash_reporter[24303]: State of crashed process [24296]: S (sleeping)
2018-11-01T09:39:51.466753-07:00 INFO crash_reporter[24303]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.466907-07:00 WARNING crash_reporter[24303]: Crash directory /var/spool/crash already full with 32 pending reports
2018-11-01T09:39:51.466925-07:00 ERR crash_reporter[24303]: Could not create crash directory
2018-11-01T09:39:51.466933-07:00 ERR crash_reporter[24303]: Unable to find/create process-specific crash path
2018-11-01T09:39:51.467769-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.470672-07:00 WARNING crash_reporter[24305]: [user] Received crash notification for crosvm[24291] sig 31, user 299 group 299 (developer build - not testing - always dumping)
2018-11-01T09:39:51.470697-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.471513-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:51.471976-07:00 INFO crash_reporter[24301]: Stored minidump to /var/spool/crash/crosvm.20181101.093951.24288.dmp
2018-11-01T09:39:51.472267-07:00 INFO crash_reporter[24301]: Leaving core file at /proc/self/fd/5/crosvm.20181101.093951.24288.core due to developer image
2018-11-01T09:39:51.472391-07:00 INFO crash_reporter[24299]: Stored minidump to /var/spool/crash/crosvm.20181101.093951.24287.dmp
2018-11-01T09:39:51.472645-07:00 INFO crash_reporter[24299]: Leaving core file at /proc/self/fd/5/crosvm.20181101.093951.24287.core due to developer image
2018-11-01T09:39:51.473507-07:00 INFO crash_reporter[24305]: State of crashed process [24291]: S (sleeping)
2018-11-01T09:39:51.473758-07:00 INFO crash_reporter[24305]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-11-01T09:39:51.473931-07:00 WARNING crash_reporter[24305]: Crash directory /var/spool/crash already full with 32 pending reports
2018-11-01T09:39:51.474001-07:00 ERR crash_reporter[24305]: Could not create crash directory
2018-11-01T09:39:51.474013-07:00 ERR crash_reporter[24305]: Unable to find/create process-specific crash path
2018-11-01T09:39:51.474985-07:00 INFO metrics_daemon[2880]: [INFO:metrics_daemon.cc(458)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-11-01T09:39:53.485139-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:53.485251-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:55.485217-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:55.485292-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:57.485047-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:57.485118-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:39:59.485099-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:39:59.485167-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:40:01.485094-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:40:01.486001-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:173] failed write to child device process: Send(Error(111))
2018-11-01T09:40:03.486094-07:00 ERR localhos[24284]: crosvm[4]: [devices/src/proxy.rs:177] failed read from child device process: Recv(Error(11))
2018-11-01T09:40:03.486154-07:00 INFO localhos[24284]: crosvm[4]: [src/main.rs:520] crosvm has exited normally
2018-11-01T09:40:21.403543-07:00 ERR vm_concierge[24256]: VM failed to start in 30 seconds
2018-11-01T09:40:21.404440-07:00 WARNING vm_concierge[24256]: Shutdown RPC failed for VM 17 with error code 14: Connect Failed
2018-11-01T09:40:21.408767-07:00 ERR localhos[24351]: crosvm[22]: [src/main.rs:560] failed to connect to socket at '/run/vm/vm.5kbMC5/crosvm.sock': No such file or directory (os error 2)
2018-11-01T09:40:21.410395-07:00 ERR vm_concierge[24256]: Unable to send signal to 5: No such process
2018-11-01T09:40:21.411262-07:00 INFO seneschal[3954]: Received request to stop server
2018-11-01T09:40:21.411952-07:00 INFO seneschal[3954]: Process 24 killed by signal 15
,
Nov 1
Yes. A revert is here: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/1313014 I will actually try to avoid unnecessary sendmsg/recvmsg. Thanks.
,
Nov 2
,
Nov 2
Issue 901322 has been merged into this issue.
,
Nov 2
Can we have this fix chumped? I don't think we want to wait for the CQ given that crosvm is completely busted on ToT.
,
Nov 2
I will chump the revert, which is lower risk.
,
Nov 2
done. TOT should be good now. Sorry for the trouble.
,
Nov 2
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/994e418039e514acde8d31a593d18f0595fc1442 commit 994e418039e514acde8d31a593d18f0595fc1442 Author: Jingkui Wang <jkwang@google.com> Date: Fri Nov 02 21:50:47 2018 msg_socket: avoid send_msg/recv_msg call if it's not necessary This patch avoids sendmsg/recvmsg if there is no fd. BUG= chromium:900962 TEST=build local image and test Change-Id: I3a5fd52232dc7d98dacd78aa0b383a436056ffb7 Reviewed-on: https://chromium-review.googlesource.com/1313656 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Jingkui Wang <jkwang@google.com> Reviewed-by: Zach Reizner <zachr@chromium.org> [modify] https://crrev.com/994e418039e514acde8d31a593d18f0595fc1442/msg_socket/src/lib.rs
,
Nov 19
The fix and re-land of the original commit have landed as of https://chromium.googlesource.com/chromiumos/platform/crosvm/+/b23c091c8bfd37cab8be15c0912d684187969c63 so this is resolved now.
,
Nov 20
Maybe spoke too soon - on ARM devices, crosvm fails to start after the re-landed "refactor proxy" commit. This is also showing up in tast.vm tests, such as this one on bob: https://stainless.corp.google.com/browse/chromeos-autotest-results/259088604-chromeos-test/ It looks like it's throwing a SIGSYS due to a sigaction causing a rt_sigaction syscall somewhere. 2018-11-19T17:04:32.779883-08:00 WARNING crash_reporter[16135]: [user] Received crash notification for crosvm[16117] sig 31, user 299 group 299 (developer build - not testing - always dumping) I also see this in the syslog right before/during the crash, unsure if it's related: 2018-11-19T17:04:32.739280-08:00 ALERT kernel: [348511.303169] Unhandled fault: alignment fault (0x92000021) at 0x000000000e0a38b1 gdb backtrace (I think syscall number is in r7 at this point - 174 is SYS_rt_sigaction on arm): Core was generated by `/usr/bin/crosvm run --cpus 6 --mem 2859 --root /run/imageloader/cros-termina/11'. Program terminated with signal SIGSYS, Bad system call. #0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 47 ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory. [Current thread is 1 (LWP 1)] (gdb) bt #0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 #1 0xf04c6232 in __GI___libc_sigaction (sig=7, act=0xf05d4be0, oact=0x0) at ../sysdeps/unix/sysv/linux/arm/sigaction.c:67 #2 0x0daf85ec in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info registers r0 0x7 7 r1 0xf05d4aac 4032645804 r2 0x0 0 r3 0x8 8 r4 0xf05d4aac 4032645804 r5 0xf05d5ac0 4032649920 r6 0x0 0 r7 0xae 174 r8 0x7 7 r9 0x7 7 r10 0x8 8 r11 0x51000 331776 r12 0xae 174 sp 0xf05d4aa0 0xf05d4aa0 lr 0xf04c6233 -263429581 pc 0xf04b72d6 0xf04b72d6 <__libc_do_syscall+6> cpsr 0x10030 65584 (gdb)
,
Nov 20
Better backtrace with symbols: (gdb) bt #0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 #1 0xf5a67232 in __GI___libc_sigaction (sig=7, act=0xf5b75be0, oact=0x0) at ../sysdeps/unix/sysv/linux/arm/sigaction.c:67 #2 0x04e605ec in std::sys::unix::stack_overflow::imp::signal_handler () at libstd/sys/unix/stack_overflow.rs:114 #3 <signal handler called> #4 0x04e32268 in devices::proxy::child_proc () #5 0x04e81170 in <aarch64::AArch64 as arch::LinuxArch>::build_vm () #6 0x04e87e0c in crosvm::crosvm_main () #7 0x04e83c34 in crosvm::main ()
,
Nov 20
Running it under gdb, I see a SIGBUS in child_proc - I think what is probably happening is DataInit's from_slice is reinterpreting a slice of u8 into a reference to a larger type with higher alignment requirements:
Some(unsafe { &*(data.as_ptr() as *const Self) })
Call stack would be something like this (guessing since the release build doesn't have symbols for this, probably getting inlined):
data_model/src/lib.rs: DataInit::from_slice()
le_impl! $le_type::from_slice()
msg_on_socket.rs:read_from_buffer()
msg_socket/src/lib.rs:MsgReceiver::recv()
proxy.rs:child_proc()
,
Nov 20
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/7a9c6def98c92d12c825e0d427c54e178e6ee860 commit 7a9c6def98c92d12c825e0d427c54e178e6ee860 Author: Jingkui Wang <jkwang@google.com> Date: Wed Nov 21 13:53:03 2018 msg_socket: fix msg_on_socket alignment issue Instead of using unaligned memory. Allocate aligned memory and copy into it, we were already doing an clone. There should be no overhead for this new approach. BUG= chromium:900962 TEST=build and run Change-Id: I011d4c93a872d7d285e8898ff332f3ee1ef104a9 Reviewed-on: https://chromium-review.googlesource.com/1344225 Commit-Ready: Jingkui Wang <jkwang@google.com> Tested-by: Jingkui Wang <jkwang@google.com> Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> [modify] https://crrev.com/7a9c6def98c92d12c825e0d427c54e178e6ee860/msg_socket/src/msg_on_socket.rs
,
Nov 21
,
Nov 21
,
Nov 22
This issue is marked as a release blocker with no milestone associated. Please add an appropriate milestone. All release blocking issues should have milestones associated to it, so that the issue can tracked and the fixes can be pushed promptly. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 26
,
Nov 27
Tested with 11297 on kevin. It was working.
,
Nov 28
Double checked on 11302.0.0.
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosvm/+/da37f7a586d71b6b092aa4ba58e040a871335dc7 commit da37f7a586d71b6b092aa4ba58e040a871335dc7 Author: Zach Reizner <zachr@google.com> Date: Tue Dec 04 04:32:05 2018 data_model: prevent unaligned DataInit::from_slice Because the alignment of the data passed into from_slice is not checked, it is very easy to pass in unaligned data that will get dereferenced at a later point in the code. On ARM, this will lead to a SIGBUS. This change adds an alignment check to prevent getting a signal. Instead, the caller will get `None`. BUG= chromium:900962 TEST=cargo test -p data_model Change-Id: I7a0f835f7d0ffd8c3d44bbcd80a790027f652bc9 Reviewed-on: https://chromium-review.googlesource.com/1343989 Commit-Ready: Zach Reizner <zachr@chromium.org> Tested-by: Daniel Verkamp <dverkamp@chromium.org> Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> [modify] https://crrev.com/da37f7a586d71b6b092aa4ba58e040a871335dc7/data_model/src/lib.rs |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by za...@chromium.org
, Nov 1