Tremplin in crash/restart loop |
|||
Issue descriptionThis is the second time I've seen this now. Test results here for 11237.0.0: https://stainless.corp.google.com/browse/chromeos-autotest-results/255678534-chromeos-test/ prior failure was here in a different build: https://stainless.corp.google.com/browse/chromeos-autotest-results/252930304-chromeos-test/ This is what's in messages: 2018-11-07T00:57:24.640211-08:00 INFO VM(3)[32124]: lxd[135]: lvl=info msg="Done updating images" t=2018-11-07T08:57:23+0000#012 2018-11-07T00:57:24.640214-08:00 INFO VM(3)[32124]: lxd[135]: lvl=info msg="Creating BTRFS storage pool \"default\"" t=2018-11-07T08:57:23+0000#012 2018-11-07T00:57:27.017705-08:00 INFO VM(3)[32143]: [ 11.114026] maitred: <unknown process> (209) exited with status 0#015 2018-11-07T00:57:27.338158-08:00 INFO VM(3)[32143]: [ 11.434902] maitred: tremplin (161) exited with status 1#015 2018-11-07T00:57:27.340229-08:00 INFO VM(3)[32143]: [ 11.437694] maitred: Restarting tremplin#015 2018-11-07T00:57:27.347670-08:00 INFO VM(3)[32143]: [ 11.444259] maitred: tremplin restarted#015 2018-11-07T00:57:28.098532-08:00 INFO VM(3)[32143]: [ 12.194635] maitred: <unknown process> (210) killed by signal 9#015 2018-11-07T00:57:29.632991-08:00 INFO VM(3)[32124]: lxd[135]: lvl=info msg="Created BTRFS storage pool \"default\"" t=2018-11-07T08:57:24+0000#012 2018-11-07T00:57:29.633005-08:00 INFO VM(3)[32124]: dnsmasq[210]: started, version 2.78 cachesize 150 2018-11-07T00:57:29.633008-08:00 INFO VM(3)[32124]: dnsmasq[210]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua no-TFTP no-conntrack ipset no-auth no-DNSSEC loop-detect inotify 2018-11-07T00:57:29.633011-08:00 WARNING VM(3)[32124]: dnsmasq[210]: LOUD WARNING: listening on 100.115.92.193 may accept requests via interfaces other than lxdbr0 2018-11-07T00:57:29.633014-08:00 WARNING VM(3)[32124]: dnsmasq[210]: LOUD WARNING: use --bind-dynamic rather than --bind-interfaces to avoid DNS amplification attacks via these interface(s) 2018-11-07T00:57:29.633017-08:00 INFO VM(3)[32124]: dnsmasq-dhcp[210]: DHCP, IP range 100.115.92.194 -- 100.115.92.206, lease time 1h 2018-11-07T00:57:29.633020-08:00 INFO VM(3)[32124]: dnsmasq-dhcp[210]: DHCP, sockets bound exclusively to interface lxdbr0 2018-11-07T00:57:29.633023-08:00 INFO VM(3)[32124]: dnsmasq[210]: using local addresses only for domain lxd 2018-11-07T00:57:29.633026-08:00 INFO VM(3)[32124]: dnsmasq[210]: reading /run/resolv.conf 2018-11-07T00:57:29.633028-08:00 INFO VM(3)[32124]: dnsmasq[210]: using local addresses only for domain lxd 2018-11-07T00:57:29.633033-08:00 INFO VM(3)[32124]: dnsmasq[210]: using nameserver 100.109.178.168#53 2018-11-07T00:57:29.633037-08:00 INFO VM(3)[32124]: dnsmasq[210]: using nameserver 100.109.34.201#53 2018-11-07T00:57:29.633040-08:00 INFO VM(3)[32124]: dnsmasq[210]: using nameserver 8.8.4.4#53 2018-11-07T00:57:29.633043-08:00 INFO VM(3)[32124]: dnsmasq[210]: using nameserver 8.8.8.8#53 2018-11-07T00:57:29.633046-08:00 INFO VM(3)[32124]: dnsmasq[210]: read /etc/hosts - 2 addresses 2018-11-07T00:57:29.633050-08:00 INFO VM(3)[32124]: dnsmasq[210]: read /etc/hosts - 2 addresses 2018-11-07T00:57:29.633053-08:00 INFO VM(3)[32124]: tremplin[161]: 2018/11/07 08:57:26 Failed to inform host that tremplin is ready: rpc error: code = Unavailable desc = grpc: the connection is unavailable#012 2018-11-07T00:57:29.790611-08:00 INFO VM(3)[32143]: [ 13.886164] maitred: <unknown process> (259) exited with status 0#015 2018-11-07T00:57:29.995185-08:00 INFO VM(3)[32143]: [ 14.091878] maitred: tremplin (212) exited with status 1#015 2018-11-07T00:57:29.997268-08:00 INFO VM(3)[32143]: [ 14.094733] maitred: Restarting tremplin#015 2018-11-07T00:57:30.001910-08:00 INFO VM(3)[32143]: [ 14.099316] maitred: tremplin restarted#015 2018-11-07T00:57:30.763785-08:00 INFO VM(3)[32143]: [ 14.860024] maitred: <unknown process> (260) killed by signal 9#015 2018-11-07T00:57:32.527075-08:00 INFO VM(3)[32143]: [ 16.623282] maitred: <unknown process> (309) exited with status 0#015 2018-11-07T00:57:32.729949-08:00 INFO VM(3)[32143]: [ 16.826525] maitred: tremplin (262) exited with status 1#015 2018-11-07T00:57:32.732447-08:00 INFO VM(3)[32143]: [ 16.829681] maitred: Restarting tremplin#015 2018-11-07T00:57:32.751734-08:00 INFO VM(3)[32143]: [ 16.844853] maitred: tremplin restarted#015 2018-11-07T00:57:33.488116-08:00 INFO VM(3)[32143]: [ 17.584444] maitred: <unknown process> (310) killed by signal 9#015
,
Nov 27
,
Nov 28
The failing cases all have something in common: 2018-11-27T00:45:34.288381-08:00 INFO vm_cicerone[24593]: Started tremplin grpc server 2018-11-27T00:45:34.292032-08:00 INFO kernel: [ 689.480247] NET: Registered protocol family 40 Normally vsock's protocol family is registered before cicerone tries listening on its socket for tremplin. concierge's upstart job is responsible for modprobing vhost-vsock, but cicerone is "start on starting vm_concierge" so cicerone could be attempting to listen on vsock before the modprobe even runs. cicerone should probably be "start on started vm_concierge" instead.
,
Nov 29
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba commit c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba Author: Stephen Barber <smbarber@chromium.org> Date: Thu Nov 29 20:11:14 2018 vm_tools: init: set cicerone to start after concierge BUG= chromium:902901 TEST=vm.CrostiniFiles Change-Id: Id05b732205b323475663120bf166926f6dd0959c Reviewed-on: https://chromium-review.googlesource.com/1352831 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Jeffrey Kardatzke <jkardatzke@google.com> [modify] https://crrev.com/c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba/vm_tools/init/vm_cicerone.conf
,
Nov 29
|
|||
►
Sign in to add a comment |
|||
Comment 1 by jkardatzke@chromium.org
, Nov 27