New issue
Advanced search Search tips

Issue 902901 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 29
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Tremplin in crash/restart loop

Project Member Reported by jkardatzke@chromium.org, Nov 7

Issue description

This is the second time I've seen this now.

Test results here for 11237.0.0:

https://stainless.corp.google.com/browse/chromeos-autotest-results/255678534-chromeos-test/

prior failure was here in a different build: https://stainless.corp.google.com/browse/chromeos-autotest-results/252930304-chromeos-test/

This is what's in messages:

2018-11-07T00:57:24.640211-08:00 INFO VM(3)[32124]:  lxd[135]: lvl=info msg="Done updating images" t=2018-11-07T08:57:23+0000#012
2018-11-07T00:57:24.640214-08:00 INFO VM(3)[32124]:  lxd[135]: lvl=info msg="Creating BTRFS storage pool \"default\"" t=2018-11-07T08:57:23+0000#012
2018-11-07T00:57:27.017705-08:00 INFO VM(3)[32143]: [   11.114026] maitred: <unknown process> (209) exited with status 0#015
2018-11-07T00:57:27.338158-08:00 INFO VM(3)[32143]: [   11.434902] maitred: tremplin (161) exited with status 1#015
2018-11-07T00:57:27.340229-08:00 INFO VM(3)[32143]: [   11.437694] maitred: Restarting tremplin#015
2018-11-07T00:57:27.347670-08:00 INFO VM(3)[32143]: [   11.444259] maitred: tremplin restarted#015
2018-11-07T00:57:28.098532-08:00 INFO VM(3)[32143]: [   12.194635] maitred: <unknown process> (210) killed by signal 9#015
2018-11-07T00:57:29.632991-08:00 INFO VM(3)[32124]:  lxd[135]: lvl=info msg="Created BTRFS storage pool \"default\"" t=2018-11-07T08:57:24+0000#012
2018-11-07T00:57:29.633005-08:00 INFO VM(3)[32124]:  dnsmasq[210]: started, version 2.78 cachesize 150
2018-11-07T00:57:29.633008-08:00 INFO VM(3)[32124]:  dnsmasq[210]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua no-TFTP no-conntrack ipset no-auth no-DNSSEC loop-detect inotify
2018-11-07T00:57:29.633011-08:00 WARNING VM(3)[32124]:  dnsmasq[210]: LOUD WARNING: listening on 100.115.92.193 may accept requests via interfaces other than lxdbr0
2018-11-07T00:57:29.633014-08:00 WARNING VM(3)[32124]:  dnsmasq[210]: LOUD WARNING: use --bind-dynamic rather than --bind-interfaces to avoid DNS amplification attacks via these interface(s)
2018-11-07T00:57:29.633017-08:00 INFO VM(3)[32124]:  dnsmasq-dhcp[210]: DHCP, IP range 100.115.92.194 -- 100.115.92.206, lease time 1h
2018-11-07T00:57:29.633020-08:00 INFO VM(3)[32124]:  dnsmasq-dhcp[210]: DHCP, sockets bound exclusively to interface lxdbr0
2018-11-07T00:57:29.633023-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using local addresses only for domain lxd
2018-11-07T00:57:29.633026-08:00 INFO VM(3)[32124]:  dnsmasq[210]: reading /run/resolv.conf
2018-11-07T00:57:29.633028-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using local addresses only for domain lxd
2018-11-07T00:57:29.633033-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using nameserver 100.109.178.168#53
2018-11-07T00:57:29.633037-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using nameserver 100.109.34.201#53
2018-11-07T00:57:29.633040-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using nameserver 8.8.4.4#53
2018-11-07T00:57:29.633043-08:00 INFO VM(3)[32124]:  dnsmasq[210]: using nameserver 8.8.8.8#53
2018-11-07T00:57:29.633046-08:00 INFO VM(3)[32124]:  dnsmasq[210]: read /etc/hosts - 2 addresses
2018-11-07T00:57:29.633050-08:00 INFO VM(3)[32124]:  dnsmasq[210]: read /etc/hosts - 2 addresses
2018-11-07T00:57:29.633053-08:00 INFO VM(3)[32124]:  tremplin[161]: 2018/11/07 08:57:26 Failed to inform host that tremplin is ready: rpc error: code = Unavailable desc = grpc: the connection is unavailable#012
2018-11-07T00:57:29.790611-08:00 INFO VM(3)[32143]: [   13.886164] maitred: <unknown process> (259) exited with status 0#015
2018-11-07T00:57:29.995185-08:00 INFO VM(3)[32143]: [   14.091878] maitred: tremplin (212) exited with status 1#015
2018-11-07T00:57:29.997268-08:00 INFO VM(3)[32143]: [   14.094733] maitred: Restarting tremplin#015
2018-11-07T00:57:30.001910-08:00 INFO VM(3)[32143]: [   14.099316] maitred: tremplin restarted#015
2018-11-07T00:57:30.763785-08:00 INFO VM(3)[32143]: [   14.860024] maitred: <unknown process> (260) killed by signal 9#015
2018-11-07T00:57:32.527075-08:00 INFO VM(3)[32143]: [   16.623282] maitred: <unknown process> (309) exited with status 0#015
2018-11-07T00:57:32.729949-08:00 INFO VM(3)[32143]: [   16.826525] maitred: tremplin (262) exited with status 1#015
2018-11-07T00:57:32.732447-08:00 INFO VM(3)[32143]: [   16.829681] maitred: Restarting tremplin#015
2018-11-07T00:57:32.751734-08:00 INFO VM(3)[32143]: [   16.844853] maitred: tremplin restarted#015
2018-11-07T00:57:33.488116-08:00 INFO VM(3)[32143]: [   17.584444] maitred: <unknown process> (310) killed by signal 9#015
 
I've been seeing this again when I reviewed the results yesterday and again today, here's the latest one:

https://stainless.corp.google.com/browse/chromeos-autotest-results/261429899-chromeos-test/
Status: Started (was: Assigned)
The failing cases all have something in common:

2018-11-27T00:45:34.288381-08:00 INFO vm_cicerone[24593]: Started tremplin grpc server
2018-11-27T00:45:34.292032-08:00 INFO kernel: [  689.480247] NET: Registered protocol family 40

Normally vsock's protocol family is registered before cicerone tries listening on its socket for tremplin. concierge's upstart job is responsible for modprobing vhost-vsock, but cicerone is "start on starting vm_concierge" so cicerone could be attempting to listen on vsock before the modprobe even runs.

cicerone should probably be "start on started vm_concierge" instead.
Project Member

Comment 4 by bugdroid1@chromium.org, Nov 29

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba

commit c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba
Author: Stephen Barber <smbarber@chromium.org>
Date: Thu Nov 29 20:11:14 2018

vm_tools: init: set cicerone to start after concierge

BUG= chromium:902901 
TEST=vm.CrostiniFiles

Change-Id: Id05b732205b323475663120bf166926f6dd0959c
Reviewed-on: https://chromium-review.googlesource.com/1352831
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Jeffrey Kardatzke <jkardatzke@google.com>

[modify] https://crrev.com/c80bc7cc0e2a8f943635b13ea89bb70a70dbdcba/vm_tools/init/vm_cicerone.conf

Status: Fixed (was: Started)

Sign in to add a comment