Crostini start time regression |
||
Issue descriptioncrostini_start_time.container_start_time has a significant leap on 73.11329.0.0 http://shortn/_tNW8k0ECiW The metric measures the start time of an *existing* container (i.e, excluding the start time of a newly-created container). Also the crostini_start_time.initial_vm_start_time became very unstable recently. http://shortn/_B5RQpsbDfY The metric measures the first start time of a new VM
,
Dec 10
I haven't looked much into the VM start time yet but for the container the only obvious culprit is FUSE. It seems like this should just be a mknod though, not a 2 second start regression. https://crosland.corp.google.com/log/11325.0.0..11331.0.0
,
Dec 11
Actually SIGPWR would be treated differently than SIGTERM, so LXD could be trying to restore container state. Should do some local experiments with reverts to see which of these the culprit is. https://lxd.readthedocs.io/en/latest/daemon-behavior/
,
Dec 21
The culprit CL is indeed SIGPWR. dhclient in the container seems to wait for 3 seconds if SIGPWR was used to shut LXD down: Dec 21 22:33:01 penguin dhclient[53]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 Dec 21 22:33:02 penguin systemd-networkd[19]: eth0: Gained IPv6LL Dec 21 22:33:04 penguin dhclient[53]: DHCPREQUEST of 100.115.92.205 on eth0 to 255.255.255.255 port 67 Without the SIGPWR change the container just proceeds to DHCPREQUEST.
,
Dec 21
And this is because an orderly shutdown runs dhclient -r which releases the existing lease. Our previous shutdown method killed everything so we immediately retry the previous lease. From the container's perspective this is WAI. But a 3 second delay is pretty egregious, so we should do something to improve this.
,
Jan 3
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tremplin/+/f39d3e5674d68211c00ac6af8f9e6d069c32f54d commit f39d3e5674d68211c00ac6af8f9e6d069c32f54d Author: Stephen Barber <smbarber@chromium.org> Date: Thu Jan 03 23:04:27 2019 tremplin: set dnsmasq to not ping addresses Many DHCP servers will ping an address to make sure that it's not in use before offering it to a client. This isn't necessary in termina, so remove the ping. Also allow dnsmasq to operate as the authoritative DHCP server for the lxdbr0 network. BUG= chromium:913556 TEST=vm.CrostiniStartTime is ~3 seconds faster for container start Change-Id: I85ae997428229e222c27060d10f6cb0047b917de Reviewed-on: https://chromium-review.googlesource.com/1393764 Commit-Ready: Stephen Barber <smbarber@chromium.org> Tested-by: Stephen Barber <smbarber@chromium.org> Reviewed-by: Chirantan Ekbote <chirantan@chromium.org> [modify] https://crrev.com/f39d3e5674d68211c00ac6af8f9e6d069c32f54d/src/chromiumos/tremplin/main.go
,
Jan 7
Container start times are back down. VM start times are also stable again after reverting back to LXD 3.0.2. |
||
►
Sign in to add a comment |
||
Comment 1 by dgreid@google.com
, Dec 10Owner: smbar...@chromium.org
Status: Assigned (was: Untriaged)