New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 913556 link

Starred by 4 users

Issue metadata

Status: Verified
Owner:
Closed: Jan 7
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Crostini start time regression

Project Member Reported by cylee@chromium.org, Dec 10

Issue description

crostini_start_time.container_start_time has a significant leap on 73.11329.0.0 
http://shortn/_tNW8k0ECiW
The metric measures the start time of an *existing* container (i.e, excluding the start time of a newly-created container).


Also the crostini_start_time.initial_vm_start_time became very unstable recently. 
http://shortn/_B5RQpsbDfY
The metric measures the first start time of a new VM

 
Labels: -Pri-3 M-73 OS-Chrome Pri-1
Owner: smbar...@chromium.org
Status: Assigned (was: Untriaged)
I haven't looked much into the VM start time yet but for the container the only obvious culprit is FUSE. It seems like this should just be a mknod though, not a 2 second start regression.

https://crosland.corp.google.com/log/11325.0.0..11331.0.0
Actually SIGPWR would be treated differently than SIGTERM, so LXD could be trying to restore container state. Should do some local experiments with reverts to see which of these the culprit is.

https://lxd.readthedocs.io/en/latest/daemon-behavior/
The culprit CL is indeed SIGPWR. dhclient in the container seems to wait for 3 seconds if SIGPWR was used to shut LXD down:

Dec 21 22:33:01 penguin dhclient[53]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7
Dec 21 22:33:02 penguin systemd-networkd[19]: eth0: Gained IPv6LL
Dec 21 22:33:04 penguin dhclient[53]: DHCPREQUEST of 100.115.92.205 on eth0 to 255.255.255.255 port 67

Without the SIGPWR change the container just proceeds to DHCPREQUEST.
And this is because an orderly shutdown runs dhclient -r which releases the existing lease. Our previous shutdown method killed everything so we immediately retry the previous lease.

From the container's perspective this is WAI. But a 3 second delay is pretty egregious, so we should do something to improve this.
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tremplin/+/f39d3e5674d68211c00ac6af8f9e6d069c32f54d

commit f39d3e5674d68211c00ac6af8f9e6d069c32f54d
Author: Stephen Barber <smbarber@chromium.org>
Date: Thu Jan 03 23:04:27 2019

tremplin: set dnsmasq to not ping addresses

Many DHCP servers will ping an address to make sure that it's not in use
before offering it to a client. This isn't necessary in termina, so remove
the ping. Also allow dnsmasq to operate as the authoritative DHCP server
for the lxdbr0 network.

BUG= chromium:913556 
TEST=vm.CrostiniStartTime is ~3 seconds faster for container start

Change-Id: I85ae997428229e222c27060d10f6cb0047b917de
Reviewed-on: https://chromium-review.googlesource.com/1393764
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>

[modify] https://crrev.com/f39d3e5674d68211c00ac6af8f9e6d069c32f54d/src/chromiumos/tremplin/main.go

Status: Verified (was: Assigned)
Container start times are back down. VM start times are also stable again after reverting back to LXD 3.0.2.

Sign in to add a comment