Issue metadata
Sign in to add a comment
|
new ccompute images not setting hostname after boot |
||||||||||||||||||||||
Issue descriptionIt looks like the GCE ubuntu image uses cloud-init for setting the hostname now. However after creating a proto instance and saving it as an image and launching it as a new instance (lets say "slave119-c4") the new hostname doesn't get picked up in the instance and the hostname remains as proto-something. It appears the cloud-init only ever runs once (during proto), and during the first bootup as the new instance, it doesn't run (and therefore doesn't pick up the new instance name. chrome-trusty-17052300-43663d9e103 is currently problematic, it was built https://uberchromegw.corp.google.com/i/internal.infra.cron/builders/ccompute-chrome-trusty64/builds/824 currently running on slave118-c4 in us-central1-c
,
May 24 2017
,
May 24 2017
Can you also modify the startup scripts to notify if cloud-init is not installed etc?
,
May 24 2017
I believe there's a flag file/state file of some sort somewhere in /var/ that can be removed before storing the image. It will cause cloud-init to redo full initilization upon next boot. I can take a closer look a bit later. > Can you also modify the startup scripts to notify if cloud-init is not installed etc? Notify where?
,
May 24 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/a0887f20fd8f061289b4913a60ff7563e7f1a678 commit a0887f20fd8f061289b4913a60ff7563e7f1a678 Author: Ryan Tseng <hinoka@google.com> Date: Wed May 24 18:36:50 2017
,
May 24 2017
friedman@proto-chrome-trusty:/var/log/messages/chromebuild$ cat root-setup.log [D2017-05-24T18:24:11.907864 767 140453443786560 chromebuild-startup:201] GET http://169.254.169.254/computeMetadata/v1/instance/attributes/image_name [D2017-05-24T18:24:11.911595 767 140453443786560 chromebuild-startup:201] GET http://169.254.169.254/computeMetadata/v1/instance/attributes/cipd_deployments [I2017-05-24T18:24:11.913096 767 140453443786560 chromebuild-startup:220] Using slave name "proto-chrome-trusty" and image name "chrome-trusty-17052300-43663d9e103" [I2017-05-24T18:24:11.913272 767 140453443786560 chromebuild-startup:432] Hostname is proto-chrome-trusty, that looks like a prototype hostname... [I2017-05-24T18:24:11.913347 767 140453443786560 chromebuild-startup:433] Sleeping for 10 seconds for GCE startup scripts to change it. ... repeat ... I dunno... I guess thats how we found out what was supposed to fix the hostname, but even then if we had had some notifier about a package missing, it would have been notifying/logging about the old package which doesnt exist anymore. Maybe #3's request is pointless.
,
May 24 2017
I believe what is supposed to happen is that when cloud-init launches, it figures out the instance ID. If the instance ID is new or differs from the last time it launches, it should go through the entire init sequence again Whats happening here seems to be that cloud-init got uninstalled after the first init sequence.
,
May 24 2017
No mention of the uninstall in the image build process... odd. Kicked off a new test build https://uberchromegw.corp.google.com/i/internal.infra.cron/builders/ccompute-chrome-trusty64/builds/826
,
May 24 2017
It's unattended upgrades... I booted the new image that explicitly installs cloud-init, but the same issue existed. Also cloud-init was listed as uninstalled... I manually installed it and noticed unattended-upgrades came with it...fuck. root@proto-chrome-trusty:/var/log# aptitude why unattended-upgrades i cloud-init Depends software-properties-common i A software-properties-common Depends python3-software-properties (= 0.92.37.8) i A python3-software-properties Depends unattended-upgrades I'll disable unattended upgrades in another way.
,
May 24 2017
If all we are getting from cloud-init is the hostname being set then I'm more than happy to cron that and force it to be uninstalled.
,
May 24 2017
Wait, unattended-upgrades is a great thing, we shouldn't be removing them. Instead we should pin cloud-init package somehow so it doesn't get auto-removed (also I don't quite understand why it gets removed).
,
May 24 2017
Currently the most visible bug from cloud-init not running on second init is that the hostname isn't set, but I feel like it does more than setting the hostname and we do want to make sure cloud-init always runs the first time we boot up the image.
,
May 24 2017
>unattended-upgrades is a great thing Why? It recently caused version skew when it randomly decided to update apache on some VMs. In what circumstances do we want our packages updated without our knowledge?
,
May 24 2017
We want our machines to have all latest security updates. unattended-upgrades does that. We can remove it only if we have an alternative way (like automatically using fresh images). But we don't have it now. I believe it should be possibly to restrict what unattended-upgrades can update if some packages are more sensitive than others. (Also, why do we need apache on bots?...)
,
May 24 2017
i think some tests use lighttpd to serve endpoints for chrome to hit, some use apache
,
May 24 2017
apache is part of install-build-updates and caused a major outage in the blocking bug. We can't be fully aware of all our dependencies and cloud-init or more specifically unattended-upgrades requires that you specifically set the packages that it should leave alone. Whats the best course of action here? It seems we just get the hostname set from cloud-init... I just sent https://chrome-internal-review.googlesource.com/382268 that should do that using the metadata server. If we want packages up-to-date, then puppet should be running that, not some random daemon. Keep in mind that unattended-upgrades does not run anywhere else other than gce.
,
May 24 2017
Are you sure about "only hostname"? Here's chunk of cloud-init config: # The modules that run in the 'init' stage cloud_init_modules: - migrator - seed_random - bootcmd - write-files - growpart - resizefs - set_hostname - update_hostname - update_etc_hosts - ca-certs - rsyslog - users-groups - ssh It seems like a lot of stuff.
,
May 24 2017
layout_tests use apache. I don't know what they are doing that requires starting apache, but they need it. I think we should pin all packages. If we want security updates, we should push out a new image with the updated packages. Can we disable unattended-upgrades in the Trusty base image by default, then use Puppet to re-enable unattended-upgrades for Buildbot GCE VMs? MP will keep it off and we can start doing regular pushes. Base image should install latest security updates then remove unattended-upgrades, which should only be re-enabled when a non-proto GCE VM starts up for Buildbot. Thoughts?
,
May 24 2017
I'm against removing unattended-upgrades because if we do, we will get stuck with ancient versions of all packages because we do not have any other automatic process of package upgrades. Building and pushing new images is great, but it is not automated, and there's no reliably canary process. So pushing new image is scary manual process => no one will do it (it is already happening, we are running prehistoric images everywhere). unattended-upgrades is mostly trouble free. In my memory it's the first time they caused troubles (some other aptitute-related outages, like different git versions, were self-inflicted, since we run "apt-get update" via puppet). I blame layout_tests's dependency on particular apache quirk here, not unattended-upgrades. I also think that replacing unattended-upgrades cron with Puppet won't change much: we still won't be able to precisely control or monitor what's installed and when. unattended-upgrades at least have a configuration that we can tweak. My position is that we should keep unattended-upgrades (configured to install only security updates), and maybe pin some dependencies that really cause troubles (perhaps look through install-build-deps and pick some suspiciously looking ones there).
,
May 24 2017
To be clear: when we have a reliable automated process for image rotations, then I will support removing unattended-upgrades and totally "freezing" the image. Until then, I think we need some automated package upgrade mechanism to install security updates.
,
May 24 2017
>I'm against removing unattended-upgrades because if we do, we will get stuck with ancient versions of all packages because we do not have any other automatic process of package upgrades. That's why I suggested that we reinstall and re-enable it on Buildbot/non-MP GCE VMs. >I also think that replacing unattended-upgrades cron with Puppet won't change much That's not what I meant. Puppet would just enforce that unattended-upgrades is installed and enabled for non-MP VMs, rather than having it part of the base image.
,
May 24 2017
> That's why I suggested that we reinstall and re-enable it on Buildbot/non-MP GCE VMs. But we don't automatically update images of MP VMs either, so I don't see how it is different from Buildbot VMs: both run ancient images. True, it is simpler to update MP image, but we don't do it in practice. > That's not what I meant This was reply to last point in #16 :) Oh, also "we do shitty thing somewhere" shouldn't imply that it is fine to do shitty thing everywhere :(
,
May 24 2017
Ok, so we need cloud-init. cloud-init requires unattended-upgrades. unattended-upgrades wanted to update apache2, which, if they deem important, then it most likely is. I agree that the tests that broke should be fixed, not the other way around in this case. If we are going to require unattended-upgrades on GCE then we should require it globally. It would solve a ton of updates/security issues for us it seems. There was some verbage about reboots from it which is a bit scary though.
,
May 24 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/puppet/+/a339012bd11beaf0e368ae0b323416e9ac406796 commit a339012bd11beaf0e368ae0b323416e9ac406796 Author: Elliott Friedman <friedman@google.com> Date: Wed May 24 21:39:23 2017
,
May 24 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/infra/infra_internal/+/403ca5f764e000dcf8b1ba9c1d56d5df003dad3c commit 403ca5f764e000dcf8b1ba9c1d56d5df003dad3c Author: Elliott Friedman <friedman@google.com> Date: Wed May 24 21:47:40 2017
,
May 24 2017
Just for the record, unattended-upgrades can be installed, but disabled via files in /etc/apt/apt.conf.d/
,
May 24 2017
Yes, but there's no simple interface for it. On all hosts with the package, there is 50unattended-upgrades but on some there is also 20auto-upgrades. Also 10periodic should probably be updated as well. It doesn't feel super clean to me but I guess we can go down that path if need be.
,
May 24 2017
,
Jun 23 2017
,
Jun 23 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by hinoka@chromium.org
, May 24 2017