DevserverProvisionTask fails simple staging call sanity check at the end
Reported by
jrbarnette@chromium.org,
Aug 3
|
|||
Issue descriptionAt then end of `bin/run_server_task DevserverProvisionTask`, the command performs two staging calls to the devserver service on the target system (presumably as a sanity check). Sometimes, that check fails. Retrying the command tends to work. The failure looks like this: [chromeos15-infra-devserver12.cros.corp.google.com] sudo: service devserver status [chromeos15-infra-devserver12.cros.corp.google.com] out: sudo password: [chromeos15-infra-devserver12.cros.corp.google.com] out: devserver start/running, process 1092 [chromeos15-infra-devserver12.cros.corp.google.com] out: [chromeos15-infra-devserver12.cros.corp.google.com] run: gsutil ls gs://chromeos-image-archive/daisy-release/ | tail -2 | head -1 [chromeos15-infra-devserver12.cros.corp.google.com] out: gs://chromeos-image-archive/daisy-release/R70-10934.0.0/ [chromeos15-infra-devserver12.cros.corp.google.com] out: [chromeos15-infra-devserver12.cros.corp.google.com] run: curl "http://localhost:8082/stage?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/daisy-release/R70-10934.0.0/" [chromeos15-infra-devserver12.cros.corp.google.com] out: curl: (7) Failed to connect to localhost port 8082: Connection refused At first blush, this looks like it might be a race with devserver startup (the command is run immediately after a reboot). Morever, at one point I went to a server that had failed this way, and manually ran the same 'curl' command; it passed without incident. Reinforcing the "race condition" hypothesis: When I saw this fail, it was during a sequence of deployments on three different servers: * The first server failed, and succeeded on retry. * The second server failed twice, and succeeded on the third try. * The third server never saw this failure. Attached is the log from the server with the single failure.
,
Aug 8
There's still a race though, if Puppet actually needs to restart nginx. Fixing the bug in #1 will make it so it should not take more than two runs, but adding a well placed sleep should make that rare enough.
,
Aug 8
,
Aug 8
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/085e11b7cafb195ba285b36603d49a06fc23c3b7 commit 085e11b7cafb195ba285b36603d49a06fc23c3b7 Author: Allen Li <ayatane@chromium.org> Date: Wed Aug 08 18:00:29 2018
,
Aug 15
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/c4fe5897a767efca54c38bfff206cc124f41a1ee commit c4fe5897a767efca54c38bfff206cc124f41a1ee Author: Allen Li <ayatane@chromium.org> Date: Wed Aug 15 18:43:44 2018
,
Aug 15
|
|||
►
Sign in to add a comment |
|||
Comment 1 by ayatane@chromium.org
, Aug 8Ugh, we are restarting nginx every run exec { 'disable_nginx_init_script': command => '/usr/sbin/update-rc.d nginx disable', user => 'root', } So it's hit or miss whether any particular run will pass.