Restart the payload copy workqueue during push to devserver
Reported by
jrbarnette@chromium.org,
Sep 20 2017
|
|||||
Issue description
With the addition of the provision payload copying workqueue,
there are now two upstart jobs running on a lab devserver:
devserver
provision-workqueue
If we perform an update to the devserver code for any purpose,
we need to restart both jobs. Currently, the standard push to
devserver only knows how to restart the 'devserver' job, not
the 'provision-workqueue'. We should fix that.
,
Sep 28 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/27ea686fef5572bc5b06ebd03f67068d46848873 commit 27ea686fef5572bc5b06ebd03f67068d46848873 Author: Richard Barnette <jrbarnette@google.com> Date: Thu Sep 28 00:28:13 2017
,
Sep 28 2017
We'll need to do a push to devserver before we can completely declare victory. Passing this to the deputy second to be scheduled at his leisure.
,
Sep 28 2017
... Can't be scheduled compleetely at leisure. The change above needs to allow for a puppet push. An automated push should be complete before 23:00 today, so the devserver push should probably wait until tomorrow.
,
Sep 28 2017
I'm waiting for https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/687955 to land before doing any pushes, as I'm afraid of breaking stats otherwise.
,
Oct 2 2017
-> reassigning devserver push request to this week's secondary
,
Oct 3 2017
Declaring victory since most devserves were updated, but there were substantial issues. 2 devserver has locally committed git changes, breaking the ability to update. * Talk to team about not doing this. ssh connectivity was very, very unreliable. * crbug.com/771258 I don't have access to these servers, and couldn't update them. They are not owned by Eng Labs. * 172.22.39.161 * 172.22.39.162 * 172.22.39.163 * 172.22.39.164
,
Oct 4 2017
Argh! It seems the change didn't work: Apparently, 'restart devserver' doesn't emit a 'stopped' event'.
,
Oct 12 2017
> Argh! It seems the change didn't work: Apparently, 'restart devserver' > doesn't emit a 'stopped' event'. I've performed some controlled experiments, and as best I can tell, the code should have worked. I don't fully understand what's going yet... In any event, I've confirmed that if you restart the provision-workqueue job, and _then_ restart the devserver, all works as expected.
,
Oct 12 2017
The behavior is coming from upstart. I've confirmed that if you change the start conditions for job A to depend on job B, then restart job B, the dependency isn't recognized until job A is also restarted. So... The fix here is to go through and manually restart provision-workqueue anywhere it's out of date.
,
Oct 13 2017
I've gone through every devserver known to puppet, and restarted the job. I then checked that all of the jobs were running, with a start time as expected. So... On the basis of reliable reports, I believe that the next time we push to devserver, the provision-workqueue job will be restarted along with it. However, I haven't _actually_ seen that happen. On that basis, marking this 'Fixed', but not 'Verified'... |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jrbarnette@chromium.org
, Sep 27 2017Owner: jrbarnette@chromium.org
Status: Started (was: Available)
There are two ways to deal with this: A) Change the commands we run during push to include restarting the workqueue upstart job. B) Change the workqueue upstart job to restart with the devserver job. I'm going with plan B, because I think that will be more robust from a maintenance perspective.