New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 767220 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Restart the payload copy workqueue during push to devserver

Reported by jrbarnette@chromium.org, Sep 20 2017

Issue description

With the addition of the provision payload copying workqueue,
there are now two upstart jobs running on a lab devserver:
    devserver
    provision-workqueue

If we perform an update to the devserver code for any purpose,
we need to restart both jobs.  Currently, the standard push to
devserver only knows how to restart the 'devserver' job, not
the 'provision-workqueue'.  We should fix that.

 
Labels: -Pri-2 Pri-1
Owner: jrbarnette@chromium.org
Status: Started (was: Available)
There are two ways to deal with this:
 A) Change the commands we run during push to include restarting
    the workqueue upstart job.
 B) Change the workqueue upstart job to restart with the devserver
    job.

I'm going with plan B, because I think that will be more robust
from a maintenance perspective.

Project Member

Comment 2 by bugdroid1@chromium.org, Sep 28 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/27ea686fef5572bc5b06ebd03f67068d46848873

commit 27ea686fef5572bc5b06ebd03f67068d46848873
Author: Richard Barnette <jrbarnette@google.com>
Date: Thu Sep 28 00:28:13 2017

Owner: akes...@chromium.org
Status: Assigned (was: Started)
We'll need to do a push to devserver before we can completely
declare victory.

Passing this to the deputy second to be scheduled at his leisure.
... Can't be scheduled compleetely at leisure.  The change above
needs to allow for a puppet push.  An automated push should be
complete before 23:00 today, so the devserver push should probably
wait until tomorrow.

I'm waiting for https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/687955 to land before doing any pushes, as I'm afraid of breaking stats otherwise.
Cc: akes...@chromium.org
Owner: dgarr...@chromium.org
-> reassigning devserver push request to this week's secondary
Owner: jrbarnette@chromium.org
Declaring victory since most devserves were updated, but there were substantial issues.

2 devserver has locally committed git changes, breaking the ability to update.
 * Talk to team about not doing this.
ssh connectivity was very, very unreliable.
 *  crbug.com/771258 
I don't have access to these servers, and couldn't update them. They are not owned by Eng Labs.
 * 172.22.39.161
 * 172.22.39.162
 * 172.22.39.163
 * 172.22.39.164
Argh!  It seems the change didn't work:  Apparently, 'restart devserver'
doesn't emit a 'stopped' event'.

> Argh!  It seems the change didn't work:  Apparently, 'restart devserver'
> doesn't emit a 'stopped' event'.

I've performed some controlled experiments, and as best I can tell, the
code should have worked.  I don't fully understand what's going yet...

In any event, I've confirmed that if you restart the provision-workqueue
job, and _then_ restart the devserver, all works as expected.
The behavior is coming from upstart.  I've confirmed that if you
change the start conditions for job A to depend on job B, then restart
job B, the dependency isn't recognized until job A is also restarted.

So... The fix here is to go through and manually restart provision-workqueue
anywhere it's out of date.

Status: Fixed (was: Assigned)
I've gone through every devserver known to puppet, and restarted
the job.  I then checked that all of the jobs were running, with
a start time as expected.

So...  On the basis of reliable reports, I believe that the next
time we push to devserver, the provision-workqueue job will be
restarted along with it.  However, I haven't _actually_ seen that
happen.  On that basis, marking this 'Fixed', but not 'Verified'...

Sign in to add a comment