Baremetal slaves don't reconnect to masters after being disassociated/re-associated. |
||||
Issue descriptionA baremetal slave's startup kicks the BuildBot slave process exactly once. At this point the slave asks the question, "do I belong to a master?". If the answer is "no", it quits, leaving the slave sitting idle. If the slave is added to a master in the future, the slave doesn't re-poll its state, forcing a trooper to reboot or kick the slave process manually. This leads to unnecessary maintenance, especially on the CrOS waterfalls, where slaves are detached and re-attached to masters all the time as part of typical builder layout adjustments. Troopers often do forget the "kick new slaves" step, leaving parts of the waterfall offline for extended periods of time. GCE builders have a monitor process wrapping BuildBot that kicks it periodically. It would be really great to have this on baremetal/VM systems too. Maybe BuildBot itself can be a subordinate of service manager?
,
Mar 28 2016
+dsansome fyi
,
Mar 29 2016
Yes I'd love to run buildbot under service manager. We'd probably need to add a per-service config option to control how often a service is restarted.
,
Apr 7 2016
,
Jun 6 2016
|
||||
►
Sign in to add a comment |
||||
Comment 1 by pgervais@chromium.org
, Mar 28 2016Status: Assigned (was: Untriaged)