Exec slaves.cfgs on slave-side in a subprocess |
|||||||
Issue description
,
Mar 1 2017
,
Mar 1 2017
,
Apr 26 2017
Erik - can you help find an owner for this? It's a p1 from a PM.
,
May 24 2017
Erik? Who would be the best owner for this? Please update. Thanks!
,
May 24 2017
I'd like more info on what would need to be done here and why it prevents the issue in the postmortem.
,
May 24 2017
The issue is that slaves.cfg files are executed on masters to extract some configuration (probably master->builder mapping, machenbach@ will know more). As a result they could affect master processes and take them down as happened in the outage described in the postmortem. If we were to execute slaves.cfg files in a subprocess, they will not be able to affect global Python state of a master and thus would prevent the outages of this kind.
,
May 24 2017
Ok, so we still want to execute them master-side but in a subprocess, right?
,
May 27 2017
Yes.
,
May 29 2017
The other way around. The slave-side execution should be done in a subprocess. slaves.cfg is a master configuration file and it is standard buildbot logic to execute the slaves.cfg on the master. Chrome has extra code that also makes slaves execute the slaves.cfg on startup to figure out which master they belong to. This execution might go wrong and prevent the slave from starting up properly, which happened in the underlying bug.
,
Sep 7 2017
This is blocking closing of cit-pm-18. Any updates on this?
,
Sep 10 2017
No updates, we're pushing to get the LUCI migration done so changes to buildbot will continue to be lower priority.
,
Aug 17
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by benhenry@chromium.org
, Mar 1 2017