Implement read-only mode in "vpython". |
|||||
Issue descriptionIn "vpython", each user assumes ownership of their VirtualEnv. In a multi-user environment, each user would run their own VirtualEnvs. Three mutations are necessary: - The ability to write to the VirtualEnv during initial setup. - The ability to write to lock files in the "vpython" root for. - The ability to delete a VirtualEnv when pruning. In order to deploy "vpython" as a system utility (via Puppet) and use a VirtualEnv for system purposes (e.g., "service manager"), we need to support a less ad-hoc control over "vpython": 1) We want to be able to have Puppet (root) create a single VirtualEnv during Puppet run-time and share it with other users. - This means no locks, since there's no sane way to share filesystem locks. 2) We need to be able to ensure that after an installation, all subsequent VirtualEnv usage is offline. Since VirtualEnvs are not written to after creation, this seems feasible. However, some details need to be worked out, including pruning old VirtualEnvs. One problem with pruning is that Puppet will run while "service manager" is still running. If a new VirtualEnv is specified, Puppet can't delete the old one, since a Python instance may still be using it. However, the Python instance that is using it can't mark its usage due to an inability to lock the root-owned VirtualEnv. Two options come to mind: == OPTION A: Offline CIPD Cache Only == No read-only mode, per-user "vpython", but shared CIPD / wheel cache. This cache would be provisioned and maintained by Puppet using "vpython" against the latest "infra_python" spec file. With this option, Puppet would create a read-only CIPD package cache at provision-time for "vpython" to use. Each user would run "vpython" like normal, self-lock, and build their own VirtualEnv. Because they're using the offline CIPD package cache, they can do a fully offline VirtualEnv setup. Puppet's workflow would be: 1) Install "vpython". 2) Install "infra_python" bundle. 3) Run "vpython -spec /path/to/infra_python/spec.venv -cipd-cache /path/to/offline-cache -dev install" 4) System services would be run through: "vpython -spec /path/to/infra_python/spec.venv -cipd-cache /path/to/offline-cache -- ./run.py etc." Pros: - Simple. This will take very few modifications, and should be completely safe. - Standard pruning is still in play, so no need for shared locks or concern removing pruning Python that is in-use. Cons: - The first time any system utility is executed, a VirtualEnv will need to be created, introducing delay. Con Mitigations: - For some users, Puppet could pre-provision their VirtualEnv, mitigating delay. - Puppet can assert correct VirtualEnv construction as root, enabling high assurance that per-user construction would be successful. == OPTION B: Read-only "vpython" == Implement a read-only mode for "vpython", preventing it from locking or pruning. Services executing in read-only mode would assume that the only "vpython" instances managing their environment are also running in read-only mode, making it safe. Puppet would be responsible for provisioning and pruning the VirtualEnvs during "infra_python" package setup. Some problems arise: - How do we clean up old environments? Without locking, we can't tell what is in-use. - This *requires* that system services use the same Python interpreter as Puppet used to set up the VirtualEnv, since that is baked into the VirtualEnv's identity. Puppet's role would be: 1) Install "vpython". 2) Install "infra_python" bundle. 3) Run "vpython -spec /path/to/infra_python/spec.venv -cipd-cache /path/to/offline-cache -dev install" 4) System services would be run through: "vpython -spec /path/to/infra_python/spec.venv -root /path/to/shared/root -read-only -- ./run.py etc." "vpython install" would bump the last-used timestamp of the touched instance(s), delaying their pruning. Pros: - No first-run setup delays or pruning delays on execution. Cons: - More complex Puppet role. - More code changes to "vpython". - If anything root-owned ever ran against the same VirtualEnv root in non-read-only mode, it could prune active environments. Need to think about how a root-owned "vpython" would know whether or not a VirtualEnv can be pruned. It knows immediately that the current (defined by the spec) should not be pruned. ATM it allowed 7 days after a VirtualEnv has been last touched (last "install"'d by Puppet) before it prunes it. Therefore, if a system is running "service_manager", installs a new spec, and doesn't reboot for 7 days, that VirtualEnv may be destroyed. == Thoughts?
,
Jul 21 2017
The problem with the "Puppet fully manage" idea is that Puppet runs continuously throughout the lifecycle of the system. Imagine: 1) Puppet runs, installs VENV_A. 1) System runs "service_manager" @ VENV_A 2) Puppet runs, installs VENV_B. In this case, how would Puppet know that (1) is still running? TBH I'm in favor of Option A b/c it is complete and simple. Vadim had some concern about initial VirtualEnv setup delay during first-run and potential for failure, though, which is how Option B came about.
,
Jul 21 2017
OTOH currently Puppet rips "infra_python" out from underneath of running instances. If the VirtualEnv changes, or if the package layout or code changes, it's entirely possible for "service_manager" to start, then later run some code that expects a file or import to work and find that it no longer exists. On Linux systems, services shut down while they are being upgraded. Is it reasonable at all to have Puppet shut down all "infra_python" services when it determines that an upgrade is warranted? Even w/ individual "vpython" and lock protection, we're still in a situation where the "infra_python" CIPD package contents, layout, etc., could change while "infra_python" code is running. If the code references any externals after it starts up, this is a fundamental problem. Maybe having root prune the environment after 7 days isn't so bad...
,
Jul 21 2017
Few independent thoughts:
1. As I said before multiple times, I'm against option A, since it introduces possibility of "delayed" errors (when building per-user environment) that happen at uncontrollable, not-well-monitored moments. As much provisioning steps as possible should be happening within Puppet run (all of them, ideally). E.g if there's not enough disk space to build venv, we want Puppet run to fail (keeping all existing services intact), instead of "succeeding", only to make service_manager fail to start after the next reboot (which happens at unknown moment in time, perhaps delayed for days).
2. > OTOH currently Puppet rips "infra_python" out from underneath of running instances
Yeah, it kind of does. But in practice it looks similar to running "gclient sync" while Buildbot process is still up: it updates files in-place, and doesn't change directory inode or anything like that. So if processes do not load modules during runtime, it is mostly harmless (well.. mostly).
Now, IIUC, vpython will try to _delete_ all files, and then recreate new environment in a new directory with new inode. This is more destructive procedure. Python may freak out if it finds its sys.prefix has been deleted. (Or maybe not).
I think "gc old venvs after N days" is actually better than both what we had and what we have with existing vpython implementation. If this is exposed as "vpython gc", we can also try to hook it up to run after reboot (though this may be complicated).
3. I've been thinking about idempotency of vpython installs w.r.t. Puppet integration. CIPD achieves them by using special "cipd puppet-check-updates" subcommand, which is used as "onlyif" guard in 'exec' resource. This is required to:
* Make sure puppet rerun cipd if previous install is "dirty".
* Avoid Puppet emitting "something has changed!!!" signal, when nothing has really changed. Without it, puppet will be restarting all services that depend on cipd-installed packages all the time (thinking their code has changed).
I believe vpython will need similar functionality: a subcommand that says whether the install root is good enough or it needs more work.
Also, I'm curios how you see API for using vpython for installing stuff via Puppet. I can imagine few variants:
vpython { "/opt/python-venv":
manifest => "<vpython manifest in plain text>"
...
}
or
vpython { "/opt/python-venv":
bundle => "infra/venv/crappy-python-env"
version => "git_revision:...."
...
}
(where 'bundle' is a CIPD package with vpython manifest and all necessary wheels).
--
I guess, I see vpython+Puppet as a tool to install isolated virtual environments, not as a tool that runs some programs, which may install environments. I mean, there's no single script that can have the vpython manifest in it, the manifest is supplied from outside as a parameter...
,
Jul 21 2017
Hm, I forgot that we actually want to also deploy non-wheel code too (for infra_python). Maybe
vpython { "/opt/infra-python":
package => "infra/infra_python", # CIPD package with all the code and manifest
manifest => "run.py", # the file that embeds the manifest, or manifest itself
...
}
It will either use cipd {...} resource inside to install 'infra/infra_python', or will do necessary cipd calls itself. (I even consider possibility of making vpython its own flavor of CIPD client, so it can be fetched from CIPD backend directly, in same way we bootstrap cipd client now).
The property I very much like to see in puppet-level API is "atomicity": if you want to install something managed by vpython, you declare it as single resource.
,
Jul 25 2017
Expanding more on the "gc" idea, what about: 1) Add a "managed" boolean flag to the VirtualEnv protobuf. "vpython" automatic pruning will ignore managed VirutalEnv. 2) Add a "vpython provision" subcommand that will accept 1 or more "spec" files, provision those VirtualEnv, mark them all "managed: true", and write the set of VirtualEnv to a manifest file. 3) Add a "vpython gc" command that prune any VirtualEnv, managed or otherwise, that's not in the current manifest. 4) Add support in "run.py" to read a local file (//infra/vpython_root) containing the "vpython" root to use. We will rig things such that: 1) Puppet uses "vpython provision" to ensure that a set of VirtualEnv are installed into a root. 2) Puppet will drop a "//infra/vpython_root" file into the CIPD package that it provisions to let "run.py" know to use the "infra_python" vpython root. 3) We will run "vpython gc" on boot (as part of starting service manager?).
,
Aug 15 2017
,
Aug 16
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 30
,
Oct 18
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by iannucci@chromium.org
, Jul 21 2017