The rollout process (currently being done on chromium-swarm-dev) is a bit clanky:
1. Modify bots.cfg first (adding require_gce_vm_token for ALL GCE bots that currently use require_luci_machine_token, but also keeping require_luci_machine_token).
2. Land the change, wait until it applies everywhere.
3. Modify bot_config.py to start sending GCE VM tokens if the bot runs on GCE.
4. Land the change, confirm everything still works and bots use GCE VM auth for real (via viceroy console).
5. Wait a month or so... (in case something breaks horribly).
5. Remove require_luci_machine_token{} fallbacks from bots.cfg.
6. Remove reading of luci token files from bot_config.py, remove luci_machine_tokend cron and other related stuff.
Fully deployed to -dev, works fine. Even works on Windows GCE and on dockerized GCE bots (so looks like metadata server is available from inside docker containers).
Boring graph: http://shortn/_vwFvKDW4xj (2 QPS with luci tokens come from non-GCE bots).
Next: step fully deploy this to prod (perhaps in stages) before the production freeze. Keep the fallback in place. Wait until mid January. Remove the fallback.
Comment 1 by bugdroid1@chromium.org
, Nov 5