Changes in injected bot_config.py via bot_config_script in bots.cfg doesn't force bots to update |
||
Issue descriptionThere's an on-going problem where a luci-config change in infradata.git that *only* modify an injected scripts/foo.py script will not be correctly propagated to the bots. Workaround: Do a whitespace change to bot_config.py to force all bots to restart. Having issue 706449 resolved would likely help with this but this should be fixable without this functionality, hence not marking as a blocking bug.
,
Apr 10 2017
rev is revision of config set, not individual file
,
Apr 10 2017
This bug is about a bug likely in the Swarming server itself, not in the config service. https://github.com/luci/luci-py/blob/master/appengine/swarming/server/bot_groups_config.py contains the logic.
,
Jan 9 2018
Looks like some patches are being picked up by the bots, e.g. https://crrev.com/i/540998 caused 62 bots to be quarantined, but revert https://crrev.com/i/542418 was not picked up until all affected bots were manually rebooted.
,
Jan 11 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/464847672015d64f25f83fa1b9fa2ece2dcd649e commit 464847672015d64f25f83fa1b9fa2ece2dcd649e Author: Vadim Shtayura <vadimsh@chromium.org> Date: Thu Jan 11 21:46:42 2018
,
Jan 11 2018
I just did a test change that touches only android.py script on chromium-swarm-dev and it correctly propagated. Looking at the code, it should work... The body of the config script is used to build bot_group_cfg_version digest. And bots restart whenever they detect changes to bot_group_cfg_version, therefore they restart when config script changes. The worst that can happen is bot picking up new config, restarting, then picking up old config, and then restarting again to pick up the new config again. Eventually all bots should pick up the new config. The duration of this flapping period can be reduced by reducing _BotGroups cache expiration time (which is currently 30 sec). With refactoring done to bot_groups_config in Issue 795168 this is relatively safe to do, so I'll probably reduce the caching duration to 1 sec.
,
Jan 11 2018
Ok thanks. One thing we could do is to force fetching the bot_config_cfg_version via memcache instead of a local value. It'd reduce the performance a bit but would remove the flapping.
,
Jan 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/1a03254ca472d74a666bd93802bbed1d03781dad commit 1a03254ca472d74a666bd93802bbed1d03781dad Author: Vadim Shtayura <vadimsh@chromium.org> Date: Fri Jan 12 15:16:16 2018 [swarming] Reduce the chance of "flapping" of bot config script propagation. When we change additional bot config script, corresponding bots see the different bot_groups_cfg_version and proceed to restart. But bot_groups_cfg_version is part of _BotGroups tuple, cached in local GAE process memory for 30 sec. It means once bot restarts, it can hit some other GAE instance that still hold stale cache value of _BotGroups, and this will cause the bot to restart again. To avoid this, we reduce the expiration time of bot groups config cache to 1 sec and at the same time, let bots sleep 2 sec before they restart. This makes sure restarted bots always hit a fresh cache on GAE side. For the common code path, 1 sec expiration means each GAE instance each second will fetch a tiny BotsCfgHead entity (using ndb memcache), discover that nothing has changed, and will continue using existing in-memory cache it already holds. R=maruel@chromium.org BUG= 710033 Change-Id: Ifc072636b92a4e6fc2134ef66d1e1d0aeed4bbdc Reviewed-on: https://chromium-review.googlesource.com/862842 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/1a03254ca472d74a666bd93802bbed1d03781dad/appengine/swarming/server/bot_groups_config.py [modify] https://crrev.com/1a03254ca472d74a666bd93802bbed1d03781dad/appengine/swarming/swarming_bot/bot_code/bot_main.py
,
Jan 22 2018
I believe this should be fixed. Please try NOT to touch bot_config.py when updating android.py (or something) next time and report if it doesn't propagate everywhere. Note that updating bot_config_script doesn't change "Expected bot version", so there'll be no yellow highlight for bots running old config. They still are restarted though to pick it up after they finish their current task. Its indicated by the line "Restarting to pick up new bots.cfg config" in Events list on the bot page.
,
Jan 22 2018
yayyyy, thank you Vadim!
,
Jan 23 2018
Should we remove this block comment then to avoid confusion? https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/scripts/bot_config.py#34
,
Jan 23 2018
,
Jan 23 2018
The following revision refers to this bug: https://chrome-internal.googlesource.com/infradata/config/+/fa23ad73f9d146c27bbd0a2ee329b75376296c0f commit fa23ad73f9d146c27bbd0a2ee329b75376296c0f Author: Vadim Shtayura <vadimsh@chromium.org> Date: Tue Jan 23 20:37:03 2018 |
||
►
Sign in to add a comment |
||
Comment 1 by no...@chromium.org
, Apr 10 2017