Issue metadata
Sign in to add a comment
|
ts_mon may break running swarming task when upgrading luci-config schema on swarming |
||||||||||||||||||||||||
Issue descriptionRepro: - Add new field in https://github.com/luci/luci-py/blob/master/appengine/swarming/proto/config.proto - Roll new Swarming version to chromium-swarm.appspot.com - Push new config.cfg in the git repo fetched by luci-config Expected: Seamless upgrade. Actual: If there were tasks still running on the old Swarming version, ts_mon will cause an exception in task_update when flushing. The last error was: ConfigFormatError: 11:1 : Message type "SettingsCfg" has no field named "ui_client_id". when https://github.com/luci/luci-py/commit/e02bb087183bbf2323a0229eda4266b51d5d74d6#diff-803a4899c7fea28eed7aee90b7f44fa6 was rolled in. This is because ts_mon flushes on arbitrary versions of the code, not only on the default version. This killed the running tasks that hit this point repeatedly. Workaround: Wait for *all* task to complete when updating to a new schema before starting to use it. Which means a bit over 1 hour.
,
Aug 16 2016
By design, a bot that is running a task locks itself to the version that was used to get the task, e.g. 123-abc-dot-chromium-swarm.appspot.com. So yes, bots hit non-default version while running a task if the default version is changed while the task is running.
,
Aug 16 2016
I see. Well, another way to do this is to roll only backwards-compatible schema changes. E.g. don't just delete or rename a field, but add a new one first, switch all usage to the new field, and after some time, delete the now unused field. It's more work, but eliminates the potential downtime.
,
Sep 1 2016
,
Sep 12 2016
Removing Internals>Metrics since this is not related to UMA or experiments.
,
Sep 13 2016
it was a backward-compatible change from the proto perspective: add new field was added what needs to be done is to store the config in the datstore in binary format, not text format. Buildbucket had the same problem and this solution worked well. https://chromium-review.googlesource.com/c/367886/ https://chromium-review.googlesource.com/c/368355/ Swarming uses components.config's feature to store the config in the datastore, so fixing this bug requires changes to components.config https://cs.chromium.org/chromium/infra/luci/appengine/components/components/config/remote.py?q=store_last_good+file:remote&sq=package:chromium&l=121
,
Mar 29 2017
|
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by sergeybe...@chromium.org
, Aug 16 2016