Bots failing bot_update due to googlesource replication lag |
|||||
Issue descriptionhttps://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29?numbuilds=200 Looks like this bot has failed bot_update with an exception four times in the last few days: https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/68393 https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/68385 https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/68361 https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/68211 The bot_update exception always looks like this: Traceback (most recent call last): File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/run.py", line 376, in _old_run recipe_result = recipe_script.run(api, properties) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/loader.py", line 98, in run self.run_steps, properties, self.PROPERTIES, api=api) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/loader.py", line 627, in invoke_with_properties **additional_args) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/loader.py", line 588, in _invoke_with_properties return callable_obj(*props, **additional_args) File "/b/build/scripts/slave/recipes/cronet.py", line 189, in RunSteps chromium_apply_config=builder_config.get('chromium_apply_config')) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/recipe_api.py", line 745, in _inner return func(*a, **kw) File "/b/build/scripts/slave/recipe_modules/cronet/api.py", line 54, in init_and_sync droid.init_and_sync(use_bot_update=True) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/recipe_api.py", line 745, in _inner return func(*a, **kw) File "/b/build/scripts/slave/recipe_modules/chromium_android/api.py", line 133, in init_and_sync spec, refs=refs, with_branch_heads=with_branch_heads) File "/b/build/scripts/slave/.recipe_deps/recipe_engine/recipe_engine/recipe_api.py", line 745, in _inner return func(*a, **kw) File "/b/build/scripts/slave/.recipe_deps/depot_tools/recipes/recipe_modules/bot_update/api.py", line 327, in ensure_checkout co_root = result['root'] KeyError: 'root' CC'ing agable@ who made the last change to api.py and jbudorick@ because he always seems to know why Cronet's bots fail :)
,
Oct 6 2017
I should mention that this appears to be affecting many Cronet bots. For example here's another: https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20ARM64%20Builder%20%28dbg%29?numbuilds=200 The different bots are running on different slaves.
,
Oct 6 2017
Oh interesting, yep, it's happening in a bunch of places, but with a different hash every time.
,
Oct 6 2017
Uh oh, I found it. It's googlesource latency. Example: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.android%2FAndroid_Cronet_x86_Builder__dbg_%2F68361%2F%2B%2Frecipes%2Fsteps%2Fbot_update%2F0%2Fstdout Tries to check out 3df0d0574d1f53e831bd5d9f7ae9c825c0c8cb38 Runs 'git fetch' in the cache, and gets: From https://chromium.googlesource.com/chromium/src 0c0ed17d896b..71b25bc67d18 master -> master But 71b is the *parent* of 3df (https://chromium.googlesource.com/chromium/src/+/3df0d0574d1f53e831bd5d9f7ae9c825c0c8cb38). So even though 3df has landed in the repo and triggered a job to start, when that job fetches the repo from GoB it doesn't download that commit. That commit was landed at :49 The job was started at :52 The fetch started at least a minute and a half after that (given step timings prior to bot_update). So that's a GoB replication latency of ~5 minutes, which is unnaceptably slow. Filing an internal bug against them.
,
Oct 6 2017
b/67501786 Adding Trooper label so trooper is aware of this issue. If others report similar symptoms, please dupe into this bug.
,
Oct 19 2017
,
Oct 20 2017
New trooper checking in. Looks like the consistency hiccup is still occurring very occasionally: https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/69451 https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/69430 https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20x86%20Builder%20%28dbg%29/builds/69417 Not sure what else we can do here. Over to agable to decide if we should just close this as wontfix.
,
Oct 20 2017
On the day I noticed it, every time I looked at https://build.chromium.org/p/chromium.gpu.fyi/console there were 4 or 5 purple bots. If this could happen on CQ bots wouldn't you have a day of people unable to land changes? Unless there's reason to believe this is extremely rare and won't get worse it doesn't seem like something to be ignored.
,
Oct 20 2017
Please post updates on the internal bug; there's nothing we can do from here.
,
Dec 1 2017
Keeping this bug open isn't helping; all progress here is tracked in GoB team's code-yellow bug: b/35998605
,
Jan 3 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by aga...@chromium.org
, Oct 6 2017