swarming.py collect --print-status-updates times out on swarming |
||
Issue descriptionhttps://ci.chromium.org/p/infra/builders/luci.infra.try/Build%20Recipes%20Tester/b8939547300825591824 is a sample build The specific step that fails is 'swarming.py collect'. I pass in --print-status-updates, but it only seems to print out one update. I looked at the code and it should do this every 15 minutes if I understand it correctly. I also don't think the task has a timeout less than 15 minutes? There's an execution timeout of 30 minutes.... Any ideas what's happening here? There's a separate issue where it's not actually collecting the issue itself. Although maybe there's a bug in the swarming.py collect code which causes it to hang and never collect the task.
,
Jul 30
Can someone familiar with swarming take a look?
,
Jul 31
Looks like a problem with the recipe. Tasks: https://chromium-swarm.appspot.com/task?id=3f05918474816f10 https://chromium-swarm.appspot.com/task?id=3f05930303f1b010 In particular the first task didn't print anything as the recipe ran. I think the right fix is to disable the I/O timeout. https://chromium-swarm.appspot.com/task?id=3f05918474816f10&show_raw=1
,
Jul 31
What do you mean didn't print anything? https://chromium-swarm.appspot.com/task?id=3f05918474816f10&refresh=10&show_raw=1 doesn't print anything because the recipe is running. That's due to how kitchen works afaik. I'm more confused why `swarming.py collect` runs for 23 hours in https://chromium-swarm.appspot.com/task?id=3f05918474816f10&refresh=10 (same task). It doesn't print anything, it just sits there and hangs. And IIRC I've run this locally and it's collected the result just fine. I'll double check right now.
,
Jul 31
Don't be confused by the Milo output, the task https://chromium-swarm.appspot.com/task?id=3f05918474816f10 only ran for 30 minutes exactly. That's why I recommend disabling the io_timeout.
,
Jul 31
Ah. But the task has no I/O timeout. It just has "--" there, which I assume means nothing. Do I want to disable the execution timeout instead? Also I'm not sure how to change that, since this task is triggered by buildbucket...
,
Jul 31
Errr, you're right. It has an execution timeout of 30m though, which seems to be too low.
,
Jul 31
And yes the default is low, to make sure people don't run long tasks without knowing.
,
Jul 31
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/687b3af44494680e7f4d0bacb45d09d6ec6dc084 commit 687b3af44494680e7f4d0bacb45d09d6ec6dc084 Author: Stephen Martinis <martiniss@chromium.org> Date: Tue Jul 31 20:04:25 2018 tcmalloc builder: Remove cores dimension Don't really need to specify cores, as only one bot has that builder dimension. Bug: 868024, 869187 Change-Id: Ie59cf98fc59d4bbc869618bc2add13aeedf10759 Reviewed-on: https://chromium-review.googlesource.com/1157179 Commit-Queue: Stephen Martinis <martiniss@chromium.org> Reviewed-by: Robbie Iannucci <iannucci@chromium.org> Cr-Commit-Position: refs/heads/master@{#579539} [modify] https://crrev.com/687b3af44494680e7f4d0bacb45d09d6ec6dc084/infra/config/global/cr-buildbucket.cfg |
||
►
Sign in to add a comment |
||
Comment 1 by martiniss@chromium.org
, Jul 30