swarming.py hangs with no output until killed by builder. |
|||||||
Issue descriptionIt started showing up this afternoon (12/08) See an example build: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/7230 Error message: 14:46:31: INFO: [1;31m14:46:31: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds Traceback (most recent call last): File "/b/c/cbuild/repository/chromite/cbuildbot/builders/generic_builders.py", line 120, in _RunParallelStages parallel.RunParallelSteps(steps) File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 679, in RunParallelSteps return [queue.get_nowait() for queue in queues] File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 676, in RunParallelSteps pass File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 562, in ParallelTasks raise BackgroundFailure(exc_infos=errors) BackgroundFailure: <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds [0m
,
Dec 11 2017
adding this week's folks. dgarrett: is this an infra issue? I'm not sure how we even know this is a "provision error", the output is pretty opaque to me.
,
Dec 11 2017
,
Dec 11 2017
Oh... no, that's not a provision error. There is a long standing problem with swarming.py incorrectly appearing to hang.
,
Dec 11 2017
I thought xixuan@ had fixed this previously.
,
Dec 12 2017
,
Dec 12 2017
Re #6, yes. I will prepare a fix later today.
,
Dec 13 2017
Issue 794125 has been merged into this issue.
,
Dec 14 2017
,
Dec 14 2017
More examples of this. It seems that we are hitting this a LOT now, and it's blocking a variety of things.
,
Dec 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/cd0856953252117831f4c4441093924d390ae14d commit cd0856953252117831f4c4441093924d390ae14d Author: Xixuan Wu <xixuan@chromium.org> Date: Fri Dec 15 02:11:53 2017 cbuildbot: retry swarming commands if it's timed out. BUG= chromium:793499 TEST=Run tryjob: https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/paladin/4811# Change-Id: I70861c0b7f22312295ea3a71cd8b55b332168519 Reviewed-on: https://chromium-review.googlesource.com/823617 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/cd0856953252117831f4c4441093924d390ae14d/cbuildbot/swarming_lib.py
,
Dec 15 2017
Hopefully, this is now fixed?
,
Dec 15 2017
I hope so... I will wait for some days for more reports if there's any...
,
Feb 10 2018
Never see this issue reported. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by x...@chromium.org
, Dec 11 2017