|swarming.py hangs with no output until killed by builder.|
|Project Member Reported by x...@chromium.org, Dec 8||Back to list|
It started showing up this afternoon (12/08) See an example build: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/7230 Error message: 14:46:31: INFO: [1;31m14:46:31: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds Traceback (most recent call last): File "/b/c/cbuild/repository/chromite/cbuildbot/builders/generic_builders.py", line 120, in _RunParallelStages parallel.RunParallelSteps(steps) File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 679, in RunParallelSteps return [queue.get_nowait() for queue in queues] File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 676, in RunParallelSteps pass File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 562, in ParallelTasks raise BackgroundFailure(exc_infos=errors) BackgroundFailure: <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds [0m
Seems the error still shows up on the latest lumpy-chrome-pfq: https://luci-milo.appspot.com/buildbot/chromeos/lumpy-chrome-pfq/11085
adding this week's folks. dgarrett: is this an infra issue? I'm not sure how we even know this is a "provision error", the output is pretty opaque to me.
Oh... no, that's not a provision error. There is a long standing problem with swarming.py incorrectly appearing to hang.
I thought xixuan@ had fixed this previously.
Is this the same as issue 772985, then?
Re #6, yes. I will prepare a fix later today.
Issue 794125 has been merged into this issue.
Issue 794859 has been merged into this issue.
More examples of this. It seems that we are hitting this a LOT now, and it's blocking a variety of things.
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/cd0856953252117831f4c4441093924d390ae14d commit cd0856953252117831f4c4441093924d390ae14d Author: Xixuan Wu <firstname.lastname@example.org> Date: Fri Dec 15 02:11:53 2017 cbuildbot: retry swarming commands if it's timed out. BUG=chromium:793499 TEST=Run tryjob: https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/paladin/4811# Change-Id: I70861c0b7f22312295ea3a71cd8b55b332168519 Reviewed-on: https://chromium-review.googlesource.com/823617 Commit-Ready: Xixuan Wu <email@example.com> Tested-by: Xixuan Wu <firstname.lastname@example.org> Reviewed-by: Don Garrett <email@example.com> [modify] https://crrev.com/cd0856953252117831f4c4441093924d390ae14d/cbuildbot/swarming_lib.py
Hopefully, this is now fixed?
I hope so... I will wait for some days for more reports if there's any...
|► Sign in to add a comment|