New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 6 users

Issue metadata

Status: Verified
Owner:
Last visit 24 days ago
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

swarming.py hangs with no output until killed by builder.

Project Member Reported by x...@chromium.org, Dec 8 2017

Issue description

It started showing up this afternoon (12/08)
See an example build:
https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/tricky-tot-chrome-pfq-informational/builds/7230

Error message:
14:46:31: INFO: 
14:46:31: ERROR: BaseException in _RunParallelStages <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/builders/generic_builders.py", line 120, in _RunParallelStages
    parallel.RunParallelSteps(steps)
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 679, in RunParallelSteps
    return [queue.get_nowait() for queue in queues]
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 676, in RunParallelSteps
    pass
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/b/c/cbuild/repository/chromite/lib/parallel.py", line 562, in ParallelTasks
    raise BackgroundFailure(exc_infos=errors)
BackgroundFailure: <class 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-7:7:2, started)> for 8640 seconds

 

Comment 1 by x...@chromium.org, Dec 11 2017

Seems the error still shows up on the latest lumpy-chrome-pfq: https://luci-milo.appspot.com/buildbot/chromeos/lumpy-chrome-pfq/11085
Cc: deanliao@chromium.org dtor@chromium.org ecgh@chromium.org
Owner: dgarr...@chromium.org
adding this week's folks. dgarrett: is this an infra issue? I'm not sure how we even know this is a "provision error", the output is pretty opaque to me.
Cc: davidri...@chromium.org
Summary: swarming.py hangs with no output until killed by builder. (was: Hwtest provision error on several chrome PFQs and informational PFQs )
Oh... no, that's not a provision error.

There is a long standing problem with swarming.py incorrectly appearing to hang.


Owner: xixuan@chromium.org
I thought xixuan@ had fixed this previously.
Cc: nxia@chromium.org
Is this the same as  issue 772985 , then?

Comment 7 by xixuan@chromium.org, Dec 12 2017

Status: Started (was: Untriaged)
Re #6, yes. I will prepare a fix later today.
Issue 794125 has been merged into this issue.
Cc: dgarr...@chromium.org
 Issue 794859  has been merged into this issue.
More examples of this. It seems that we are hitting this a LOT now, and it's blocking a variety of things.
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/cd0856953252117831f4c4441093924d390ae14d

commit cd0856953252117831f4c4441093924d390ae14d
Author: Xixuan Wu <xixuan@chromium.org>
Date: Fri Dec 15 02:11:53 2017

cbuildbot: retry swarming commands if it's timed out.

BUG= chromium:793499 
TEST=Run tryjob:
https://luci-milo.appspot.com/buildbot/chromiumos.tryserver/paladin/4811#

Change-Id: I70861c0b7f22312295ea3a71cd8b55b332168519
Reviewed-on: https://chromium-review.googlesource.com/823617
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/cd0856953252117831f4c4441093924d390ae14d/cbuildbot/swarming_lib.py

Hopefully, this is now fixed?
I hope so... I will wait for some days for more reports if there's any...
Status: Verified (was: Started)
Never see this issue reported.

Sign in to add a comment