New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 618159 link

Starred by 6 users

Issue metadata

Status: Verified
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

pre-cq-launcher and master-paladin failure: maximum recursion depth exceeded

Project Member Reported by sha...@chromium.org, Jun 8 2016

Issue description

The previous two pre-cq-launcher runs have failed:

https://uberchromegw.corp.google.com/i/chromeos/builders/pre-cq-launcher/builds/7090/steps/PreCQLauncher/logs/stdio

18:01:35: INFO: Running cidb query on pid 22222, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f3dc6c49590>
18:01:36: INFO: RunCommand: git fetch -f https://chromium-review.googlesource.com/chromiumos/third_party/kernel refs/changes/47/350247/1 in /b/cbuild/internal_master/src/third_party/kernel/v4.4
18:02:34: INFO: Running cidb query on pid 22222, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f3dc6471dd0>
18:02:35: INFO: Creating disjoint transactions.
Exception RuntimeError: RuntimeError('maximum recursion depth exceeded in __instancecheck__',) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
Exception RuntimeError: RuntimeError('maximum recursion depth exceeded while calling a Python object',) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

@@@STEP_FAILURE@@@
18:05:05: ERROR: <type 'exceptions.RuntimeError'>: maximum recursion depth exceeded
Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 601, in TaskRunner
    task(*x, **task_kwargs)
  File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 799, in <lambda>
    fn = lambda idx, task_args: out_queue.put((idx, task(*task_args)))
  File "/b/cbuild/internal_master/chromite/lib/gerrit.py", line 109, in GetChangeDetail
    self.host, change_num, o_params=('CURRENT_REVISION', 'CURRENT_COMMIT'))
  File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 347, in GetChangeDetail
    return FetchUrlJson(host, path)
  File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 232, in FetchUrlJson
    fh = FetchUrl(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 224, in FetchUrl
    _FetchUrlHelper, sleep=SLEEP)
  File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 127, in RetryException
    return GenericRetry(exc_retry, max_retry, functor, *args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 88, in GenericRetry
    ret = functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 166, in _FetchUrlHelper
    body=body)
  File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 115, in CreateHttpConn
    git.GetGitRepoRevision(os.path.dirname(os.path.realpath(__file__))),
  File "/b/cbuild/internal_master/chromite/lib/git.py", line 190, in GetGitRepoRevision
    return RunGit(cwd, ['rev-parse', branch]).output.strip()
  File "/b/cbuild/internal_master/chromite/lib/git.py", line 829, in RunGit
    ['git'] + cmd, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 88, in GenericRetry
    ret = functor(*args, **kwargs)
  File "/b/cbuild/internal_master/chromite/lib/cros_build_lib.py", line 500, in RunCommand
    stdout = _get_tempfile()
  File "/b/cbuild/internal_master/chromite/lib/cros_build_lib.py", line 481, in _get_tempfile
    return tempfile.TemporaryFile(bufsize=0)
  File "/usr/lib/python2.7/tempfile.py", line 493, in TemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
  File "/usr/lib/python2.7/tempfile.py", line 236, in _mkstemp_inner
    name = names.next()
  File "/usr/lib/python2.7/tempfile.py", line 138, in next
    choose = self.rng.choice
  File "/usr/lib/python2.7/tempfile.py", line 128, in rng
    self._rng = _Random()
RuntimeError: maximum recursion depth exceeded
 
Cc: akes...@chromium.org
Labels: -Pri-1 Pri-0
All pre-cq-launcher builds are failing on this.
Cc: vpalatin@chromium.org
Issue 618328 has been merged into this issue.
Summary: pre-cq-launcher and master-paladin failure: maximum recursion depth exceeded (was: pre-cq-launcher failure: maximum recursion depth exceeded)
Actually the master-paladin is also failing on the same issue :
e.g. 
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/11428/steps/CommitQueueSync/logs/stdio
A wild guess is that one of the patch creates an infinite recursion through CQ-DEPEND (is that even possible ?)
Actually, the one recursing infinitely on the CQ in the loop for gerrit deps only, so that's not a CQ-DEPEND...

  def _AddChangeToPlanWithDeps(self, change, plan, gerrit_deps_seen,
                               cq_deps_seen, limit_to=None,
                               include_cq_deps=True):
    """Add a change and its dependencies into a |plan|.

[...]

    # Only process the Gerrit dependencies for each change once. We prioritize
    # Gerrit dependencies over CQ dependencies, since Gerrit dependencies might
    # be required in order for the change to apply.
    old_plan_len = len(plan)
    if change not in gerrit_deps_seen:
      gerrit_deps = self._LookupUncommittedChanges(
          gerrit_deps, limit_to=limit_to)
      gerrit_deps_seen.Inject(change)
      for dep in gerrit_deps:
        self._AddChangeToPlanWithDeps(dep, plan, gerrit_deps_seen, cq_deps_seen,      <<<< recursion
                                      limit_to=limit_to, include_cq_deps=False)

Closed dependency graphs are supported by the CQ, so that shouldn't be a problem. Unless something changed recently...
error logs in master-paladin, seems not exactly the same:

https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin?numbuilds=100
While iterating (painfully) over CQ+1 CLs using gerrit CLI tool, I have noticed that CL 347462 fails strangely on the command line:

$ gerrit deps 347462
gerrit: Unhandled exception:
Traceback (most recent call last):
  File "/mnt/host/source/chromite/bin/gerrit", line 164, in <module>
    commandline.ScriptWrapperMain(FindTarget)
  File "/mnt/host/source/chromite/lib/commandline.py", line 834, in ScriptWrapperMain
    ret = target(argv[1:])
  File "/mnt/host/source/chromite/scripts/gerrit.py", line 522, in main
    functor(opts, *args)
  File "/mnt/host/source/chromite/scripts/gerrit.py", line 306, in UserActDeps
    visited_key=lambda cl: cl.gerrit_number)
  File "/mnt/host/source/chromite/scripts/gerrit.py", line 277, in _BreadthFirstSearch
    for child in children(node):
  File "/mnt/host/source/chromite/scripts/gerrit.py", line 300, in _Children
    change = _QueryChange(dep.ToGerritQueryText())[-1]
IndexError: list index out of range
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 8 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/32e1e444aad172682bcab650177a866918aaf519

commit 32e1e444aad172682bcab650177a866918aaf519
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Jun 08 16:49:41 2016

validiation_pool: log leader change when attempting transaction

BUG= chromium:618159 
TEST=unit tests pass

Change-Id: I391da542830fbcdc911d13d82c9fdf475c290228
Reviewed-on: https://chromium-review.googlesource.com/350781
Reviewed-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Shawn N <shawnn@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Vincent Palatin <vpalatin@chromium.org>

[modify] https://crrev.com/32e1e444aad172682bcab650177a866918aaf519/cbuildbot/validation_pool.py

Actually the CL is part of 392 patches deep stack of CL on kernel 4.4,
not sure our infra actually supports such a crazy size ...
Per discussion, that is almost certainly the reason we are hitting recursion depth limit / stack overflow. Sheriffs will work with owner of that giant stack to break it up into bite sized pieces.

Follow-up work: when we get a recursion depth limit exception like this in transaction creation, we should log the leader of the guilty stack and then continue, rather than crash.
Marked this CL (roughly the midpoint) as verified -1:

https://chromium-review.googlesource.com/#/c/345128/
We're fairly sure we've seen in the past large series being *automatically* broken into smaller chunks without any artificial -1. Has anything changed recently?
Marc, please provide an example of such a series so we can investigate.
> We're fairly sure we've seen in the past large series being *automatically* 
> broken into smaller chunks without any artificial -1.

it's actually failing before that step when retrieving all the CLs dependencies in gerrit
I think that is right, if the stack is small enough then the pre-cq knows how to break it into smaller pieces. I think this one was large enough that we hit the recursion limit in the code that just figures out what the stacks are (prior to breaking them up to smaller pieces).
Not that we're planning to submit such a large series again anytime soon, but what is the current limit on the number of patches in a series such that the recursion depth limit isn't hit?
Hard limit unknown. I have a CL in progress that will limit it to 150 to be safe and hopefully stay under the hard limit.
I don't think Gerrit makes sense for very large series. In the chromeos-3.18 past we went around it: https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.18/drivers/gpu/drm/i915?s=f4bf2e8329725476eb
> > Marc, please provide an example of such a series so we can investigate.

> if the stack is small enough then the pre-cq knows how to break it into [even] smaller pieces.

The size of these "even smaller pieces" seems to be around 50:
git log --pretty=fuller ^v4.4 cros/chromeos-4.4  | grep 'CommitDate: Thu Jun 9 05:' | wc -l
     49
git log --pretty=fuller ^v4.4 cros/chromeos-4.4  | grep 'CommitDate: Thu Jun 9 09:' | wc -l
     43

This seems to match older memories.
Project Member

Comment 23 by bugdroid1@chromium.org, Jun 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/d8c811db0b52d12ace8ed62d32f0120e64963e49

commit d8c811db0b52d12ace8ed62d32f0120e64963e49
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Jun 08 19:15:04 2016

validation_pool.py: recursion depth limit for _AddChangeToPlanWithDeps

BUG= chromium:618159 
TEST=None

Change-Id: I67fbe0c140a04b87d1a5c86c740d719b028b25d8
Reviewed-on: https://chromium-review.googlesource.com/350872
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>

[modify] https://crrev.com/d8c811db0b52d12ace8ed62d32f0120e64963e49/cbuildbot/validation_pool.py

Project Member

Comment 24 by bugdroid1@chromium.org, Jun 14 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/ec962d571819bc930a3fdfe8fe19e0b5041c70cc

commit ec962d571819bc930a3fdfe8fe19e0b5041c70cc
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Jun 08 18:01:26 2016

validation_pool.py: catch and better surface recursion depth exceptions

BUG= chromium:618159 
TEST=None

Change-Id: I51d1e4569f9354bd3ff8f76ea778a4df7dd91678
Reviewed-on: https://chromium-review.googlesource.com/350920
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>

[modify] https://crrev.com/ec962d571819bc930a3fdfe8fe19e0b5041c70cc/cbuildbot/validation_pool.py

Project Member

Comment 25 by sheriffbot@chromium.org, Jun 22 2016

Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable?

If a fix is in active development, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: akes...@chromium.org
Status: Fixed (was: Untriaged)
Status: Verified (was: Fixed)
Bulk verified

Sign in to add a comment