pre-cq-launcher and master-paladin failure: maximum recursion depth exceeded |
||||||
Issue descriptionThe previous two pre-cq-launcher runs have failed: https://uberchromegw.corp.google.com/i/chromeos/builders/pre-cq-launcher/builds/7090/steps/PreCQLauncher/logs/stdio 18:01:35: INFO: Running cidb query on pid 22222, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f3dc6c49590> 18:01:36: INFO: RunCommand: git fetch -f https://chromium-review.googlesource.com/chromiumos/third_party/kernel refs/changes/47/350247/1 in /b/cbuild/internal_master/src/third_party/kernel/v4.4 18:02:34: INFO: Running cidb query on pid 22222, repr(query) starts with <sqlalchemy.sql.expression.Insert object at 0x7f3dc6471dd0> 18:02:35: INFO: Creating disjoint transactions. Exception RuntimeError: RuntimeError('maximum recursion depth exceeded in __instancecheck__',) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored Exception RuntimeError: RuntimeError('maximum recursion depth exceeded while calling a Python object',) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored @@@STEP_FAILURE@@@ 18:05:05: ERROR: <type 'exceptions.RuntimeError'>: maximum recursion depth exceeded Traceback (most recent call last): File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 601, in TaskRunner task(*x, **task_kwargs) File "/b/cbuild/internal_master/chromite/lib/parallel.py", line 799, in <lambda> fn = lambda idx, task_args: out_queue.put((idx, task(*task_args))) File "/b/cbuild/internal_master/chromite/lib/gerrit.py", line 109, in GetChangeDetail self.host, change_num, o_params=('CURRENT_REVISION', 'CURRENT_COMMIT')) File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 347, in GetChangeDetail return FetchUrlJson(host, path) File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 232, in FetchUrlJson fh = FetchUrl(*args, **kwargs) File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 224, in FetchUrl _FetchUrlHelper, sleep=SLEEP) File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 127, in RetryException return GenericRetry(exc_retry, max_retry, functor, *args, **kwargs) File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 88, in GenericRetry ret = functor(*args, **kwargs) File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 166, in _FetchUrlHelper body=body) File "/b/cbuild/internal_master/chromite/lib/gob_util.py", line 115, in CreateHttpConn git.GetGitRepoRevision(os.path.dirname(os.path.realpath(__file__))), File "/b/cbuild/internal_master/chromite/lib/git.py", line 190, in GetGitRepoRevision return RunGit(cwd, ['rev-parse', branch]).output.strip() File "/b/cbuild/internal_master/chromite/lib/git.py", line 829, in RunGit ['git'] + cmd, **kwargs) File "/b/cbuild/internal_master/chromite/lib/retry_util.py", line 88, in GenericRetry ret = functor(*args, **kwargs) File "/b/cbuild/internal_master/chromite/lib/cros_build_lib.py", line 500, in RunCommand stdout = _get_tempfile() File "/b/cbuild/internal_master/chromite/lib/cros_build_lib.py", line 481, in _get_tempfile return tempfile.TemporaryFile(bufsize=0) File "/usr/lib/python2.7/tempfile.py", line 493, in TemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags) File "/usr/lib/python2.7/tempfile.py", line 236, in _mkstemp_inner name = names.next() File "/usr/lib/python2.7/tempfile.py", line 138, in next choose = self.rng.choice File "/usr/lib/python2.7/tempfile.py", line 128, in rng self._rng = _Random() RuntimeError: maximum recursion depth exceeded
,
Jun 8 2016
All pre-cq-launcher builds are failing on this.
,
Jun 8 2016
,
Jun 8 2016
Issue 618328 has been merged into this issue.
,
Jun 8 2016
Actually the master-paladin is also failing on the same issue : e.g. https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/11428/steps/CommitQueueSync/logs/stdio
,
Jun 8 2016
A wild guess is that one of the patch creates an infinite recursion through CQ-DEPEND (is that even possible ?)
,
Jun 8 2016
Actually, the one recursing infinitely on the CQ in the loop for gerrit deps only, so that's not a CQ-DEPEND...
def _AddChangeToPlanWithDeps(self, change, plan, gerrit_deps_seen,
cq_deps_seen, limit_to=None,
include_cq_deps=True):
"""Add a change and its dependencies into a |plan|.
[...]
# Only process the Gerrit dependencies for each change once. We prioritize
# Gerrit dependencies over CQ dependencies, since Gerrit dependencies might
# be required in order for the change to apply.
old_plan_len = len(plan)
if change not in gerrit_deps_seen:
gerrit_deps = self._LookupUncommittedChanges(
gerrit_deps, limit_to=limit_to)
gerrit_deps_seen.Inject(change)
for dep in gerrit_deps:
self._AddChangeToPlanWithDeps(dep, plan, gerrit_deps_seen, cq_deps_seen, <<<< recursion
limit_to=limit_to, include_cq_deps=False)
,
Jun 8 2016
Closed dependency graphs are supported by the CQ, so that shouldn't be a problem. Unless something changed recently...
,
Jun 8 2016
error logs in master-paladin, seems not exactly the same: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin?numbuilds=100
,
Jun 8 2016
While iterating (painfully) over CQ+1 CLs using gerrit CLI tool, I have noticed that CL 347462 fails strangely on the command line:
$ gerrit deps 347462
gerrit: Unhandled exception:
Traceback (most recent call last):
File "/mnt/host/source/chromite/bin/gerrit", line 164, in <module>
commandline.ScriptWrapperMain(FindTarget)
File "/mnt/host/source/chromite/lib/commandline.py", line 834, in ScriptWrapperMain
ret = target(argv[1:])
File "/mnt/host/source/chromite/scripts/gerrit.py", line 522, in main
functor(opts, *args)
File "/mnt/host/source/chromite/scripts/gerrit.py", line 306, in UserActDeps
visited_key=lambda cl: cl.gerrit_number)
File "/mnt/host/source/chromite/scripts/gerrit.py", line 277, in _BreadthFirstSearch
for child in children(node):
File "/mnt/host/source/chromite/scripts/gerrit.py", line 300, in _Children
change = _QueryChange(dep.ToGerritQueryText())[-1]
IndexError: list index out of range
,
Jun 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/32e1e444aad172682bcab650177a866918aaf519 commit 32e1e444aad172682bcab650177a866918aaf519 Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Jun 08 16:49:41 2016 validiation_pool: log leader change when attempting transaction BUG= chromium:618159 TEST=unit tests pass Change-Id: I391da542830fbcdc911d13d82c9fdf475c290228 Reviewed-on: https://chromium-review.googlesource.com/350781 Reviewed-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Shawn N <shawnn@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Vincent Palatin <vpalatin@chromium.org> [modify] https://crrev.com/32e1e444aad172682bcab650177a866918aaf519/cbuildbot/validation_pool.py
,
Jun 8 2016
Actually the CL is part of 392 patches deep stack of CL on kernel 4.4, not sure our infra actually supports such a crazy size ...
,
Jun 8 2016
Per discussion, that is almost certainly the reason we are hitting recursion depth limit / stack overflow. Sheriffs will work with owner of that giant stack to break it up into bite sized pieces. Follow-up work: when we get a recursion depth limit exception like this in transaction creation, we should log the leader of the guilty stack and then continue, rather than crash.
,
Jun 8 2016
Marked this CL (roughly the midpoint) as verified -1: https://chromium-review.googlesource.com/#/c/345128/
,
Jun 8 2016
We're fairly sure we've seen in the past large series being *automatically* broken into smaller chunks without any artificial -1. Has anything changed recently?
,
Jun 8 2016
Marc, please provide an example of such a series so we can investigate.
,
Jun 8 2016
> We're fairly sure we've seen in the past large series being *automatically* > broken into smaller chunks without any artificial -1. it's actually failing before that step when retrieving all the CLs dependencies in gerrit
,
Jun 8 2016
I think that is right, if the stack is small enough then the pre-cq knows how to break it into smaller pieces. I think this one was large enough that we hit the recursion limit in the code that just figures out what the stacks are (prior to breaking them up to smaller pieces).
,
Jun 8 2016
Not that we're planning to submit such a large series again anytime soon, but what is the current limit on the number of patches in a series such that the recursion depth limit isn't hit?
,
Jun 8 2016
Hard limit unknown. I have a CL in progress that will limit it to 150 to be safe and hopefully stay under the hard limit.
,
Jun 9 2016
I don't think Gerrit makes sense for very large series. In the chromeos-3.18 past we went around it: https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.18/drivers/gpu/drm/i915?s=f4bf2e8329725476eb
,
Jun 9 2016
> > Marc, please provide an example of such a series so we can investigate.
> if the stack is small enough then the pre-cq knows how to break it into [even] smaller pieces.
The size of these "even smaller pieces" seems to be around 50:
git log --pretty=fuller ^v4.4 cros/chromeos-4.4 | grep 'CommitDate: Thu Jun 9 05:' | wc -l
49
git log --pretty=fuller ^v4.4 cros/chromeos-4.4 | grep 'CommitDate: Thu Jun 9 09:' | wc -l
43
This seems to match older memories.
,
Jun 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d8c811db0b52d12ace8ed62d32f0120e64963e49 commit d8c811db0b52d12ace8ed62d32f0120e64963e49 Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Jun 08 19:15:04 2016 validation_pool.py: recursion depth limit for _AddChangeToPlanWithDeps BUG= chromium:618159 TEST=None Change-Id: I67fbe0c140a04b87d1a5c86c740d719b028b25d8 Reviewed-on: https://chromium-review.googlesource.com/350872 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: David James <davidjames@chromium.org> [modify] https://crrev.com/d8c811db0b52d12ace8ed62d32f0120e64963e49/cbuildbot/validation_pool.py
,
Jun 14 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/ec962d571819bc930a3fdfe8fe19e0b5041c70cc commit ec962d571819bc930a3fdfe8fe19e0b5041c70cc Author: Aviv Keshet <akeshet@chromium.org> Date: Wed Jun 08 18:01:26 2016 validation_pool.py: catch and better surface recursion depth exceptions BUG= chromium:618159 TEST=None Change-Id: I51d1e4569f9354bd3ff8f76ea778a4df7dd91678 Reviewed-on: https://chromium-review.googlesource.com/350920 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: David James <davidjames@chromium.org> [modify] https://crrev.com/ec962d571819bc930a3fdfe8fe19e0b5041c70cc/cbuildbot/validation_pool.py
,
Jun 22 2016
Pri-0 bugs are critical regressions or serious emergencies, and this bug has not been updated in three days. Could you please provide an update, or adjust the priority to a more appropriate level if applicable? If a fix is in active development, please set the status to Started. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jun 22 2016
,
Jul 1 2016
Bulk verified |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by xixuan@chromium.org
, Jun 8 2016