New issue
Advanced search Search tips

Issue 796252 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 29
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Suite fails with "Exception waiting for results"

Reported by jrbarnette@chromium.org, Dec 19 2017

Issue description

The following paladin run failed:
    https://luci-milo.appspot.com/buildbot/chromeos/lumpy-paladin/30534

Below is a summary of the key characteristics of the failure:
  * On the build page, the error is reported like this:
        [Test-Logs]: bvt-inline: FAIL: Exception waiting for results
  * Following the link labeled "Link to suite", you find a suite
    that shows no test failures.  For the case above, this is the
    suite:
        http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=163499548
  * Click the link labeled "All logs", and then follow to "debug/autoserv.DEBUG",
    and search for "Exception waiting".  That will turn up a traceback
    like with the message "Error decoding JSON reponse" (sic).
  * The cited JSON object is truncated.

 
Components: Infra>Client>ChromeOS
Labels: OS-Chrome
The full log entry for the traceback looks like this:

12/18 13:30:09.306 ERROR|             suite:1283| Exception waiting for results
Traceback (most recent call last):
  File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1272, in wait
    jobs = self._afe.get_jobs(parent_job_id=self._suite_job_id)
  File "/usr/local/autotest/server/frontend.py", line 606, in get_jobs
    jobs_data = self.run('get_jobs', **dargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run
    self, call, **dargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry
    return _run()
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 177, in _Wrapper
    ret = func(*args, **kwargs)
  File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 243, in _run
    return functor(*args, **kwargs)
  File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run
    return super(RetryingAFE, self).run(call, **dargs)
  File "/usr/local/autotest/server/frontend.py", line 108, in run
    result = utils.strip_unicode(rpc_call(**dargs))
  File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 141, in __call__
    raise JSONRPCException('Error decoding JSON reponse:\n' + respdata)
JSONRPCException: Error decoding JSON reponse:

The exception indicates the following sequence of events:
  * The master scheduler creates the suite job, and assigns it to a
    drone.
  * The suite begins running on the drone, and creates all test jobs.
  * Prior to beginning to wait for the created jobs to complete,
    the suite code calls back to the AFE to get the complete list of
    jobs assigned to the suite.
  * The JSON response to the get_jobs() RPC is truncated.

Historically, JSON truncation of this sort has happened because the RPC
server was overloaded, and closed the connection prematurely.  Probably,
that's what happens in this case.

Owner: xixuan@chromium.org
Status: Unconfirmed (was: Untriaged)
Not sure how relevant this still is.  In any case, the AFE is going away so this problem will go away
Status: WontFix (was: Unconfirmed)

Sign in to add a comment