Suite fails with "Exception waiting for results"
Reported by
jrbarnette@chromium.org,
Dec 19 2017
|
|||
Issue description
The following paladin run failed:
https://luci-milo.appspot.com/buildbot/chromeos/lumpy-paladin/30534
Below is a summary of the key characteristics of the failure:
* On the build page, the error is reported like this:
[Test-Logs]: bvt-inline: FAIL: Exception waiting for results
* Following the link labeled "Link to suite", you find a suite
that shows no test failures. For the case above, this is the
suite:
http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=163499548
* Click the link labeled "All logs", and then follow to "debug/autoserv.DEBUG",
and search for "Exception waiting". That will turn up a traceback
like with the message "Error decoding JSON reponse" (sic).
* The cited JSON object is truncated.
,
May 16 2018
Not sure how relevant this still is. In any case, the AFE is going away so this problem will go away
,
Aug 29
|
|||
►
Sign in to add a comment |
|||
Comment 1 by jrbarnette@chromium.org
, Dec 19 2017Labels: OS-Chrome
The full log entry for the traceback looks like this: 12/18 13:30:09.306 ERROR| suite:1283| Exception waiting for results Traceback (most recent call last): File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1272, in wait jobs = self._afe.get_jobs(parent_job_id=self._suite_job_id) File "/usr/local/autotest/server/frontend.py", line 606, in get_jobs jobs_data = self.run('get_jobs', **dargs) File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 131, in run self, call, **dargs) File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 244, in GenericRetry return _run() File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 177, in _Wrapper ret = func(*args, **kwargs) File "/usr/local/autotest/site-packages/chromite/lib/retry_util.py", line 243, in _run return functor(*args, **kwargs) File "/usr/local/autotest/server/cros/dynamic_suite/frontend_wrappers.py", line 94, in _run return super(RetryingAFE, self).run(call, **dargs) File "/usr/local/autotest/server/frontend.py", line 108, in run result = utils.strip_unicode(rpc_call(**dargs)) File "/usr/local/autotest/frontend/afe/json_rpc/proxy.py", line 141, in __call__ raise JSONRPCException('Error decoding JSON reponse:\n' + respdata) JSONRPCException: Error decoding JSON reponse: The exception indicates the following sequence of events: * The master scheduler creates the suite job, and assigns it to a drone. * The suite begins running on the drone, and creates all test jobs. * Prior to beginning to wait for the created jobs to complete, the suite code calls back to the AFE to get the complete list of jobs assigned to the suite. * The JSON response to the get_jobs() RPC is truncated. Historically, JSON truncation of this sort has happened because the RPC server was overloaded, and closed the connection prematurely. Probably, that's what happens in this case.