New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 712464 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jun 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Retry failure - tried to retry after a successful retry

Reported by jrbarnette@chromium.org, Apr 18 2017

Issue description

This slave paladin run failed:
    https://luci-milo.appspot.com/buildbot/chromeos/cyan-paladin/2277

It produced this suite job:
    http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=112754039

The suite job logs show this:
====
04/14 18:26:15.619 DEBUG|             suite:1095| Scheduling cheets_SELinuxTest, to retry afe job 112754081
04/14 18:26:15.945 DEBUG|             suite:1136| Job 112757619 created to retry job 112754081. Have retried for 1 time(s)
[ ... ]
04/14 18:26:15.948 DEBUG|             suite:1095| Scheduling cheets_SELinuxTest, to retry afe job 112754081
04/14 18:26:16.332 ERROR|             suite:1281| Exception waiting for results
Traceback (most recent call last):
  File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1278, in wait
    bug_template=bug_template)
  File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1306, in _record_result
    retry_for=result.id, ignore_errors=True)
  File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1131, in _schedule_test
    old_job_id=retry_for, new_job_id=job.id)
  File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 155, in add_retry
    old_job_id)
ValueError: We have already retried or attempted to retry job 112754081
====

So, job 112754081 failed, we scheduled job 112757619 to retry it, and that
job passed.  The we tried to retry 112754081 again, and an internal check
stopped us.

So far, I think it's happened just this once, but I'm still looking.

 

Comment 1 by nxia@chromium.org, Apr 20 2017

looks like retry run passed, but the result wasn't handled properly


https://luci-milo.appspot.com/buildbot/chromeos/cyan-paladin/2318

Comment 2 by aut...@google.com, May 16 2017

Do we need more investigation on this? 
Owner: dshi@chromium.org

Comment 4 by dshi@chromium.org, Jun 1 2017

Status: WontFix (was: Available)
This is more or less expected behavior. The suite job gets the test job status with query like:
select * from tko_test_view_2 where job_tag like "112754081-%%"\G;

The code path is like:
results_generator = job_status.wait_for_child_results(
                        self._afe, self._tko, self._suite_job_id)
which calls 
job_status._yield_job_results
https://cs.corp.google.com/android/external/autotest/server/cros/dynamic_suite/job_status.py?q=wait_for_child_results&sq=package:%5E(android)$+file:(/%7C%5E)&dr=CSs&l=175

which calls
statuses = tko.get_job_test_statuses_from_db(job.id)
which runs the query here:
https://cs.corp.google.com/android/external/autotest/server/frontend.py?rcl=8f0c7263db293872c2d6c31e22f12864450864d4&l=145

basically, the tko query returns 3 records: server job, client job and test
The test passed, but server job and client job failed. Each failed job_status will lead to a retry, that's why there is a second attempt to retry the job 112754081.

The suite job uses a _retry_map to keeps record, so 2 failed job_status for the same job won't lead to 2 new test jobs. Instead, only 1 test job is created.

Sign in to add a comment