TKO job record often misses several important columns |
||||
Issue descriptionEntries in tko_jobs often miss important columns like afe_parent_job_id, board and suite when a job fails early. Below are some example entries. http://cautotest/afe/#tab_id=view_job&object_id=112840318 http://cautotest/afe/#tab_id=view_job&object_id=122218265 http://cautotest/afe/#tab_id=view_job&object_id=122126034 mysql> select * from tko_jobs where job_idx in (select job_idx from (select job_idx, label, afe_parent_job_id from tko_jobs order by job_idx desc limit 100) x where label = '' and afe_parent_job_id is null) \G *************************** 1. row *************************** job_idx: 112840425 tag: 122080685-chromeos-test/chromeos4-row4-rack13-host13 label: username: chromeos-test machine_idx: 4168 queued_time: 2017-06-07 20:20:36 started_time: 2017-06-08 11:02:01 finished_time: 2017-06-08 11:05:04 afe_job_id: 122080685 afe_parent_job_id: NULL build: NULL build_version: NULL suite: NULL board: NULL *************************** 2. row *************************** job_idx: 112840409 tag: 122218265-abodeti/chromeos2-row3-rack9-host1 label: username: chromeos-test machine_idx: 2929 queued_time: 2017-06-08 10:54:10 started_time: 2017-06-08 10:54:58 finished_time: 2017-06-08 11:04:44 afe_job_id: 122218265 afe_parent_job_id: NULL build: NULL build_version: NULL suite: NULL board: NULL *************************** 3. row *************************** job_idx: 112840392 tag: 122126034-chromeos-test/chromeos2-row8-rack10-host11 label: username: chromeos-test machine_idx: 6836 queued_time: 2017-06-07 23:57:02 started_time: 2017-06-08 11:01:21 finished_time: 2017-06-08 11:04:28 afe_job_id: 122126034 afe_parent_job_id: NULL build: NULL build_version: NULL suite: NULL board: NULL
,
Jun 9 2017
Sorry the first link was wrong. Correctly: http://cautotest/afe/#tab_id=view_job&object_id=122080685 http://cautotest/afe/#tab_id=view_job&object_id=122218265 http://cautotest/afe/#tab_id=view_job&object_id=122126034
,
Jun 9 2017
Looks like it happens when provision failed? I'm not yet sure how provision is treated in autotests, but it's likely test runs were aborted before dynamic_suite writes keyvals.
,
Jun 9 2017
Whoever ends up doing this, let's make sure we require good automated testing (unittests + push-to-prod post-test validation) to protect this.
,
Jun 12 2017
assigning to nya@, we can review as needed
,
Jun 13 2017
I will try, but provisioning is an area I don't have much knowledge of, so if anybody from infra team could also investigate I'm very happy.
,
Jun 13 2017
I'll help out anywhere you're stuck / need help / to review. But if you can drive this, it'll work best, since you know exactly why / what we need for this. (Also, you're most likely to have the attention span for this since you depend on it)
,
Jun 16 2017
My motivation for this issue is building BigQuery tables ( Issue 733103 ) and now I'm feeling like I can exclude this kind of broken entries. This issue happens when provision fails, so it is not actually a failure of the test itself but the test did not run. I think it makes sense to exclude entries which never ran. That said, I think this issue is caused because TKO parser runs against provisioning logs, not test logs, when provisioning fails. For example, these two test jobs were triggered by the same suite job. Former one succeeded, but latter one failed on provisioning and its TKO entry is missing most fields. A: http://cautotest/afe/#tab_id=view_job&object_id=123638738 B: http://cautotest/afe/#tab_id=view_job&object_id=123638741 Here are logs of those jobs: Aj: http://cautotest/tko/retrieve_logs.cgi?job=/results/123638738-chromeos-test Bj: http://cautotest/tko/retrieve_logs.cgi?job=/results/123638741-chromeos-test And, here are provisioning logs of those jobs: Ap: http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack14-host9/537194-provision/ Bp: http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack9-host9/537330-provision/ Interestingly, (Bj) is mostly the same as (Bp), while (Aj) is completely different from (Ap). From this observation, I guess provisioning logs are copied as job logs when provisioning failed (this is a guess not based on code). Assuming my guess is correct, this is why TKO entry for (B) is missing fields. In B case, TKO parser runs against provisioning logs so it can't parse build/board/version/suite info from job label and afe_parent_job_id from keyvals.
,
Nov 14 2017
We're working around this issue by joining AFE/TKO jobs. Releasing for now.
,
Jun 8 2018
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you. |
||||
►
Sign in to add a comment |
||||
Comment 1 by pprabhu@chromium.org
, Jun 8 2017