New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 731339 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

TKO job record often misses several important columns

Project Member Reported by nya@chromium.org, Jun 8 2017

Issue description

Entries in tko_jobs often miss important columns like afe_parent_job_id, board and suite when a job fails early. Below are some example entries.

http://cautotest/afe/#tab_id=view_job&object_id=112840318
http://cautotest/afe/#tab_id=view_job&object_id=122218265
http://cautotest/afe/#tab_id=view_job&object_id=122126034

mysql> select * from tko_jobs where job_idx in (select job_idx from (select job_idx, label, afe_parent_job_id from tko_jobs order by job_idx desc limit 100) x where label = '' and afe_parent_job_id is null) \G
*************************** 1. row ***************************
          job_idx: 112840425
              tag: 122080685-chromeos-test/chromeos4-row4-rack13-host13
            label:
         username: chromeos-test
      machine_idx: 4168
      queued_time: 2017-06-07 20:20:36
     started_time: 2017-06-08 11:02:01
    finished_time: 2017-06-08 11:05:04
       afe_job_id: 122080685
afe_parent_job_id: NULL
            build: NULL
    build_version: NULL
            suite: NULL
            board: NULL
*************************** 2. row ***************************
          job_idx: 112840409
              tag: 122218265-abodeti/chromeos2-row3-rack9-host1
            label:
         username: chromeos-test
      machine_idx: 2929
      queued_time: 2017-06-08 10:54:10
     started_time: 2017-06-08 10:54:58
    finished_time: 2017-06-08 11:04:44
       afe_job_id: 122218265
afe_parent_job_id: NULL
            build: NULL
    build_version: NULL
            suite: NULL
            board: NULL
*************************** 3. row ***************************
          job_idx: 112840392
              tag: 122126034-chromeos-test/chromeos2-row8-rack10-host11
            label:
         username: chromeos-test
      machine_idx: 6836
      queued_time: 2017-06-07 23:57:02
     started_time: 2017-06-08 11:01:21
    finished_time: 2017-06-08 11:04:28
       afe_job_id: 122126034
afe_parent_job_id: NULL
            build: NULL
    build_version: NULL
            suite: NULL
            board: NULL
 
Labels: -Restrict-View-Google
Nothing RVG here.

Comment 3 by nya@chromium.org, Jun 9 2017

Looks like it happens when provision failed? I'm not yet sure how provision is treated in autotests, but it's likely test runs were aborted before dynamic_suite writes keyvals.
Whoever ends up doing this, let's make sure we require good automated testing (unittests + push-to-prod post-test validation) to protect this.

Comment 5 by aut...@google.com, Jun 12 2017

Owner: nya@chromium.org
assigning to nya@, we can review as needed 

Comment 6 by nya@chromium.org, Jun 13 2017

I will try, but provisioning is an area I don't have much knowledge of, so if anybody from infra team could also investigate I'm very happy.

Status: Assigned (was: Untriaged)
I'll help out anywhere you're stuck / need help / to review. But if you can drive this, it'll work best, since you know exactly why / what we need for this. (Also, you're most likely to have the attention span for this since you depend on it)

Comment 8 by nya@chromium.org, Jun 16 2017

My motivation for this issue is building BigQuery tables ( Issue 733103 ) and now I'm feeling like I can exclude this kind of broken entries. This issue happens when provision fails, so it is not actually a failure of the test itself but the test did not run. I think it makes sense to exclude entries which never ran.


That said, I think this issue is caused because TKO parser runs against provisioning logs, not test logs, when provisioning fails.

For example, these two test jobs were triggered by the same suite job. Former one succeeded, but latter one failed on provisioning and its TKO entry is missing most fields.
A: http://cautotest/afe/#tab_id=view_job&object_id=123638738
B: http://cautotest/afe/#tab_id=view_job&object_id=123638741

Here are logs of those jobs:
Aj: http://cautotest/tko/retrieve_logs.cgi?job=/results/123638738-chromeos-test
Bj: http://cautotest/tko/retrieve_logs.cgi?job=/results/123638741-chromeos-test

And, here are provisioning logs of those jobs:
Ap: http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row2-rack14-host9/537194-provision/
Bp: http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row8-rack9-host9/537330-provision/

Interestingly, (Bj) is mostly the same as (Bp), while (Aj) is completely different from (Ap). From this observation, I guess provisioning logs are copied as job logs when provisioning failed (this is a guess not based on code).

Assuming my guess is correct, this is why TKO entry for (B) is missing fields. In B case, TKO parser runs against provisioning logs so it can't parse build/board/version/suite info from job label and afe_parent_job_id from keyvals.

Comment 9 by nya@chromium.org, Nov 14 2017

Labels: -Pri-2 Pri-3
Owner: ----
Status: Available (was: Assigned)
We're working around this issue by joining AFE/TKO jobs. Releasing for now.

Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you.

Sign in to add a comment