New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 796439 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 797849
Owner: ----
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

nyan_big-paladin:3884 failed

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Dec 20 2017

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of bmgordon@google.com

bvt-inline finished 163820691-chromeos-test/chromeos4-row5-rack10-host5/provision_AutoUpdate.double at 15:13.  Then it sat for 75 minutes without printing any more messages about starting tests and timed out.

Builders failed on: 
- nyan_big-paladin: 
  https://luci-milo.appspot.com/buildbot/chromeos/nyan_big-paladin/3884


 
 Issue 796438  has been merged into this issue.
Cc: pprabhu@chromium.org
Components: Infra>Client>ChromeOS
Labels: Type-Bug
OK.  I did some digging, and then some more digging.

The suite job is here:
    http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=163820593

That shows every test completing successfully, except for one aborted
job:
    http://cautotest-prod.corp.google.com/afe/#tab_id=view_job&object_id=163820700

If you look in the suite job logs, you see that this job isn't
listed as finishing, and that every other job completed by 15:12:02 at
the latest.  Between that point and suite abort at 16:31:29, there
was a whole lot of idle DUTs, and no testing.

Going to the shard, and looking in the scheduler logs, you see all
of the following entries:
12/19 15:02:38.441 INFO |  scheduler_models:0588| HQE: 164207534, for job: 163820700 and host: chromeos4-row5-rack9-host21 has status:Starting [active] -> Running
12/19 15:02:38.452 INFO |         rdb_hosts:0222| Host chromeos4-row5-rack9-host21 in Pending updating {'status': 'Running'} through rdb on behalf of: HQE: 164207534, for job: 163820700 
12/19 15:02:38.458 INFO |         rdb_hosts:0222| Host chromeos4-row5-rack9-host21 in Running updating {'dirty': 1} through rdb on behalf of: HQE: 164207534, for job: 163820700 
12/19 15:03:12.504 INFO |  scheduler_models:0588| HQE: 164207534, for job: 163820700 and host: chromeos4-row5-rack9-host21 has status:Running [active] -> Gathering
12/19 15:03:12.508 INFO |         rdb_hosts:0222| Host chromeos4-row5-rack9-host21 in Running updating {'status': 'Running'} through rdb on behalf of: HQE: 164207534, for job: 163820700 
12/19 15:03:13.021 INFO |  scheduler_models:0588| HQE: 164207534, for job: 163820700 and host: chromeos4-row5-rack9-host21 has status:Gathering [active] -> Parsing
12/19 15:03:13.027 INFO |         rdb_hosts:0222| Host chromeos4-row5-rack9-host21 in Running updating {'status': 'Ready'} through rdb on behalf of: HQE: 164207534, for job: 163820700 

So... The job ran.  To completion.  At first blush, without incident.

And the coup de grace, on the shard, you can see this:
    chromeos-test@chromeos-server130:/usr/local/autotest/logs$ ls -d /usr/local/autotest/results/163820700*
    /usr/local/autotest/results/163820700-chromeos-test

My top theory is that the job wasn't properly transferred from the
shard to the master.  But I am only an egg...
Cc: bmgordon@chromium.org
Mergedinto: 797849
Status: Duplicate (was: Available)

Sign in to add a comment