New issue
Advanced search Search tips

Issue 734690 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: May 2018
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 736393



Sign in to add a comment

Job ran for 12+ hours on chromeos2-row8-rack1-host3

Project Member Reported by pprabhu@chromium.org, Jun 19 2017

Issue description

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/124102529-chromeos-test/chromeos2-row8-rack1-host3/debug/

The autoserv for the server side test ran from 06/18 16:37:41.300 to 06/19 05:13:43.091 when it was finally aborted (12+ hours)
The client.DEBUG logs run from 06/18 17:01:51.345 to 06/18 17:02:46.223

[Component is currently a placeholder]

Setting myself as owner for FIR.
 
There are repeated instances of rsync hanging for ~3 hours so:

06/18 17:44:02.905 DEBUG|      abstract_ssh:0357| Using Rsync.
06/18 17:44:02.906 DEBUG|             utils:0203| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_ejBJsMssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row8-rack1-host3:"/usr/local/autotest/results/default/" "/usr/local/autotest/results/124102529-chromeos-test/chromeos2-row8-rack1-host3"'
06/18 20:27:01.238 INFO |      crashcollect:0071| Collecting /var/log...
06/18 20:27:01.240 DEBUG|          ssh_host:0286| Running (ssh) 'ls -ld /var/log | cut -d" " -f5'
That DUT (chromeos2-row8-rack1-host3) does not show a history of pathological failures of this type.
OTOH, that DUT does have a history of unexpected reset jobs against it: https://bugs.chromium.org/p/chromium/issues/detail?id=734701#c4
> [ ... ] unexpected reset jobs against it [ ... ]

Reset tasks run in between every test job; they're normal.
I don't know under what circumstances reset would be "unexpected"...

> The autoserv for the server side test ran from 06/18 16:37:41.300
> to 06/19 05:13:43.091 when it was finally aborted (12+ hours)

This doesn't indicate a problem DUT, it indicates a problem with the
scheduler.  The job had a 90 minute timeout, so it should have
terminated well before the 12 hour mark.

This is the problem job:
    https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=124102529

Blockedon: 736393
Status: Archived (was: Untriaged)

Sign in to add a comment