New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 870507 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 17
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Daisy test job is not finishing after test failing, having device being disconnected, and lots of mosys crashes collected

Project Member Reported by ka...@chromium.org, Aug 2

Issue description

Host: chromeos15-row13a-rack1-host1
Host page: http://cros-full-0024.mtv.corp.google.com/afe/#tab_id=view_job&object_id=223241142
Job page: http://cros-full-0024.mtv.corp.google.com/afe/#tab_id=view_job&object_id=223034515


More than one hour after the device finished the test with failure, the test job is still hanging on ssh retries like:
-------------------------------------------
stderr:
ssh: connect to host chromeos15-row13a-rack1-host1 port 22: Connection timed out)
08/02 16:15:58.164 DEBUG|          ssh_host:0301| Running (ssh) 'portageq owners / mosys| sed -e "s/^[^\t].*/@@@ & @@@/" -e "s/^\t//"| tr \\n \\0 | xargs -0 -r stat -L -c "%a %A %n" 2>&1' from 'get_crashdumps|get_site_crashdumps|report_bug_from_crash|find_package_of|run|run_very_slowly'
08/02 16:17:01.416 DEBUG|             utils:0286| [stderr] ssh: connect to host chromeos15-row13a-rack1-host1 port 22: Connection timed out
08/02 16:17:01.417 WARNI| site_crashcollect:0405| Crash detection failed with: ('ssh timed out', * Command: 
    /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -l root -p 22 chromeos15-row13a-rack1-host1 "export LIBC_FATAL_STDERR_=1;
    if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::report_bug_from_crash|find_package_of|run] ->
    ssh_run(portageq owners / mosys| sed -e \\\"s/^[^\\\\t].*/@@@ & @@@/\\\"
    -e \\\"s/^\\\\t//\\\"| tr \\\\\\\\n \\\\\\\\0 | xargs -0 -r stat -L -c
    \\\"%a %A %n\\\" 2>&1)\";fi; portageq owners / mosys| sed -e
    \"s/^[^\\t].*/@@@ & @@@/\" -e \"s/^\\t//\"| tr \\\\n \\\\0 | xargs -0 -r
    stat -L -c \"%a %A %n\" 2>&1"
Exit status: 255
Duration: 63.2359609604
-------------------------------------------

Huge number of mosys crashes are observed. Logs are at http://cros-full-0024.mtv.corp.google.com/results/223034515-chromeos-test/chromeos15-row13a-rack1-host1/debug/autoserv.DEBUG

Looks like provisioning job after the failed test is having problems too - http://cros-full-0024.mtv.corp.google.com/afe/#tab_id=view_job&object_id=223271495
 
Cc: sontis@chromium.org pgangishetty@chromium.org matthewjoseph@chromium.org
Components: -Internals>Network>Connectivity
Cc: -harpreet@chromium.org jkop@chromium.org
Components: -Blink>Infra Infra>Client>ChromeOS
Owner: jrbarnette@chromium.org
Fixed the wrong infra component.

Also, host page is http://cros-full-0024.mtv.corp.google.com/afe/#tab_id=view_host&object_id=8617

Even currenly running job logs at
http://cros-full-0024.mtv.corp.google.com/results/223600227-chromeos-test/chromeos15-row13a-rack1-host1/debug/autoserv.DEBUG
show the problem of daisy DUT being disconnected, but test does not exit.

Should autotest timeout and start repairjob with the host?

Regarding the disconnect:
- One possibility is the mosys crash to cause it 
- Could be a bad device too

Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>Test
Labels: Hotlist-Deputy
Owner: jkop@chromium.org
Passing Hotlist-Deputy to this week's deputy.

Owner: xixuan@chromium.org
Passing on Hotlist-Deputy
Status: WontFix (was: Untriaged)
The host is back to ready. Mark it as wontfix.

Sign in to add a comment