some autoserv.DEBUG log timestamps seem wrong |
||
Issue descriptionHow can this happen? https://uberchromegw.corp.google.com/i/chromeos/builders/ninja-release/builds/866 https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/101766331-chromeos-test/chromeos4-row3-rack9-host2/debug/ file autoserv.DEBUG: 02/17 06:23:52.879 DEBUG| ssh_host:0284| Running (ssh) 'test ! -e /var/log/messages || cp -f /var/log/messages /var/tmp/messages.autotest_start' 02/17 06:23:52.951 INFO | abstract_ssh:0809| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_ajPK9Gssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row3-rack9-host2' 02/17 06:23:52.951 DEBUG| base_utils:0185| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_ajPK9Gssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row3-rack9-host2' 02/17 06:23:58.014 INFO | abstract_ssh:0824| Timed out waiting for master-ssh connection to be established. 02/17 06:25:01.341 ERROR| base_utils:0280| [stderr] ssh: connect to host chromeos4-row3-rack9-host2 port 22: Connection timed out 02/17 06:25:01.343 INFO | remote:0073| Failed to copy /var/log/messages at startup: ('ssh timed out', * Command: /usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_ajPK9Gssh- master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row3-rack9-host2 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::create_target_machine|create_host|job_start] -> ssh_run(test ! -e /var/log/messages || cp -f /var/log/messages /var/tmp/messages.autotest_start)\";fi; test ! -e /var/log/messages || cp -f /var/log/messages /var/tmp/messages.autotest_start" Exit status: 255 Duration: 63.2917709351 There is a log entry at 06:23:51 about starting the master ssh connection, then about 6 seconds later at 06:23:58 another entry about the timeout. That's too soon, we would expect the delay to be at least 30 seconds. The next log line has a more reasonable time stamp.
,
Mar 2 2017
I can't explain this behavior, and at first blush, my estimate
is that the ratio of "likely value of a correct answer" divided
by "time required to find the correct answer" is too small to
justify an investigation.
Assuming the right response isn't "WontFix", what do we hope to
gain by resolving this?
As a practical matter, the logs for the DUT at the time of the failure
show this:
2017-02-17 06:51:49 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack9-host2/433453-provision/
2017-02-17 06:28:22 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack9-host2/433415-repair/
2017-02-17 06:23:48 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack9-host2/433401-provision/
2017-02-17 06:04:09 -- http://cautotest/tko/retrieve_logs.cgi?job=/results/101765844-chromeos-test/
2017-02-17 05:53:45 OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row3-rack9-host2/433338-provision/
Basically, the failure here happened because the DUT actually went
offline, and it was fixed by disconnecting, the reconnecting AC power
(ninja devices have no battery).
,
Mar 2 2017
Yeah we can close this, but I still need to understand the timeout logic.
,
Mar 9 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by shuqianz@chromium.org
, Feb 21 2017