Early failures of autotestd on DUT leads to autoserv collecting logs from the (incorrect) last test run |
|||
Issue descriptionLifted from https://bugs.chromium.org/p/chromium/issues/detail?id=721867#c21 On this build the "graphics_dEQP" link from https://luci-milo.appspot.com/buildbot/chromeos/caroline-paladin/206 [Test-Logs]: graphics_dEQP: FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run. leads to "cheets_StartAndroid.stress" failure https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122453172-chromeos-test/chromeos2-row8-rack1-host20/cheets_StartAndroid.stress/ Retry or dashboard logic going wrong?
,
Jun 12 2017
The suite job also claims that that job_id belongs to 06/09 12:23:41.578 DEBUG| suite:1129| Scheduling graphics_dEQP.gles31.info 06/09 12:23:41.911 DEBUG| suite:1474| Adding job keyval for graphics_dEQP.gles31.info=122453172-chromeos-test
,
Jun 12 2017
Instead, we find logs for the test cheets_StartAndroid.stress.0 It's results should be in: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122453098-chromeos-test/chromeos2-row8-rack1-host20/ It's on the same DUT, and look into debug/autoserv.DEBUG, we find that pids are different, so this is not a case of incorrectly uploaded test logs from the same test twice.
,
Jun 12 2017
Oh fun, autoserv.DEBUG claims that it's running graphics_dEQP test. Also, autoserv ran ~13:08 on 06/09 but the client.DEBUG claims that it ran cheets_StartAndroid.stress at ~12:34 on 06/09 In fact, we went and collect the client results / logs from the DUT for an earlier test running cheets_StartAndroid.stress instead of graphis_dQEP.gles31
,
Jun 12 2017
- autoserv tried to launch autotestd on the DUT - failed to start (exit code 255) - autoserv tried to ping DUT / see if it's dead - autoserv found the DUT had died / rebooted - autoserv gave up - post-job tasks then collected logs from the DUT from /usr/local/autotest/results/default BUT /usr/local/autotest/results/default had never been setup properly, since we didn't get far enough to do that. As a result, we ended up collecting logs from the last test run on the DUT. Relevant log from autoserv: 06/09 12:53:24.545 DEBUG| ssh_host:0286| Running (ssh) '/usr/local/autotest/bin/autotestd_monitor /tmp/autoserv-hRB5MI 0 0' 06/09 12:53:25.000 DEBUG| site_autotest:0194| Entered autotestd_monitor. 06/09 12:53:25.001 INFO | autotest:1177| Entered autotestd_monitor. 06/09 12:53:25.001 DEBUG| site_autotest:0194| Finished launching tail subprocesses. 06/09 12:53:25.001 INFO | autotest:1177| Finished launching tail subprocesses. 06/09 12:53:25.002 DEBUG| site_autotest:0194| Finished waiting on autotestd to start. 06/09 12:53:25.002 INFO | autotest:1177| Finished waiting on autotestd to start. 06/09 12:53:25.534 DEBUG| site_autotest:0194| AUTOTEST_STATUS::START ---- ---- timestamp=1497038004 localtime=Jun 09 12:53:24 06/09 12:53:25.537 INFO | server_job:0199| START ---- ---- timestamp=1497038004 localtime=Jun 09 12:53:24 06/09 12:53:25.578 DEBUG| site_autotest:0194| AUTOTEST_STATUS:: START graphics_dEQP graphics_dEQP timestamp=1497038005 localtime=Jun 09 12:53:25 06/09 12:53:25.579 INFO | server_job:0199| START graphics_dEQP graphics_dEQP timestamp=1497038005 localtime=Jun 09 12:53:25 06/09 13:08:25.810 DEBUG| autotest:0815| Result exit status is 255. 06/09 13:08:25.811 DEBUG| utils:0203| Running 'ping chromeos2-row8-rack1-host20 -w1 -c1' 06/09 13:08:25.987 DEBUG| utils:0298| [stdout] PING chromeos2-row8-rack1-host20.cros.corp.google.com (100.115.230.141) 56(84) bytes of data. 06/09 13:08:25.988 DEBUG| utils:0298| [stdout] 64 bytes from 100.115.230.141: icmp_seq=1 ttl=59 time=0.810 ms 06/09 13:08:25.989 DEBUG| utils:0298| [stdout] 06/09 13:08:25.989 DEBUG| utils:0298| [stdout] --- chromeos2-row8-rack1-host20.cros.corp.google.com ping statistics --- 06/09 13:08:25.989 DEBUG| utils:0298| [stdout] 1 packets transmitted, 1 received, 0% packet loss, time 0ms 06/09 13:08:25.989 DEBUG| utils:0298| [stdout] rtt min/avg/max/mdev = 0.810/0.810/0.810/0.000 ms 06/09 13:08:25.992 DEBUG| ssh_host:0286| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' 06/09 13:08:26.012 INFO | abstract_ssh:0795| Master ssh connection to chromeos2-row8-rack1-host20 is down. 06/09 13:08:26.013 DEBUG| abstract_ssh:0756| Nuking master_ssh_job. 06/09 13:08:26.013 DEBUG| abstract_ssh:0762| Cleaning master_ssh_tempdir. 06/09 13:08:26.014 INFO | abstract_ssh:0809| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_OOcoyyssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row8-rack1-host20' 06/09 13:08:26.014 DEBUG| utils:0203| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_OOcoyyssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row8-rack1-host20' 06/09 13:08:27.924 DEBUG| utils:0298| [stdout] 7bac4261-36b9-40eb-9239-449dd002cd94 06/09 13:08:27.969 INFO | server_job:0199| FAIL ---- ---- timestamp=1497038907 localtime=Jun 09 13:08:27 Autotest client terminated unexpectedly: DUT rebooted during the test run. 06/09 13:08:27.970 INFO | server_job:0199| END FAIL ---- ---- timestamp=1497038907 localtime=Jun 09 13:08:27 06/09 13:08:27.971 INFO | server_job:0199| END GOOD ---- ---- timestamp=1497038907 localtime=Jun 09 13:08:27 06/09 13:08:27.971 DEBUG| autotest:0962| Autotest job finishes running. Below is the post-processing operations. ------------------------- - This will only occur when the test in question fails anyway - and only if autotestd dies very early. - This will not affect the correctness of the results. So adding it to Fixit. We have enough information to go from on this bug now.
,
Mar 31 2018
|
|||
►
Sign in to add a comment |
|||
Comment 1 by pprabhu@chromium.org
, Jun 12 2017Status: Started (was: Untriaged)