falco-chrome-pfq failing HWTests due to DUT rebooted during the test run |
|||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5049 See comment #7 here: https://bugs.chromium.org/p/chromium/issues/detail?id=614719#c7 for details. All 196 test cases completed successfully on the DUT before it rebooted. Looking at eventlog.txt, there's a 'SUS Power Fail' followed by a 'System Reset' at the time the autoserv reported that the DUT had rebooted. In fact, there are multiple system resets, one of them is 'Thermal'. Is that DUT suffering from over-heat/power issues? 206 | 2016-12-06 03:51:06 | EC Event | Thermal 207 | 2016-12-06 03:57:28 | Kernel Event | Clean Shutdown 208 | 2016-12-06 03:57:29 | ACPI Enter | S5 209 | 2016-12-06 03:58:50 | System boot | 384 210 | 2016-12-06 03:58:50 | EC Event | Lid Open 211 | 2016-12-06 03:58:50 | SUS Power Fail 212 | 2016-12-06 03:58:50 | System Reset 213 | 2016-12-06 03:58:50 | ACPI Wake | S5 214 | 2016-12-06 03:58:50 | Wake Source | Power Button | 0 215 | 2016-12-06 03:58:50 | Wake Source | PCI Express | 0 216 | 2016-12-06 04:08:53 | Kernel Event | Clean Shutdown 217 | 2016-12-06 04:08:54 | ACPI Enter | S5 218 | 2016-12-06 04:10:57 | System boot | 385 219 | 2016-12-06 04:10:57 | EC Event | Lid Open 220 | 2016-12-06 04:10:57 | SUS Power Fail 221 | 2016-12-06 04:10:57 | System Reset 222 | 2016-12-06 04:10:57 | ACPI Wake | S5 223 | 2016-12-06 04:10:57 | Wake Source | Power Button | 0 224 | 2016-12-06 04:10:57 | Wake Source | PCI Express | 0
,
Dec 6 2016
Unfortunately it doesn't look like it was a clean panic because there is no console-ramoops. It could be a reset of some sort. https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/89269908-chromeos-test/chromeos2-row3-rack1-host13/debug/ 12/06 04:23:52.709 ERROR| logging_manager:0626| STATUS: END FAIL ---- ---- timestamp=1481027031 localtime=Dec 06 04:23:51 12/06 04:23:52.710 ERROR| logging_manager:0626| parsing test graphics_dEQP graphics_dEQP 12/06 04:23:52.711 ERROR| logging_manager:0626| ADD: FAIL 12/06 04:23:52.711 ERROR| logging_manager:0626| Subdir: graphics_dEQP 12/06 04:23:52.712 ERROR| logging_manager:0626| Testname: graphics_dEQP 12/06 04:23:52.712 ERROR| logging_manager:0626| Autotest client terminated unexpectedly: DUT rebooted during the test run. 12/06 04:23:52.713 ERROR| logging_manager:0626| 12/06 04:23:52.714 ERROR| logging_manager:0626| STATUS: END GOOD ---- ---- timestamp=1481027031 localtime=Dec 06 04:23:51 12/06 04:23:52.714 ERROR| logging_manager:0626| parsing test None CLIENT_JOB.0 12/06 04:23:52.715 ERROR| logging_manager:0626| ADD: GOOD 12/06 04:23:52.716 ERROR| logging_manager:0626| Subdir: None 12/06 04:23:52.716 ERROR| logging_manager:0626| Testname: CLIENT_JOB.0 12/06 04:23:52.717 ERROR| logging_manager:0626| 12/06 04:23:52.718 ERROR| logging_manager:0626| parsing test ---- SERVER_JOB 12/06 04:23:52.818 INFO | crashcollect:0154| Collecting crash information... 12/06 04:23:52.819 DEBUG| ssh_host:0177| Running (ssh) 'python -c 'import cPickle, glob, sys;cPickle.dump(glob.glob(sys.argv[1]), sys.stdout, 0)'' 12/06 04:23:52.838 INFO | abstract_ssh:0766| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row3-rack1-host13' 12/06 04:23:52.839 DEBUG| base_utils:0185| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row3-rack1-host13' 12/06 04:23:53.960 INFO | site_crashcollect:0212| There are no orphaned crashes; deleting /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13 12/06 04:23:53.973 DEBUG| ssh_host:0177| Running (ssh) 'true' 12/06 04:23:53.977 DEBUG| monitors:0191| ts_mon: sending 11 metrics to PubSub 12/06 04:23:54.199 INFO | crashcollect:0206| chromeos2-row3-rack1-host13 already up, collecting crash info 12/06 04:23:54.201 INFO | crashcollect:0068| Collecting /var/tmp/messages.autotest_start... 12/06 04:23:54.202 DEBUG| ssh_host:0177| Running (ssh) 'ls -ld /var/tmp/messages.autotest_start | cut -d" " -f5' 12/06 04:23:54.435 DEBUG| base_utils:0280| [stdout] 1105825 12/06 04:23:54.438 DEBUG| abstract_ssh:0337| get_file. source: /var/tmp/messages.autotest_start, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.at_start, delete_dest: False,preserve_perm: False, preserve_symlinks:False 12/06 04:23:54.438 DEBUG| ssh_host:0177| Running (ssh) 'rsync --version' 12/06 04:23:54.670 DEBUG| abstract_ssh:0348| Using Rsync. 12/06 04:23:54.672 DEBUG| base_utils:0185| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/tmp/messages.autotest_start" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.at_start"' 12/06 04:23:55.510 INFO | crashcollect:0068| Collecting /var/log/messages... 12/06 04:23:55.511 DEBUG| ssh_host:0177| Running (ssh) 'ls -ld /var/log/messages | cut -d" " -f5' 12/06 04:23:55.740 DEBUG| base_utils:0280| [stdout] 1234553 12/06 04:23:55.742 DEBUG| abstract_ssh:0337| get_file. source: /var/log/messages, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.raw, delete_dest: False,preserve_perm: False, preserve_symlinks:False 12/06 04:23:55.743 DEBUG| abstract_ssh:0348| Using Rsync. 12/06 04:23:55.743 DEBUG| base_utils:0185| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/log/messages" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.raw"' 12/06 04:23:56.306 INFO | crashcollect:0254| Collecting 'dmesg' ... 12/06 04:23:56.307 DEBUG| ssh_host:0177| Running (ssh) 'dmesg' 12/06 04:23:56.541 INFO | crashcollect:0068| Collecting /var/log... 12/06 04:23:56.541 DEBUG| ssh_host:0177| Running (ssh) 'ls -ld /var/log | cut -d" " -f5' 12/06 04:23:56.783 DEBUG| base_utils:0280| [stdout] 4096 12/06 04:23:56.786 DEBUG| abstract_ssh:0337| get_file. source: /var/log, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/var, delete_dest: False,preserve_perm: False, preserve_symlinks:False 12/06 04:23:56.786 DEBUG| abstract_ssh:0348| Using Rsync. 12/06 04:23:56.787 DEBUG| base_utils:0185| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/log" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/var"' 12/06 04:23:57.490 INFO | crashcollect:0068| Collecting /dev/pstore/console-ramoops... 12/06 04:23:57.491 DEBUG| ssh_host:0177| Running (ssh) 'ls -ld /dev/pstore/console-ramoops | cut -d" " -f5' 12/06 04:23:57.747 ERROR| base_utils:0280| [stderr] ls: cannot access /dev/pstore/console-ramoops: No such file or directory
,
Dec 6 2016
I think the "Thermal" event just indicates that it couldn't read the temperature sensors for some reason. (it's more of a warning[0]) If there were an overheating issue and the sensors were valid, I would expect the "Thermal Overload" event. [0] - https://chromium.googlesource.com/chromiumos/platform/ec/+/c77d95fa20027566a75c60a4465099ce8b68becc/common/thermal.c#135
,
Dec 7 2016
Not an infra issue, changing components
,
Dec 8 2016
Another instance on peach_pit a while ago: https://bugs.chromium.org/p/chromium/issues/detail?id=614719#c9
,
Jan 20 2017
Old |
|||
►
Sign in to add a comment |
|||
Comment 1 by semenzato@chromium.org
, Dec 6 2016