New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 671681 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 614719



Sign in to add a comment

falco-chrome-pfq failing HWTests due to DUT rebooted during the test run

Project Member Reported by afakhry@chromium.org, Dec 6 2016

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/falco-chrome-pfq/builds/5049

See comment #7 here: https://bugs.chromium.org/p/chromium/issues/detail?id=614719#c7 for details.

All 196 test cases completed successfully on the DUT before it rebooted. 

Looking at eventlog.txt, there's a 'SUS Power Fail' followed by a 'System Reset' at the time the autoserv reported that the DUT had rebooted. In fact, there are multiple system resets, one of them is 'Thermal'. Is that DUT suffering from over-heat/power issues?

206 | 2016-12-06 03:51:06 | EC Event | Thermal
207 | 2016-12-06 03:57:28 | Kernel Event | Clean Shutdown
208 | 2016-12-06 03:57:29 | ACPI Enter | S5
209 | 2016-12-06 03:58:50 | System boot | 384
210 | 2016-12-06 03:58:50 | EC Event | Lid Open
211 | 2016-12-06 03:58:50 | SUS Power Fail
212 | 2016-12-06 03:58:50 | System Reset
213 | 2016-12-06 03:58:50 | ACPI Wake | S5
214 | 2016-12-06 03:58:50 | Wake Source | Power Button | 0
215 | 2016-12-06 03:58:50 | Wake Source | PCI Express | 0
216 | 2016-12-06 04:08:53 | Kernel Event | Clean Shutdown
217 | 2016-12-06 04:08:54 | ACPI Enter | S5
218 | 2016-12-06 04:10:57 | System boot | 385
219 | 2016-12-06 04:10:57 | EC Event | Lid Open
220 | 2016-12-06 04:10:57 | SUS Power Fail
221 | 2016-12-06 04:10:57 | System Reset
222 | 2016-12-06 04:10:57 | ACPI Wake | S5
223 | 2016-12-06 04:10:57 | Wake Source | Power Button | 0
224 | 2016-12-06 04:10:57 | Wake Source | PCI Express | 0


 
Cc: aaboagye@chromium.org
Aseda would you kindly help with this?  Or maybe help finding someone who may be familiar with EC thermal events?

In any case, offhand I don't see anything strange here except possibly the thermal event. System reset is (I believe) a normal part of reboot.

Ahmed, have you looked at the console-ramoops log for the DUT?  It's possible that the reboot is from a kernel panic.

Unfortunately it doesn't look like it was a clean panic because there is no console-ramoops.  It could be a reset of some sort.


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/89269908-chromeos-test/chromeos2-row3-rack1-host13/debug/


12/06 04:23:52.709 ERROR|   logging_manager:0626| STATUS: END FAIL	----	----	timestamp=1481027031	localtime=Dec 06 04:23:51
12/06 04:23:52.710 ERROR|   logging_manager:0626| parsing test graphics_dEQP graphics_dEQP
12/06 04:23:52.711 ERROR|   logging_manager:0626| ADD: FAIL
12/06 04:23:52.711 ERROR|   logging_manager:0626| Subdir: graphics_dEQP
12/06 04:23:52.712 ERROR|   logging_manager:0626| Testname: graphics_dEQP
12/06 04:23:52.712 ERROR|   logging_manager:0626| Autotest client terminated unexpectedly: DUT rebooted during the test run.
12/06 04:23:52.713 ERROR|   logging_manager:0626| 
12/06 04:23:52.714 ERROR|   logging_manager:0626| STATUS: END GOOD	----	----	timestamp=1481027031	localtime=Dec 06 04:23:51
12/06 04:23:52.714 ERROR|   logging_manager:0626| parsing test None CLIENT_JOB.0
12/06 04:23:52.715 ERROR|   logging_manager:0626| ADD: GOOD
12/06 04:23:52.716 ERROR|   logging_manager:0626| Subdir: None
12/06 04:23:52.716 ERROR|   logging_manager:0626| Testname: CLIENT_JOB.0
12/06 04:23:52.717 ERROR|   logging_manager:0626| 
12/06 04:23:52.718 ERROR|   logging_manager:0626| parsing test ---- SERVER_JOB
12/06 04:23:52.818 INFO |      crashcollect:0154| Collecting crash information...
12/06 04:23:52.819 DEBUG|          ssh_host:0177| Running (ssh) 'python -c 'import cPickle, glob, sys;cPickle.dump(glob.glob(sys.argv[1]), sys.stdout, 0)''
12/06 04:23:52.838 INFO |      abstract_ssh:0766| Starting master ssh connection '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row3-rack1-host13'
12/06 04:23:52.839 DEBUG|        base_utils:0185| Running '/usr/bin/ssh -a -x   -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos2-row3-rack1-host13'
12/06 04:23:53.960 INFO | site_crashcollect:0212| There are no orphaned crashes; deleting /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13
12/06 04:23:53.973 DEBUG|          ssh_host:0177| Running (ssh) 'true'
12/06 04:23:53.977 DEBUG|          monitors:0191| ts_mon: sending 11 metrics to PubSub
12/06 04:23:54.199 INFO |      crashcollect:0206| chromeos2-row3-rack1-host13 already up, collecting crash info
12/06 04:23:54.201 INFO |      crashcollect:0068| Collecting /var/tmp/messages.autotest_start...
12/06 04:23:54.202 DEBUG|          ssh_host:0177| Running (ssh) 'ls -ld /var/tmp/messages.autotest_start | cut -d" " -f5'
12/06 04:23:54.435 DEBUG|        base_utils:0280| [stdout] 1105825
12/06 04:23:54.438 DEBUG|      abstract_ssh:0337| get_file. source: /var/tmp/messages.autotest_start, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.at_start, delete_dest: False,preserve_perm: False, preserve_symlinks:False
12/06 04:23:54.438 DEBUG|          ssh_host:0177| Running (ssh) 'rsync --version'
12/06 04:23:54.670 DEBUG|      abstract_ssh:0348| Using Rsync.
12/06 04:23:54.672 DEBUG|        base_utils:0185| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/tmp/messages.autotest_start" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.at_start"'
12/06 04:23:55.510 INFO |      crashcollect:0068| Collecting /var/log/messages...
12/06 04:23:55.511 DEBUG|          ssh_host:0177| Running (ssh) 'ls -ld /var/log/messages | cut -d" " -f5'
12/06 04:23:55.740 DEBUG|        base_utils:0280| [stdout] 1234553
12/06 04:23:55.742 DEBUG|      abstract_ssh:0337| get_file. source: /var/log/messages, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.raw, delete_dest: False,preserve_perm: False, preserve_symlinks:False
12/06 04:23:55.743 DEBUG|      abstract_ssh:0348| Using Rsync.
12/06 04:23:55.743 DEBUG|        base_utils:0185| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/log/messages" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/messages.raw"'
12/06 04:23:56.306 INFO |      crashcollect:0254| Collecting 'dmesg' ...
12/06 04:23:56.307 DEBUG|          ssh_host:0177| Running (ssh) 'dmesg'
12/06 04:23:56.541 INFO |      crashcollect:0068| Collecting /var/log...
12/06 04:23:56.541 DEBUG|          ssh_host:0177| Running (ssh) 'ls -ld /var/log | cut -d" " -f5'
12/06 04:23:56.783 DEBUG|        base_utils:0280| [stdout] 4096
12/06 04:23:56.786 DEBUG|      abstract_ssh:0337| get_file. source: /var/log, dest: /usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/var, delete_dest: False,preserve_perm: False, preserve_symlinks:False
12/06 04:23:56.786 DEBUG|      abstract_ssh:0348| Using Rsync.
12/06 04:23:56.787 DEBUG|        base_utils:0185| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_fHPgf9ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos2-row3-rack1-host13:"/var/log" "/usr/local/autotest/results/89269908-chromeos-test/chromeos2-row3-rack1-host13/crashinfo.chromeos2-row3-rack1-host13/var"'
12/06 04:23:57.490 INFO |      crashcollect:0068| Collecting /dev/pstore/console-ramoops...
12/06 04:23:57.491 DEBUG|          ssh_host:0177| Running (ssh) 'ls -ld /dev/pstore/console-ramoops | cut -d" " -f5'
12/06 04:23:57.747 ERROR|        base_utils:0280| [stderr] ls: cannot access /dev/pstore/console-ramoops: No such file or directory
I think the "Thermal" event just indicates that it couldn't read the temperature sensors for some reason. (it's more of a warning[0]) If there were an overheating issue and the sensors were valid, I would expect the "Thermal Overload" event.

[0] - https://chromium.googlesource.com/chromiumos/platform/ec/+/c77d95fa20027566a75c60a4465099ce8b68becc/common/thermal.c#135

Components: -Infra>Client>ChromeOS OS>Kernel
Not an infra issue, changing components
Another instance on peach_pit a while ago: https://bugs.chromium.org/p/chromium/issues/detail?id=614719#c9
Status: WontFix (was: Untriaged)
Old

Sign in to add a comment