enterprise_CFM_USBPeripheralHotplugDetect.c920 / c930 - Detecting crashes on M-62 and M-63 |
|||||||||||||||||
Issue descriptionChromeOS build: R62-9888.0.0 Test: enterprise_CFM_USBPeripheralHotplugDetect.c920 Failure Reason: Following peripherals do not match: Speaker. No of Crashes: 25. Crashes: New Warning or Crash Detected before plugging in usb peripherals., New Warning or Crash detected after device enrolled into CFM. Logs are not yet available in google cloud but I did manage to get the debug logs from tko link via sponge/ https://paste.googleplex.com/5743448221024256 There are bunch of crashes during the test run along with peripheral mismatch. dsunkara@ - can you please extract the crash data from the device/logs and post it in this bug. Thanks
,
Aug 29 2017
Number of crashes across all boards is up drastically.. https://chromeperf.appspot.com/report?sid=ffcaedf3f246ab9738d5cbe76df56b44a093db6563d9ef19c52cacd210d7f8f3
,
Aug 29 2017
These crashes are generated when device is rebooted. 1. recover DUT with 9888.0.0 2. reboot. 3. check /var/spool/crash folder for these crash files. 4. clear tpm and reboot. 5. More crash files added to /var/spool/crash folder. All crash files collected could be found below: https://pantheon.corp.google.com/storage/browser/chromiumos-test-logs/bugfiles/cr/760236/
,
Sep 8 2017
dsunkara@ - are there still failures for device mismatch? if not, lets change the title of this bug to make it more focused on number of crashes.
,
Sep 8 2017
Hi Harpreet, This test is not failing anymore because of device mismatch. There are still crashes though on M-62.
,
Sep 16 2017
enterprise_CFM_USBPeripheralHotplugDetect.c920 / c930 test fails on guado on 9/15 with ChromeOS R62-9901.16.0. https://wmatrix.googleplex.com/testrun_log/unfiltered?test_ids=555424043 log: stderr: /bin/bash: minidump_stackwalk: command not found, 'Command returned non-zero exit status') 09/15 02:09:41.624 INFO | site_crashcollect:0111| Generating stack trace using devserver for /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020450.5790.dmp 09/15 02:09:41.624 DEBUG| dev_server:2558| Get the least loaded <class 'autotest_lib.client.common_lib.cros.dev_server.CrashServer'> 09/15 02:09:41.624 DEBUG| dev_server:0678| Pick one healthy devserver from ['http://100.108.1.139:8082', 'http://100.108.1.152:8082', 'http://100.108.1.59:8082', 'http://100.109.175.143:8082', 'http://100.109.175.145:8082', 'http://172.24.190.211:8082', 'http://172.25.61.81:8082', 'http://172.25.65.207:8082', 'http://172.25.66.216:8082'] 09/15 02:09:41.624 DEBUG| dev_server:0682| Check health for http://100.108.1.139:8082 09/15 02:09:41.996 DEBUG| dev_server:0687| Pick http://100.108.1.139:8082 09/15 02:09:42.094 WARNI| retry:0228| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>) 09/15 02:09:42.095 WARNI| retry:0183| Retrying in 1.709657 seconds... 09/15 02:09:43.810 WARNI| retry:0228| <class 'urllib2.URLError'>(<urlopen error [Errno 111] Connection refused>) 09/15 02:09:43.810 ERROR| dev_server:0434| Devserver call failed: "http://172.25.65.207:8082/check_health?", timeout: 2.0 seconds, Error: <class 'urllib2.URLError'>: <urlopen error [Errno 111] Connection refused> 09/15 02:09:44.061 DEBUG| dev_server:0300| RPC call get_load has timed out on devserver 172.24.190.211. 09/15 02:09:44.062 ERROR| dev_server:0434| Devserver call failed: "http://172.24.190.211:8082/check_health?", timeout: 2.0 seconds, Error: retry exception (label="get_load"), timeout = 2s 09/15 02:09:44.077 DEBUG| dev_server:0300| RPC call get_load has timed out on devserver 172.25.61.81. 09/15 02:09:44.078 ERROR| dev_server:0434| Devserver call failed: "http://172.25.61.81:8082/check_health?", timeout: 2.0 seconds, Error: retry exception (label="get_load"), timeout = 2s 09/15 02:09:44.092 INFO | connectionpool:0207| Starting new HTTP connection (1): 100.108.1.59 09/15 02:10:07.480 DEBUG| connectionpool:0387| "POST /symbolicate_dump?archive_url=gs://chromeos-image-archive/guado-release/R62-9901.16.0 HTTP/1.1" 200 71729 09/15 02:10:07.537 INFO | site_crashcollect:0124| Generated stack trace for dump /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020450.5790.dmp 09/15 02:10:07.546 INFO | server_job:0214| INFO ---- ---- timestamp=1505466607 localtime=Sep 15 02:10:07 Start crashcollection record 09/15 02:10:07.546 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/crashinfo.chromeos1-row4-rack5-host3/chrome.20170915.020547.7270.dmp 09/15 02:10:07.546 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020351.3910.dmp 09/15 02:10:07.546 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020309.3755.dmp 09/15 02:10:07.547 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020304.664.dmp 09/15 02:10:07.547 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020412.4684.dmp 09/15 02:10:07.547 INFO | server_job:0214| INFO ---- New Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /usr/local/autotest/results/142198393-chromeos-test/sysinfo/var/spool/crash/chrome.20170915.020450.5790.dmp 09/15 02:10:07.547 INFO | server_job:0214| INFO ---- Orphaned Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /var/spool/crash/chrome.20170915.020547.7270.dmp 09/15 02:10:07.547 INFO | server_job:0214| INFO ---- Orphaned Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /var/spool/crash/chrome.20170915.020547.7270.meta 09/15 02:10:07.548 INFO | server_job:0214| INFO ---- Orphaned Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /var/spool/crash/chrome.20170915.020547.7270.core 09/15 02:10:07.548 INFO | server_job:0214| INFO ---- Orphaned Crash Dump timestamp=1505466607 localtime=Sep 15 02:10:07 /var/spool/crash/chrome.20170915.020547.7270.log 09/15 02:10:07.548 INFO | server_job:0214| INFO ---- ---- timestamp=1505466607 localtime=Sep 15 02:10:07 End crashcollection record 09/15 02:10:07.548 WARNI| site_crashcollect:0405| Crash detection failed with: [Errno 2] No such file or directory: '/var/spool/crash/chrome.20170915.020547.7270.meta' 09/15 02:10:07.549 WARNI| site_crashcollect:0405| Crash detection failed with: [Errno 2] No such file or directory: '/var/spool/crash/chrome.20170915.020547.7270.meta' 09/15 02:10:07.549 WARNI| site_crashcollect:0405| Crash detection failed with: [Errno 2] No such file or directory: '/var/spool/crash/chrome.20170915.020547.7270.meta' 09/15 02:10:07.549 WARNI| site_crashcollect:0405| Crash detection failed with: [Errno 2] No such file or directory: '/var/spool/crash/chrome.20170915.020547.7270.meta'
,
Sep 18 2017
Regarding #6. that test failure is mainly because of flakiness on skipping oobe after enrollment. Created a separate bug for it crbug.com/766366 This bug is for increase in number of crashes. kalin@ - who is the right owner for this to look at why there is such an increase in crashes in M62?
,
Oct 17 2017
Number of crashes seems to have subsided to ~ 2-3 during the test run down from ~30 at its peak in earlier M62 builds. https://chromeperf.appspot.com/report?sid=ffcaedf3f246ab9738d5cbe76df56b44a093db6563d9ef19c52cacd210d7f8f3 2-3 crashes are still too many compared to the past when there use to be 0 crashes for most run's. Still might be worth looking into.
,
Oct 18 2017
,
Oct 18 2017
Can you reproduce this with a device on a recovery image? If so, lest get these crashes uploaded to crash server, and see if there are already bugs assigned. What is the chrome crash failing thread stacktrace?
,
Oct 18 2017
Dinesh, please give it a try on a recovery image. For lab runs, chrome crash dumps should also be stored as part of test logs. See if you can find some and upload those to fireplace.
,
Oct 20 2017
Hi Harpreet, I had a look at the crashes reported in the lab and are mostly related to "service-failure" sig=ffe03e3a-imageloader exec_name=service-failure ver=10009.0.0 payload=/var/spool/crash/service_failure.20171011.095817.0.log payload_size=5574 done=1 We have also reported similar crashes from manual test here@ https://bugs.chromium.org/p/chromium/issues/detail?id=773760
,
Nov 18 2017
enterprise_CFM_USBPeripheralHotplugDetect.c920 report crashes like below in M63. https://wmatrix.googleplex.com/failures/unfiltered?suites=hotrod&tests=enterprise_CFM_USBPeripheralHotplugDetect.c920&days_back=7&hide_missing=True 11/17 07:50:11.964 INFO | crash_detector:0049| CRASH DETECTED in /var/spool/crash: service_failure.20171117.074823.0.meta 11/17 07:50:11.965 INFO | crash_detector:0049| CRASH DETECTED in /var/spool/crash: service_failure.20171117.074743.0.meta
,
Nov 18 2017
,
Nov 20 2017
Hi katie, Can you please help find an owner for this issue?
,
Nov 20 2017
,
Nov 21 2017
Same type of failure happens to 9901.77.1.
,
Nov 21 2017
Same type of failure happens to 9901.77.1 on Bluestreak setup.
,
Dec 20 2017
Same type of failure happens to R64-10176.22.0 on almost all CFMs platforms. mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.4/debug/test_that.INFO -A 3 12/20 14:10:37.419 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:10:37.478 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:10:37.479 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Dec 11 17:25 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121011.0.meta', '-rw-r--r-- 1 root root 6770 Dec 20 12:10 service_failure.20171220.121011.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:10:37.479 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.22/debug/test_that.INFO -A 3 12/20 14:10:35.462 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:10:35.522 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:10:35.522 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Dec 11 16:49 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121004.0.meta', '-rw-r--r-- 1 root root 6528 Dec 20 12:10 service_failure.20171220.121004.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:10:35.522 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.3/debug/test_that.INFO -A 3 12/20 14:10:41.623 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:10:41.697 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:10:41.698 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Oct 20 00:25 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121012.0.meta', '-rw-r--r-- 1 root root 6792 Dec 20 12:10 service_failure.20171220.121012.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:10:41.698 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.13/debug/test_that.INFO -A 3 12/20 14:10:49.664 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:10:49.728 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:10:49.729 INFO | test_runner_utils:0199| autoserv| ---['total 28', 'drwxr-xr-x 5 root root 4096 Dec 19 14:21 ..', '-rw-r--r-- 1 root root 199 Dec 20 12:09 os-release', '-rw-r--r-- 1 root root 768 Dec 20 12:09 lsb-release', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121020.0.meta', '-rw-r--r-- 1 root root 6940 Dec 20 12:10 service_failure.20171220.121020.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:10:49.729 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.11/debug/test_that.INFO -A 3 12/20 14:11:03.482 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:11:03.540 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:11:03.541 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Nov 1 15:40 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121035.0.meta', '-rw-r--r-- 1 root root 6805 Dec 20 12:10 service_failure.20171220.121035.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:11:03.541 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.12/debug/test_that.INFO -A 3 12/20 14:10:48.446 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:10:48.502 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:10:48.503 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Nov 21 15:15 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121015.0.meta', '-rw-r--r-- 1 root root 6794 Dec 20 12:10 service_failure.20171220.121015.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:10:48.503 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.26/debug/test_that.INFO -A 3 12/20 14:11:24.922 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:11:24.992 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:11:24.993 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Nov 22 15:43 ..', '-rw-r--r-- 1 root root 6838 Dec 20 12:10 service_failure.20171220.121031.0.log', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121031.0.meta', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .'] 12/20 14:11:24.993 INFO | test_runner_utils:0199| autoserv| mzhuo@minzhuo2:~/chromiumos/chroot/tmp/R64-10176.22.0$ grep "Start_verification:check_process_crash" 100.107.146.24/debug/test_that.INFO -A 3 12/20 14:11:02.072 INFO | test_runner_utils:0199| autoserv| Start_verification:check_process_crash 12/20 14:11:02.141 INFO | test_runner_utils:0199| autoserv| ---ls -latr /var/spool/crash 12/20 14:11:02.142 INFO | test_runner_utils:0199| autoserv| ---['total 20', 'drwxr-xr-x 5 root root 4096 Dec 12 15:14 ..', '-rw-r--r-- 1 root root 147 Dec 20 12:10 service_failure.20171220.121029.0.meta', '-rw-r--r-- 1 root root 7211 Dec 20 12:10 service_failure.20171220.121029.0.log', 'drwxr-xr-t 2 root root 4096 Dec 20 12:10 .']
,
Jan 7 2018
with R63-10032.79.0 test enterprise_CFM_USBPeripheralHotplugDetect.c920 and enterprise_CFM_USBPeripheralHotplugDetect.c920 reports the same crash for Zako, Rikku and tricky. Link: https://stainless.corp.google.com/search?view=list&first_date=20180102&last_date=20180105&suite=%5Ehotrod%24&status=FAIL&status=ERROR&status=ABORT&reason=New+Warning+or+Crash+Detected+before&exclude_cts=true&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true
,
Jan 7 2018
Same type of crash happens to Guado in our lab testing at high frequency too with R63, R64 and R65. With this crash no noticeable impact is observed. However, during longevity run either we bypass crashing detection, or script is aborted due to this crash.
,
Jan 11 2018
,
Jan 12 2018
,
Jan 16 2018
,
Jan 18 2018
,
Feb 6 2018
Is this user facing at all? Or, are the tests in this suite realistic? (i.e. resemble anything we might see in the wild) If we want to hold CfM devices to higher standards similar to the ones in this suite, we should: 1. enable most/all debug flags in kernel and cros 2. run these tests on those "special" devices, so it's easier to pinpoint where errors originate from and potentially be able to issue fixes. Thoughts?
,
Mar 14 2018
,
Apr 22 2018
I'm closing this due to age and IIUC, no symptoms except a crash file. |
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by harpreet@chromium.org
, Aug 29 2017