elm: reboot during StartAndroid stress test |
|||
Issue descriptionA CQ run for elm failed with an unexpected reboot during the cheets_StartAndroid.stress test: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941193971303237376 Not sure if this is a problem introduced by one of the CLs or a spurious failure. At a first glance I didn't notice any suspicious CL (elm runs with a v3.18 kernel) and I recall occasional reboots during this test from earlier sheriff shifts.
,
Jul 16
,
Jul 16
elm-paladin has passed 26 runs in a row, so this is probably a flakey test case. Or, based on the "unexpected rboot" error, this was an intermittent kernel crash on elm that just happened to bite this test run. https://stainless.corp.google.com/browse/chromeos-autotest-results/216510557-chromeos-test/ The log showed the reboot happened sometime after one of the restart ui iterations: 07/12 10:13:44.263 INFO | browser:0207| Closing browser (pid=22180) ... 07/12 10:13:44.266 INFO | cros_interface:0575| (Re)starting the ui (logs the user out) 07/12 10:13:44.268 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 07/12 10:13:44.277 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 07/12 10:13:44.284 DEBUG| global_hooks:0056| ['sh', '-c', 'status ui'] 07/12 10:13:44.299 DEBUG| cros_interface:0454| IsServiceRunning(ui)->True 07/12 10:13:44.299 DEBUG| cros_interface:0058| sh -c restart ui 07/12 10:13:44.300 DEBUG| global_hooks:0056| ['sh', '-c', 'restart ui'] ���������������������������������������������������� The previous iterations looked like: 07/12 10:10:18.299 DEBUG| global_hooks:0056| ['sh', '-c', 'status ui'] 07/12 10:10:18.314 DEBUG| cros_interface:0454| IsServiceRunning(ui)->True 07/12 10:10:18.314 DEBUG| cros_interface:0058| sh -c restart ui 07/12 10:10:18.315 DEBUG| global_hooks:0056| ['sh', '-c', 'restart ui'] 07/12 10:10:20.370 DEBUG| cros_interface:0067| > stdout=[ui start/running, process 8010 ], stderr=[] 07/12 10:10:20.371 DEBUG| cros_interface:0058| sh -c cryptohome-path user 'test@test.test' 07/12 10:10:20.371 DEBUG| global_hooks:0056| ['sh', '-c', "cryptohome-path user 'test@test.test'"] 07/12 10:10:20.392 DEBUG| cros_interface:0067| > stdout=[/home/user/d7c8fbb19197956c83e86116580a285574878281 ], stderr=[] 07/12 10:10:20.392 DEBUG| cros_interface:0058| sh -c /bin/df --output=source,target /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281 07/12 10:10:20.393 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281'] 07/12 10:10:20.402 DEBUG| cros_interface:0067| > stdout=[], stderr=[df: /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281: No such file or directory ] However, the console-ramoops in stainless shows a normal reboot, not a kernel crash. So probably the ramoops is actually from before this test, and the actual crash here is not recorded with the test result: ramoops shows a normal shutdown: [ 21.811876] init: arc-kmsg-logger main process (2463) killed by TERM signal [ 25.603799] mtk-afe-pcm 11220000.audio-controller: mtk_afe_dais_trigger DL1 cmd=0 [ 102.306546] EXT4-fs (mmcblk0p5): mounted filesystem without journal. Opts: (null) [ 144.740762] init: shill main process (1528) terminated with status 143 [ 144.800159] Bluetooth: hci_core.c:skip_conditional_cmd() COND LE cmd (0x200a) is already 0 (chg 0), skip transition to 0 [ 144.800170] Bluetooth: hci_core.c:skip_conditional_cmd() COND call queue_work. [ 144.802230] init: daisydog main process (948) terminated with status 1 [ 144.802681] init: chapsd main process (1391) killed by PIPE signal [ 144.803058] init: powerd main process (1303) killed by TERM signal [ 144.803560] init: crash-sender main process (1773) killed by TERM signal [ 144.803784] init: apk-cache-cleaner main process (1774) killed by TERM signal [ 144.804026] init: log-rotate main process (1798) killed by TERM signal [ 144.805174] init: anomaly-collector main process (2177) killed by TERM signal [ 144.805417] init: cros-machine-id-regen-periodic main process (2208) killed by TERM signal [ 144.805667] EXT4-fs (dm-0): re-mounted. Opts: [ 144.805888] init: activate_date main process (2485) killed by TERM signal [ 144.829012] init: cryptohomed-client main process (3305) terminated with status 1 [ 144.840188] init: cryptohomed main process (1468) killed by TERM signal [ 144.871010] init: cras main process (1848) terminated with status 143 [ 144.907933] init: recover_duts main process (2037) killed by TERM signal [ 145.190841] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required. [ 146.681368] reboot: Restarting system I'm tempted to just WontFix this one unless it re-occurs.
,
Jul 16
For the record: It's not elm specific, it's a recurring issue across different platforms: https://bugs.chromium.org/p/chromium/issues/detail?id=863539 |
|||
►
Sign in to add a comment |
|||
Comment 1 by jettrink@chromium.org
, Jul 16