New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 863152 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jul 16
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

elm: reboot during StartAndroid stress test

Project Member Reported by mka@chromium.org, Jul 12

Issue description

A CQ run for elm failed with an unexpected reboot during the cheets_StartAndroid.stress test:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941193971303237376

Not sure if this is a problem introduced by one of the CLs or a spurious failure. At a first glance I didn't notice any suspicious CL (elm runs with a v3.18 kernel) and I recall occasional reboots during this test from earlier sheriff shifts.
 
Cc: rrangel@chromium.org pmalani@chromium.org
Copying current sheriffs
Cc: drinkcat@chromium.org oak-img@chromium.org
Status: WontFix (was: Unconfirmed)
elm-paladin has passed 26 runs in a row, so this is probably a flakey test case.   Or, based on the "unexpected rboot" error, this was an intermittent kernel crash on elm that just happened to bite this test run. 

https://stainless.corp.google.com/browse/chromeos-autotest-results/216510557-chromeos-test/
The log showed the reboot happened sometime after one of the restart ui iterations:

07/12 10:13:44.263 INFO |           browser:0207| Closing browser (pid=22180) ...
07/12 10:13:44.266 INFO |    cros_interface:0575| (Re)starting the ui (logs the user out)
07/12 10:13:44.268 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
07/12 10:13:44.277 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
07/12 10:13:44.284 DEBUG|      global_hooks:0056| ['sh', '-c', 'status ui']
07/12 10:13:44.299 DEBUG|    cros_interface:0454| IsServiceRunning(ui)->True
07/12 10:13:44.299 DEBUG|    cros_interface:0058| sh -c restart ui 
07/12 10:13:44.300 DEBUG|      global_hooks:0056| ['sh', '-c', 'restart ui']
����������������������������������������������������

The previous iterations looked like: 
07/12 10:10:18.299 DEBUG|      global_hooks:0056| ['sh', '-c', 'status ui']
07/12 10:10:18.314 DEBUG|    cros_interface:0454| IsServiceRunning(ui)->True
07/12 10:10:18.314 DEBUG|    cros_interface:0058| sh -c restart ui 
07/12 10:10:18.315 DEBUG|      global_hooks:0056| ['sh', '-c', 'restart ui']
07/12 10:10:20.370 DEBUG|    cros_interface:0067|  > stdout=[ui start/running, process 8010
], stderr=[]
07/12 10:10:20.371 DEBUG|    cros_interface:0058| sh -c cryptohome-path user 'test@test.test' 
07/12 10:10:20.371 DEBUG|      global_hooks:0056| ['sh', '-c', "cryptohome-path user 'test@test.test'"]
07/12 10:10:20.392 DEBUG|    cros_interface:0067|  > stdout=[/home/user/d7c8fbb19197956c83e86116580a285574878281
], stderr=[]
07/12 10:10:20.392 DEBUG|    cros_interface:0058| sh -c /bin/df --output=source,target /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281 
07/12 10:10:20.393 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281']
07/12 10:10:20.402 DEBUG|    cros_interface:0067|  > stdout=[], stderr=[df: /run/cryptohome/ephemeral_mount/d7c8fbb19197956c83e86116580a285574878281: No such file or directory
]

However, the console-ramoops in stainless shows a normal reboot, not a kernel crash.  So probably the ramoops is actually from before this test, and the actual crash here is not recorded with the test result:
ramoops shows a normal shutdown:
[   21.811876] init: arc-kmsg-logger main process (2463) killed by TERM signal
[   25.603799] mtk-afe-pcm 11220000.audio-controller: mtk_afe_dais_trigger DL1 cmd=0
[  102.306546] EXT4-fs (mmcblk0p5): mounted filesystem without journal. Opts: (null)
[  144.740762] init: shill main process (1528) terminated with status 143
[  144.800159] Bluetooth: hci_core.c:skip_conditional_cmd()   COND LE cmd (0x200a) is already 0 (chg 0), skip transition to 0
[  144.800170] Bluetooth: hci_core.c:skip_conditional_cmd()   COND call queue_work.
[  144.802230] init: daisydog main process (948) terminated with status 1
[  144.802681] init: chapsd main process (1391) killed by PIPE signal
[  144.803058] init: powerd main process (1303) killed by TERM signal
[  144.803560] init: crash-sender main process (1773) killed by TERM signal
[  144.803784] init: apk-cache-cleaner main process (1774) killed by TERM signal
[  144.804026] init: log-rotate main process (1798) killed by TERM signal
[  144.805174] init: anomaly-collector main process (2177) killed by TERM signal
[  144.805417] init: cros-machine-id-regen-periodic main process (2208) killed by TERM signal
[  144.805667] EXT4-fs (dm-0): re-mounted. Opts: 
[  144.805888] init: activate_date main process (2485) killed by TERM signal
[  144.829012] init: cryptohomed-client main process (3305) terminated with status 1
[  144.840188] init: cryptohomed main process (1468) killed by TERM signal
[  144.871010] init: cras main process (1848) terminated with status 143
[  144.907933] init: recover_duts main process (2037) killed by TERM signal
[  145.190841] Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required.
[  146.681368] reboot: Restarting system


I'm tempted to just WontFix this one unless it re-occurs.
For the record: It's not elm specific, it's a recurring issue across different platforms:

https://bugs.chromium.org/p/chromium/issues/detail?id=863539

Sign in to add a comment