New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 836305 link

Starred by 3 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

unclean unmounts lead to EXT4 lockups

Project Member Reported by briannorris@chromium.org, Apr 24 2018

Issue description

OS: scarlet-factory/R65-10211.19.0

Forked from https://issuetracker.google.com/76121632

Reboot tests done on the factory toolkit previously didn't cleanly unmount the stateful partition, which led to various sorts of filesystem corruption (expected). What's unexpected is that we also saw the system occasionally lock up during shutdown, usually with plenty of EXT4 / block device code in the blame list. We could never reproduce this on a proper test image (without the factory packages); I suspect this is because we do a better job of cleanly unmounting there.

Full console-ramoops attached, but here's a snippet:

[  240.115707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.115720] umount          D ffffffc000213030     0  2618   2605 0x00400008
[  240.115745] Call trace:
[  240.115771] [<ffffffc000213030>] __switch_to+0x9c/0xa8
[  240.115792] [<ffffffc00092dc9c>] __schedule+0x3cc/0x840
[  240.115810] [<ffffffc00092d86c>] schedule+0x4c/0xb0
[  240.115826] [<ffffffc000930924>] schedule_timeout+0x44/0x5a0
[  240.115844] [<ffffffc00092eff4>] do_wait_for_common+0xcc/0x16c
[  240.115862] [<ffffffc00092ed2c>] wait_for_common+0x58/0x78
[  240.115879] [<ffffffc00092ecc8>] wait_for_completion+0x24/0x30 
[  240.115896] [<ffffffc000233464>] flush_work+0x15c/0x1a0
[  240.115914] [<ffffffc00030dd2c>] lru_add_drain_all+0x138/0x184 
[  240.115933] [<ffffffc000392024>] invalidate_bdev+0x2c/0x48
[  240.115952] [<ffffffc0003da59c>] ext4_put_super+0x1e8/0x278
[  240.115970] [<ffffffc0003699c8>] generic_shutdown_super+0x6c/0xd8
[  240.115986] [<ffffffc00036a9c4>] kill_block_super+0x2c/0x70
[  240.116002] [<ffffffc0003697fc>] deactivate_locked_super+0x58/0x84
[  240.116018] [<ffffffc0003698a0>] deactivate_super+0x38/0x44
[  240.116035] [<ffffffc00037fccc>] cleanup_mnt+0x40/0x78
[  240.116052] [<ffffffc00037fc30>] __cleanup_mnt+0x1c/0x28
[  240.116071] [<ffffffc0002daec4>] task_work_run+0x88/0xd0
[  240.116090] [<ffffffc000208dd8>] do_notify_resume+0x530/0x56c
[  240.116106] [<ffffffc000202d28>] work_pending+0x1c/0x20



Testers saw another repro case on a similar image, but this time it also put some RK3399 graphics code in the list of blocked tasks, as well as plenty of the same filesystem tasks:

[  240.115541] INFO: task rockchip_drm_at:163 blocked for more than 120 seconds.
[  240.115577]       Not tainted 4.4.111-12566-g462126335459c-dirty #1
[  240.115590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.115602] rockchip_drm_at D ffffffc000213030     0   163      2 0x00000000
[  240.115629] Call trace:
[  240.115660] [<ffffffc000213030>] __switch_to+0x9c/0xa8
[  240.115681] [<ffffffc00092dc9c>] __schedule+0x3cc/0x840
[  240.115699] [<ffffffc00092d86c>] schedule+0x4c/0xb0
[  240.115716] [<ffffffc000930924>] schedule_timeout+0x44/0x5a0
[  240.115737] [<ffffffc000615098>] dma_fence_default_wait+0x128/0x214
[  240.115755] [<ffffffc000614cb4>] dma_fence_wait_timeout+0xb8/0x158
[  240.115777] [<ffffffc0004f3678>] rockchip_atomic_commit_complete+0x78/0x4a0
[  240.115798] [<ffffffc0005a5f60>] rockchip_drm_atomic_work+0x1c/0x28
[  240.115818] [<ffffffc0002db064>] kthread_worker_fn+0xe8/0x1b8
[  240.115837] [<ffffffc00023a5d8>] kthread+0xe0/0xf0
[  240.115855] [<ffffffc000202dd0>] ret_from_fork+0x10/0x40
...
[  240.116717] umount          D ffffffc000213030     0  2634   2621 0x00400008
[  240.116790] Call trace:
[  240.116830] [<ffffffc000213030>] __switch_to+0x9c/0xa8
[  240.116867] [<ffffffc00092dc9c>] __schedule+0x3cc/0x840
[  240.116889] [<ffffffc00092d86c>] schedule+0x4c/0xb0
[  240.116904] [<ffffffc000930924>] schedule_timeout+0x44/0x5a0
[  240.116923] [<ffffffc00092eff4>] do_wait_for_common+0xcc/0x16c
[  240.116940] [<ffffffc00092ed2c>] wait_for_common+0x58/0x78
[  240.116958] [<ffffffc00092ecc8>] wait_for_completion+0x24/0x30
[  240.116974] [<ffffffc000233464>] flush_work+0x15c/0x1a0
[  240.116993] [<ffffffc00030dd2c>] lru_add_drain_all+0x138/0x184
[  240.117012] [<ffffffc000392024>] invalidate_bdev+0x2c/0x48
[  240.117032] [<ffffffc0003da59c>] ext4_put_super+0x1e8/0x278
[  240.117050] [<ffffffc0003699c8>] generic_shutdown_super+0x6c/0xd8
[  240.117066] [<ffffffc00036a9c4>] kill_block_super+0x2c/0x70
[  240.117082] [<ffffffc0003697fc>] deactivate_locked_super+0x58/0x84
[  240.117098] [<ffffffc0003698a0>] deactivate_super+0x38/0x44
[  240.117116] [<ffffffc00037fccc>] cleanup_mnt+0x40/0x78
[  240.117132] [<ffffffc00037fc30>] __cleanup_mnt+0x1c/0x28
[  240.117150] [<ffffffc0002daec4>] task_work_run+0x88/0xd0
[  240.117170] [<ffffffc000208dd8>] do_notify_resume+0x530/0x56c
[  240.117186] [<ffffffc000202d28>] work_pending+0x1c/0x20
...
 
console-ramoops-0.1
52.5 KB Download
console-ramoops-0.2
53.1 KB Download
Labels: Kernel-4.4
Cc: sarthakkukreti@chromium.org
It's been a while since this issue was fresh in my mind...but wasn't this determined not to be completely an EXT4 issue? It was another bug (RTC driver I think?), and the locked up task happened to sit on the same queues that EXT4 did.

I think mka@ looked into this, but IIUC he might have found that this stuff got split into different queues in later kernels, so this wouldn't be as much of a problem? I'm mostly working off memory. You might find more at https://issuetracker.google.com/76121632.

Anyway, this might not be something worth spending a lot of time on if the above is true, unless we're still seeing a lot of new similar failure modes.

Sign in to add a comment