CleanupChroot hanging. |
||||||||||||
Issue descriptionI'm seeing some examples of the cbuildbot_launch stage CleanupChroot hanging or failing from time to time. An example: https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8943096145619074112 This is causing otherwise good builds to be reported as failures.
,
Jun 21 2018
,
Jun 21 2018
,
Jun 22 2018
,
Jun 22 2018
I'll be tomorrow's Bobby, so.... will probably need to start working this.
,
Jun 22 2018
,
Jun 22 2018
These are kernel panics. From the builder than ran this build: https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8943092094950437792 [56529.786894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.583503] INFO: task jbd2/dm-9-8:12760 blocked for more than 120 seconds. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.590640] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:33 swarm-cros-5 kernel: [56529.596422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604389] jbd2/dm-9-8 D ffff88336f373b00 0 12760 2 0x00000000 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604394] ffff882286b11b28 0000000000000046 ffff883303586000 0000000000013b00 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604398] ffff882286b11fd8 0000000000013b00 ffff883303586000 ffff88336f374398 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604401] ffff88343ffa70e8 0000000000000002 ffffffff81155d00 ffff882286b11ba0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604404] Call Trace: Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604412] [<ffffffff81155d00>] ? wait_on_page_read+0x60/0x60 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604422] [<ffffffff8173bcbd>] io_schedule+0x9d/0x130 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604427] [<ffffffff81155d0e>] sleep_on_page+0xe/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604430] [<ffffffff8173c134>] __wait_on_bit+0x64/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604433] [<ffffffff81155acf>] wait_on_page_bit+0x7f/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604437] [<ffffffff810b0d00>] ? autoremove_wake_function+0x40/0x40 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604440] [<ffffffff81155bd9>] filemap_fdatawait_range+0xf9/0x190 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604442] [<ffffffff81155c97>] filemap_fdatawait+0x27/0x30 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604447] [<ffffffff812949ef>] jbd2_journal_commit_transaction+0xb0f/0x1ba0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604452] [<ffffffff81079ecf>] ? try_to_del_timer_sync+0x4f/0x70 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604455] [<ffffffff8129934d>] kjournald2+0xbd/0x240 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604463] [<ffffffff810b0cc0>] ? prepare_to_wait_event+0x100/0x100 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604468] [<ffffffff81299290>] ? commit_timeout+0x10/0x10 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604471] [<ffffffff81090c9b>] kthread+0xcb/0xf0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604476] [<ffffffff81090bd0>] ? kthread_create_on_node+0x1c0/0x1c0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604480] [<ffffffff8174850e>] ret_from_fork+0x6e/0xa0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604482] [<ffffffff81090bd0>] ? kthread_create_on_node+0x1c0/0x1c0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.604523] INFO: task rm:23386 blocked for more than 120 seconds. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.610835] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:33 swarm-cros-5 kernel: [56529.616613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624578] rm D ffff88336f493b00 0 23386 23368 0x00000000 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624583] ffff88016aea5c68 0000000000000086 ffff8819f1c91800 0000000000013b00 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624586] ffff88016aea5fd8 0000000000013b00 ffff8819f1c91800 ffff88336f494398 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624590] ffff88343ffb0ce8 0000000000000002 ffffffff81155d00 ffff88016aea5ce0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624593] Call Trace: Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624600] [<ffffffff81155d00>] ? wait_on_page_read+0x60/0x60 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624612] [<ffffffff8173bcbd>] io_schedule+0x9d/0x130 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624616] [<ffffffff81155d0e>] sleep_on_page+0xe/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624618] [<ffffffff8173c134>] __wait_on_bit+0x64/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624621] [<ffffffff81155acf>] wait_on_page_bit+0x7f/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624625] [<ffffffff810b0d00>] ? autoremove_wake_function+0x40/0x40 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624630] [<ffffffff8116582e>] truncate_inode_pages_range+0x2fe/0x5a0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624632] [<ffffffff81165ae5>] truncate_inode_pages+0x15/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624637] [<ffffffff8124f091>] ext4_evict_inode+0x131/0x510 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624640] [<ffffffff811e0d03>] evict+0xb3/0x1b0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624649] [<ffffffff811e1518>] iput+0xf8/0x190 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624653] [<ffffffff811d5e3e>] do_unlinkat+0x18e/0x2b0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624658] [<ffffffff811c71ae>] ? ____fput+0xe/0x10 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624662] [<ffffffff8108d9c4>] ? task_work_run+0xb4/0xd0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624667] [<ffffffff81014ed7>] ? do_notify_resume+0x97/0xb0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624670] [<ffffffff811d6d6b>] SyS_unlinkat+0x1b/0x40 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624673] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:33 swarm-cros-5 kernel: [56529.624682] INFO: task x86_64-cros-lin:24543 blocked for more than 120 seconds. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.632129] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:33 swarm-cros-5 kernel: [56529.637919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645884] x86_64-cros-lin D ffffffff81155d00 0 24543 24042 0x00000000 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645890] ffff880122de1b38 0000000000000082 ffff88043f8a4800 0000000000013b00 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645896] ffff880122de1fd8 0000000000013b00 ffff88043f8a4800 ffff88336f554398 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645900] ffff88343ffb12e8 0000000000000002 ffffffff81155d00 ffff880122de1bb0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645904] Call Trace: Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645911] [<ffffffff81155d00>] ? wait_on_page_read+0x60/0x60 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645917] [<ffffffff8173bcbd>] io_schedule+0x9d/0x130 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645924] [<ffffffff81155d0e>] sleep_on_page+0xe/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645926] [<ffffffff8173c134>] __wait_on_bit+0x64/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645930] [<ffffffff81155acf>] wait_on_page_bit+0x7f/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645936] [<ffffffff810b0d00>] ? autoremove_wake_function+0x40/0x40 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645940] [<ffffffff8116582e>] truncate_inode_pages_range+0x2fe/0x5a0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645943] [<ffffffff81165ae5>] truncate_inode_pages+0x15/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645947] [<ffffffff8124f091>] ext4_evict_inode+0x131/0x510 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645950] [<ffffffff811e0d03>] evict+0xb3/0x1b0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645953] [<ffffffff811e1518>] iput+0xf8/0x190 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645957] [<ffffffff811dc3c8>] __dentry_kill+0x198/0x1f0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645960] [<ffffffff811dc4cd>] dput+0xad/0x190 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645962] [<ffffffff811d61f4>] SYSC_renameat+0x294/0x420 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645967] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645969] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645973] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:33 swarm-cros-5 kernel: [56529.645976] INFO: task x86_64-cros-lin:24567 blocked for more than 120 seconds. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.653427] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:33 swarm-cros-5 kernel: [56529.659329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667279] x86_64-cros-lin D ffff88336f253b00 0 24567 24173 0x00000000 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667284] ffff88331b919c80 0000000000000082 ffff88013c059800 0000000000013b00 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667288] ffff88331b919fd8 0000000000013b00 ffff88013c059800 ffff88010d1cb728 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667291] ffff88010d1cb72c ffff88013c059800 00000000ffffffff ffff88010d1cb730 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667294] Call Trace: Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667301] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667310] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667314] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667319] [<ffffffff811d40c0>] do_last+0x2c0/0x1370 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667324] [<ffffffff812e251c>] ? security_file_alloc+0x1c/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667326] [<ffffffff811d522b>] path_openat+0xbb/0x650 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667329] [<ffffffff811d662a>] do_filp_open+0x3a/0x90 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667336] [<ffffffff811e3517>] ? __alloc_fd+0xa7/0x130 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667341] [<ffffffff811c4739>] do_sys_open+0x129/0x2a0 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667344] [<ffffffff811c48ce>] SyS_open+0x1e/0x20 Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667349] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:33 swarm-cros-5 kernel: [56529.667352] INFO: task x86_64-cros-lin:24576 blocked for more than 120 seconds. Jun 21 23:46:33 swarm-cros-5 kernel: [56529.674783] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:33 swarm-cros-5 kernel: [56529.680563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688522] x86_64-cros-lin D ffff88336f393b00 0 24576 23991 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688526] ffff8801529a3cf0 0000000000000086 ffff88013436c800 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688530] ffff8801529a3fd8 0000000000013b00 ffff88013436c800 ffff88010d1c8158 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688533] ffff88010d1c815c ffff88013436c800 00000000ffffffff ffff88010d1c8160 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688536] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688543] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688552] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688559] [<ffffffff811c7a69>] ? __sb_start_write+0x49/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688561] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688564] [<ffffffff811d0460>] lock_rename+0xc0/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688567] [<ffffffff811d610f>] SYSC_renameat+0x1af/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688572] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688575] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688578] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.688585] INFO: task x86_64-cros-lin:24578 blocked for more than 120 seconds. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.696015] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:34 swarm-cros-5 kernel: [56529.701803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709769] x86_64-cros-lin D ffff88336f3d3b00 0 24578 24232 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709774] ffff883304d8bb38 0000000000000086 ffff8801393e3000 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709778] ffff883304d8bfd8 0000000000013b00 ffff8801393e3000 ffff88336f3d4398 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709781] ffff88343ffa4ce8 0000000000000002 ffffffff81155d00 ffff883304d8bbb0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709784] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709792] [<ffffffff81155d00>] ? wait_on_page_read+0x60/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709797] [<ffffffff8173bcbd>] io_schedule+0x9d/0x130 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709805] [<ffffffff81155d0e>] sleep_on_page+0xe/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709811] [<ffffffff8173c134>] __wait_on_bit+0x64/0x90 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709813] [<ffffffff81155acf>] wait_on_page_bit+0x7f/0x90 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709818] [<ffffffff810b0d00>] ? autoremove_wake_function+0x40/0x40 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709822] [<ffffffff8116582e>] truncate_inode_pages_range+0x2fe/0x5a0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709824] [<ffffffff81165ae5>] truncate_inode_pages+0x15/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709829] [<ffffffff8124f091>] ext4_evict_inode+0x131/0x510 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709832] [<ffffffff811e0d03>] evict+0xb3/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709840] [<ffffffff811e1518>] iput+0xf8/0x190 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709846] [<ffffffff811dc3c8>] __dentry_kill+0x198/0x1f0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709848] [<ffffffff811dc4cd>] dput+0xad/0x190 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709853] [<ffffffff811d61f4>] SYSC_renameat+0x294/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709858] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709860] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709864] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.709866] INFO: task x86_64-cros-lin:24608 blocked for more than 120 seconds. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.717303] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:34 swarm-cros-5 kernel: [56529.723086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731047] x86_64-cros-lin D ffff88336f373b00 0 24608 24154 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731052] ffff883306e99cf0 0000000000000082 ffff8810d52b6000 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731056] ffff883306e99fd8 0000000000013b00 ffff8810d52b6000 ffff88010d1c8158 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731059] ffff88010d1c815c ffff8810d52b6000 00000000ffffffff ffff88010d1c8160 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731062] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731069] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731072] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731077] [<ffffffff811c7a69>] ? __sb_start_write+0x49/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731079] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731082] [<ffffffff811d0460>] lock_rename+0xc0/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731085] [<ffffffff811d610f>] SYSC_renameat+0x1af/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731090] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731092] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731096] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.731099] INFO: task x86_64-cros-lin:24609 blocked for more than 120 seconds. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.738557] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:34 swarm-cros-5 kernel: [56529.744343] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752324] x86_64-cros-lin D ffff88336f233b00 0 24609 24243 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752329] ffff880013141cf0 0000000000000086 ffff882e1b678000 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752333] ffff880013141fd8 0000000000013b00 ffff882e1b678000 ffff88010d1c8158 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752336] ffff88010d1c815c ffff882e1b678000 00000000ffffffff ffff88010d1c8160 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752339] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752347] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752355] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752363] [<ffffffff811c7a69>] ? __sb_start_write+0x49/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752366] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752369] [<ffffffff811d0460>] lock_rename+0xc0/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752372] [<ffffffff811d610f>] SYSC_renameat+0x1af/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752376] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752379] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752383] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.752390] INFO: task x86_64-cros-lin:24616 blocked for more than 120 seconds. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.759826] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:34 swarm-cros-5 kernel: [56529.765626] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773585] x86_64-cros-lin D ffff88336f2d3b00 0 24616 24256 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773590] ffff88003632bcf0 0000000000000086 ffff8830ff65b000 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773594] ffff88003632bfd8 0000000000013b00 ffff8830ff65b000 ffff88010d1c8158 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773597] ffff88010d1c815c ffff8830ff65b000 00000000ffffffff ffff88010d1c8160 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773600] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773607] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773614] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773620] [<ffffffff811c7a69>] ? __sb_start_write+0x49/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773622] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773625] [<ffffffff811d0460>] lock_rename+0xc0/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773628] [<ffffffff811d610f>] SYSC_renameat+0x1af/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773633] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773636] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773643] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.773648] INFO: task x86_64-cros-lin:24618 blocked for more than 120 seconds. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.781094] Not tainted 3.13.0-147-generic #196-Ubuntu Jun 21 23:46:34 swarm-cros-5 kernel: [56529.786894] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794892] x86_64-cros-lin D ffff88336f253b00 0 24618 24228 0x00000000 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794896] ffff880036089cf0 0000000000000082 ffff880194499800 0000000000013b00 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794900] ffff880036089fd8 0000000000013b00 ffff880194499800 ffff88010d1c8158 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794903] ffff88010d1c815c ffff880194499800 00000000ffffffff ffff88010d1c8160 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794907] Call Trace: Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794914] [<ffffffff8173bea9>] schedule_preempt_disabled+0x29/0x70 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794918] [<ffffffff8173dcf5>] __mutex_lock_slowpath+0x135/0x1b0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794928] [<ffffffff811c7a69>] ? __sb_start_write+0x49/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794933] [<ffffffff8173dd8f>] mutex_lock+0x1f/0x2f Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794936] [<ffffffff811d0460>] lock_rename+0xc0/0xe0 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794939] [<ffffffff811d610f>] SYSC_renameat+0x1af/0x420 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794943] [<ffffffff8118772c>] ? vm_munmap+0x4c/0x60 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794946] [<ffffffff811d74eb>] SyS_rename+0x1b/0x20 Jun 21 23:46:34 swarm-cros-5 kernel: [56529.794950] [<ffffffff817486f0>] system_call_fastpath+0x1a/0x1f Jun 21 23:47:00 swarm-cros-5 puppet-agent[32316]: Applying configuration version '1529649697' Jun 21 23:49:25 swarm-cros-5 puppet-agent[32316]: Finished catalog run in 145.24 seconds Jun 22 00:45:12 swarm-cros-5 puppet-agent[28311]: Retrieving pluginfacts Jun 22 00:45:12 swarm-cros-5 puppet-agent[28311]: Retrieving plugin Jun 22 00:45:13 swarm-cros-5 puppet-agent[28311]: Loading facts Jun 22 00:46:15 swarm-cros-5 puppet-agent[28311]: Caching catalog for swarm-cros-5.c.chromeos-bot.internal Jun 22 00:46:16 swarm-cros-5 puppet-agent[28311]: Applying configuration version '1529652687' Jun 22 00:46:20 swarm-cros-5 puppet-agent[28311]: Finished catalog run in 3.78 seconds Jun 22 01:45:10 swarm-cros-5 puppet-agent[2222]: Retrieving pluginfacts Jun 22 01:45:10 swarm-cros-5 puppet-agent[2222]: Retrieving plugin Jun 22 01:45:11 swarm-cros-5 puppet-agent[2222]: Loading facts Jun 22 01:45:36 swarm-cros-5 puppet-agent[2222]: Caching catalog for swarm-cros-5.c.chromeos-bot.internal Jun 22 01:45:37 swarm-cros-5 puppet-agent[2222]: Applying configuration version '1529656411' Jun 22 01:45:40 swarm-cros-5 puppet-agent[2222]: Finished catalog run in 3.92 seconds Jun 22 02:45:16 swarm-cros-5 puppet-agent[12426]: Retrieving pluginfacts Jun 22 02:45:16 swarm-cros-5 puppet-agent[12426]: Retrieving plugin Jun 22 02:45:17 swarm-cros-5 puppet-agent[12426]: Loading facts Jun 22 02:46:02 swarm-cros-5 puppet-agent[12426]: Caching catalog for swarm-cros-5.c.chromeos-bot.internal Jun 22 02:46:03 swarm-cros-5 puppet-agent[12426]: Applying configuration version '1529659880' Jun 22 02:46:08 swarm-cros-5 puppet-agent[12426]: Finished catalog run in 5.24 seconds Jun 22 03:25:58 swarm-cros-5 dhclient: DHCPREQUEST of 10.0.2.126 on eth0 to 169.254.169.254 port 67 (xid=0x4be5daae) Jun 22 03:25:58 swarm-cros-5 dhclient: DHCPACK of 10.0.2.126 from 169.254.169.254 Jun 22 03:25:58 swarm-cros-5 dhclient: bound to 10.0.2.126 -- renewal in 33439 seconds. Jun 22 03:45:39 swarm-cros-5 puppet-agent[2376]: Retrieving pluginfacts Jun 22 03:45:39 swarm-cros-5 puppet-agent[2376]: Retrieving plugin Jun 22 03:45:41 swarm-cros-5 puppet-agent[2376]: Loading facts Jun 22 03:46:15 swarm-cros-5 puppet-agent[2376]: Caching catalog for swarm-cros-5.c.chromeos-bot.internal Jun 22 03:46:51 swarm-cros-5 puppet-agent[2376]: Applying configuration version '1529663485' Jun 22 03:47:18 swarm-cros-5 puppet-agent[2376]: Finished catalog run in 26.90 seconds Fri Jun 22 11:25:55 UTC 2018: shutting down for reboot [up 73284s].
,
Jun 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/880ab82f2f2063bc9d7fb1b954eb2f6a540ae759 commit 880ab82f2f2063bc9d7fb1b954eb2f6a540ae759 Author: Don Garrett <dgarrett@google.com> Date: Tue Jun 26 23:59:31 2018 cros_sdk_lib: Add timeout for CleanupChrootMount. We are seeing kernel bugs that cause CleanupChrootMount to hang forever, which can tie up a build machine for up to 24 hours. I'm not certain if this will unhang the process or not, but it doesn't seem like it can hurt. BUG= chromium:855151 TEST=run_tests Change-Id: If5aa195dcbc539203af871c912069648ddee8131 Reviewed-on: https://chromium-review.googlesource.com/1112712 Commit-Ready: Don Garrett <dgarrett@chromium.org> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/880ab82f2f2063bc9d7fb1b954eb2f6a540ae759/lib/cros_sdk_lib.py
,
Jun 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/91a8cfc0b7666077c68c531b17619ad8a6b28b7c commit 91a8cfc0b7666077c68c531b17619ad8a6b28b7c Author: Don Garrett <dgarrett@google.com> Date: Tue Jun 26 23:59:31 2018 cbuildbot_launch: Add metrics for CleanupChroot stage. Capture metrics for the CleanupChroot stage so we can see how often it fails. BUG= chromium:855151 TEST=run_tests cros tryjob --local success-build Change-Id: Ia02a6e7a4ef9e0828bfa19477b47f7b7e0fd87c9 Reviewed-on: https://chromium-review.googlesource.com/1112713 Commit-Ready: Don Garrett <dgarrett@chromium.org> Tested-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Mike Nichols <mikenichols@chromium.org> [modify] https://crrev.com/91a8cfc0b7666077c68c531b17619ad8a6b28b7c/scripts/cbuildbot_launch.py
,
Jun 27 2018
In theory, it might, maybe, be fixed. If this doesn't work, the next step is much more drastic. We can disable to cleanup and depend on reboots, or we can stop using chroot.img. I have started testing an update to the image used by ChromeOS bots (to see if we can pull in a kernel update), but the newest images are having unrelated issues.
,
Jun 27 2018
Since updating the kernel may be our best bet, I'm blocking on the bug to fix the builder that generates our updated bot images.
,
Jun 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e3d2bf14a3ded568e72df70feac71980d34da10c commit e3d2bf14a3ded568e72df70feac71980d34da10c Author: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Wed Jun 27 02:57:50 2018 Roll src/third_party/chromite 1b45f0dc972b..47bb72284a37 (3 commits) https://chromium.googlesource.com/chromiumos/chromite.git/+log/1b45f0dc972b..47bb72284a37 git log 1b45f0dc972b..47bb72284a37 --date=short --no-merges --format='%ad %ae %s' 2018-06-26 dgarrett@google.com lint: change type() checks to use isinstance() instead 2018-06-26 dgarrett@google.com cbuildbot_launch: Add metrics for CleanupChroot stage. 2018-06-26 dgarrett@google.com cros_sdk_lib: Add timeout for CleanupChrootMount. Created with: gclient setdep -r src/third_party/chromite@47bb72284a37 The AutoRoll server is located here: https://chromite-chromium-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. BUG=chromium:None,chromium:855151,chromium:855151 TBR=chrome-os-gardeners@chromium.org Change-Id: I7a982eceeabd1233818cc7545bc7cdcf2c04cefd Reviewed-on: https://chromium-review.googlesource.com/1116082 Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#570647} [modify] https://crrev.com/e3d2bf14a3ded568e72df70feac71980d34da10c/DEPS
,
Jun 27 2018
I upgraded the grunt-paladin builder to the 4.4 HWE kernel from Ubuntu a few weeks ago. If that machine hasn't seen any more hangs (it hadn't before I went on leave), that might be worth pushing to the rest. I hadn't seen any other bad side effects of the kernel upgrade.
,
Jun 27 2018
Working on being able to generate new GCE images again (to fix an unrelated bug). After that's fixed, I think we should update the kernel on the image, and update all of our builders.
,
Jun 27 2018
If that ends up taking longer than planned, you could probably also push out the equivalent of this with puppet to get the kernel everywhere: apt-get install --install-recommends linux-generic-lts-xenial
,
Jun 28 2018
,
Jul 3
Issue 859766 has been merged into this issue.
,
Jul 3
Looks like this is still happening but at least we are killing it after 4 minutes instead of 24 hours. We might have to accelerate the kernel rollout.
,
Jul 3
This is happening frequently on *-tot-chrome-pfq-informational builders now: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8942003780206831536 http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8942008371026020672 http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8942033899737051312 etc. What does the timeline for fixing it currently look like?
,
Jul 3
We are also seeing problems with chroot cleanup during the start of the build, AND I'm having trouble getting an updated builder image that works. I'm thinking the best thing is to disable chroot.img until this can be sorted out again.
,
Jul 3
Sounds good to me. This just became a higher priority, because it's now making a bunch of the non-informational Chrome PFQ builders also fail.
,
Jul 3
I created this bug last night, which similar, except that is can also commonly happen at the start of the build: https://crbug.com/859984 It had a totally different cause, and I've chumped in a hopeful fix.
,
Jul 3
,
Jul 5
Issue 818874 has been merged into this issue.
,
Jul 5
issue 860017 is requesting an upgrade of the kernel version which would much quicker than rev'ing the OS image.
,
Jul 6
,
Jul 7
Based on analysis in issue 860508, I believe that the original kernel issue is solved and we are now facing a Swarming bug that just so happens to affect the exit status recorded on CleanupChroot because it happens to be the last one.
,
Jul 7
Thanks, that explanation seems likely to me too: when I've seen apparent CleanupChroot failures most recently, during the middle of the run the build page was always displaying the currently-running stage in purple. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by manojgupta@chromium.org
, Jun 21 2018Labels: OS-Chrome