cheets_StartAndroid.stress randomly reboots Intel 3.14 devices (yuna, lulu) |
||||||
Issue descriptioncheets_StartAndroid (formerly known as cheets_CTSHelper) is a test which launches Chrome and starts Android. It is used as a basic test in all CTS tests. The .stress version launches Android 10 times. I am digging through the failures here: https://wmatrix.googleplex.com/unfiltered?hide_missing=True&releases=tot&tests=cheets_StartAndroid.stress&days_back=20 The DEBUG log shows great unhappiness: 05/05 02:33:37.846 INFO | arc_common:0037| Waiting for Android to boot completely. 05/05 02:33:37.846 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:39.877 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:41.974 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:44.035 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:46.092 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:48.158 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:50.218 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:52.290 DEBUG| utils:0202| Running 'android-sh -c "getprop sys.boot_completed"' 05/05 02:33:52.371 INFO | arc_common:0043| Android has booted completely. 05/05 02:33:54.374 DEBUG| arc_util:0041| ARC is enabled in mode enabled 05/05 02:33:54.375 INFO | arc_util:0105| Saving Android dumpstate. 05/05 02:34:14.376 INFO | arc_util:0125| Android dumpstate successfully saved. 05/05 02:34:14.399 DEBUG| cros_interface:0363| ListProcesses(<predicate>)->[237 processes] 05/05 02:34:14.403 INFO | cros_interface:0546| (Re)starting the ui (logs the user out) 05/05 02:34:14.422 DEBUG| cros_interface:0439| IsServiceRunning(ui)->True 05/05 02:34:14.423 DEBUG| cros_interface:0058| sh -c restart ui ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������� The crashinfo collected a kcrash https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/115860250-chromeos-test/chromeos4-row10-rack6-host5/crashinfo.chromeos4-row10-rack6-host5/ <1>[ 724.477784] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 <1>[ 724.477800] IP: [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c <5>[ 724.477815] PGD 0 <5>[ 724.477821] Oops: 0000 [#1] PREEMPT SMP <0>[ 724.480262] gsmi: Log Shutdown Reason 0x03 <5>[ 724.480269] Modules linked in: ip6t_REJECT xt_TCPMSS ip6table_mangle ip6table_raw veth uinput iwlmvm i2c_dev memc_x86 x86_pkg_temp_thermal iwlwifi smsc75xx cros_ec_accel kfifo_buf iio_trig_sysfs industrialio iwl7000_mac80211 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_soc_sst_acpi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep rfcomm ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat zram xt_mark bridge stp llc fuse cfg80211 ip6table_filter snd_seq_midi snd_seq_midi_event snd_rawmidi ip6_tables snd_seq snd_seq_device smsc95xx usbnet mii btusb btbcm btintel bluetooth uvcvideo videobuf2_vmalloc joydev <5>[ 724.480390] CPU: 1 PID: 3141 Comm: chrome Not tainted 3.14.0 #1 <5>[ 724.480397] Hardware name: GOOGLE Auron_Yuna, BIOS Google_Auron_yuna.6301.59.8 04/02/2015 <5>[ 724.480408] task: ffff880072091240 ti: ffff880160314000 task.ti: ffff880160314000 <5>[ 724.480417] RIP: 0010:[<ffffffffb677e725>] [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c <5>[ 724.480432] RSP: 0000:ffff880160315c08 EFLAGS: 00010217 <5>[ 724.480439] RAX: ffff880072091240 RBX: 0000000000000000 RCX: 0000000000000000 <5>[ 724.480448] RDX: ffffffffb72d9f50 RSI: ffff8801721bc338 RDI: ffff8801721bc338 <5>[ 724.480456] RBP: ffff880160315c20 R08: 0000000000000000 R09: 0000000000000000 <5>[ 724.480465] R10: ffff880160315be0 R11: 0000000000000000 R12: ffff8801721bc338 <5>[ 724.480474] R13: ffff8801721bc378 R14: ffff880171069c50 R15: ffff880171069b50 <5>[ 724.480483] FS: 000070924b52b780(0000) GS:ffff88017ed00000(0000) knlGS:0000000000000000 <5>[ 724.480493] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <5>[ 724.480501] CR2: 0000000000000008 CR3: 000000006145f000 CR4: 00000000000407e0 <5>[ 724.480509] Stack: <5>[ 724.480513] ffff8801721bc338 ffff880172fc3000 0000000000000000 ffff880160315c48 <5>[ 724.480526] ffffffffb677e7a2 ffff8801721bc338 ffff880172fc3018 ffff880172fc3000 <5>[ 724.480539] ffff880160315c88 ffffffffb677ebc4 ffff8801721bc390 000000005658541a <5>[ 724.480551] Call Trace: <5>[ 724.480560] [<ffffffffb677e7a2>] ep_remove+0x20/0xc2 <5>[ 724.480569] [<ffffffffb677ebc4>] eventpoll_release_file+0x6c/0xb1 <5>[ 724.480579] [<ffffffffb6748e76>] __fput+0xa0/0x1c6 <5>[ 724.480587] [<ffffffffb6748fd2>] ____fput+0xe/0x10 <5>[ 724.480596] [<ffffffffb66789e0>] task_work_run+0x7d/0x93 <5>[ 724.480606] [<ffffffffb665fa89>] do_exit+0x40d/0x94d <5>[ 724.480617] [<ffffffffb666aa2a>] ? __dequeue_signal+0x1a/0x136 <5>[ 724.480627] [<ffffffffb6660041>] do_group_exit+0x42/0xb0 <5>[ 724.480635] [<ffffffffb666d764>] get_signal_to_deliver+0x567/0x58d <5>[ 724.480645] [<ffffffffb6601eb3>] do_signal+0x57/0x527 <5>[ 724.480656] [<ffffffffb6633d71>] ? __do_page_fault+0x35d/0x383 <5>[ 724.480665] [<ffffffffb668cade>] ? update_stats_wait_end+0x7c/0xd2 <5>[ 724.480677] [<ffffffffb6c22555>] ? _raw_spin_unlock_irq+0x17/0x22 <5>[ 724.480686] [<ffffffffb66848d7>] ? finish_task_switch+0x63/0xb6 <5>[ 724.480695] [<ffffffffb66023ac>] do_notify_resume+0x29/0x5b <5>[ 724.480705] [<ffffffffb6c22bf9>] retint_signal+0x3d/0x74 <5>[ 724.480712] Code: 5a 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 55 48 89 e5 41 55 4c 8d 6f 40 41 54 53 49 89 fc 49 8b 5c 24 40 4c 39 eb 74 56 <48> 8b 43 08 48 8b 13 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 <1>[ 724.480803] RIP [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c <5>[ 724.480815] RSP <ffff880160315c08> <5>[ 724.480820] CR2: 0000000000000008 <4>[ 724.480826] ---[ end trace 4ab72ed7d87b2919 ]--- <0>[ 724.489411] Kernel panic - not syncing: Fatal exception <0>[ 724.489424] Kernel Offset: 0x35600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) <0>[ 724.489539] gsmi: Log Shutdown Reason 0x02 https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/115751258-chromeos-test/chromeos4-row6-rack2-host1/crashinfo.chromeos4-row6-rack2-host1/ <7>[ 1133.914087] SELinux: initialized (dev proc, type proc), uses genfs_contexts <1>[ 1137.403704] BUG: unable to handle kernel paging request at fffffffffffffff8 <1>[ 1137.403719] IP: [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1 <5>[ 1137.403735] PGD 18e0d067 PUD 18e0f067 PMD 0 <5>[ 1137.403746] Oops: 0000 [#1] PREEMPT SMP <0>[ 1137.406177] gsmi: Log Shutdown Reason 0x03 <5>[ 1137.406183] Modules linked in: ip6t_REJECT xt_TCPMSS ip6table_mangle ip6table_raw veth uinput i2c_dev iwlmvm memc_x86 x86_pkg_temp_thermal cros_ec_accel iio_trig_sysfs kfifo_buf industrialio iwlwifi iwl7000_mac80211 smsc75xx snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_acpi ipt_MASQUERADE snd_hda_intel snd_hda_controller zram snd_hda_codec iptable_nat snd_hwdep nf_nat_ipv4 rfcomm nf_nat xt_mark bridge stp llc fuse cfg80211 ip6table_filter ip6_tables snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device smsc95xx usbnet mii btusb btbcm btintel bluetooth uvcvideo videobuf2_vmalloc joydev <5>[ 1137.406310] CPU: 0 PID: 4737 Comm: DownloadManager Not tainted 3.14.0 #1 <5>[ 1137.406318] Hardware name: GOOGLE Lulu, BIOS Google_Lulu.6301.136.57 03/28/2016 <5>[ 1137.406329] task: ffff880075e7a480 ti: ffff880032ffa000 task.ti: ffff880032ffa000 <5>[ 1137.406338] RIP: 0010:[<ffffffff9837eba0>] [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1 <5>[ 1137.406352] RSP: 0000:ffff880032ffbe80 EFLAGS: 00210203 <5>[ 1137.406359] RAX: 0000000000000000 RBX: ffffffffffffffa8 RCX: 0000000000000000 <5>[ 1137.406368] RDX: ffffffff98ed9f50 RSI: ffffffff9837d837 RDI: ffff8800757a6718 <5>[ 1137.406376] RBP: ffff880032ffbeb0 R08: ffffea0000092900 R09: 0000000000000000 <5>[ 1137.406385] R10: 0000000000000000 R11: ffff88007856e000 R12: ffff8800757a6718 <5>[ 1137.406393] R13: ffff8800757a6700 R14: ffff880035089390 R15: ffff880035089290 <5>[ 1137.406402] FS: 0000000000000000(0000) GS:ffff88007bc00000(0063) knlGS:00000000e611f978 <5>[ 1137.406412] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 <5>[ 1137.406419] CR2: fffffffffffffff8 CR3: 0000000003ab5000 CR4: 00000000000407f0 <5>[ 1137.406427] Stack: <5>[ 1137.406431] 0000000000000000 00000000927dc1d8 ffff880035089280 0000000000000008 <5>[ 1137.406444] ffff8800781d19f0 ffff88003e20eca8 ffff880032ffbef0 ffffffff98348e6d <5>[ 1137.406458] ffff8800790388e0 ffff880075e7a480 ffffffff99049c20 0000000000000000 <5>[ 1137.406471] Call Trace: <5>[ 1137.406480] [<ffffffff98348e6d>] __fput+0xa0/0x1c6 <5>[ 1137.406488] [<ffffffff98348fc9>] ____fput+0xe/0x10 <5>[ 1137.406497] [<ffffffff982789e0>] task_work_run+0x7d/0x93 <5>[ 1137.406506] [<ffffffff982023da>] do_notify_resume+0x57/0x5b <5>[ 1137.406517] [<ffffffff988237ee>] int_signal+0x12/0x17 <5>[ 1137.406524] Code: 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 b1 2a 4a 00 49 8b 06 48 89 45 d0 48 8b 5d d0 48 83 eb 58 48 8d 43 58 4c 39 f0 74 35 <4c> 8b 6b 50 4d 8d 65 18 4c 89 e7 e8 89 2a 4a 00 48 89 de 4c 89 <1>[ 1137.406615] RIP [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1 <5>[ 1137.406626] RSP <ffff880032ffbe80> <5>[ 1137.406631] CR2: fffffffffffffff8 <4>[ 1137.406637] ---[ end trace 042b51bc83c59bb6 ]--- <0>[ 1137.410305] Kernel panic - not syncing: Fatal exception <0>[ 1137.410329] Kernel Offset: 0x17200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) <0>[ 1137.410441] gsmi: Log Shutdown Reason 0x02
,
May 6 2017
Might well be. I can look into it, but it will have to wait until Monday.
,
May 6 2017
Thanks. Monday is fine.
,
May 8 2017
https://chromium-review.googlesource.com/#/c/498852/ https://chromium-review.googlesource.com/#/c/498853/
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/050f1cadd5572aa0ab45f18c4ee4081f545a86be commit 050f1cadd5572aa0ab45f18c4ee4081f545a86be Author: Konstantin Khlebnikov <koct9i@gmail.com> Date: Wed May 10 03:36:37 2017 UPSTREAM: epoll: fix use-after-free in eventpoll_release_file This fixes use-after-free of epi->fllink.next inside list loop macro. This loop actually releases elements in the body. The list is rcu-protected but here we cannot hold rcu_read_lock because we need to lock mutex inside. The obvious solution is to use list_for_each_entry_safe(). RCU-ness isn't essential because nobody can change this list under us, it's final fput for this file. The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL using rcu") BUG= chromium:719087 TEST=Run cheets_StartAndroid.stress Change-Id: Iffe854de38a46665fc031e4bedc533ce6645b9d6 Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Stable <stable@vger.kernel.org> # 3.13+ Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit ebe06187bf2a) Reviewed-on: https://chromium-review.googlesource.com/498852 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/050f1cadd5572aa0ab45f18c4ee4081f545a86be/fs/eventpoll.c
,
May 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7 commit 3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7 Author: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Date: Wed May 10 03:36:38 2017 UPSTREAM: eventpoll: fix uninitialized variable in epoll_ctl When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is not initialized but ep_take_care_of_epollwakeup reads its event field. When this unintialized field has EPOLLWAKEUP bit set, a capability check is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This produces unexpected messages in the audit log, such as (on a system running SELinux): type=AVC msg=audit(1408212798.866:410): avc: denied { block_suspend } for pid=7754 comm="dbus-daemon" capability=36 scontext=unconfined_u:unconfined_r:unconfined_t tcontext=unconfined_u:unconfined_r:unconfined_t tclass=capability2 permissive=1 type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233 success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1 pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=3 comm="dbus-daemon" exe="/usr/bin/dbus-daemon" subj=unconfined_u:unconfined_r:unconfined_t key=(null) ("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)") Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL. BUG= chromium:719087 TEST=Run cheets_StartAndroid.stress Change-Id: I35c200d749dad57aeeed34965b622dcef514b0ea Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arve Hjnnevg <arve@android.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit c680e41b3a2e) Reviewed-on: https://chromium-review.googlesource.com/498853 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> [modify] https://crrev.com/3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7/fs/eventpoll.c
,
May 10 2017
,
Jan 22 2018
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by snanda@chromium.org
, May 5 2017