New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 719087 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

cheets_StartAndroid.stress randomly reboots Intel 3.14 devices (yuna, lulu)

Project Member Reported by ihf@chromium.org, May 5 2017

Issue description

cheets_StartAndroid (formerly known as cheets_CTSHelper) is a test which launches Chrome and starts Android. It is used as a basic test in all CTS tests. The .stress version launches Android 10 times.

I am digging through the failures here:
https://wmatrix.googleplex.com/unfiltered?hide_missing=True&releases=tot&tests=cheets_StartAndroid.stress&days_back=20


The DEBUG log shows great unhappiness:
05/05 02:33:37.846 INFO |        arc_common:0037| Waiting for Android to boot completely.
05/05 02:33:37.846 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:39.877 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:41.974 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:44.035 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:46.092 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:48.158 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:50.218 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:52.290 DEBUG|             utils:0202| Running 'android-sh -c "getprop sys.boot_completed"'
05/05 02:33:52.371 INFO |        arc_common:0043| Android has booted completely.
05/05 02:33:54.374 DEBUG|          arc_util:0041| ARC is enabled in mode enabled
05/05 02:33:54.375 INFO |          arc_util:0105| Saving Android dumpstate.
05/05 02:34:14.376 INFO |          arc_util:0125| Android dumpstate successfully saved.
05/05 02:34:14.399 DEBUG|    cros_interface:0363| ListProcesses(<predicate>)->[237 processes]
05/05 02:34:14.403 INFO |    cros_interface:0546| (Re)starting the ui (logs the user out)
05/05 02:34:14.422 DEBUG|    cros_interface:0439| IsServiceRunning(ui)->True
05/05 02:34:14.423 DEBUG|    cros_interface:0058| sh -c restart ui 
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������

The crashinfo collected a kcrash
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/115860250-chromeos-test/chromeos4-row10-rack6-host5/crashinfo.chromeos4-row10-rack6-host5/

<1>[  724.477784] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
<1>[  724.477800] IP: [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c
<5>[  724.477815] PGD 0 
<5>[  724.477821] Oops: 0000 [#1] PREEMPT SMP 
<0>[  724.480262] gsmi: Log Shutdown Reason 0x03
<5>[  724.480269] Modules linked in: ip6t_REJECT xt_TCPMSS ip6table_mangle ip6table_raw veth uinput iwlmvm i2c_dev memc_x86 x86_pkg_temp_thermal iwlwifi smsc75xx cros_ec_accel kfifo_buf iio_trig_sysfs industrialio iwl7000_mac80211 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_soc_sst_acpi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep rfcomm ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat zram xt_mark bridge stp llc fuse cfg80211 ip6table_filter snd_seq_midi snd_seq_midi_event snd_rawmidi ip6_tables snd_seq snd_seq_device smsc95xx usbnet mii btusb btbcm btintel bluetooth uvcvideo videobuf2_vmalloc joydev
<5>[  724.480390] CPU: 1 PID: 3141 Comm: chrome Not tainted 3.14.0 #1
<5>[  724.480397] Hardware name: GOOGLE Auron_Yuna, BIOS Google_Auron_yuna.6301.59.8 04/02/2015
<5>[  724.480408] task: ffff880072091240 ti: ffff880160314000 task.ti: ffff880160314000
<5>[  724.480417] RIP: 0010:[<ffffffffb677e725>]  [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c
<5>[  724.480432] RSP: 0000:ffff880160315c08  EFLAGS: 00010217
<5>[  724.480439] RAX: ffff880072091240 RBX: 0000000000000000 RCX: 0000000000000000
<5>[  724.480448] RDX: ffffffffb72d9f50 RSI: ffff8801721bc338 RDI: ffff8801721bc338
<5>[  724.480456] RBP: ffff880160315c20 R08: 0000000000000000 R09: 0000000000000000
<5>[  724.480465] R10: ffff880160315be0 R11: 0000000000000000 R12: ffff8801721bc338
<5>[  724.480474] R13: ffff8801721bc378 R14: ffff880171069c50 R15: ffff880171069b50
<5>[  724.480483] FS:  000070924b52b780(0000) GS:ffff88017ed00000(0000) knlGS:0000000000000000
<5>[  724.480493] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<5>[  724.480501] CR2: 0000000000000008 CR3: 000000006145f000 CR4: 00000000000407e0
<5>[  724.480509] Stack:
<5>[  724.480513]  ffff8801721bc338 ffff880172fc3000 0000000000000000 ffff880160315c48
<5>[  724.480526]  ffffffffb677e7a2 ffff8801721bc338 ffff880172fc3018 ffff880172fc3000
<5>[  724.480539]  ffff880160315c88 ffffffffb677ebc4 ffff8801721bc390 000000005658541a
<5>[  724.480551] Call Trace:
<5>[  724.480560]  [<ffffffffb677e7a2>] ep_remove+0x20/0xc2
<5>[  724.480569]  [<ffffffffb677ebc4>] eventpoll_release_file+0x6c/0xb1
<5>[  724.480579]  [<ffffffffb6748e76>] __fput+0xa0/0x1c6
<5>[  724.480587]  [<ffffffffb6748fd2>] ____fput+0xe/0x10
<5>[  724.480596]  [<ffffffffb66789e0>] task_work_run+0x7d/0x93
<5>[  724.480606]  [<ffffffffb665fa89>] do_exit+0x40d/0x94d
<5>[  724.480617]  [<ffffffffb666aa2a>] ? __dequeue_signal+0x1a/0x136
<5>[  724.480627]  [<ffffffffb6660041>] do_group_exit+0x42/0xb0
<5>[  724.480635]  [<ffffffffb666d764>] get_signal_to_deliver+0x567/0x58d
<5>[  724.480645]  [<ffffffffb6601eb3>] do_signal+0x57/0x527
<5>[  724.480656]  [<ffffffffb6633d71>] ? __do_page_fault+0x35d/0x383
<5>[  724.480665]  [<ffffffffb668cade>] ? update_stats_wait_end+0x7c/0xd2
<5>[  724.480677]  [<ffffffffb6c22555>] ? _raw_spin_unlock_irq+0x17/0x22
<5>[  724.480686]  [<ffffffffb66848d7>] ? finish_task_switch+0x63/0xb6
<5>[  724.480695]  [<ffffffffb66023ac>] do_notify_resume+0x29/0x5b
<5>[  724.480705]  [<ffffffffb6c22bf9>] retint_signal+0x3d/0x74
<5>[  724.480712] Code: 5a 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 44 00 00 55 48 89 e5 41 55 4c 8d 6f 40 41 54 53 49 89 fc 49 8b 5c 24 40 4c 39 eb 74 56 <48> 8b 43 08 48 8b 13 48 89 42 08 48 89 10 48 b8 00 01 10 00 00 
<1>[  724.480803] RIP  [<ffffffffb677e725>] ep_unregister_pollwait.isra.5+0x1f/0x7c
<5>[  724.480815]  RSP <ffff880160315c08>
<5>[  724.480820] CR2: 0000000000000008
<4>[  724.480826] ---[ end trace 4ab72ed7d87b2919 ]---
<0>[  724.489411] Kernel panic - not syncing: Fatal exception
<0>[  724.489424] Kernel Offset: 0x35600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
<0>[  724.489539] gsmi: Log Shutdown Reason 0x02

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/115751258-chromeos-test/chromeos4-row6-rack2-host1/crashinfo.chromeos4-row6-rack2-host1/
<7>[ 1133.914087] SELinux: initialized (dev proc, type proc), uses genfs_contexts
<1>[ 1137.403704] BUG: unable to handle kernel paging request at fffffffffffffff8
<1>[ 1137.403719] IP: [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1
<5>[ 1137.403735] PGD 18e0d067 PUD 18e0f067 PMD 0 
<5>[ 1137.403746] Oops: 0000 [#1] PREEMPT SMP 
<0>[ 1137.406177] gsmi: Log Shutdown Reason 0x03
<5>[ 1137.406183] Modules linked in: ip6t_REJECT xt_TCPMSS ip6table_mangle ip6table_raw veth uinput i2c_dev iwlmvm memc_x86 x86_pkg_temp_thermal cros_ec_accel iio_trig_sysfs kfifo_buf industrialio iwlwifi iwl7000_mac80211 smsc75xx snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_acpi ipt_MASQUERADE snd_hda_intel snd_hda_controller zram snd_hda_codec iptable_nat snd_hwdep nf_nat_ipv4 rfcomm nf_nat xt_mark bridge stp llc fuse cfg80211 ip6table_filter ip6_tables snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device smsc95xx usbnet mii btusb btbcm btintel bluetooth uvcvideo videobuf2_vmalloc joydev
<5>[ 1137.406310] CPU: 0 PID: 4737 Comm: DownloadManager Not tainted 3.14.0 #1
<5>[ 1137.406318] Hardware name: GOOGLE Lulu, BIOS Google_Lulu.6301.136.57 03/28/2016
<5>[ 1137.406329] task: ffff880075e7a480 ti: ffff880032ffa000 task.ti: ffff880032ffa000
<5>[ 1137.406338] RIP: 0010:[<ffffffff9837eba0>]  [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1
<5>[ 1137.406352] RSP: 0000:ffff880032ffbe80  EFLAGS: 00210203
<5>[ 1137.406359] RAX: 0000000000000000 RBX: ffffffffffffffa8 RCX: 0000000000000000
<5>[ 1137.406368] RDX: ffffffff98ed9f50 RSI: ffffffff9837d837 RDI: ffff8800757a6718
<5>[ 1137.406376] RBP: ffff880032ffbeb0 R08: ffffea0000092900 R09: 0000000000000000
<5>[ 1137.406385] R10: 0000000000000000 R11: ffff88007856e000 R12: ffff8800757a6718
<5>[ 1137.406393] R13: ffff8800757a6700 R14: ffff880035089390 R15: ffff880035089290
<5>[ 1137.406402] FS:  0000000000000000(0000) GS:ffff88007bc00000(0063) knlGS:00000000e611f978
<5>[ 1137.406412] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
<5>[ 1137.406419] CR2: fffffffffffffff8 CR3: 0000000003ab5000 CR4: 00000000000407f0
<5>[ 1137.406427] Stack:
<5>[ 1137.406431]  0000000000000000 00000000927dc1d8 ffff880035089280 0000000000000008
<5>[ 1137.406444]  ffff8800781d19f0 ffff88003e20eca8 ffff880032ffbef0 ffffffff98348e6d
<5>[ 1137.406458]  ffff8800790388e0 ffff880075e7a480 ffffffff99049c20 0000000000000000
<5>[ 1137.406471] Call Trace:
<5>[ 1137.406480]  [<ffffffff98348e6d>] __fput+0xa0/0x1c6
<5>[ 1137.406488]  [<ffffffff98348fc9>] ____fput+0xe/0x10
<5>[ 1137.406497]  [<ffffffff982789e0>] task_work_run+0x7d/0x93
<5>[ 1137.406506]  [<ffffffff982023da>] do_notify_resume+0x57/0x5b
<5>[ 1137.406517]  [<ffffffff988237ee>] int_signal+0x12/0x17
<5>[ 1137.406524] Code: 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 b1 2a 4a 00 49 8b 06 48 89 45 d0 48 8b 5d d0 48 83 eb 58 48 8d 43 58 4c 39 f0 74 35 <4c> 8b 6b 50 4d 8d 65 18 4c 89 e7 e8 89 2a 4a 00 48 89 de 4c 89 
<1>[ 1137.406615] RIP  [<ffffffff9837eba0>] eventpoll_release_file+0x51/0xb1
<5>[ 1137.406626]  RSP <ffff880032ffbe80>
<5>[ 1137.406631] CR2: fffffffffffffff8
<4>[ 1137.406637] ---[ end trace 042b51bc83c59bb6 ]---
<0>[ 1137.410305] Kernel panic - not syncing: Fatal exception
<0>[ 1137.410329] Kernel Offset: 0x17200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
<0>[ 1137.410441] gsmi: Log Shutdown Reason 0x02
 
Cc: groeck@chromium.org dtor@chromium.org
This is probably relevant:
https://lkml.org/lkml/2014/6/16/843

dtor, groeck, thoughts?

Comment 2 by groeck@google.com, May 6 2017

Might well be. I can look into it, but it will have to wait until Monday.

Cc: snanda@chromium.org
Owner: groeck@chromium.org
Thanks. Monday is fine.
Project Member

Comment 5 by bugdroid1@chromium.org, May 10 2017

Labels: merge-merged-chromeos-3.14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/050f1cadd5572aa0ab45f18c4ee4081f545a86be

commit 050f1cadd5572aa0ab45f18c4ee4081f545a86be
Author: Konstantin Khlebnikov <koct9i@gmail.com>
Date: Wed May 10 03:36:37 2017

UPSTREAM: epoll: fix use-after-free in eventpoll_release_file

This fixes use-after-free of epi->fllink.next inside list loop macro.
This loop actually releases elements in the body.  The list is
rcu-protected but here we cannot hold rcu_read_lock because we need to
lock mutex inside.

The obvious solution is to use list_for_each_entry_safe().  RCU-ness
isn't essential because nobody can change this list under us, it's final
fput for this file.

The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL
using rcu")

BUG= chromium:719087 
TEST=Run cheets_StartAndroid.stress

Change-Id: Iffe854de38a46665fc031e4bedc533ce6645b9d6
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Stable <stable@vger.kernel.org> # 3.13+
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit ebe06187bf2a)
Reviewed-on: https://chromium-review.googlesource.com/498852
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/050f1cadd5572aa0ab45f18c4ee4081f545a86be/fs/eventpoll.c

Project Member

Comment 6 by bugdroid1@chromium.org, May 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7

commit 3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7
Author: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Date: Wed May 10 03:36:38 2017

UPSTREAM: eventpoll: fix uninitialized variable in epoll_ctl

When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup.  This
produces unexpected messages in the audit log, such as (on a system
running SELinux):

    type=AVC msg=audit(1408212798.866:410): avc:  denied
    { block_suspend } for  pid=7754 comm="dbus-daemon" capability=36
    scontext=unconfined_u:unconfined_r:unconfined_t
    tcontext=unconfined_u:unconfined_r:unconfined_t
    tclass=capability2 permissive=1

    type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
    success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
    pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
    fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
    exe="/usr/bin/dbus-daemon"
    subj=unconfined_u:unconfined_r:unconfined_t key=(null)

("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")

Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.

BUG= chromium:719087 
TEST=Run cheets_StartAndroid.stress

Change-Id: I35c200d749dad57aeeed34965b622dcef514b0ea
Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arve Hjnnevg <arve@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit c680e41b3a2e)
Reviewed-on: https://chromium-review.googlesource.com/498853
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>

[modify] https://crrev.com/3e7034cf4cb02187504fe3f5c2d1cebfe62cd8f7/fs/eventpoll.c

Comment 7 by groeck@chromium.org, May 10 2017

Status: Fixed (was: Started)

Comment 8 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment