New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 858959 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 9
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 850452



Sign in to add a comment

Shill memory leak caused kernel panic

Project Member Reported by vovoy@chromium.org, Jun 29 2018

Issue description

Comment 1 by vovoy@chromium.org, Jun 29 2018

Part of the kernel log in https://crash.corp.google.com/browse?q=ReportID%3D%27ed6613607f994561%27

<4>[1207620.133823] periodic_schedu invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=-1000
<6>[1207620.133853] periodic_schedu cpuset=/ mems_allowed=0
<4>[1207620.133895] CPU: 1 PID: 1518 Comm: periodic_schedu Not tainted 4.4.111-12522-g3bb47e2cd0a1-dirty #1
<4>[1207620.133912] Hardware name: GOOGLE Sumo, BIOS Google_Sumo.5216.382.5 08/08/2015
<4>[1207620.133929]  0000000000000286 b8edc008e798d594 ffff880070267b10 ffffffffaccfa02f
<4>[1207620.133968]  0000000000000000 ffffffffad617848 ffff880070267b70 ffffffffacb3f57a
<4>[1207620.134004]  ffffffffacb3f511 fffffffffffffc18 ffff880070267b40 ffffffffad1cf56a
<4>[1207620.134040] Call Trace:
<4>[1207620.134075]  [<ffffffffaccfa02f>] dump_stack+0x4d/0x63
<4>[1207620.134104]  [<ffffffffacb3f57a>] dump_header.isra.8+0x5f/0x1cf
<4>[1207620.134128]  [<ffffffffacb3f511>] ? find_lock_task_mm+0x9b/0xa5
<4>[1207620.134155]  [<ffffffffad1cf56a>] ? _raw_spin_unlock+0xe/0x20
<4>[1207620.134180]  [<ffffffffacb40092>] out_of_memory+0x227/0x2db
<4>[1207620.134206]  [<ffffffffacb7b19c>] __alloc_pages_nodemask+0x2a00/0x2a86
<4>[1207620.134236]  [<ffffffffaccb5cb4>] ? avc_has_perm+0xc0/0x135
<4>[1207620.134263]  [<ffffffffacb432d3>] alloc_kmem_pages_node+0x1b/0x1d
<4>[1207620.134286]  [<ffffffffacb1283a>] copy_process+0x18a/0x1c9d
<4>[1207620.134311]  [<ffffffffaccb65d8>] ? selinux_file_alloc_security+0x3a/0x5b
<4>[1207620.134337]  [<ffffffffad1cf56a>] ? _raw_spin_unlock+0xe/0x20
<4>[1207620.134361]  [<ffffffffacc5c020>] ? get_unused_fd_flags+0x2bf/0x2e5
<4>[1207620.134386]  [<ffffffffaca6ab80>] _do_fork+0x87/0x340
<4>[1207620.134411]  [<ffffffffacba28ff>] ? SYSC_pipe2+0x99/0xb9
<4>[1207620.134434]  [<ffffffffaca6aebf>] SyS_clone+0x19/0x1b
<4>[1207620.134458]  [<ffffffffad1cf948>] entry_SYSCALL_64_fastpath+0x1c/0x90
<4>[1207620.134518] Mem-Info:
<4>[1207620.134559] active_anon:290719 inactive_anon:97290 isolated_anon:0
<4>[1207620.134559]  active_file:17130 inactive_file:7240 isolated_file:0
<4>[1207620.134559]  unevictable:0 dirty:5 writeback:0 unstable:0
<4>[1207620.134559]  slab_reclaimable:11847 slab_unreclaimable:10368
<4>[1207620.134559]  mapped:2234 shmem:115 pagetables:4179 bounce:0
<4>[1207620.134559]  free:45814 free_pcp:1247 free_cma:0
<4>[1207620.134624] DMA free:15876kB min:540kB low:672kB high:808kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15996kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
<4>[1207620.134640] lowmem_reserve[]: 0 1869 3846 3846
<4>[1207620.134714] DMA32 free:89832kB min:65488kB low:81860kB high:98232kB active_anon:1126156kB inactive_anon:375424kB active_file:24256kB inactive_file:24032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1996216kB managed:1917496kB mlocked:0kB dirty:4kB writeback:0kB mapped:3820kB shmem:264kB slab_reclaimable:25272kB slab_unreclaimable:15276kB kernel_stack:1312kB pagetables:7056kB unstable:0kB bounce:0kB free_pcp:2532kB local_pcp:624kB free_cma:0kB writeback_tmp:0kB pages_scanned:392 all_unreclaimable? yes
<4>[1207620.134730] lowmem_reserve[]: 0 0 1976 1976
<4>[1207620.134800] Normal free:77548kB min:69132kB low:86412kB high:103696kB active_anon:36720kB inactive_anon:13736kB active_file:44264kB inactive_file:4928kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2097152kB managed:2024244kB mlocked:0kB dirty:16kB writeback:0kB mapped:5116kB shmem:196kB slab_reclaimable:22116kB slab_unreclaimable:26196kB kernel_stack:1920kB pagetables:9660kB unstable:0kB bounce:0kB free_pcp:2456kB local_pcp:464kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
<4>[1207620.134816] lowmem_reserve[]: 0 0 0 0
<4>[1207620.134855] DMA: 1*4kB (U) 0*8kB 0*16kB 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (U) = 15876kB
<4>[1207620.135268] DMA32: 22450*4kB (UME) 4*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 89832kB
<4>[1207620.135372] Normal: 3879*4kB (UME) 7754*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 77548kB
<4>[1207620.135478] 25666 total pagecache pages
<4>[1207620.135494] 1181 pages in swap cache
<4>[1207620.135511] Swap cache stats: add 3622637, delete 3621456, find 7601088/8450885
<4>[1207620.135524] Free swap  = 0kB
<4>[1207620.135536] Total swap = 5797332kB
<4>[1207620.135549] 1027341 pages RAM
<4>[1207620.135561] 0 pages HighMem/MovableOnly
<4>[1207620.135574] 37929 pages reserved
<6>[1207620.135587] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
<6>[1207620.135625] [  166]     0   166     3260      269      12       3      157         -1000 udevd
<6>[1207620.135651] [  431]   202   431    74312      294      18       3      142         -1000 rsyslogd
<6>[1207620.135675] [  442]   201   442     2427      259      10       3       37         -1000 dbus-daemon
<6>[1207620.135699] [  700]   219   700     4500       87      12       3      132         -1000 wpa_supplicant
<6>[1207620.135724] [ 1120]   228  1120    26429      526      21       3      147         -1000 powerd
<6>[1207620.135747] [ 1144]     0  1144     9548      691      22       3      333         -1000 session_manager
<6>[1207620.135771] [ 1172]     0  1172     6582      133      15       3      171         -1000 debugd
<6>[1207620.135794] [ 1174]   207  1174    80901       39      25       4      131         -1000 tcsd
<6>[1207620.135817] [ 1178]   223  1178    46782        0      23       3      502         -1000 chapsd
<6>[1207620.135841] [ 1328]     0  1328  1830630   384642    3580      10  1438499         -1000 shill
<6>[1207620.135864] [ 1332]   202  1332     1625       12       8       3       37         -1000 logger
<6>[1207620.135887] [ 1450]     0  1450    83933       84      31       4      528         -1000 cryptohomed
<6>[1207620.135911] [ 1498]     0  1498     1307      225       7       3      202         -1000 periodic_schedu
<6>[1207620.135934] [ 1501]     0  1501   154058        0      34       4      426         -1000 esif_ufd
<6>[1207620.135957] [ 1518]     0  1518     1307      224       7       3      203         -1000 periodic_schedu
<6>[1207620.136008] [ 1537]   230  1537     7746       86      20       3      181         -1000 permission_brok
<6>[1207620.136032] [ 1542]   226  1542     9221        8      17       3      151         -1000 mtpd
<6>[1207620.136056] [ 1554]   241  1554    61011       34      21       3      295         -1000 ModemManager
<6>[1207620.136079] [ 1564]   238  1564     2902       42      10       3       33         -1000 avahi-daemon
<6>[1207620.136102] [ 1565]     0  1565     2169       54       9       3       46         -1000 minijail0
<6>[1207620.136126] [ 1578]   238  1578     2871        5       9       3       52         -1000 avahi-daemon
<6>[1207620.136149] [ 1589]   213  1589     8081      133      19       3      332         -1000 disks
<6>[1207620.136173] [ 1601]   600  1601    23713       56      14       3      402         -1000 cras
<6>[1207620.136196] [ 1613]     0  1613     2251       12       9       3       36         -1000 upstart-socket-
<6>[1207620.136219] [ 1639]     0  1639     2169       50       8       3       46         -1000 minijail0
<6>[1207620.136243] [ 1649]   232  1649     1752       59       9       3       62         -1000 conntrackd
<6>[1207620.136266] [ 1651]     0  1651    12386     1619      28       3     2644         -1000 update_engine
<6>[1207620.136290] [ 1789]   239  1789    26775      201      21       3      188         -1000 p2p-server
<6>[1207620.136314] [ 1941]     0  1941     1083        0       6       3       23         -1000 sh
<6>[1207620.136337] [ 1956]     0  1956     6600       83      16       3      134         -1000 anomaly_collect
<6>[1207620.136361] [ 1967]     0  1967     1307      214       7       3      211         -1000 periodic_schedu
<6>[1207620.136385] [ 2080]   234  2080     3313       81      10       3       64         -1000 tlsdated
<6>[1207620.136409] [ 2081]     0  2081     1064       37       7       3       47         -1000 logger
<6>[1207620.136432] [ 2083]     0  2083     2169       30       9       3       43         -1000 minijail0
<6>[1207620.136455] [ 2134]     0  2134     1307      224       7       3      196         -1000 periodic_schedu
<6>[1207620.136478] [ 2154]     0  2154     2790        5       9       3       90         -1000 tlsdated-setter
<6>[1207620.136502] [ 2166]   218  2166     7979        0      20       3      229         -1000 bluetoothd
<6>[1207620.136526] [18179]   239 18179   189283      114      33       4      115         -1000 p2p-http-server
<6>[1207620.136552] [18443]     0 18443     2378      261       8       3        0         -1000 sleep
<6>[1207620.136575] [18447]     0 18447     2378      238       8       3        0         -1000 sleep
<6>[1207620.136598] [18454]     0 18454     2378      265       9       3        0         -1000 sleep
<0>[1207620.136616] Kernel panic - not syncing: Out of memory and no killable processes...

upload_file_kcrash-ed6613607f994561.kcrash
149 KB Download

Comment 2 by vovoy@chromium.org, Jun 29 2018

What's the UX impact of restarting shill? What's the range of reasonable shill memory usage (to set a proper memory limit)?

Comment 3 by vovoy@chromium.org, Jun 29 2018

Labels: -Pri-3 Pri-2
Hi, Kirtika, are you the owner of shill daemon? Can you help to answer the questions in #2?
move to b/111666466
why? we have crbug, lets keep it one place.
This issue has been pending for 3 weeks, so I was wondering whether I would get quicker response on buganizer~
"As shill can potentially leak memory, it should be killable by oom-killer to avoid the kernel panic."
--> Shill controls dhcp client for all network interfaces, so killing shill will mean you don't have network and the chromebook is unusable.

--> Impact of restarting shill: You'd lose network for ~10 seconds, in the worst case. Not too bad. 

I do not know about shill's memory limits or memory consumption.
Someone else was leading an effort earlier to find out where shill was consuming or leaking memory. Luigi might remember this. 
Cc: shills@google.com
Owner: snanda@chromium.org
Hi Sameer,

  Could you help to find an owner for this issue? Thanks!

> This issue has been pending for 3 weeks, so I was wondering whether I would get quicker response on buganizer~

no, please don't do that.  raise the Priority of the bug and ping people instead.  if it's a blocking issue, we have ReleaseBlock-xxx labels.
Would someone kindly clarify the "chromebook is unusable" part of comment #8?  If it's "unusable" for 10 seconds, we can cope with that if it's a rare event (better than panicking).

I would look at the "normal" size of shill, perhaps after walking around a bit in an area with many APs.  Then multiply it by 5.  See /etc/init/metrics_daemon.conf.

Of course the leak should be fixed too.


When shill is killed, it will be restarted automatically by upstart. (/etc/init/shill.conf specified upstart respawn)
Cc: snanda@chromium.org
Owner: matthewmwang@chromium.org
Matthew, can you please look into both:
1. putting limits on shill's memory consumption.
2. root causing where the leak is happening.
CL for limiting shill size: https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1149148

I'm still investigating the root cause.
Project Member

Comment 15 by bugdroid1@chromium.org, Jul 25

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/9a68b3b2f016aca6438394b221d082b34e52b03d

commit 9a68b3b2f016aca6438394b221d082b34e52b03d
Author: Matthew Wang <matthewmwang@chromium.org>
Date: Wed Jul 25 07:15:00 2018

shill: limit daemon size and make it oom-killable

Did as semenzato@ recommended and walked around for a bit. Shill size
tends to be around 30MB-40MB. This change sets the VM size of shill
to be 200MB. It also makes it oom-killable by setting the score
adjustment to -100.

This is motivated by a shill memory leak.

BUG= chromium:858959 
TEST=deployed to device, checked /proc/<pid>/{limits,oom_score_adj}

Change-Id: I7da703bf727ed5fbb5209adf6b22b4e3f81dc63b
Reviewed-on: https://chromium-review.googlesource.com/1149148
Commit-Ready: Matthew Wang <matthewmwang@chromium.org>
Tested-by: Matthew Wang <matthewmwang@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/9a68b3b2f016aca6438394b221d082b34e52b03d/init/shill.conf.in

Thanks for the fix
Status: Assigned (was: Available)
Status: Verified (was: Assigned)

Sign in to add a comment