Shill memory leak caused kernel panic |
||||||
Issue descriptionIn the following 10 crash reports, shill consumes too much memory and caused kernel panic: Out of memory and no killable processes. https://crash.corp.google.com/browse?q=ReportID%3D%27ed6613607f994561%27 https://crash.corp.google.com/browse?q=ReportID%3D%27d7f2325310501f33%27 https://crash.corp.google.com/browse?q=ReportID%3D%274bdbaf6d947f52b0%27 https://crash.corp.google.com/browse?q=ReportID%3D%2783a9d5acf7959ebb%27 https://crash.corp.google.com/browse?q=ReportID%3D%27564b97e94d0ddc6b%27 https://crash.corp.google.com/browse?q=ReportID%3D%2781b9eb29c6dbda5c%27 https://crash.corp.google.com/browse?q=ReportID%3D%27150698cfb91a609e%27 https://crash.corp.google.com/browse?q=ReportID%3D%275c713be89a9c7dce%27 https://crash.corp.google.com/browse?q=ReportID%3D%27cbc51fd41094d36b%27 https://crash.corp.google.com/browse?q=ReportID%3D%27234979e52f999bd9%27 As shill can potentially leak memory, it should be killable by oom-killer to avoid the kernel panic. And shill should also set a proper memory limit. The following example CL sets proper oom_score_adj and limit for metric_daemon: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/865538 Doc for details of adjusting oom_score_adj: https://docs.google.com/document/d/1NIul6tcKDfiC5J37q8_7hw1MrTz6mCqdvlUOzkwjKuc/edit
,
Jun 29 2018
What's the UX impact of restarting shill? What's the range of reasonable shill memory usage (to set a proper memory limit)?
,
Jun 29 2018
,
Jul 16
Hi, Kirtika, are you the owner of shill daemon? Can you help to answer the questions in #2?
,
Jul 20
move to b/111666466
,
Jul 20
why? we have crbug, lets keep it one place.
,
Jul 21
This issue has been pending for 3 weeks, so I was wondering whether I would get quicker response on buganizer~
,
Jul 21
"As shill can potentially leak memory, it should be killable by oom-killer to avoid the kernel panic." --> Shill controls dhcp client for all network interfaces, so killing shill will mean you don't have network and the chromebook is unusable. --> Impact of restarting shill: You'd lose network for ~10 seconds, in the worst case. Not too bad. I do not know about shill's memory limits or memory consumption. Someone else was leading an effort earlier to find out where shill was consuming or leaking memory. Luigi might remember this.
,
Jul 21
Hi Sameer, Could you help to find an owner for this issue? Thanks!
,
Jul 21
> This issue has been pending for 3 weeks, so I was wondering whether I would get quicker response on buganizer~ no, please don't do that. raise the Priority of the bug and ping people instead. if it's a blocking issue, we have ReleaseBlock-xxx labels.
,
Jul 21
Would someone kindly clarify the "chromebook is unusable" part of comment #8? If it's "unusable" for 10 seconds, we can cope with that if it's a rare event (better than panicking). I would look at the "normal" size of shill, perhaps after walking around a bit in an area with many APs. Then multiply it by 5. See /etc/init/metrics_daemon.conf. Of course the leak should be fixed too.
,
Jul 21
When shill is killed, it will be restarted automatically by upstart. (/etc/init/shill.conf specified upstart respawn)
,
Jul 23
Matthew, can you please look into both: 1. putting limits on shill's memory consumption. 2. root causing where the leak is happening.
,
Jul 24
CL for limiting shill size: https://chromium-review.googlesource.com/c/aosp/platform/system/connectivity/shill/+/1149148 I'm still investigating the root cause.
,
Jul 25
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/connectivity/shill/+/9a68b3b2f016aca6438394b221d082b34e52b03d commit 9a68b3b2f016aca6438394b221d082b34e52b03d Author: Matthew Wang <matthewmwang@chromium.org> Date: Wed Jul 25 07:15:00 2018 shill: limit daemon size and make it oom-killable Did as semenzato@ recommended and walked around for a bit. Shill size tends to be around 30MB-40MB. This change sets the VM size of shill to be 200MB. It also makes it oom-killable by setting the score adjustment to -100. This is motivated by a shill memory leak. BUG= chromium:858959 TEST=deployed to device, checked /proc/<pid>/{limits,oom_score_adj} Change-Id: I7da703bf727ed5fbb5209adf6b22b4e3f81dc63b Reviewed-on: https://chromium-review.googlesource.com/1149148 Commit-Ready: Matthew Wang <matthewmwang@chromium.org> Tested-by: Matthew Wang <matthewmwang@chromium.org> Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/9a68b3b2f016aca6438394b221d082b34e52b03d/init/shill.conf.in
,
Jul 25
Thanks for the fix
,
Aug 2
,
Nov 9
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by vovoy@chromium.org
, Jun 29 2018149 KB
149 KB Download