New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 861994 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit 15 days ago
Closed: Jul 12
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

minijail_enter crash in various chromebox daemons (mimo-monitor, cecservice, atrusd, huddly-monitor)

Project Member Reported by lhchavez@chromium.org, Jul 10

Issue description

https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+product.Version+LIKE+%27108%25.0.0%27+AND+EXISTS+%28SELECT+1+FROM+UNNEST%28CrashedStackTrace.StackFrame%29+WHERE+FunctionName%3D%27minijail_enter%27%29+AND+NOT+EXISTS+%28SELECT+1+FROM+UNNEST%28productdata%29+WHERE+Key%3D%27exec_name%27+AND+Value+IN+%28%27apk-cache-cleaner%27%2C+%27memd%27%29%29#-property-selector,-productname:1000,stablesignature:50,hardwareclass:50,magicsignaturesorted:50

Sample crash:

https://crash.corp.google.com/e497235e7fa5d3f9
0x00007fe18e155dd2	(libc-2.23.so -raise.c:54 )	raise
0x00007fe18e157bf5	(libc-2.23.so -abort.c:89 )	abort
0x00007fe18e7f2501	(libminijailpreload.so -libminijail.c )	minijail_enter
0x00007fe18e7efb42	(libminijailpreload.so -libminijailpreload.c:81 )	fake_main
0x00007fe18e142735	(libc-2.23.so -libc-start.c:289 )	__libc_start_main
0x00007fe18e80d188	(mimo-monitor + 0x00002188 )	_start
0x00007ffcdbf32037		
0x00007fe18e80d15f	(mimo-monitor + 0x0000215f )	_init

This is only happening on zako, panther (which are using v3.8 kernels). By inspecting some of the service_failure logs[1], I was able to determine that the crash is caused due to the fact that libminijail is trying to enter a new cgroup namespace, which has not been backported to v3.8 kernels (only up to 3.14 AFAIK):

2018-07-08T10:14:51.689106-05:00 ERR cecservice[1311]: libminijail[2]: unshare(CLONE_NEWCGROUP) failed: Invalid argument

The solution is to conditionally add the -N flag to minijail, only if the kernel version is >= 3.14.

1: https://crash.corp.google.com/browse?q=product_name%3D%27ChromeOS%27+AND+EXISTS+%28SELECT+1+FROM+UNNEST%28productdata%29+WHERE+Key%3D%27exec_name%27+AND+Value%3D%27service-failure%27%29+AND+product.Version+LIKE+%27108%25.0.0%27+AND+stable_signature+LIKE+%27%25cecservice%25%27&stbtiq=&reportid=&index=0#2
2: 
 
Cc: vapier@chromium.org sque@chromium.org benchan@chromium.org
Owner: egemih@chromium.org
Assigning to egemih@ as the owner of mimo-monitor and huddly-monitor. This is the top crasher of M69.

One small inconvenience is that the easiest way of conditionally deciding whether or not to add the -N flag needs to be done in the exec/script stanza (since Upstart does not allow other stanzas to modify env variables: http://upstart.ubuntu.com/cookbook/#environment-variables ). that is at odds with the "expect fork" stanza in all the .conf scripts.

one alternative is to split the .conf in two such that one invokes the other with the correct set of flags, but it seems kind of clunky.

another alternative is to make minijail0 not abort() when asking to enter an invalid cgroup namespace, but no other namespace has this property of being able to soft-fail and I'd rather not open that can of worms.
Oh almost forgot, another alternative (probably the cleanest) is to make all these services raise the SIGSTOP signal and change the stanza from "expect fork" to "expect stop": http://upstart.ubuntu.com/cookbook/#expect-stop
for now, i would just drop the -N flag from any daemon that expects to be run on 3.8/3.10 devices.  it's what we've been doing elsewhere.

we opted to not change minijail to silently ignore -N on older systems.

does SIGSTOP allow for any forking of children ?  if not, that won't be as useful :(.
Cc: egemih@chromium.org
Owner: lhchavez@chromium.org
Status: Started (was: Untriaged)
Ugh, I misinterpreted the documentation to mean that 'expect stop' would both tell upstart what the PID to be tracked is AND the moment in which it has finished initializing, but that's not the case[1]. Dropping the -N flag seems straightforward, so https://chromium-review.googlesource.com/q/topic:%22fizz_cgroup_namespaces%22+(status:open%20OR%20status:merged) should fix all the crashes.

Mostly for my own reference, tryjob posted here: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8941375536239901072

1: https://bugs.launchpad.net/upstart-cookbook/+bug/1394744
Project Member

Comment 5 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/cfm-device-monitor/+/8e6c28caa1ab93e3efea4fe81ae03c29cd37b870

commit 8e6c28caa1ab93e3efea4fe81ae03c29cd37b870
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Wed Jul 11 01:47:29 2018

Stop requesting entering a cgroup namespace

This change avoids passing the -N flag to minijail0 since both
huddly-monitor and mimo-monitor run on kernels that don't support cgroup
namespaces. This avoids a crash on such older devices.

BUG= chromium:861994 
TEST=fizz tryjob

Change-Id: If99ccdf2cd14b4effa8a4cf06ea3db387ac11468
Reviewed-on: https://chromium-review.googlesource.com/1131535
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/8e6c28caa1ab93e3efea4fe81ae03c29cd37b870/init/huddly-monitor.conf
[modify] https://crrev.com/8e6c28caa1ab93e3efea4fe81ae03c29cd37b870/init/mimo-monitor.conf

Project Member

Comment 6 by bugdroid1@chromium.org, Jul 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/8c0535ca8583848f2be8a3cef554f9eba728ea1c

commit 8c0535ca8583848f2be8a3cef554f9eba728ea1c
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Wed Jul 11 19:13:10 2018

cecservice: Stop requesting entering a cgroup namespace

This change avoids passing the -N flag to minijail0 since cecservice run
on kernels that don't support cgroup namespaces. This avoids a crash on
such older devices.

BUG= chromium:861994 
TEST=fizz tryjob

Change-Id: I669dc3b0186e8b8c3e63ea22745d089f1e98f84b
Reviewed-on: https://chromium-review.googlesource.com/1131540
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/8c0535ca8583848f2be8a3cef554f9eba728ea1c/cecservice/share/cecservice.conf

Project Member

Comment 7 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/atrusctl/+/f2f9d8df9f307aea2f0c269c81ab7f104b8a4a20

commit f2f9d8df9f307aea2f0c269c81ab7f104b8a4a20
Author: Luis Hector Chavez <lhchavez@google.com>
Date: Thu Jul 12 18:35:40 2018

Clean up the minijail0 invocation

This change uses /var/empty as the chroot to match the way we invoke the
rest of the services. It also stops creating/deleting the chroot
directory.

BUG=b:65450844
BUG= chromium:849455 
BUG= chromium:861994 
TEST=fizz tryjob

Change-Id: I6a76cc92d93bdb8f7edf2990cb0cf219ac20f4ff
Reviewed-on: https://chromium-review.googlesource.com/1087681
Commit-Ready: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Emil Lundmark <lndmrk@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/f2f9d8df9f307aea2f0c269c81ab7f104b8a4a20/init/atrusd.conf

Status: Fixed (was: Started)

Sign in to add a comment