New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 875467 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 11
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

memd is crashing with SIGSYS for tgkill

Project Member Reported by lhchavez@chromium.org, Aug 17

Issue description

Chrome OS: caroline R70-10974.0.0 (x86_64)

https://storage.cloud.google.com/chromeos-autotest-results/228219784-ssola/chromeos6-row2-rack21-host15/sysinfo/var/spool/crash/memd.20180817.124053.3419.dmp.txt

Operating system: Linux
                  0.0.0 Linux 3.18.0-18188-gb719ef63af35 #1 SMP PREEMPT Thu Aug 16 04:24:24 PDT 2018 x86_64
CPU: amd64
     family 6 model 78 stepping 3
     1 CPU

GPU: UNKNOWN

Crash reason:  SIGSYS
Crash address: 0x0
Process uptime: not available

Thread 0 (crashed)
 0  libc-2.23.so!raise [raise.c : 54 + 0x10]
    rax = 0x00000000000000ea   rdx = 0x0000000000000006
    rcx = 0x00005a9a8871a260   rbx = 0xffffffffbcf96af4
    rsi = 0x0000000000000002   rdi = 0x0000000000000002
    rbp = 0x00007fff71c0dfb0   rsp = 0x00007fff71c0e228
     r8 = 0x0000000000000000    r9 = 0x00005a9a8871c1e8
    r10 = 0x0000000000000008   r11 = 0x0000000000000202
    r12 = 0x00005a9a8a29bf20   r13 = 0x0000000000000005
    r14 = 0x00005a9a8a29b100   r15 = 0x00005a9a8a29b100
    rip = 0x000070b01a3ebdd2
    Found by: given as instruction pointer in context
 1  memd + 0x12618
    rbx = 0xffffffffbcf96af4   rbp = 0x00007fff71c0dfb0
    rsp = 0x00007fff71c0e230   r12 = 0x00005a9a8a29bf20
    r13 = 0x0000000000000005   r14 = 0x00005a9a8a29b100
    r15 = 0x00005a9a8a29b100   rip = 0x00005a9a886ca618
    Found by: call frame info

Loaded modules:
0x5a9a886b8000 - 0x5a9a88717fff  memd  ???  (main)  (WARNING: No symbols, memd, 8FC81524525B375520A3D850553BF6B50)
0x70b01a3b8000 - 0x70b01a558fff  libc-2.23.so  ???
0x70b01a763000 - 0x70b01a778fff  libgcc_s.so.1  ???
0x70b01a97a000 - 0x70b01a990fff  libpthread-2.23.so  ???
0x70b01ab97000 - 0x70b01ab9cfff  librt-2.23.so  ???
0x70b01ad9f000 - 0x70b01ada0fff  libdl-2.23.so  ???
0x70b01afa3000 - 0x70b01afc7fff  ld-2.23.so  ???
0x70b01b148000 - 0x70b01b14bfff  libattr.so.1.1.0  ???
0x70b01b14e000 - 0x70b01b151fff  libcap.so.2.24  ???
0x70b01b156000 - 0x70b01b199fff  libdbus-1.so.3.14.8  ???
0x70b01b1ab000 - 0x70b01b1bdfff  libminijailpreload.so  ???
0x7fff71d2d000 - 0x7fff71d2efff  linux-gate.so  ???

The x86_64 calling convention puts the syscall number in rax, which corresponds to 234 (__NR_tgkill), which checks out with this being invoked from raise(2).
 
Cc: -derat@chromium.org
(I don't know anything about memd.)
Cc: samanthamiller@chromium.org vapier@chromium.org sonnyrao@chromium.org
Labels: -Pri-3 Pri-1
Status: Assigned (was: Untriaged)
I'm seeing memd crashes on grunt both locally (R70-10987.0.0).

I also see them during HWTest (R70-10988.0.0):

https://stainless.corp.google.com/browse/chromeos-autotest-results/229803136-chromeos-test/

messages shows there was an selinux violation:

2018-08-21T21:50:11.603637+00:00 WARNING memd[2392]: memd started
2018-08-21T21:50:11.984620+00:00 WARNING kernel: [   32.775848] kauditd_printk_skb: 274 callbacks suppressed
2018-08-21T21:50:11.984640+00:00 NOTICE kernel: [   32.775850] audit: type=1326 audit(1534888211.983:335): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=u:r:minijailed:s0 pid=2392 comm="memd" exe="/usr/bin/memd" sig=31 arch=c000003e syscall=234 compat=0 ip=0x7906d9063dd2 code=0x0
2018-08-21T21:50:12.012977+00:00 INFO crash_reporter[2393]: libminijail[2393]: mount /dev/log -> /dev/log type ''
2018-08-21T21:50:12.013410+00:00 INFO crash_reporter[2394]: libminijail[2394]: mount /dev/log -> /dev/log type ''
2018-08-21T21:50:12.055305+00:00 WARNING crash_reporter[2393]: Could not load the device policy file.
2018-08-21T21:50:12.055856+00:00 WARNING crash_reporter[2393]: [user] Received crash notification for memd[2392] sig 31, user 0 group 0 (developer build - not testing - always dumping)

selinux_violation.20180821.145012.0.meta:

sig=2c503b66-selinux----memd-
comm=memd
exec_name=selinux-violation
ver=10988.0.0
payload=/var/spool/crash/selinux_violation.20180821.145012.0.log
payload_size=187
done=1

minidump_stackwalk  229803136-chromeos-test%2Fchromeos2-row3-rack10-host5%2Fcrashinfo.chromeos2-row3-rack10-host5%2Fmemd.20180821.145012.2392.dmp

Thread 0 (crashed)
 0  libc-2.23.so + 0x33dd2
    rax = 0x00000000000000ea   rdx = 0x0000000000000006
    rcx = 0x00007906d9063dd2   rbx = 0x0000000000000001
    rsi = 0x0000000000000002   rdi = 0x0000000000000002
    rbp = 0x00007ffeda19c600   rsp = 0x00007ffeda19c4d8
     r8 = 0x0000000000000000    r9 = 0x000055d31b1c0148
    r10 = 0x0000000000000008   r11 = 0x0000000000000202
    r12 = 0x0000000000000001   r13 = 0x000055d31b1b37f0
    r14 = 0x0000000000000000   r15 = 0x00007ffeda19c658
    rip = 0x00007906d9063dd2
    Found by: given as instruction pointer in context


This probably started happening since R70-10950.0.0 when memd was re-enabled by:

https://chromium-review.googlesource.com/1145740
Thank you DJ, will look now.

syscall 234 is tgkill (thread group kill).  The program is probably exiting with some run-time error.  I will look in the test results hoping that it logged something before exiting.  (Memd logs in /var/log/messages.)
The stainless page (https://stainless.corp.google.com/browse/chromeos-autotest-results/229803136-chromeos-test/) has links to the logs, but I get Error 403 for all the links I've tried.  Anybody knows what the deal is?
The pantheon page seems to work:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/229803136-chromeos-test/chromeos2-row3-rack10-host5/sysinfo?angularJsUrl=%2Fstorage%2Fbrowser%2Fchromeos-autotest-results%2F229803136-chromeos-test&authuser=2

However, sysinfo/var/log/messages from that bucket ends at 21:45 and the crash seems to have happened at 21:50 according to #2.

Where did you get the messages file?  I will try to get around the authentication of stainless, but aren't those the same files?


OK---the correct messages file is from the crashinfo bucket:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/229803136-chromeos-test/chromeos2-row3-rack10-host5/crashinfo.chromeos2-row3-rack10-host5/

but there isn't much info about the crash.  Maybe tgkill() would trigger a signal handler (if it could run)?  I will send a CL to allow tgkill().
(Trivial) CL is here:

https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1184238

I would be inclined to push this and then see if we get more results when it happens again.

DJ did you say this is repeatable on grunt?  Only on grunt?
Next I will try 1. repeating on grunt; 2. getting a stack trace from the minidump.  Probably tomorrow though.

We should still push the CL in #9.
Found the problem(s).  CL is at https://crrev.com/c/1187190.  Anybody wants to review trivial rust code changes?  My reviewer for this (sonnyrao) is on vacation.
Project Member

Comment 12 by bugdroid1@chromium.org, Aug 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/f6fbccb5b5541083fab5fd30e96629d47eab2fd4

commit f6fbccb5b5541083fab5fd30e96629d47eab2fd4
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Thu Aug 23 20:21:00 2018

metrics: memd: seccomp: allow tgkill()

We see a denial to call tgkill() in a test log.
This is probably called from an assertion failure.
Allowing this call probably allows better debugging
information to be collected.

BUG= chromium:875467 
TEST=none

Change-Id: I43e40161fa4c60ee8c2c214e50a59359604d4562
Reviewed-on: https://chromium-review.googlesource.com/1184238
Commit-Ready: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luis Hector Chavez <lhchavez@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>
Reviewed-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-arm64.policy
[modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-amd64.policy
[modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-arm.policy

Project Member

Comment 13 by bugdroid1@chromium.org, Aug 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/715ad39efa5401e58db403c026e8c16ed7a0be95

commit 715ad39efa5401e58db403c026e8c16ed7a0be95
Author: Luigi Semenzato <semenzato@chromium.org>
Date: Fri Aug 31 18:20:15 2018

metrics: memd: initialize tracing and don't collect NR_PAGES_SCANNED

This fixes two bugs revealed by changes in behavior on 4.14 kernel.
Firstly, the code was not initializing tracing_enabled and tracing_on,
which it should have, and it was just luck that it worked on other
platforms.  Secondly, 4.14 no longer outputs nr_pages_scanned in
/proc/vmstat, which is just as well because apparently it was getting
reset in a way that made it not useful.

BUG= chromium:875467 
TEST=ran on device

Change-Id: Ie4f4d376822ab2d6e3a2e0139eed800b4fc1b5ea
Reviewed-on: https://chromium-review.googlesource.com/1187190
Commit-Ready: Luigi Semenzato <semenzato@chromium.org>
Tested-by: Luigi Semenzato <semenzato@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/715ad39efa5401e58db403c026e8c16ed7a0be95/metrics/memd/src/main.rs

Status: Fixed (was: Assigned)

Sign in to add a comment