memd is crashing with SIGSYS for tgkill |
|||
Issue descriptionChrome OS: caroline R70-10974.0.0 (x86_64) https://storage.cloud.google.com/chromeos-autotest-results/228219784-ssola/chromeos6-row2-rack21-host15/sysinfo/var/spool/crash/memd.20180817.124053.3419.dmp.txt Operating system: Linux 0.0.0 Linux 3.18.0-18188-gb719ef63af35 #1 SMP PREEMPT Thu Aug 16 04:24:24 PDT 2018 x86_64 CPU: amd64 family 6 model 78 stepping 3 1 CPU GPU: UNKNOWN Crash reason: SIGSYS Crash address: 0x0 Process uptime: not available Thread 0 (crashed) 0 libc-2.23.so!raise [raise.c : 54 + 0x10] rax = 0x00000000000000ea rdx = 0x0000000000000006 rcx = 0x00005a9a8871a260 rbx = 0xffffffffbcf96af4 rsi = 0x0000000000000002 rdi = 0x0000000000000002 rbp = 0x00007fff71c0dfb0 rsp = 0x00007fff71c0e228 r8 = 0x0000000000000000 r9 = 0x00005a9a8871c1e8 r10 = 0x0000000000000008 r11 = 0x0000000000000202 r12 = 0x00005a9a8a29bf20 r13 = 0x0000000000000005 r14 = 0x00005a9a8a29b100 r15 = 0x00005a9a8a29b100 rip = 0x000070b01a3ebdd2 Found by: given as instruction pointer in context 1 memd + 0x12618 rbx = 0xffffffffbcf96af4 rbp = 0x00007fff71c0dfb0 rsp = 0x00007fff71c0e230 r12 = 0x00005a9a8a29bf20 r13 = 0x0000000000000005 r14 = 0x00005a9a8a29b100 r15 = 0x00005a9a8a29b100 rip = 0x00005a9a886ca618 Found by: call frame info Loaded modules: 0x5a9a886b8000 - 0x5a9a88717fff memd ??? (main) (WARNING: No symbols, memd, 8FC81524525B375520A3D850553BF6B50) 0x70b01a3b8000 - 0x70b01a558fff libc-2.23.so ??? 0x70b01a763000 - 0x70b01a778fff libgcc_s.so.1 ??? 0x70b01a97a000 - 0x70b01a990fff libpthread-2.23.so ??? 0x70b01ab97000 - 0x70b01ab9cfff librt-2.23.so ??? 0x70b01ad9f000 - 0x70b01ada0fff libdl-2.23.so ??? 0x70b01afa3000 - 0x70b01afc7fff ld-2.23.so ??? 0x70b01b148000 - 0x70b01b14bfff libattr.so.1.1.0 ??? 0x70b01b14e000 - 0x70b01b151fff libcap.so.2.24 ??? 0x70b01b156000 - 0x70b01b199fff libdbus-1.so.3.14.8 ??? 0x70b01b1ab000 - 0x70b01b1bdfff libminijailpreload.so ??? 0x7fff71d2d000 - 0x7fff71d2efff linux-gate.so ??? The x86_64 calling convention puts the syscall number in rax, which corresponds to 234 (__NR_tgkill), which checks out with this being invoked from raise(2).
,
Aug 21
I'm seeing memd crashes on grunt both locally (R70-10987.0.0). I also see them during HWTest (R70-10988.0.0): https://stainless.corp.google.com/browse/chromeos-autotest-results/229803136-chromeos-test/ messages shows there was an selinux violation: 2018-08-21T21:50:11.603637+00:00 WARNING memd[2392]: memd started 2018-08-21T21:50:11.984620+00:00 WARNING kernel: [ 32.775848] kauditd_printk_skb: 274 callbacks suppressed 2018-08-21T21:50:11.984640+00:00 NOTICE kernel: [ 32.775850] audit: type=1326 audit(1534888211.983:335): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=u:r:minijailed:s0 pid=2392 comm="memd" exe="/usr/bin/memd" sig=31 arch=c000003e syscall=234 compat=0 ip=0x7906d9063dd2 code=0x0 2018-08-21T21:50:12.012977+00:00 INFO crash_reporter[2393]: libminijail[2393]: mount /dev/log -> /dev/log type '' 2018-08-21T21:50:12.013410+00:00 INFO crash_reporter[2394]: libminijail[2394]: mount /dev/log -> /dev/log type '' 2018-08-21T21:50:12.055305+00:00 WARNING crash_reporter[2393]: Could not load the device policy file. 2018-08-21T21:50:12.055856+00:00 WARNING crash_reporter[2393]: [user] Received crash notification for memd[2392] sig 31, user 0 group 0 (developer build - not testing - always dumping) selinux_violation.20180821.145012.0.meta: sig=2c503b66-selinux----memd- comm=memd exec_name=selinux-violation ver=10988.0.0 payload=/var/spool/crash/selinux_violation.20180821.145012.0.log payload_size=187 done=1 minidump_stackwalk 229803136-chromeos-test%2Fchromeos2-row3-rack10-host5%2Fcrashinfo.chromeos2-row3-rack10-host5%2Fmemd.20180821.145012.2392.dmp Thread 0 (crashed) 0 libc-2.23.so + 0x33dd2 rax = 0x00000000000000ea rdx = 0x0000000000000006 rcx = 0x00007906d9063dd2 rbx = 0x0000000000000001 rsi = 0x0000000000000002 rdi = 0x0000000000000002 rbp = 0x00007ffeda19c600 rsp = 0x00007ffeda19c4d8 r8 = 0x0000000000000000 r9 = 0x000055d31b1c0148 r10 = 0x0000000000000008 r11 = 0x0000000000000202 r12 = 0x0000000000000001 r13 = 0x000055d31b1b37f0 r14 = 0x0000000000000000 r15 = 0x00007ffeda19c658 rip = 0x00007906d9063dd2 Found by: given as instruction pointer in context
,
Aug 21
This probably started happening since R70-10950.0.0 when memd was re-enabled by: https://chromium-review.googlesource.com/1145740
,
Aug 22
Thank you DJ, will look now.
,
Aug 22
syscall 234 is tgkill (thread group kill). The program is probably exiting with some run-time error. I will look in the test results hoping that it logged something before exiting. (Memd logs in /var/log/messages.)
,
Aug 22
The stainless page (https://stainless.corp.google.com/browse/chromeos-autotest-results/229803136-chromeos-test/) has links to the logs, but I get Error 403 for all the links I've tried. Anybody knows what the deal is?
,
Aug 22
The pantheon page seems to work: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/229803136-chromeos-test/chromeos2-row3-rack10-host5/sysinfo?angularJsUrl=%2Fstorage%2Fbrowser%2Fchromeos-autotest-results%2F229803136-chromeos-test&authuser=2 However, sysinfo/var/log/messages from that bucket ends at 21:45 and the crash seems to have happened at 21:50 according to #2. Where did you get the messages file? I will try to get around the authentication of stainless, but aren't those the same files?
,
Aug 22
OK---the correct messages file is from the crashinfo bucket: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/229803136-chromeos-test/chromeos2-row3-rack10-host5/crashinfo.chromeos2-row3-rack10-host5/ but there isn't much info about the crash. Maybe tgkill() would trigger a signal handler (if it could run)? I will send a CL to allow tgkill().
,
Aug 22
(Trivial) CL is here: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1184238 I would be inclined to push this and then see if we get more results when it happens again. DJ did you say this is repeatable on grunt? Only on grunt?
,
Aug 22
Next I will try 1. repeating on grunt; 2. getting a stack trace from the minidump. Probably tomorrow though. We should still push the CL in #9.
,
Aug 23
Found the problem(s). CL is at https://crrev.com/c/1187190. Anybody wants to review trivial rust code changes? My reviewer for this (sonnyrao) is on vacation.
,
Aug 23
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/f6fbccb5b5541083fab5fd30e96629d47eab2fd4 commit f6fbccb5b5541083fab5fd30e96629d47eab2fd4 Author: Luigi Semenzato <semenzato@chromium.org> Date: Thu Aug 23 20:21:00 2018 metrics: memd: seccomp: allow tgkill() We see a denial to call tgkill() in a test log. This is probably called from an assertion failure. Allowing this call probably allows better debugging information to be collected. BUG= chromium:875467 TEST=none Change-Id: I43e40161fa4c60ee8c2c214e50a59359604d4562 Reviewed-on: https://chromium-review.googlesource.com/1184238 Commit-Ready: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luis Hector Chavez <lhchavez@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org> Reviewed-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-arm64.policy [modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-amd64.policy [modify] https://crrev.com/f6fbccb5b5541083fab5fd30e96629d47eab2fd4/metrics/memd/init/memd-seccomp-arm.policy
,
Aug 31
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/715ad39efa5401e58db403c026e8c16ed7a0be95 commit 715ad39efa5401e58db403c026e8c16ed7a0be95 Author: Luigi Semenzato <semenzato@chromium.org> Date: Fri Aug 31 18:20:15 2018 metrics: memd: initialize tracing and don't collect NR_PAGES_SCANNED This fixes two bugs revealed by changes in behavior on 4.14 kernel. Firstly, the code was not initializing tracing_enabled and tracing_on, which it should have, and it was just luck that it worked on other platforms. Secondly, 4.14 no longer outputs nr_pages_scanned in /proc/vmstat, which is just as well because apparently it was getting reset in a way that made it not useful. BUG= chromium:875467 TEST=ran on device Change-Id: Ie4f4d376822ab2d6e3a2e0139eed800b4fc1b5ea Reviewed-on: https://chromium-review.googlesource.com/1187190 Commit-Ready: Luigi Semenzato <semenzato@chromium.org> Tested-by: Luigi Semenzato <semenzato@chromium.org> Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org> [modify] https://crrev.com/715ad39efa5401e58db403c026e8c16ed7a0be95/metrics/memd/src/main.rs
,
Sep 11
|
|||
►
Sign in to add a comment |
|||
Comment 1 by derat@chromium.org
, Aug 17