New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 872827 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

telemetry_LoginTest fails on tricky-tot-chrome-pfq-informational

Project Member Reported by afakhry@chromium.org, Aug 9

Issue description

- tricky-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8938674262259406480

telemetry_LoginTest                     FAIL: Failed to get LoginStatus

08/09 07:38:12.176 DEBUG|             utils:0218| Running 'mkdir -p /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var'
08/09 07:38:12.177 DEBUG|      global_hooks:0056| 'mkdir -p /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var'
08/09 07:38:12.188 DEBUG|             utils:0218| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/log/**autoserv* /var/log /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var'
08/09 07:38:12.189 DEBUG|      global_hooks:0056| 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/log/**autoserv* /var/log /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var'
08/09 07:38:12.201 INFO |              test:0928| after_hook completed
08/09 07:38:12.201 ERROR|          parallel:0026| child process failed
08/09 07:38:12.204 DEBUG|          parallel:0030| Traceback (most recent call last):
08/09 07:38:12.204 DEBUG|          parallel:0030|   File "/usr/local/autotest/bin/parallel.py", line 18, in fork_start
08/09 07:38:12.204 DEBUG|          parallel:0030|     l()
08/09 07:38:12.204 DEBUG|          parallel:0030|   File "/usr/local/autotest/bin/job.py", line 488, in <lambda>
08/09 07:38:12.205 DEBUG|          parallel:0030|     l = lambda : test.runtest(self, url, tag, args, dargs)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/bin/test.py", line 117, in runtest
08/09 07:38:12.205 DEBUG|          parallel:0030|     job.sysinfo.log_after_each_iteration)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 919, in runtest
08/09 07:38:12.205 DEBUG|          parallel:0030|     mytest._exec(args, dargs)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 631, in _exec
08/09 07:38:12.205 DEBUG|          parallel:0030|     _call_test_function(self.execute, *p_args, **p_dargs)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function
08/09 07:38:12.205 DEBUG|          parallel:0030|     return func(*args, **dargs)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 495, in execute
08/09 07:38:12.205 DEBUG|          parallel:0030|     dargs)
08/09 07:38:12.205 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry
08/09 07:38:12.205 DEBUG|          parallel:0030|     postprocess_profiled_run, args, dargs)
08/09 07:38:12.206 DEBUG|          parallel:0030|   File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once
08/09 07:38:12.206 DEBUG|          parallel:0030|     self.run_once(*args, **dargs)
08/09 07:38:12.206 DEBUG|          parallel:0030|   File "/usr/local/autotest/tests/telemetry_LoginTest/telemetry_LoginTest.py", line 41, in run_once
08/09 07:38:12.206 DEBUG|          parallel:0030|     (is_regular_user, email) = self._get_login_status(cr)
08/09 07:38:12.206 DEBUG|          parallel:0030|   File "/usr/local/autotest/tests/telemetry_LoginTest/telemetry_LoginTest.py", line 18, in _get_login_status
08/09 07:38:12.206 DEBUG|          parallel:0030|     raise error.TestFail('Failed to get LoginStatus')
08/09 07:38:12.206 DEBUG|          parallel:0030| TestFail: Failed to get LoginStatus

There are many "dash" crashes. I still don't know what that "dash" is and whether its crash has a bad effect of this test. It also fails to symbolize the crash dumps.



 
Nothing in the link or log mentions 'dash'?
Cc: vapier@chromium.org lhchavez@chromium.org jorgelo@chromium.org
Take a look at the logs: https://stainless.corp.google.com/browse/chromeos-autotest-results/225393226-chromeos-test/

You'll see a bunch of 

dash.*.dmp.txt  files.

Also if you look at `messages`, you'll see several:

2018-08-09T14:38:11.651905+00:00 INFO crash_reporter[17835]: libminijail[17835]: mount /dev/log -> /dev/log type ''
2018-08-09T14:38:11.675491+00:00 WARNING crash_reporter[17835]: [user] Received crash notification for dash[17834] sig 31, user 0 (developer build - not testing - always dumping)
2018-08-09T14:38:11.677760+00:00 INFO crash_reporter[17835]: State of crashed process [17834]: S (sleeping)
2018-08-09T14:38:11.678411+00:00 INFO crash_reporter[17835]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5'
2018-08-09T14:38:11.681302+00:00 INFO metrics_daemon[2199]: [INFO:metrics_daemon.cc(427)] Got org.chromium.CrashReporter.UserCrash D-Bus signal
2018-08-09T14:38:11.686982+00:00 INFO crash_reporter[17835]: Stored minidump to /var/spool/crash/dash.20180809.073811.17834.dmp
2018-08-09T14:38:11.687609+00:00 INFO crash_reporter[17835]: Leaving core file at /proc/self/fd/5/dash.20180809.073811.17834.core due to developer image
2018-08-09T14:38:11.732249+00:00 INFO session_manager[17849]: [INFO:session_manager_main.cc(212)] Will wait 3s for graceful browser exit.
Cc: geohsu@chromium.org
sig 31 is SIGSYS and that comes up when a seccomp filter is violated.  it's a bit weird that the shell is being run through seccomp though ... so guess you'll want to find out what's running it that way.
Cc: samanthamiller@chromium.org rrangel@chromium.org
One log: https://storage.cloud.google.com/chromeos-autotest-results/225393226-chromeos-test/chromeos4-row2-rack3-host12/debug/dash.20180809.073811.17834.dmp.txt

Operating system: Linux
                  0.0.0 Linux 3.8.11 #1 SMP Thu Aug 9 05:17:31 PDT 2018 x86_64
CPU: amd64
     family 6 model 69 stepping 1
     1 CPU

GPU: UNKNOWN

Crash reason:  SIGSYS
Crash address: 0x0
Process uptime: not available

Thread 0 (crashed)
 0  libc-2.23.so!pipe + 0x7
    rax = 0x0000000000000016   rdx = 0x00007f620360e5b0
    rcx = 0xffffffffffffffff   rbx = 0x00007ffc2c702690
    rsi = 0x00007ffc2c702690   rdi = 0x00007ffc2c702608
    rbp = 0x00007ffc2c702630   rsp = 0x00007ffc2c7025f8
     r8 = 0x00007f620457789f    r9 = 0x0000000000000000
    r10 = 0x0000000000000020   r11 = 0x0000000000000202
    r12 = 0x0000000000000001   r13 = 0x0000000000000101
    r14 = 0x00007f6204576ac8   r15 = 0x0000000000000001
    rip = 0x00007f620310f5f7

So it's crashing due to pipe(2) being blocked.

I kinda remember somebody telling me that mosys was being libminijail'd, and mosys logs appear to be interspersed between the failures. Adding mosys folks.
I don't know much about dash, and I would have expected mosys dumps if it were failing, but I'm happy to allow pipe on amd64 machines.
is mosys turning on a seccomp filter and then running programs like dash under that filter ?  that sounds like a recipe for destruction ...
Both mosys and dash were crashing in another test failure that can be seen in this bug 872113.
mosys is turning on a seccomp filter. I have no idea where dash is coming from. I don't see it mentioned anywhere in the mosys source code. I'm not convinced that this is mosys-related, but I'm willing to give it a try and see if it fixes the issue.
/bin/sh is a symlink to /bin/bash, so any time you execve(2) something that has #!/bin/sh, you are implicitly calling dash.
I see. Mosys does sometimes execve.

Did this problem just show up? The mosys change went in on July 31, in version 10924.0.0. Looking at the test history for the failing test, there don't seem to be any failures starting then.
if adding pipe to the seccomp filter unblocks this, that's fine. that's a generally safe syscall. and gives us more time to figure out a better way forward.
Cc: -samanthamiller@chromium.org
Owner: samanthamiller@chromium.org
Status: Assigned (was: Untriaged)
Assigning to you since you seem to be the right owner.
Can someone try the patch at crrev.com/c/1170859? If this fixes things, great. If not, the bug should probably be assigned to someone else.
I'm not sure how to test and verify this, since these crashes don't seem to be directly related to what a particular test is doing. So for example tricky-tot-chrome-pfq-informational cycled green already, but graphics_Sanity failed on peppy-chrome-pfq while observing the same crash (See issue 873203 for the full stack trace). 

If this change is reasonable I recommend we just land it.
Project Member

Comment 16 by bugdroid1@chromium.org, Aug 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/mosys/+/df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0

commit df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0
Author: Samantha Miller <samanthamiller@google.com>
Date: Sat Aug 11 05:43:54 2018

mosys: Add pipe to amd64 seccomp policy

Possibly, mosys is blocking something from using the pipe syscall.
This may fix the issue seen in the bug.

BUG= chromium:872827 
TEST=cherrypick and test bug

Change-Id: I3e970aa34265cda00b8611bddfbb459be21a69c5
Reviewed-on: https://chromium-review.googlesource.com/1170859
Commit-Ready: Samantha Miller <samanthamiller@google.com>
Tested-by: Samantha Miller <samanthamiller@google.com>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0/seccomp/mosys-seccomp-amd64.policy

Status: Fixed (was: Assigned)
Looks like the test has been ok. Closing the bug.

Sign in to add a comment