telemetry_LoginTest fails on tricky-tot-chrome-pfq-informational |
||||||
Issue description- tricky-tot-chrome-pfq-informational: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8938674262259406480 telemetry_LoginTest FAIL: Failed to get LoginStatus 08/09 07:38:12.176 DEBUG| utils:0218| Running 'mkdir -p /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var' 08/09 07:38:12.177 DEBUG| global_hooks:0056| 'mkdir -p /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var' 08/09 07:38:12.188 DEBUG| utils:0218| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/log/**autoserv* /var/log /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var' 08/09 07:38:12.189 DEBUG| global_hooks:0056| 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/log/**autoserv* /var/log /usr/local/autotest/results/default/telemetry_LoginTest/sysinfo/var' 08/09 07:38:12.201 INFO | test:0928| after_hook completed 08/09 07:38:12.201 ERROR| parallel:0026| child process failed 08/09 07:38:12.204 DEBUG| parallel:0030| Traceback (most recent call last): 08/09 07:38:12.204 DEBUG| parallel:0030| File "/usr/local/autotest/bin/parallel.py", line 18, in fork_start 08/09 07:38:12.204 DEBUG| parallel:0030| l() 08/09 07:38:12.204 DEBUG| parallel:0030| File "/usr/local/autotest/bin/job.py", line 488, in <lambda> 08/09 07:38:12.205 DEBUG| parallel:0030| l = lambda : test.runtest(self, url, tag, args, dargs) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/bin/test.py", line 117, in runtest 08/09 07:38:12.205 DEBUG| parallel:0030| job.sysinfo.log_after_each_iteration) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 919, in runtest 08/09 07:38:12.205 DEBUG| parallel:0030| mytest._exec(args, dargs) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 631, in _exec 08/09 07:38:12.205 DEBUG| parallel:0030| _call_test_function(self.execute, *p_args, **p_dargs) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function 08/09 07:38:12.205 DEBUG| parallel:0030| return func(*args, **dargs) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 495, in execute 08/09 07:38:12.205 DEBUG| parallel:0030| dargs) 08/09 07:38:12.205 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry 08/09 07:38:12.205 DEBUG| parallel:0030| postprocess_profiled_run, args, dargs) 08/09 07:38:12.206 DEBUG| parallel:0030| File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once 08/09 07:38:12.206 DEBUG| parallel:0030| self.run_once(*args, **dargs) 08/09 07:38:12.206 DEBUG| parallel:0030| File "/usr/local/autotest/tests/telemetry_LoginTest/telemetry_LoginTest.py", line 41, in run_once 08/09 07:38:12.206 DEBUG| parallel:0030| (is_regular_user, email) = self._get_login_status(cr) 08/09 07:38:12.206 DEBUG| parallel:0030| File "/usr/local/autotest/tests/telemetry_LoginTest/telemetry_LoginTest.py", line 18, in _get_login_status 08/09 07:38:12.206 DEBUG| parallel:0030| raise error.TestFail('Failed to get LoginStatus') 08/09 07:38:12.206 DEBUG| parallel:0030| TestFail: Failed to get LoginStatus There are many "dash" crashes. I still don't know what that "dash" is and whether its crash has a bad effect of this test. It also fails to symbolize the crash dumps.
,
Aug 9
Take a look at the logs: https://stainless.corp.google.com/browse/chromeos-autotest-results/225393226-chromeos-test/ You'll see a bunch of dash.*.dmp.txt files. Also if you look at `messages`, you'll see several: 2018-08-09T14:38:11.651905+00:00 INFO crash_reporter[17835]: libminijail[17835]: mount /dev/log -> /dev/log type '' 2018-08-09T14:38:11.675491+00:00 WARNING crash_reporter[17835]: [user] Received crash notification for dash[17834] sig 31, user 0 (developer build - not testing - always dumping) 2018-08-09T14:38:11.677760+00:00 INFO crash_reporter[17835]: State of crashed process [17834]: S (sleeping) 2018-08-09T14:38:11.678411+00:00 INFO crash_reporter[17835]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5' 2018-08-09T14:38:11.681302+00:00 INFO metrics_daemon[2199]: [INFO:metrics_daemon.cc(427)] Got org.chromium.CrashReporter.UserCrash D-Bus signal 2018-08-09T14:38:11.686982+00:00 INFO crash_reporter[17835]: Stored minidump to /var/spool/crash/dash.20180809.073811.17834.dmp 2018-08-09T14:38:11.687609+00:00 INFO crash_reporter[17835]: Leaving core file at /proc/self/fd/5/dash.20180809.073811.17834.core due to developer image 2018-08-09T14:38:11.732249+00:00 INFO session_manager[17849]: [INFO:session_manager_main.cc(212)] Will wait 3s for graceful browser exit.
,
Aug 9
,
Aug 9
sig 31 is SIGSYS and that comes up when a seccomp filter is violated. it's a bit weird that the shell is being run through seccomp though ... so guess you'll want to find out what's running it that way.
,
Aug 9
One log: https://storage.cloud.google.com/chromeos-autotest-results/225393226-chromeos-test/chromeos4-row2-rack3-host12/debug/dash.20180809.073811.17834.dmp.txt Operating system: Linux 0.0.0 Linux 3.8.11 #1 SMP Thu Aug 9 05:17:31 PDT 2018 x86_64 CPU: amd64 family 6 model 69 stepping 1 1 CPU GPU: UNKNOWN Crash reason: SIGSYS Crash address: 0x0 Process uptime: not available Thread 0 (crashed) 0 libc-2.23.so!pipe + 0x7 rax = 0x0000000000000016 rdx = 0x00007f620360e5b0 rcx = 0xffffffffffffffff rbx = 0x00007ffc2c702690 rsi = 0x00007ffc2c702690 rdi = 0x00007ffc2c702608 rbp = 0x00007ffc2c702630 rsp = 0x00007ffc2c7025f8 r8 = 0x00007f620457789f r9 = 0x0000000000000000 r10 = 0x0000000000000020 r11 = 0x0000000000000202 r12 = 0x0000000000000001 r13 = 0x0000000000000101 r14 = 0x00007f6204576ac8 r15 = 0x0000000000000001 rip = 0x00007f620310f5f7 So it's crashing due to pipe(2) being blocked. I kinda remember somebody telling me that mosys was being libminijail'd, and mosys logs appear to be interspersed between the failures. Adding mosys folks.
,
Aug 9
I don't know much about dash, and I would have expected mosys dumps if it were failing, but I'm happy to allow pipe on amd64 machines.
,
Aug 9
is mosys turning on a seccomp filter and then running programs like dash under that filter ? that sounds like a recipe for destruction ...
,
Aug 9
Both mosys and dash were crashing in another test failure that can be seen in this bug 872113.
,
Aug 9
mosys is turning on a seccomp filter. I have no idea where dash is coming from. I don't see it mentioned anywhere in the mosys source code. I'm not convinced that this is mosys-related, but I'm willing to give it a try and see if it fixes the issue.
,
Aug 9
/bin/sh is a symlink to /bin/bash, so any time you execve(2) something that has #!/bin/sh, you are implicitly calling dash.
,
Aug 9
I see. Mosys does sometimes execve. Did this problem just show up? The mosys change went in on July 31, in version 10924.0.0. Looking at the test history for the failing test, there don't seem to be any failures starting then.
,
Aug 9
if adding pipe to the seccomp filter unblocks this, that's fine. that's a generally safe syscall. and gives us more time to figure out a better way forward.
,
Aug 10
Assigning to you since you seem to be the right owner.
,
Aug 10
Can someone try the patch at crrev.com/c/1170859? If this fixes things, great. If not, the bug should probably be assigned to someone else.
,
Aug 10
I'm not sure how to test and verify this, since these crashes don't seem to be directly related to what a particular test is doing. So for example tricky-tot-chrome-pfq-informational cycled green already, but graphics_Sanity failed on peppy-chrome-pfq while observing the same crash (See issue 873203 for the full stack trace). If this change is reasonable I recommend we just land it.
,
Aug 11
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/mosys/+/df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0 commit df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0 Author: Samantha Miller <samanthamiller@google.com> Date: Sat Aug 11 05:43:54 2018 mosys: Add pipe to amd64 seccomp policy Possibly, mosys is blocking something from using the pipe syscall. This may fix the issue seen in the bug. BUG= chromium:872827 TEST=cherrypick and test bug Change-Id: I3e970aa34265cda00b8611bddfbb459be21a69c5 Reviewed-on: https://chromium-review.googlesource.com/1170859 Commit-Ready: Samantha Miller <samanthamiller@google.com> Tested-by: Samantha Miller <samanthamiller@google.com> Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org> [modify] https://crrev.com/df6d2c65ade68ef4dbf6107a20ccf7e95b3a79a0/seccomp/mosys-seccomp-amd64.policy
,
Aug 13
Looks like the test has been ok. Closing the bug. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by jkop@chromium.org
, Aug 9