Login tests failing due to trunksd not running: blocked syscall stat64 on arm |
||||||||
Issue descriptionIn the scarlet-paladin run at http://stainless/browse/chromeos-autotest-results/257744577-chromeos-test/, all Tast tests that require login failed due to what looks like a cryptohomed timeout. Here's the first failing test: 2018/11/15 21:29:46 Started test ui.ChromeCrashLoggedIn 2018/11/15 21:29:46 [21:29:45.241] Restarting ui job 2018/11/15 21:29:47 [21:29:46.496] Waiting for org.chromium.SessionManager D-Bus service 2018/11/15 21:29:47 [21:29:46.532] Asking session_manager to enable Chrome testing 2018/11/15 21:29:47 [21:29:46.537] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/11/15 21:29:48 [21:29:47.733] Removing cryptohome for testuser@gmail.com 2018/11/15 21:30:12 [21:30:11.296] Finding OOBE DevTools target 2018/11/15 21:30:12 [21:30:11.305] Connecting to Chrome at ws://127.0.0.1:37284/devtools/page/CE9FD1FD3E3E9B7A27EC8B4AB9580591 2018/11/15 21:30:12 [21:30:11.321] Waiting for OOBE 2018/11/15 21:30:12 [21:30:11.328] Logging in as user "testuser@gmail.com" 2018/11/15 21:30:12 [21:30:11.387] Waiting for cryptohome for user "testuser@gmail.com" 2018/11/15 21:31:46 [21:31:45.243] Failed to get cryptohome status 2018/11/15 21:31:46 [21:31:45.243] External command timed out 2018/11/15 21:31:46 [21:31:45.243] Command: cryptohome --action=status 2018/11/15 21:31:46 [21:31:45.243] Uncaptured output: 2018/11/15 21:31:46 [21:31:45.261] /sys/class/tpm/tpm0/device/owned contains "1" 2018/11/15 21:31:46 [21:31:45.264] Error at chrome_crash_logged_in.go:26: Chrome login failed: not mounted for testuser@gmail.com: context deadline exceeded; last error follows: /home/user/897ff8920e92a4a1359ad3adea101deb7d6a6486 not found ... So in addition to the directory not getting mounted, I think that "cryptohome --action=status" might've also timed out when we ran it later. It looks like there are a lot of problems in the messages file: 2018-11-16T05:29:45.095322+00:00 WARNING attestationd[7973]: Failed to initialize the trunks IPC proxy; trunksd is not ready. 2018-11-16T05:29:45.095352+00:00 ERR attestationd[7973]: Failed to initialize trunks. 2018-11-16T05:29:45.095938+00:00 CRIT attestationd[7973]: Check failed: default_tpm_utility_->Initialize(). #012 2018-11-16T05:29:45.121762+00:00 INFO crash_reporter[8090]: libminijail[8090]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001 2018-11-16T05:29:45.145791+00:00 WARNING crash_reporter[8090]: Could not load the device policy file. 2018-11-16T05:29:45.146679+00:00 WARNING crash_reporter[8090]: [user] Received crash notification for attestationd[7973] sig 6, user 247 group 247 (developer build - not testing - always dumping) 2018-11-16T05:29:45.150468+00:00 INFO crash_reporter[8090]: State of crashed process [7973]: D (disk sleep) 2018-11-16T05:29:45.151366+00:00 INFO crash_reporter[8090]: Accessing crash dir '/var/spool/crash' via symlinked handle '/proc/self/fd/5' 2018-11-16T05:29:45.152728+00:00 WARNING crash_reporter[8090]: Crash directory /var/spool/crash already full with 32 pending reports 2018-11-16T05:29:45.152836+00:00 ERR crash_reporter[8090]: Could not create crash directory 2018-11-16T05:29:45.152867+00:00 ERR crash_reporter[8090]: Unable to find/create process-specific crash path ... 2018-11-16T05:30:11.287556+00:00 ERR cryptohomed[1482]: AddDBusError(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name org.chromium.TpmManager was not provided by any .service files 2018-11-16T05:30:11.290059+00:00 WARNING cryptohomed[1482]: No valid keysets on disk for 897ff8920e92a4a1359ad3adea101deb7d6a6486 2018-11-16T05:30:11.333353+00:00 INFO tpm_managerd[8691]: Starting TPM Manager... 2018-11-16T05:30:11.343744+00:00 ERR cryptohomed[1482]: AddDBusError(...): Domain=dbus, Code=org.freedesktop.DBus.Error.ServiceUnknown, Message=The name org.chromium.TpmManager was not provided by any .service files 2018-11-16T05:30:11.346414+00:00 ERR cryptohomed[1482]: TpmIsReady: is not enabled. 2018-11-16T05:30:11.346493+00:00 ERR cryptohomed[1482]: HasAuthorization: TPM not ready. 2018-11-16T05:30:11.346532+00:00 ERR cryptohomed[1482]: Destroy() called with insufficient authorization. 2018-11-16T05:30:11.356548+00:00 WARNING cryptohomed[1482]: Could not load the device policy file. 2018-11-16T05:30:11.383314+00:00 WARNING cryptohomed[1482]: message repeated 2 times: [ Could not load the device policy file.] 2018-11-16T05:30:11.386020+00:00 INFO cryptohomed[1482]: Finalizing() 2 bytes. 2018-11-16T05:30:11.392612+00:00 INFO cryptohomed[1482]: InstallAttributes have been finalized. ... There are a bunch of attestationd crashes. Here's one: Crash reason: SIGABRT Crash address: 0x0 Process uptime: not available Thread 0 (crashed) 0 libc-2.27.so!__libc_do_syscall + 0x5 r0 = 0x00000000 r1 = 0xf2f35fcc r2 = 0x00000000 r3 = 0x00000008 r4 = 0xf3a33ac0 r5 = 0x00000000 r6 = 0xf2f35fcc r7 = 0x000000af r8 = 0xf2f36224 r9 = 0xf2f36704 r10 = 0xf2f36700 r12 = 0x000000af fp = 0xf3a33ac0 sp = 0xf2f35fc0 lr = 0xf337118f pc = 0xf33622d6 Found by: given as instruction pointer in context 1 libc-2.27.so!raise [nptl-signals.h : 80 + 0xf] r4 = 0xf3a33ac0 r5 = 0x00000000 r6 = 0xf2f35fcc r7 = 0x00000006 r8 = 0xf2f36224 r9 = 0xf2f36704 r10 = 0xf2f36700 fp = 0xf3a33ac0 sp = 0xf2f35fc8 pc = 0xf337118f Found by: call frame info 2 libc-2.27.so!abort [abort.c : 79 + 0x5] r4 = 0xf3442f08 r5 = 0xf2f39450 r6 = 0x00000000 r7 = 0xf2f36208 r8 = 0xf2f36224 r9 = 0xf2f36704 r10 = 0xf2f36700 fp = 0xf3a33ac0 sp = 0xf2f360e8 pc = 0xf3371f13 Found by: call frame info 3 libbase-core-456626.so!base::debug::BreakDebugger() [debugger_posix.cc : 221 + 0x3] r4 = 0xf3442f08 r5 = 0xf2f39450 r6 = 0x00000000 r7 = 0xf2f36208 r8 = 0xf2f36224 r9 = 0xf2f36704 r10 = 0xf2f36700 fp = 0xf3a33ac0 sp = 0xf2f36208 pc = 0xf37713bb Found by: call frame info 4 libbase-core-456626.so!logging::LogMessage::~LogMessage() [logging.cc : 774 + 0x3] r4 = 0xf3442f08 r5 = 0xf2f39450 r6 = 0x00000000 r7 = 0xf2f36650 r8 = 0xf2f36224 r9 = 0xf2f36704 r10 = 0xf2f36700 fp = 0xf3a33ac0 sp = 0xf2f36210 pc = 0xf3788aa9 Found by: call frame info 5 attestationd!attestation::AttestationService::InitializeTask() [attestation_service.cc : 419 + 0x1d] r4 = 0xffcae9d8 r5 = 0xf381ea28 r6 = 0x00000000 r7 = 0xf2f38720 r8 = 0x00000000 r9 = 0xf2f387c8 r10 = 0xf3a33ac0 fp = 0xf3a33ac0 sp = 0xf2f36658 pc = 0x0dbab43b Found by: call frame info ... The bob-paladin run at http://stainless/browse/chromeos-autotest-results/257746405-chromeos-test/ looks like it had similar problems, but it also captured some tpm_managerd crashes: Crash reason: SIGABRT Crash address: 0x0 Process uptime: not available Thread 0 (crashed) 0 libc-2.27.so!__libc_do_syscall + 0x5 r0 = 0x00000000 r1 = 0xee418f04 r2 = 0x00000000 r3 = 0x00000008 r4 = 0xeedc8ac0 r5 = 0x00000000 r6 = 0xee418f04 r7 = 0x000000af r8 = 0xee41915c r9 = 0xee41959c r10 = 0xee419598 r12 = 0x000000af fp = 0xeedc8ac0 sp = 0xee418ef8 lr = 0xee7e918f pc = 0xee7da2d6 Found by: given as instruction pointer in context 1 libc-2.27.so!raise [nptl-signals.h : 80 + 0xf] r4 = 0xeedc8ac0 r5 = 0x00000000 r6 = 0xee418f04 r7 = 0x00000006 r8 = 0xee41915c r9 = 0xee41959c r10 = 0xee419598 fp = 0xeedc8ac0 sp = 0xee418f00 pc = 0xee7e918f Found by: call frame info 2 libc-2.27.so!abort [abort.c : 79 + 0x5] r4 = 0xee8baf08 r5 = 0xee41a450 r6 = 0x00000000 r7 = 0xee419140 r8 = 0xee41915c r9 = 0xee41959c r10 = 0xee419598 fp = 0xeedc8ac0 sp = 0xee419020 pc = 0xee7e9f13 Found by: call frame info 3 libbase-core-456626.so!base::debug::BreakDebugger() [debugger_posix.cc : 221 + 0x3] r4 = 0xee8baf08 r5 = 0xee41a450 r6 = 0x00000000 r7 = 0xee419140 r8 = 0xee41915c r9 = 0xee41959c r10 = 0xee419598 fp = 0xeedc8ac0 sp = 0xee419140 pc = 0xeebb73bb Found by: call frame info 4 libbase-core-456626.so!logging::LogMessage::~LogMessage() [logging.cc : 774 + 0x3] r4 = 0xee8baf08 r5 = 0xee41a450 r6 = 0x00000000 r7 = 0xee419588 r8 = 0xee41915c r9 = 0xee41959c r10 = 0xee419598 fp = 0xeedc8ac0 sp = 0xee419148 pc = 0xeebceaa9 Found by: call frame info 5 libtrunks.so!trunks::TpmStateImpl::IsPlatformHierarchyEnabled() [tpm_state_impl.cc : 97 + 0x19] r4 = 0xedb03b30 r5 = 0xff9a4cec r6 = 0xedb01d20 r7 = 0xee419648 r8 = 0xeedc8ac0 r9 = 0xedb03b30 r10 = 0xeedc8ac0 fp = 0xeedc8ac0 sp = 0xee419590 pc = 0xeed76cb5 Found by: call frame info 6 libtrunks.so!trunks::TpmStateImpl::IsEnabled() [tpm_state_impl.cc : 115 + 0x7] r4 = 0xedb03b30 r5 = 0xff9a4cec r6 = 0xedb01d20 r7 = 0xee419658 r8 = 0xff9a4cd0 r9 = 0xee419668 r10 = 0xeedc8ac0 fp = 0xeedc8ac0 sp = 0xee419650 pc = 0xeed76f23 Found by: call frame info 7 tpm_managerd!tpm_manager::TpmManagerService::InitializeTask() [tpm_manager_service.cc : 116 + 0x7] r4 = 0xff9a4d1c r5 = 0xff9a4cec r6 = 0xedb01d20 r7 = 0xee419720 r8 = 0xff9a4cd0 r9 = 0xee419668 r10 = 0xeedc8ac0 fp = 0xeedc8ac0 sp = 0xee419660 pc = 0x0cd4e303 Found by: call frame info ...
,
Nov 16
Note: bob and scarlet paladins both had two green runs after that failed run. The failed runs are: - scarlet: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929733737596891872 - bob: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929733862480839104 In both cases the bvt-tast-cq issues unsurprisingly started with failed platform.TPMResponsive, and bvt-inline had even more login-related test failures. Looking at one of the previous failed tests on scarlet: http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/257746439-chromeos-test/, /var/log/messages contains the reason for trunksd failing: 2018-11-16T05:24:25.456544+00:00 INFO trunksd[1404]: TPM: /dev/tpm0 opened successfully 2018-11-16T05:24:25.457123+00:00 WARNING trunksd[1404]: libminijail[1404]: allowing syscall: socket 2018-11-16T05:24:25.457221+00:00 WARNING trunksd[1404]: libminijail[1404]: allowing syscall: connect 2018-11-16T05:24:25.457265+00:00 WARNING trunksd[1404]: libminijail[1404]: allowing syscall: gettimeofday 2018-11-16T05:24:25.457306+00:00 WARNING trunksd[1404]: libminijail[1404]: allowing syscall: send 2018-11-16T05:24:25.457350+00:00 WARNING trunksd[1404]: libminijail[1404]: allowing syscall: writev 2018-11-16T05:24:25.457972+00:00 WARNING trunksd[1404]: libminijail[1404]: logging seccomp filter failures 2018-11-16T05:24:25.458440+00:00 ERR trunksd[1404]: libminijail[1404]: blocked syscall: stat64 Both bob & scarlet are aarch64 with TPM2.0 (so, trunks), so something must have changed for that arch in the underlying libs. I wonder why subsequent runs succeeded - maybe fixed already?
,
Nov 16
http://stainless/search?view=list&first_date=2018-11-15&last_date=2018-11-16&hostname=%5Echromeos6-row6-rack8-host10%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false lists all of the tests run on one of the DUTs; I've been able to use this before to find earlier snapshots of logs. But even going back to http://stainless/browse/chromeos-autotest-results/257738490-chromeos-test/ , the logs are truncated. I at least see references to trunksd crash files there, but it doesn't look like any of them got saved. No luck with the other host, either. It's the CQ, so it's possible someone was trying to land a broken change. I scanned through the changes at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929733737596891872 but didn't see anything obvious, though. I mostly just filed this to try to make sure this wasn't a problem that Tast can work around when running tests. WontFix-ing since it doesn't seem like there's anything actionable here. Thanks for the help looking at it!
,
Nov 20
I think that this is happening again, in both Autotest and Tast tests: bob-paladin: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929315887814567024, http://stainless/browse/chromeos-autotest-results/259439666-chromeos-test/ scarlet-paladin: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8929315749720015040, http://stainless/browse/chromeos-autotest-results/259439657-chromeos-test/ There are a bunch of attestationd and tpm_managerd crashes. I've attached some stack traces. The platform.TPMResponsive Tast test, which just runs the "tpm_version" command and checks for "Version Info" in its output, is failing, so the problem is presumably lower-level than Chrome: 2018/11/20 12:51:28 Started test platform.TPMResponsive 2018/11/20 12:51:56 [12:51:56.418] External command failed: exit status 134 2018/11/20 12:51:56 [12:51:56.418] Command: tpm_version 2018/11/20 12:51:56 [12:51:56.419] Uncaptured output: [WARNING:trunks_factory_impl.cc(158)] Failed to initialize the trunks IPC proxy; trunksd is not ready. [FATAL:trunks_client.cc(353)] Check failed: factory.Initialize(). Failed to initialize trunks factory. Aborted (core dumped) 2018/11/20 12:51:56 [12:51:56.419] Error at tpm_responsive.go:28: Failed to run tpm_version: exit status 134 --- I don't know why trunksd is unhappy, though.
,
Nov 20
,
Nov 20
From messages in http://stainless/browse/chromeos-autotest-results/259439666-chromeos-test/ : 2018-11-20T20:20:26.057619+00:00 INFO trunksd[1347]: TPM: /dev/tpm0 opened successfully 2018-11-20T20:20:26.058723+00:00 WARNING trunksd[1347]: libminijail[1347]: allowing syscall: socket 2018-11-20T20:20:26.058963+00:00 WARNING trunksd[1347]: libminijail[1347]: allowing syscall: connect 2018-11-20T20:20:26.059061+00:00 WARNING trunksd[1347]: libminijail[1347]: allowing syscall: gettimeofday 2018-11-20T20:20:26.059143+00:00 WARNING trunksd[1347]: libminijail[1347]: allowing syscall: send 2018-11-20T20:20:26.059227+00:00 WARNING trunksd[1347]: libminijail[1347]: allowing syscall: writev 2018-11-20T20:20:26.061391+00:00 WARNING trunksd[1347]: libminijail[1347]: logging seccomp filter failures 2018-11-20T20:20:26.062291+00:00 ERR trunksd[1347]: libminijail[1347]: blocked syscall: stat64 2018-11-20T20:20:26.072261+00:00 INFO tpm_managerd[1386]: Starting TPM Manager... 2018-11-20T20:20:26.076387+00:00 NOTICE u2fd[1401]: Exiting, no device policy 2018-11-20T20:20:26.089374+00:00 INFO crash_reporter[1395]: libminijail[1395]: mount '/dev/log' -> '/dev/log' type '' flags 0x1001 2018-11-20T20:20:26.113063+00:00 WARNING crash_reporter[1395]: Could not load the device policy file. 2018-11-20T20:20:26.113815+00:00 WARNING crash_reporter[1395]: [user] Received crash notification for trunksd[1347] sig 6, user 251 group 251 (developer build - not testing - always dumping) ... 2018-11-20T20:20:26.208656+00:00 WARNING session_manager[1437]: [WARNING:libpolicy.cc(38)] Could not load the device policy file. 2018-11-20T20:20:26.219063+00:00 INFO crash_reporter[1395]: Stored minidump to /var/spool/crash/trunksd.20181120.202026.1347.dmp 2018-11-20T20:20:26.220744+00:00 INFO crash_reporter[1395]: Leaving core file at /proc/self/fd/5/trunksd.20181120.202026.1347.core due to developer image 2018-11-20T20:20:26.231571+00:00 INFO minijail0[1497]: libminijail[1497]: not running init loop, exiting immediately 2018-11-20T20:20:26.234942+00:00 WARNING kernel: [ 13.496069] init: trunksd main process (1347) killed by ABRT signal 2018-11-20T20:20:26.234964+00:00 WARNING kernel: [ 13.496106] init: trunksd main process ended, respawning ... 2018-11-20T20:20:26.399440+00:00 WARNING kernel: [ 13.661554] init: trunksd main process (1503) killed by ABRT signal 2018-11-20T20:20:26.399479+00:00 WARNING kernel: [ 13.661654] init: trunksd main process ended, respawning 2018-11-20T20:20:26.428916+00:00 INFO trunksd[1548]: TPM: /dev/tpm0 opened successfully 2018-11-20T20:20:26.429318+00:00 WARNING trunksd[1548]: libminijail[1548]: allowing syscall: socket 2018-11-20T20:20:26.429342+00:00 WARNING trunksd[1548]: libminijail[1548]: allowing syscall: connect 2018-11-20T20:20:26.429352+00:00 WARNING trunksd[1548]: libminijail[1548]: allowing syscall: gettimeofday 2018-11-20T20:20:26.429360+00:00 WARNING trunksd[1548]: libminijail[1548]: allowing syscall: send 2018-11-20T20:20:26.429371+00:00 WARNING trunksd[1548]: libminijail[1548]: allowing syscall: writev 2018-11-20T20:20:26.429648+00:00 WARNING trunksd[1548]: libminijail[1548]: logging seccomp filter failures 2018-11-20T20:20:26.429884+00:00 INFO trunksd[1548]: Trunks service started. 2018-11-20T20:20:26.430269+00:00 ERR trunksd[1548]: libminijail[1548]: blocked syscall: stat64 ... I don't know much about anything, but does stat64 or fstat64 need to be added to the seccomp policy? I see this in platform2/trunks: % grep stat64 *.policy trunksd-seccomp-arm.policy:fstat64: 1 trunksd-seccomp-mips.policy:fstat64: 1 trunksd-seccomp-x86.policy:fstat64: 1 trunksd-seccomp-x86.policy:stat64: 1
,
Nov 20
Re #6: and this is bob (which uses arm.policy, as does scarlet). Looks like something has indeed changed, and arm boards now call stat64 in some cases. I will submit a CL to add stat64 to trunksd-seccomp-arm.policy. Not sure if I should do it for other architectures, besides x86 and arm. So far, only bob and scarlet are failing.
,
Nov 20
Thanks! I'm happy to LGTM.
,
Nov 20
Submitted https://crrev.com/c/1344657 since we know stat64 is being called for arm, so at least this change is needed anyways. But given that we don't know what change led to stat64 being occasionally called, and it only happens infrequently: Is there any pattern like "tests fail only after running after Repair job on the device" or "only if there were no reboots since powerwash"? I'd like to repro to make sure we don't miss other unexpected syscalls.
,
Nov 21
Re patterns, I'm not sure. cautotest's UI for displaying a host's history is user-hostile enough that I always give up before I'm able to find the information that I'm looking for. :-( If you want to try, the failing test run on chromeos6-row4-rack13-host5 started at 2018/11/20 12:50:57 and the run on chromeos6-row2-rack7-host15 started at 2018/11/20 12:51:25.
,
Nov 21
Just a guess, nevertheless: Maybe this change in the set of employed syscalls is caused by the recent glibc update ( bug 834385 )?
,
Nov 21
> is caused by the recent glibc update Yes, quite likely. But we seem to lack a test that'd hit the branch with stat64 from trunksd consistently :( I don't see where we directly work with file paths in trunksd (besides /dev/tpm0), so if I had to guess, I'd say some code in libchrome/libbrillo - e.g. logger or some dbus handling, or even minijail->Enter - coupled with this glibc update... If only there was a way for libs like libchrome to list syscalls it needs for various sub-APIs, so that the users can include those in their seccomp filters. Otherwise, we risk releasing a version that occasionally crashes like here (say, when it needs to switch to a new logfile or something) but we don't catch it in tests. Or, we should stop using those libs from daemons with seccomp filters or become intimately familiar with what exact system functions they call.
,
Nov 21
RE comment 12: Yes, this situation seems risky, but it looks like that's the official policy in Chrome OS - it's probably expected that the autotest coverage would be sufficient to catch such errors, and that developers are monitoring autotests results for any SIGSYS-related crashes; see Mike's response to our question here: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1163715/1/diagnostics/init/diagnosticsd.conf#37 +vapier@
,
Nov 21
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/9bb0baf5fc0f37ea5889e7cabaef8a80b72744b2 commit 9bb0baf5fc0f37ea5889e7cabaef8a80b72744b2 Author: Andrey Pronin <apronin@chromium.org> Date: Wed Nov 21 13:52:58 2018 trunks: allow stat64 for arm and mips Add stat64 to the list of allowed syscalls for arm and mips. BUG=chromium:906138 TEST=emerge-bob trunks; check that trunksd-seccomp.policy now contains stat64 Change-Id: Ib84cf23b14776b718629efbe300f5332fc532e62 Reviewed-on: https://chromium-review.googlesource.com/1344657 Commit-Ready: Andrey Pronin <apronin@chromium.org> Tested-by: Andrey Pronin <apronin@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/9bb0baf5fc0f37ea5889e7cabaef8a80b72744b2/trunks/trunksd-seccomp-mips.policy [modify] https://crrev.com/9bb0baf5fc0f37ea5889e7cabaef8a80b72744b2/trunks/trunksd-seccomp-arm.policy |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by apronin@chromium.org
, Nov 16