|security_SandboxedServices failing daisydog checks in VMTest|
|Project Member Reported by akes...@chromium.org, Nov 19 2016||Back to list|
First instance: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11919 Then we had a few green runs, followed by: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11923 After which it has been failing consistently. The relevant error log seems to be: 11/18 20:29:20.062 DEBUG| base_utils:0185| Running 'scanelf -qF'%s#F' -gs __asan_init `which debugd`' 11/18 20:29:20.076 DEBUG| asan:0026| running_on_asan(): symbol: '', _ASAN_SYMBOL: '__asan_init' 11/18 20:29:20.077 ERROR|security_Sandboxed:0270| daisydog: bad user: wanted "watchdog" but got "root" 11/18 20:29:20.081 WARNI|security_Sandboxed:0308| Stale baselines: set(['thermal.sh', 'attestationd', 'brcm_patchram_p', 'cromo', 'easy_unlock', 'sslh-fork', 'timberslide', 'wimax-manager', 'esif_ufd', 'arc-networkd', 'X', 'tpm_managerd', 'tcsd', 'arc-obb-mounter', 'lid_touchpad_he']) 11/18 20:29:20.083 WARNI|security_Sandboxed:0311| New services: set(['avahi-daemon', 'nacl_helper_non']) 11/18 20:29:20.085 ERROR|security_Sandboxed:0326| Failed sandboxing: ['daisydog'] 11/18 20:29:20.088 DEBUG| base_utils:0185| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1"' 11/18 20:29:20.091 WARNI| test:0606| Autotest caught exception when running test: Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 461, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/security_SandboxedServices/security_SandboxedServices.py", line 327, in run_once raise error.TestFail("One or more processes failed sandboxing")
Nov 19 2016,
Nov 19 2016,
vapier knows about daisydog.
Nov 20 2016,
/var/log/messages indicates the watchdog device doesn't actually exist on mario 2016-11-19T01:54:38.298342+00:00 WARNING kernel: [ 4.297259] init: daisydog main process (475) terminated with status 1 2016-11-19T01:54:38.298367+00:00 WARNING kernel: [ 4.297297] init: daisydog main process ended, respawning 2016-11-19T01:54:38.894185+00:00 WARNING kernel: [ 4.893339] init: daisydog main process (574) terminated with status 1 2016-11-19T01:54:38.894202+00:00 WARNING kernel: [ 4.893414] init: daisydog main process ended, respawning 2016-11-19T01:54:39.738149+00:00 WARNING kernel: [ 5.714259] init: daisydog main process (631) terminated with status 1 2016-11-19T01:54:39.738347+00:00 WARNING kernel: [ 5.714299] init: daisydog main process ended, respawning 2016-11-19T01:54:40.527066+00:00 WARNING kernel: [ 6.526282] init: daisydog main process (738) terminated with status 1 2016-11-19T01:54:40.530057+00:00 WARNING kernel: [ 6.529907] init: daisydog main process ended, respawning i don't see platform_HWwatchdog being run. is that test properly configured ?
Nov 20 2016,
Release builder is also starting to fail: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-release/builds/5438 https://crosland.corp.google.com/log/9004.0.0..9005.0.0, I see one interesting minijail change: https://chromium-review.googlesource.com/#/c/412420/ (daisydog runs in minijail) That change was included in the first failure: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11919 Then the CL was committed as part of this master-paladin run: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12926, where, for some reason, the test passed on x86-mario-paladin: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11922 Later builds starting failing frequently.
Nov 20 2016,
Nov 20 2016,
Revert here: https://chromium-review.googlesource.com/#/c/413160/
Nov 20 2016,
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/21d455364ac7091e72bcce63f4f137daeeb765ef commit 21d455364ac7091e72bcce63f4f137daeeb765ef Author: Nicolas Boichat <email@example.com> Date: Sun Nov 20 10:24:42 2016 Revert "minijail: bring in network namespace loopback support" This reverts commit e92f13937b1f096c490784b492a5f83c1d4ae99f, which appears to be causing failures in x86-mario-paladin. BUG= chromium:665649 BUG= chromium:667075 TEST=x86-mario-paladin VMTest security_SandboxedServices passes. Change-Id: Icfc559ef58dbe07843df1153425007a3c811deeb Reviewed-on: https://chromium-review.googlesource.com/413160 Commit-Queue: Nicolas Boichat <firstname.lastname@example.org> Commit-Queue: Sean Kau <email@example.com> Tested-by: Nicolas Boichat <firstname.lastname@example.org> Trybot-Ready: Nicolas Boichat <email@example.com> Reviewed-by: Nicolas Boichat <firstname.lastname@example.org> Reviewed-by: Sean Kau <email@example.com> [rename] https://crrev.com/21d455364ac7091e72bcce63f4f137daeeb765ef/chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1470.ebuild
Nov 20 2016,
Last 2 builds after revert seem fine: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11934 https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11935 Closing, let's follow-up in Issue 665649 .
Nov 22 2016,
Mike, I don't think platform_HWwatchdog test is routinely run all platforms. Even though I wrote that test, I wasn't able to get it working on all possible platforms. My goal was to validate new platforms would have a working watchdog...but not all platforms use /dev/watchdog. :/
Nov 29 2016,
i'm not sure how this is passing today short of sheer luck. running daisydog in any VM doesn't make sense because we don't enable any virtual/pseudo watchdog drivers which means daisydog will always fail to start. the reason this test is failing is that we run it after we expect the system to be in a quiescent state. but daisydog is in a restart loop here. and when we launch via minijail, there's a short window where the daisydog process is execed as root because the minijail preload hasn't yet executed the chuid/chgid yet which means the ps output is root:root. what i'm not seeing is why daisydog is in a restart loop in the first place. i would have expected upstart to start it, fail, respawn a few times, and then give up. then this test wouldn't have failed because daisydog would have stopped spawning by the time it ran.
Nov 29 2016,
ok, grabbing the image and playing around, the issue is that the use of namespaces (plus running under a VM env which is naturally a little slower than hardware) is slowing the startup just enough to keep it below the threshold of the respawn cap. the default cap is 10 respawns within 5 seconds, but we're taking almost 1 second to startup+fail which means we stay below the threshold indefinitely (1 fail/sec is less than the required 2/fail per sec to be halted). i imagine my change to minijail where we bring up the loopback device slows things down just enough to go from breaking the threshold to staying under it. then the security_SandboxedServices autotest runs and gets upset. so the root cause is still that we shouldn't be spawning daisydog in the first place on VMs. or we should be providing a pseudo watchdog device.
Dec 6 2016,
Dec 12 2016,
Dec 13 2016,
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/daisydog/+/ed22845616b31b7e0116a354f7183572c7f912e6 commit ed22845616b31b7e0116a354f7183572c7f912e6 Author: Mike Frysinger <firstname.lastname@example.org> Date: Mon Dec 12 22:41:24 2016 init: fix startup check logic In my zeal to kill off pointless processes, I missed that this code needs to wait around for daisydog to return when checking for the previous boot status. Rework the env vars so that we don't pass -i during the check phase. BUG= chromium:667075 TEST=added some debugging and saw that the pre-script check phase processed output Change-Id: I005abffd4de489f8d0136491cb11b8515e60824a Reviewed-on: https://chromium-review.googlesource.com/419195 Commit-Ready: Mike Frysinger <email@example.com> Tested-by: Mike Frysinger <firstname.lastname@example.org> Reviewed-by: Mike Frysinger <email@example.com> [modify] https://crrev.com/ed22845616b31b7e0116a354f7183572c7f912e6/daisydog.conf
Dec 14 2016,
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/daisydog/+/8a756a4d5b30069c5ae5013648785c3b808b57eb commit 8a756a4d5b30069c5ae5013648785c3b808b57eb Author: Mike Frysinger <firstname.lastname@example.org> Date: Mon Dec 12 22:45:52 2016 init: halt service if initial check fails If the pre-startup phase fails, then there's a problem with the watchdog. Stop upstart from trying to respawn us all the time in that case. This fixes the restart storm/spam seen when we build images for real devices, but then boot them in a VM where there is no watchdog service. BUG= chromium:667075 TEST=startup still works on a real device TEST=startup halts in a VM Change-Id: Icbd22f04dd434ad1e1fe8ae42a2385232b419be0 Reviewed-on: https://chromium-review.googlesource.com/419215 Commit-Ready: Mike Frysinger <email@example.com> Tested-by: Mike Frysinger <firstname.lastname@example.org> Reviewed-by: Mattias Nissler <email@example.com> [modify] https://crrev.com/8a756a4d5b30069c5ae5013648785c3b808b57eb/daisydog.conf
Dec 15 2016,
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/0a157aaf3b45f79d914bf19980cb97ffce674245 commit 0a157aaf3b45f79d914bf19980cb97ffce674245 Author: Mike Frysinger <firstname.lastname@example.org> Date: Thu Nov 17 19:47:55 2016 minijail: bring in network namespace loopback support [reland] We've updated daisydog's init process to deal with failing in VMs, so the issue raised in chromium:667075 should be fixed now. Changes: 7559dfe9ed16 minijail: bring up loopback interface in new net namespaces 531ec3747e71 build: pass down CC from make env BUG= chromium:665649 BUG= chromium:667075 TEST=`sudo ./minijail0 -r -v -p -e /sbin/ip a s` shows loopback as up Change-Id: If8960e31b110f3a38ab2ef45810c42866ae1bfe7 Reviewed-on: https://chromium-review.googlesource.com/420383 Commit-Ready: Mike Frysinger <email@example.com> Tested-by: Mike Frysinger <firstname.lastname@example.org> Reviewed-by: Dylan Reid <email@example.com> Reviewed-by: Nicolas Boichat <firstname.lastname@example.org> [rename] https://crrev.com/0a157aaf3b45f79d914bf19980cb97ffce674245/chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1471.ebuild
Dec 15 2016,
should be all set now
Mar 4 2017,
Apr 17 2017,
May 30 2017,
|► Sign in to add a comment|