Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Starred by 1 user
Status: Archived
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 673501



Sign in to add a comment
security_SandboxedServices failing daisydog checks in VMTest
Project Member Reported by akes...@chromium.org, Nov 19 2016 Back to list
First instance:
https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11919

Then we had a few green runs, followed by:
https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11923

After which it has been failing consistently.


The relevant error log seems to be:

11/18 20:29:20.062 DEBUG|        base_utils:0185| Running 'scanelf -qF'%s#F' -gs __asan_init `which debugd`'
11/18 20:29:20.076 DEBUG|              asan:0026| running_on_asan(): symbol: '', _ASAN_SYMBOL: '__asan_init'
11/18 20:29:20.077 ERROR|security_Sandboxed:0270| daisydog: bad user: wanted "watchdog" but got "root"
11/18 20:29:20.081 WARNI|security_Sandboxed:0308| Stale baselines: set(['thermal.sh', 'attestationd', 'brcm_patchram_p', 'cromo', 'easy_unlock', 'sslh-fork', 'timberslide', 'wimax-manager', 'esif_ufd', 'arc-networkd', 'X', 'tpm_managerd', 'tcsd', 'arc-obb-mounter', 'lid_touchpad_he'])
11/18 20:29:20.083 WARNI|security_Sandboxed:0311| New services: set(['avahi-daemon', 'nacl_helper_non'])
11/18 20:29:20.085 ERROR|security_Sandboxed:0326| Failed sandboxing: ['daisydog']
11/18 20:29:20.088 DEBUG|        base_utils:0185| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1"'
11/18 20:29:20.091 WARNI|              test:0606| Autotest caught exception when running test:
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 461, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/security_SandboxedServices/security_SandboxedServices.py", line 327, in run_once
    raise error.TestFail("One or more processes failed sandboxing")
 
Labels: -Pri-2 Pri-1
Owner: vapier@chromium.org
vapier knows about daisydog.
Comment 3 by vapier@chromium.org, Nov 20 2016
Cc: grundler@chromium.org benzh@chromium.org
Owner: benzh@chromium.org
/var/log/messages indicates the watchdog device doesn't actually exist on mario

2016-11-19T01:54:38.298342+00:00 WARNING kernel: [    4.297259] init: daisydog main process (475) terminated with status 1
2016-11-19T01:54:38.298367+00:00 WARNING kernel: [    4.297297] init: daisydog main process ended, respawning
2016-11-19T01:54:38.894185+00:00 WARNING kernel: [    4.893339] init: daisydog main process (574) terminated with status 1
2016-11-19T01:54:38.894202+00:00 WARNING kernel: [    4.893414] init: daisydog main process ended, respawning
2016-11-19T01:54:39.738149+00:00 WARNING kernel: [    5.714259] init: daisydog main process (631) terminated with status 1
2016-11-19T01:54:39.738347+00:00 WARNING kernel: [    5.714299] init: daisydog main process ended, respawning
2016-11-19T01:54:40.527066+00:00 WARNING kernel: [    6.526282] init: daisydog main process (738) terminated with status 1
2016-11-19T01:54:40.530057+00:00 WARNING kernel: [    6.529907] init: daisydog main process ended, respawning

i don't see platform_HWwatchdog being run.  is that test properly configured ?
Owner: vapier@chromium.org
Status: Available
Release builder is also starting to fail: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-release/builds/5438

https://crosland.corp.google.com/log/9004.0.0..9005.0.0, I see one interesting minijail change: https://chromium-review.googlesource.com/#/c/412420/ (daisydog runs in minijail)

That change was included in the first failure:
https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11919

Then the CL was committed as part of this master-paladin run:
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12926, where, for some reason, the test passed on x86-mario-paladin: https://uberchromegw.corp.google.com/i/chromeos/builders/x86-mario-paladin/builds/11922

Later builds starting failing frequently.
Cc: drinkcat@chromium.org
Summary: security_SandboxedServices VMTest failing consistently on x86-mario-paladin (was: security_SandboxServices VMTest failing consistently on x86-mario-paladin)
Project Member Comment 7 by bugdroid1@chromium.org, Nov 20 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/21d455364ac7091e72bcce63f4f137daeeb765ef

commit 21d455364ac7091e72bcce63f4f137daeeb765ef
Author: Nicolas Boichat <drinkcat@chromium.org>
Date: Sun Nov 20 10:24:42 2016

Revert "minijail: bring in network namespace loopback support"

This reverts commit e92f13937b1f096c490784b492a5f83c1d4ae99f, which
appears to be causing failures in x86-mario-paladin.

BUG= chromium:665649 
BUG= chromium:667075 
TEST=x86-mario-paladin VMTest security_SandboxedServices passes.

Change-Id: Icfc559ef58dbe07843df1153425007a3c811deeb
Reviewed-on: https://chromium-review.googlesource.com/413160
Commit-Queue: Nicolas Boichat <drinkcat@chromium.org>
Commit-Queue: Sean Kau <skau@chromium.org>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Trybot-Ready: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Sean Kau <skau@chromium.org>

[rename] https://crrev.com/21d455364ac7091e72bcce63f4f137daeeb765ef/chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1470.ebuild

Mike,
I don't think platform_HWwatchdog test is routinely run all platforms. Even though I wrote that test, I wasn't able to get it working on all possible platforms. My goal was to validate new platforms would have a working watchdog...but not all platforms use /dev/watchdog. :/
i'm not sure how this is passing today short of sheer luck.  running daisydog in any VM doesn't make sense because we don't enable any virtual/pseudo watchdog drivers which means daisydog will always fail to start.

the reason this test is failing is that we run it after we expect the system to be in a quiescent state.  but daisydog is in a restart loop here.  and when we launch via minijail, there's a short window where the daisydog process is execed as root because the minijail preload hasn't yet executed the chuid/chgid yet which means the ps output is root:root.

what i'm not seeing is why daisydog is in a restart loop in the first place.  i would have expected upstart to start it, fail, respawn a few times, and then give up.  then this test wouldn't have failed because daisydog would have stopped spawning by the time it ran.
Status: Assigned
Summary: security_SandboxedServices failing daisydog checks in VMTest (was: security_SandboxedServices VMTest failing consistently on x86-mario-paladin)
ok, grabbing the image and playing around, the issue is that the use of namespaces (plus running under a VM env which is naturally a little slower than hardware) is slowing the startup just enough to keep it below the threshold of the respawn cap.  the default cap is 10 respawns within 5 seconds, but we're taking almost 1 second to startup+fail which means we stay below the threshold indefinitely (1 fail/sec is less than the required 2/fail per sec to be halted).

i imagine my change to minijail where we bring up the loopback device slows things down just enough to go from breaking the threshold to staying under it.  then the security_SandboxedServices autotest runs and gets upset.

so the root cause is still that we shouldn't be spawning daisydog in the first place on VMs.  or we should be providing a pseudo watchdog device.
Labels: -current-issue
Blockedon: 673501
Project Member Comment 14 by bugdroid1@chromium.org, Dec 13 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/daisydog/+/ed22845616b31b7e0116a354f7183572c7f912e6

commit ed22845616b31b7e0116a354f7183572c7f912e6
Author: Mike Frysinger <vapier@chromium.org>
Date: Mon Dec 12 22:41:24 2016

init: fix startup check logic

In my zeal to kill off pointless processes, I missed that this code needs
to wait around for daisydog to return when checking for the previous boot
status.  Rework the env vars so that we don't pass -i during the check
phase.

BUG= chromium:667075 
TEST=added some debugging and saw that the pre-script check phase processed output

Change-Id: I005abffd4de489f8d0136491cb11b8515e60824a
Reviewed-on: https://chromium-review.googlesource.com/419195
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/ed22845616b31b7e0116a354f7183572c7f912e6/daisydog.conf

Project Member Comment 15 by bugdroid1@chromium.org, Dec 14 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/daisydog/+/8a756a4d5b30069c5ae5013648785c3b808b57eb

commit 8a756a4d5b30069c5ae5013648785c3b808b57eb
Author: Mike Frysinger <vapier@chromium.org>
Date: Mon Dec 12 22:45:52 2016

init: halt service if initial check fails

If the pre-startup phase fails, then there's a problem with the watchdog.
Stop upstart from trying to respawn us all the time in that case.  This
fixes the restart storm/spam seen when we build images for real devices,
but then boot them in a VM where there is no watchdog service.

BUG= chromium:667075 
TEST=startup still works on a real device
TEST=startup halts in a VM

Change-Id: Icbd22f04dd434ad1e1fe8ae42a2385232b419be0
Reviewed-on: https://chromium-review.googlesource.com/419215
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Mattias Nissler <mnissler@chromium.org>

[modify] https://crrev.com/8a756a4d5b30069c5ae5013648785c3b808b57eb/daisydog.conf

Project Member Comment 16 by bugdroid1@chromium.org, Dec 15 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/0a157aaf3b45f79d914bf19980cb97ffce674245

commit 0a157aaf3b45f79d914bf19980cb97ffce674245
Author: Mike Frysinger <vapier@chromium.org>
Date: Thu Nov 17 19:47:55 2016

minijail: bring in network namespace loopback support [reland]

We've updated daisydog's init process to deal with failing in VMs,
so the issue raised in chromium:667075 should be fixed now.

Changes:
7559dfe9ed16 minijail: bring up loopback interface in new net namespaces
531ec3747e71 build: pass down CC from make env

BUG= chromium:665649 
BUG= chromium:667075 
TEST=`sudo ./minijail0 -r -v -p -e /sbin/ip a s` shows loopback as up

Change-Id: If8960e31b110f3a38ab2ef45810c42866ae1bfe7
Reviewed-on: https://chromium-review.googlesource.com/420383
Commit-Ready: Mike Frysinger <vapier@chromium.org>
Tested-by: Mike Frysinger <vapier@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>

[rename] https://crrev.com/0a157aaf3b45f79d914bf19980cb97ffce674245/chromeos-base/chromeos-minijail/chromeos-minijail-0.0.1-r1471.ebuild

Status: Fixed
should be all set now
Comment 18 by dchan@google.com, Mar 4 2017
Labels: VerifyIn-58
Labels: VerifyIn-59
Labels: VerifyIn-60
Labels: VerifyIn-61
Status: Archived
Sign in to add a comment