evemu-device did not die resulting in security_SandboxedServices failing
Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Dec 14 2017
Filed by email@example.com on behalf of firstname.lastname@example.org squawks-release:2042 failed Builders failed on: - squawks-release: https://luci-milo.appspot.com/buildbot/chromeos/squawks-release/2042 2/14 07:17:50.971 ERROR|security_Sandboxed:0334| New services are not allowed to run as root, but these are: ['evemu-device'] 12/14 07:17:50.977 ERROR|security_Sandboxed:0338| Failed sandboxing: ['evemu-device'] 12/14 07:17:51.538 ERROR| parallel:0026| child process failed
As part of investigations into why coral-release fails so often, I found this problem in one of the canaries: https://luci-milo.appspot.com/buildbot/chromeos/coral-release/810 Is anyone looking into this?
Seeing similar looking problem on veyron_minnie-tot-chrome-pfq-informational... see Issue 821185 ... dup?
Issue 821185 has been merged into this issue.
assuming we aren't running the tests in parallel (which i don't think we do in general), it sounds like the input_playback code is "leaking" this program. maybe we need to improve that code to make sure the program is killed ?
What is "input_playback"?
some autotest code. it's the only thing i could find that spawns "evemu-device". https://chromium.googlesource.com/chromiumos/third_party/autotest/+/release-R66-10452.B/client/cros/input_playback/input_playback.py
Hey Katherine, looks like the input_playback code might be leaking some processes. Would it be possible for you to take a look?
As Mike points out, the test that's leaking the evemu-device process needs to reap the processes it creates.
This just happened on eve-tot-chrome-pfq-informational at http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946114733070520976: 05/19 04:40:40.218 WARNI|security_Sandboxed:0325| Stale baselines: set(['thermal.sh', 'daisydog', 'cros_camera_service', 'brcm_patchram_p', 'tcsd', 'netfilter-queue', 'firewalld', 'app_process', 'wimax-manager', 'arc-networkd', 'upstart-udev-br', 'cromo', 'cros_camera_algo', 'arc-obb-mounter', 'lid_touchpad_he']) 05/19 04:40:40.221 WARNI|security_Sandboxed:0328| New services: set(['debuggerd64', 'debuggerd64:sig', 'cat', 'evemu-device', 'btdispatch', 'main']) 05/19 04:40:40.223 ERROR|security_Sandboxed:0339| New services are not allowed to run as root, but these are: ['evemu-device'] 05/19 04:40:40.224 ERROR|security_Sandboxed:0343| Failed sandboxing: ['evemu-device'] 05/19 04:40:40.246 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/security_SandboxedServices/security_SandboxedServices.py", line 344, in run_once raise error.TestFail('One or more processes failed sandboxing') TestFail: One or more processes failed sandboxing What's the state of this bug? I don't see any investigation here. I've tried to look through all the uses of InputPlayback in the autotest repo. Most of the tests that instantiate it call close() in their cleanup() methods, but there are some exceptions. In client/cros/multimedia/arc_resource.py, ArcPlayVideoResource doesn't explicitly close the InputPlayback that it creates. Neither do the desktopui_CheckRlzPingSent or ui_AppLauncher tests. There's also client/cros/input_playback/stylus.py and client/cros/input_playback/keyboard.py. Stylus is closed correctly everywhere I looked, but Keyboard isn't explicitly closed by the power_Display test. I've uploaded https://crrev.com/c/1067074 as a speculative fix for the issues that I found, but I'm probably not a good owner for this; I never know which tests are expected to pass and which we don't care about anymore. Katherine, can you take that change over? I think we've had this discussion before, but I sort of feel like the right answer here is for security_SandboxedServices to become a server test that reboots the DUT first to make sure that it's in a known-good state.
More failures: caroline-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946111767730469088 veyron_minnie-tot-chrome-pfq-informational: http://cros-goldeneye/chromeos/healthmonitoring/buildDetails?buildbucketId=8946106705233087760
I don't think the suggestion to reboot has come up before for this security test. the complaints have been that the test is a bit flaky in that it can catch transient bad cases, but not reliably. so people assume it's the test flaking (false positive) rather than the test only sometimes catching the bad behavior. I get the argument here about getting a clean state, but at the same time, leaking resources is a problem for all tests ... and I don't think we want to make them all reboot at the start.
This test in particular runs in the bvt-inline and smoke suites (ATTRIBUTES = "suite:bvt-inline, suite:smoke") so we clearly care about it passing =). Notice that the failures are in informational PFQ suites. I agree with Mike, hopefully we can prevent the test from leaking processes, this is pretty fast running test, I'd hate to have to make it a server test.
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a87122c1102365d354fcacf1a320982759e15574 commit a87122c1102365d354fcacf1a320982759e15574 Author: Daniel Erat <email@example.com> Date: Wed May 23 01:45:40 2018 autotest: Try to fix potential evemu-device process leaks. security_SandboxedServices is failing sporadically due to unexpected evemu-device processes running as root. These look like they're started by the InputPlayback test, and they're presumably being left behind by earlier tests. - Add an __exit__ method to the Keyboard class so it can be used in 'with' blocks. - Make power_Display explicitly close Keyboard. - Make ArcPlayVideoResource, desktopui_CheckRlzPingSent, and ui_AppLauncher explicitly close InputPlayback. Also make security_SandboxedServices include process names in errors. BUG= chromium:795128 TEST=none Change-Id: I1eb25cf7129b00108d7f4753118e3967eb7abf81 Reviewed-on: https://chromium-review.googlesource.com/1067074 Commit-Ready: ChromeOS CL Exonerator Bot <firstname.lastname@example.org> Tested-by: Dan Erat <email@example.com> Reviewed-by: Katherine Threlkeld <firstname.lastname@example.org> [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/site_tests/desktopui_CheckRlzPingSent/desktopui_CheckRlzPingSent.py [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/cros/input_playback/keyboard.py [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/site_tests/security_SandboxedServices/security_SandboxedServices.py [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/site_tests/power_Display/power_Display.py [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/cros/multimedia/arc_resource.py [modify] https://crrev.com/a87122c1102365d354fcacf1a320982759e15574/client/site_tests/ui_AppLauncher/ui_AppLauncher.py
Might be fixed now, might not be. If it's fixed, it'll probably regress again since it's really easy to forget to call close(). :-(
Looks like the change above broke my test. Failing 100% initializing inputPlayback(): https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-05-18&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=&master_builder_name_number=&owner=&retry=&exclude_cts=true&exclude_non_production=false&hostname=&board=&test=%5Erlz_CheckPing%24&suite=&build=%5ER68%5C-10707%5C.0%5C.0%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&exclude_not_run=false&last_date=2018-05-24&exclude_non_release=true&exclude_au=true&model=&view=list
#17: Sorry about that. Looking...
Nearly every time I try to run rlz_CheckPing on a DUT in the lab using test_that, autotest just hangs on me with a useless log message that doesn't describe what it's actually doing: $ test_that --board=cave chromeos2-row8-rack7-host19.cros desktopui_CheckRlzPingSent ... 14:38:46 INFO | autoserv| INFO ---- ---- kernel=3.18.0-17844-g7b8a3c85d4e1 localtime=May 24 14:38:46 timestamp=1527197926 14:38:46 INFO | autoserv| Installing autotest on chromeos2-row8-rack7-host19.cros 14:38:47 INFO | autoserv| Using installation dir /usr/local/autotest 14:38:48 INFO | autoserv| Installation of autotest completed from /build/cave/usr/local/build/autotest/client/ 14:38:48 INFO | autoserv| Installing updated global_config.ini. 14:38:49 INFO | autoserv| Executing /usr/local/autotest/bin/autotest /usr/local/autotest/control phase 0 14:38:50 INFO | autoserv| Entered autotestd_monitor. 14:38:50 INFO | autoserv| Finished launching tail subprocesses. 14:38:50 INFO | autoserv| Finished waiting on autotestd to start. 14:38:50 INFO | autoserv| START ---- ---- timestamp=1527197930 localtime=May 24 14:38:50 14:38:50 INFO | autoserv| START desktopui_CheckRlzPingSent desktopui_CheckRlzPingSent timestamp=1527197930 localtime=May 24 14:38:50 14:38:51 INFO | autoserv| Bundling /build/cave/usr/local/build/autotest/client/site_tests/desktopui_CheckRlzPingSent into test-desktopui_CheckRlzPingSent.tar.bz2 It failed once, though. Here's the full trace: 14:32:14.227 WARNI| test:0637| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 631, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 837, in _call_test_function raise error.UnhandledTestFail(e) UnhandledTestFail: Unhandled AttributeError: __enter__ Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 495, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/desktopui_CheckRlzPingSent/desktopui_CheckRlzPingSent.py", line 61, in run_once self._check_url_for_rlz(cr) File "/usr/local/autotest/tests/desktopui_CheckRlzPingSent/desktopui_CheckRlzPingSent.py", line 42, in _check_url_for_rlz with input_playback.InputPlayback() as player: AttributeError: __enter__ I'd assumed that the presence of an __exit__ method on InputPlayback meant that it was usable in 'with' statements. But no, that doesn't appear to be the case. I'll try to fix it.
Sent https://crrev.com/c/1072794, but I still can't get this test to run to the point where it either passes or fails.
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f3b4b8ed9b4a4c643ce0d8596da12fbea0d236b0 commit f3b4b8ed9b4a4c643ce0d8596da12fbea0d236b0 Author: Daniel Erat <email@example.com> Date: Tue May 29 05:46:27 2018 autotest: Fix __enter__/__exit__ in input playback classes. Update the InputPlayback and Keyboard classes to define both __enter__ and __exit__ methods so they can be used in 'with' statements, e.g. 'with InputPlayback() as playback:'. BUG= chromium:795128 TEST=none, since test_that almost always just hangs instead of running desktopui_CheckRlzPingSent when i use test_that :-/ Change-Id: Id2f8b7921637f85caf12e2f0fc7614b3ec83583d Reviewed-on: https://chromium-review.googlesource.com/1072794 Commit-Ready: Dan Erat <firstname.lastname@example.org> Tested-by: Dan Erat <email@example.com> Reviewed-by: David Haddock <firstname.lastname@example.org> [modify] https://crrev.com/f3b4b8ed9b4a4c643ce0d8596da12fbea0d236b0/client/cros/input_playback/keyboard.py [modify] https://crrev.com/f3b4b8ed9b4a4c643ce0d8596da12fbea0d236b0/client/cros/input_playback/input_playback.py
I'm going to tentatively call this fixed now.
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3defff0054ea6590ae07889e366c324e715e862b commit 3defff0054ea6590ae07889e366c324e715e862b Author: Daniel Erat <email@example.com> Date: Mon Jul 09 19:42:43 2018 autotest: Add proc exclusions to security_SandboxedServices. Exclude "cras_test_clien" and "evemu-device" in security_SandboxedServices. These processes can be left behind by earlier misbehaving or aborted tests. BUG= chromium:795128 , chromium:853804 , chromium:860107 TEST=none Change-Id: I5db939d08ddda1709fde0b2b671d0389f66d8e33 Reviewed-on: https://chromium-review.googlesource.com/1125513 Commit-Ready: Dan Erat <firstname.lastname@example.org> Tested-by: Dan Erat <email@example.com> Reviewed-by: Jorge Lucangeli Obes <firstname.lastname@example.org> [modify] https://crrev.com/3defff0054ea6590ae07889e366c324e715e862b/client/site_tests/security_SandboxedServices/exclude
Sign in to add a comment