hwtests failing on wolf-tot-paladin, but not wolf-paladin |
|||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/wolf-tot-paladin wolf-tot-paladin is repeatedly failing these tests: [Test-Logs]: security_SandboxedServices: retry_count: 2, FAIL: One or more processes failed sandboxing [Test-Logs]: security_StatefulPermissions: retry_count: 2, FAIL: Unexpected files/perms in stateful That means that these are real failures in TOT. However, the same tests on wolf-paladin have been passing. This leaves me very confused.
,
Aug 8 2017
This is causing the CQ to make bad decisions for all runs that don't exit early.
,
Aug 8 2017
,
Aug 8 2017
This is the first wolf-tot build to fail. https://luci-milo.appspot.com/buildbot/chromeos/wolf-tot-paladin/10903 That blamelist contains a lot of CLs. I don't yet see how any of them could be it, but....
,
Aug 8 2017
,
Aug 8 2017
08/04 17:12:56.188 DEBUG| utils:0212| Running 'scanelf -qF'%s#F' -gs __asan_init `which debugd`'
08/04 17:12:56.209 DEBUG| asan:0026| running_on_asan(): symbol: '', _ASAN_SYMBOL: '__asan_init'
08/04 17:12:56.210 ERROR|security_Sandboxed:0276| cryptohomed: bad user: wanted "root" but got "cryptohome"
08/04 17:12:56.220 WARNI|security_Sandboxed:0320| Stale baselines: set(['thermal.sh', 'daisydog', 'attestationd', 'brcm_patchram_p', 'cromo', 'easy_unlock', 'sslh-fork', 'timberslide', 'wimax-manager', 'esif_ufd', 'lid_touchpad_he', 'arc_camera_serv', 'arc-networkd', 'tpm_managerd', 'arc-obb-mounter', 'conntrackd'])
08/04 17:12:56.223 ERROR|security_Sandboxed:0338| Failed sandboxing: ['cryptohomed']
08/04 17:12:56.227 DEBUG| test:0389| Test failed due to One or more processes failed sandboxing. Exception log follows the after_iteration_hooks.
08/04 17:12:56.227 DEBUG| test:0392| starting after_iteration_hooks
08/04 17:12:56.228 DEBUG| utils:0212| Running 'mkdir -p /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1/var/spool'
08/04 17:12:56.234 DEBUG| utils:0212| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1/var/spool'
08/04 17:12:56.242 DEBUG| utils:0212| Running 'rm -rf /var/spool/crash/*'
08/04 17:12:56.249 DEBUG| utils:0212| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1"'
08/04 17:12:56.258 DEBUG| test:0395| after_iteration_hooks completed
08/04 17:12:56.259 WARNI| test:0612| The test failed with the following exception
Traceback (most recent call last):
File "/usr/local/autotest/common_lib/test.py", line 606, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/common_lib/test.py", line 806, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/common_lib/test.py", line 470, in execute
dargs)
File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/tests/security_SandboxedServices/security_SandboxedServices.py", line 339, in run_once
raise error.TestFail('One or more processes failed sandboxing')
one failure example, probably need to involve sheriffs to look into the failures.
,
Aug 8 2017
Out of sheer confusion, I've reinstanced (wiped) the wolf-tot builder, in case some bad state was being left behind on it.
,
Aug 9 2017
stimim: the host (chromeos4-row1-rack3-host5) is running old autotest it does not have https://crosreview.com/596670
,
Aug 9 2017
If the test is server side packaged, that SSP isn't being properly updated, somehow. Otherwise, that CL might not have been pushed to all drones in the lab.
,
Aug 9 2017
We already have a lab push requested for in the morning. I'll also track down which server wasn't updated to make sure it is now.
,
Aug 9 2017
,
Aug 9 2017
Reinstancing the wolf-tot builder fixed the problem, which wiped build state on that builder. To me, this suggests an ebuild uprev problem. Perhaps the wrong revision was cached away on the builder.
,
Aug 9 2017
Working theory.... A local uprev was performed and cached. Then an actual uprev was generated and submitted with the same revision. The locally cached version kept being used, which hid the CL in the real uprev. Wiping the builder wiped the cache, forced the actual ebuild to be used, and fixed the tests.
,
Aug 16 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by dgarr...@chromium.org
, Aug 8 2017