New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 753526 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

hwtests failing on wolf-tot-paladin, but not wolf-paladin

Project Member Reported by dgarr...@chromium.org, Aug 8 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/wolf-tot-paladin

wolf-tot-paladin is repeatedly failing these tests:

[Test-Logs]: security_SandboxedServices: retry_count: 2, FAIL: One or more processes failed sandboxing
[Test-Logs]: security_StatefulPermissions: retry_count: 2, FAIL: Unexpected files/perms in stateful

That means that these are real failures in TOT. However, the same tests on wolf-paladin have been passing.

This leaves me very confused.
 
Summary: hwtests failing on wolf-tot-paladin, but not wolf-paladin (was: hwtests failing on TOT)
Labels: -Pri-3 Pri-1
This is causing the CQ to make bad decisions for all runs that don't exit early.
Cc: rajatja@chromium.org
Owner: denniskempin@chromium.org
This is the first wolf-tot build to fail.

https://luci-milo.appspot.com/buildbot/chromeos/wolf-tot-paladin/10903

That blamelist contains a lot of CLs. I don't yet see how any of them could be it, but....
Cc: nxia@chromium.org

Comment 6 by nxia@chromium.org, Aug 8 2017

08/04 17:12:56.188 DEBUG|             utils:0212| Running 'scanelf -qF'%s#F' -gs __asan_init `which debugd`'
08/04 17:12:56.209 DEBUG|              asan:0026| running_on_asan(): symbol: '', _ASAN_SYMBOL: '__asan_init'
08/04 17:12:56.210 ERROR|security_Sandboxed:0276| cryptohomed: bad user: wanted "root" but got "cryptohome"
08/04 17:12:56.220 WARNI|security_Sandboxed:0320| Stale baselines: set(['thermal.sh', 'daisydog', 'attestationd', 'brcm_patchram_p', 'cromo', 'easy_unlock', 'sslh-fork', 'timberslide', 'wimax-manager', 'esif_ufd', 'lid_touchpad_he', 'arc_camera_serv', 'arc-networkd', 'tpm_managerd', 'arc-obb-mounter', 'conntrackd'])
08/04 17:12:56.223 ERROR|security_Sandboxed:0338| Failed sandboxing: ['cryptohomed']
08/04 17:12:56.227 DEBUG|              test:0389| Test failed due to One or more processes failed sandboxing. Exception log follows the after_iteration_hooks.
08/04 17:12:56.227 DEBUG|              test:0392| starting after_iteration_hooks
08/04 17:12:56.228 DEBUG|             utils:0212| Running 'mkdir -p /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1/var/spool'
08/04 17:12:56.234 DEBUG|             utils:0212| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1/var/spool'
08/04 17:12:56.242 DEBUG|             utils:0212| Running 'rm -rf /var/spool/crash/*'
08/04 17:12:56.249 DEBUG|             utils:0212| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/security_SandboxedServices/sysinfo/iteration.1"'
08/04 17:12:56.258 DEBUG|              test:0395| after_iteration_hooks completed
08/04 17:12:56.259 WARNI|              test:0612| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 606, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 806, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 470, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/security_SandboxedServices/security_SandboxedServices.py", line 339, in run_once
    raise error.TestFail('One or more processes failed sandboxing')

one failure example, probably need to involve sheriffs to look into the failures.
Out of sheer confusion, I've reinstanced (wiped) the wolf-tot builder, in case some bad state was being left behind on it.
stimim: 

the host (chromeos4-row1-rack3-host5) is running old autotest
it does not have https://crosreview.com/596670
If the test is server side packaged, that SSP isn't being properly updated, somehow.

Otherwise, that CL might not have been pushed to all drones in the lab.
We already have a lab push requested for in the morning.

I'll also track down which server wasn't updated to make sure it is now.
Cc: stimim@chromium.org
Reinstancing the wolf-tot builder fixed the problem, which wiped build state on that builder.

To me, this suggests an ebuild uprev problem. Perhaps the wrong revision was cached away on the builder.
Owner: dgarr...@chromium.org
Working theory....

A local uprev was performed and cached. Then an actual uprev was generated and submitted with the same revision. The locally cached version kept being used, which hid the CL in the real uprev.

Wiping the builder wiped the cache, forced the actual ebuild to be used, and fixed the tests.
Status: Fixed (was: Untriaged)

Sign in to add a comment