security_SandboxedServices keeps failing in wolf-tot-paladin |
|||||||||||
Issue descriptionMaster-paladin: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16273 wolf-tot-paladin: https://luci-milo.appspot.com/buildbot/chromeos/wolf-tot-paladin/11348 09/15 00:12:48.425 ERROR|security_Sandboxed:0276| cryptohomed: bad user: wanted "cryptohome" but got "root" By looking at the "baseline" file in the autotest, it is cryptohomed,root,root,No,No,No,No which seems to me that the wanted euser is root. It looks strange to me that the error message said that "cryptohome" was wanted. Mike, you have been maintained the autotest within recent year, would you please help take a look? Thanks!
,
Sep 15 2017
The wolf-tot-paladin has failed continuously, raise the priority
,
Sep 15 2017
The background is that cryptohome was running as the `cryptohome` user but we reverted that and went back to running as root. I updated the test, in this case I believe the old test is running against the new changes. crbug.com/764540
,
Sep 15 2017
Hi, I was wondering if you could list the ebuild which is related to this test? That way, we can figure out whether this is an uprev logic problem (less likely), or a stale artifact problem (more likely, since this has happened before and I don't know the root cause).
,
Sep 15 2017
The commits landed for this test are: The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/211deb059ef5d04c12f43edb8dd0cd5e141d2ff3 commit 211deb059ef5d04c12f43edb8dd0cd5e141d2ff3 Author: Greg Kerr <kerrnel@chromium.org> Date: Fri Sep 15 01:01:49 2017 Revert cryptohome sandboxing changes. This reverts the CLs to sandbox cryptohome which cause dregressions. This reverts commit f7ff65bd0e20ce532d97cc511e1c0ff1749ae91d Author: Matthew Denton <mpdenton@google.com> Date: Thu Jul 20 07:36:56 2017 Update cryptohome ebuild to create cryptohome user Updated the cryptohome ebuild file with a pkg_preinst to create the new user and group "cryptohome". This reverts commit 3bb107b6d68d92f23927b99643934be3554a6668 Author: Matthew Denton <mpdenton@google.com> Date: Fri Aug 04 22:39:04 2017 upstart: Create dircrypto keyring with owner "cryptohome" This CL creates the dircrypto keyring with owner user and group as "cryptohome". This is necessary in order to run cryptohome as non-root user "cryptohome", as cryptohome needs to create keys in the dircrypto keyring. The keyring was originally owned (and could only be modified by) root. BUG= chromium:741786 ,chromium:764540 TEST=pre-cq Change-Id: I24a9fdd6c6251001fb807bbd5cf3674dbafca3e0 Reviewed-on: https://chromium-review.googlesource.com/665345 Commit-Ready: Greg Kerr <kerrnel@chromium.org> Tested-by: Greg Kerr <kerrnel@chromium.org> Reviewed-by: Matthew Denton <mpdenton@google.com> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/211deb059ef5d04c12f43edb8dd0cd5e141d2ff3/sys-apps/upstart/files/upstart-1.2-dircrypto.patch [modify] https://crrev.com/211deb059ef5d04c12f43edb8dd0cd5e141d2ff3/chromeos-base/cryptohome/cryptohome-9999.ebuild [rename] https://crrev.com/211deb059ef5d04c12f43edb8dd0cd5e141d2ff3/sys-apps/upstart/upstart-1.2-r21.ebuild The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/0ec311758db7f10181886053eeab88366d55a2c2 commit 0ec311758db7f10181886053eeab88366d55a2c2 Author: Greg Kerr <kerrnel@chromium.org> Date: Fri Sep 15 01:01:49 2017 Revert commits for cryptohome baseline test changes. This reverts commit 0be215d51744c5a682ab6d5b70f24f4505f0e2f3 Author: Matthew Denton <mpdenton@google.com> Date: Thu Jul 20 07:36:56 2017 Add cryptohome to the baseline accounts test. This adds cryptohome to the baseline accounts test to note its new group. This reverts commit 2f972f0c098113976403925fadd77b43c754b76f Author: Matthew Denton <mpdenton@google.com> Date: Fri Aug 04 22:39:05 2017 Update security tests for non-root cryptohomed This updates the security_SandboxedServices, security_ProfilePermissions, and security_StatefulPermissions tests to reflect the fact that cryptohomed runs and mounts directories under the "cryptohome" user instead of "root". BUG= chromium:741786 ,chromium:764540 TEST=pre-cq CQ-DEPEND=CL:665279,CL:665345 Change-Id: I6cde77c984bbee7fbc4ab99f3c527d5bbf176215 Reviewed-on: https://chromium-review.googlesource.com/666018 Commit-Ready: Greg Kerr <kerrnel@chromium.org> Tested-by: Greg Kerr <kerrnel@chromium.org> Reviewed-by: Matthew Denton <mpdenton@google.com> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/0ec311758db7f10181886053eeab88366d55a2c2/client/site_tests/security_AccountsBaseline/baseline.group [modify] https://crrev.com/0ec311758db7f10181886053eeab88366d55a2c2/client/site_tests/security_ProfilePermissions/security_ProfilePermissions.py [modify] https://crrev.com/0ec311758db7f10181886053eeab88366d55a2c2/client/site_tests/security_StatefulPermissions/security_StatefulPermissions.py [modify] https://crrev.com/0ec311758db7f10181886053eeab88366d55a2c2/client/site_tests/security_SandboxedServices/baseline [modify] https://crrev.com/0ec311758db7f10181886053eeab88366d55a2c2/client/site_tests/security_AccountsBaseline/baseline.passwd
,
Sep 15 2017
vapier@, do you have an answer for #6? kerrnel@, I should expect the error goes away after the revert?
,
Sep 15 2017
No those CLs aren't causing the error, the error is happening because the test changes, required by those CLs, aren't updating on the bot. So the bot needs to run the latest version of the test and it should pass.
,
Sep 15 2017
Looking at the Uprev stage for : https://chromium-review.googlesource.com/666018 which make dhte changes for security_SandboxedServices, we can see that the uprev logic detects a change in the relevant directory: 16:04:57: INFO: Rev: Determined that one+ of the ebuild autotest-tests-security rev_subdirs was touched ['client/site_tests/security_RendererSandbox', 'client/site_tests/security_SessionManagerDbusEndpoints', 'client/site_tests/security_Minijail_seccomp', 'client/site_tests/security_SeccompSyscallFilters', 'client/site_tests/security_AccountsBaseline', 'client/site_tests/security_AltSyscall', 'client/site_tests/security_ASLR', 'client/site_tests/security_ChromiumOSLSM', 'client/site_tests/security_CroshModules', 'client/site_tests/security_DbusOwners', 'client/site_tests/security_DeviceJail_AllowDeny', 'client/site_tests/security_DeviceJail_Detach', 'client/site_tests/security_DeviceJail_Filesystem', 'client/site_tests/security_DeviceJail_Lockdown', 'client/site_tests/security_Firewall', 'client/site_tests/security_HardlinkRestrictions', 'client/site_tests/security_Minijail0', 'client/site_tests/security_ModuleLocking', 'client/site_tests/security_mprotect', 'client/site_tests/security_OpenFDs', 'client/site_tests/security_OpenSSLBlacklist', 'client/site_tests/security_ProtocolFamilies', 'client/site_tests/security_ptraceRestrictions', 'client/site_tests/security_RootCA', 'client/site_tests/security_RootfsOwners', 'client/site_tests/security_RootfsStatefulSymlinks', 'client/site_tests/security_RunOci', 'client/site_tests/security_RuntimeExecStack', 'client/site_tests/security_SandboxedServices', 'client/site_tests/security_StatefulPermissions', 'client/site_tests/security_SuidBinaries', 'client/site_tests/security_SymlinkRestrictions', 'client/site_tests/security_SysLogPermissions', 'client/site_tests/security_SysVIPC', 'client/site_tests/security_x86Registers', 'client/site_tests/security_x86Registers'] And it subsequently uprevs autotests-tests-security: Marking 9999 ebuild for chromeos-base/autotest-tests-security as stable. So the issue is definitely that the version of the autotest artifact being used by the test is old. The same pathologies remain (Pre-CQ runs seem to get the correct version of th artifacts, while CQ runs, which re-use old chroots, have old versions). As before, the following bits of information are necessary to debug this further: - Where are the CQ autotests pulling the artifacts from (from some artifact server) ? If so, what version are they referencing. - Where are the logs for when the artifacts generated get uploaded to cloud storage. My current suspicion is that the old artifacts never get overwritten, and therefore the autotests keep using the old version. However, without the above information, I don't have any suggestions except disabling this again.
,
Sep 15 2017
Don, aviv, do you know the answer for comment 10?
,
Sep 15 2017
>#9, "the bot needs to run the latest version of the test and it should pass", I guess you mean the lab servers need to run the latest version of the test, which means we need to a push-to-prod to update the lab servers?
,
Sep 15 2017
I think the fix in #7 needs a push-to-prod, right?
,
Sep 15 2017
jrbarnette@ was deputy last time we had an autotest uprev problem and should have more background (I wasn't heavily involved). However, last time it happened, we were leaving chroots in a bad state that could be fixed by wiping the builder (cros-beefy414-c2). I just ssh'd in and rm'd /b/c/cbuild/.cbuildbot_launch_state, which is a hack should cause cbuildbot_launch to wipe EVERYTHING when the next build starts. If that isn't needed, it doesn't hurt anything other that slowing the start of the next build.
,
Sep 15 2017
Talked with kerrnel@ offline, since this only happens on one build, so we agree with pmalani@, the uprev logic is broken on this builder, which causes it always using the stale artifact.
,
Sep 15 2017
thanks, don!
,
Sep 15 2017
I've marked the wolf paladin as experimental, since this is just the build itself has some issues with the uprev logic. We still need to come back to this issue next Monday and try to fix it.
,
Sep 18 2017
Is it actually experimental? The latest runs still say important=True
,
Sep 18 2017
I see, it's experimental in tree status but not in the source. I'm going to mark it experimental in source now.
,
Sep 18 2017
Is it not easier to just clear the artifacts on the bot and redo it?
,
Sep 18 2017
I'd like to understand if it would make sense to only clear out those packages which we have detected have been upreved. It seems like the build server ends up taking whatever version is in the chroot, so selectively clearing out the stale packages might be more efficient than clearing all artifacts (or in the worst case, clearing out the entire chroot)
,
Sep 18 2017
We tried that, it did now fix the problem on the builder. Marking the the builder as experimental is just mitigation until we can get the real bug fixed, and is usually easy to do.
,
Sep 18 2017
Sorry, I meant "it did not fix the problem".
,
Sep 18 2017
Discussed with dgarrett@ on IM. There are a few hypotheses at the moment: - wolf-tot-paladin builders don't have binary prebuilt uploads enabled, and so the newly upreved test artifacts never get uploaded. - wolf-tot-paladin and wolf-paladin builders might be building the same ebuild version, but different versions of the code. As a result, they could be uploading the tarball to the same place and overwriting each other with the same name but different artifact contents (that still wouldn't explain why many other paladin builders have shown the same issue in the past) - for the failing autotest runs, the autotest is using an old / stale version of the artifacts (again, we need to know where the autotest pulls these artifacts from; does it just pull whatever is in the local chroot, or does it pull it from the GS bucket?). - This issue of stale autotest artifacts could have been getting masked earlier because autotest ebuilds used to get upreved a lot more frequently (and so versions with the most recent code base always made their way to the GS in time). Since autotest ebuilds don't get upreved all the time now, the issue is being uncovered. dgarrett@ mentioned that the test artifact upload shouldn't even be happening from the autotest ebuild (since "various assumptions will be thrown off about off-builder side effects"). I don't completely understand what the side effects are in this case. As always, if we know where the upload of the artifacts is occurring in the builder, that will help (logs of where that happens, what are the contents of the artifacts, where are they uploaded to, which version is used by the autotest run). Else, I'll go ahead an revert the uprev logic (so that autotests start getting upreved a lot more, like they did before).
,
Sep 18 2017
I believe this is the cause of the chell-incremental builder failing. First bad build: Sep 13 21:47 #1305 https://uberchromegw.corp.google.com/i/chromeos/builders/chell-incremental/builds/1353/steps/VMTest%20%28attempt%202%29/logs/stdio /tmp/cbuildbot8Lo6zH/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-32-security_SandboxedServices FAIL: One or more processes failed sandboxing /tmp/cbuildbot8Lo6zH/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-32-security_SandboxedServices/security_SandboxedServices [ FAILED ] /tmp/cbuildbot8Lo6zH/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-32-security_SandboxedServices/security_SandboxedServices FAIL: One or more processes failed sandboxing /tmp/cbuildbot8Lo6zH/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-32-security_SandboxedServices/security_SandboxedServices 09/18 13:34:34.338 ERROR|security_Sandboxed:0276| cryptohomed: bad user: wanted "cryptohome" but got "root" /tmp/cbuildbot8Lo6zH/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/results-32-security_SandboxedServices/security_SandboxedServices 09/18 13:34:34.366 ERROR|security_Sandboxed:0338| Failed sandboxing: ['cryptohomed']
,
Sep 18 2017
A couple of things I don't understand: - https://chromium-review.googlesource.com/666018 failed its CQ run (https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16270). No artifacts were uploaded during this run. Still this patch was committed. Why did that happen? If there is any failure the patch shouldn't be committed. Also, if it is being committed, the relevant artifacts should be pushed to the relevant GS bucket. - https://chromium-review.googlesource.com/665345 passed its CQ run. Still no artifacts were uploaded during this run. Why did that happen? amstan@ could you kindly redirect this to someone who could answer the above questions?
,
Sep 18 2017
To make sure I understand this correctly, you're saying that those CLs failed their CQ run but landed anyhow? The reason I'm asking is because I need to backport the CLs so if they failed their initial CQ run, that's not good.
,
Sep 18 2017
Comment #27: Yes, that's what the CQ page suggests. I don't know what the rules are regarding CQ runs being deemed successes, so this could be a reasonable outcome.
,
Sep 18 2017
CLs are submitted despite failures in the CQ, for multiple reasons, including: - CL deemed irrelevant to failed config - CL passed on the failed config in previous runs
,
Sep 18 2017
@ Comment #29 : Cool. Thanks. In that case, why weren't any artifacts uploaded (not only autotest, but others either)? An uprev happened, so the corresponding autotest package artifact should be getting uploaded. Is there something about certain configs which prevents upload of artifacts? One hypothesis here is that, this is patch is the last time an uprev happened to autotest-tests-security (and the new artifact didn't get uploaded). If since then, no uprev has happened to autotest-tests-security, then the artifact will continue to remain the stale one.
,
Sep 18 2017
Also, Re Comment #1: Drilling a bit further down into the individual failures in the master-paladin failing CQ link (https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16273) : 1. The lumpy-paladin has the correct autotest artifacts, i.e the user is root, and not cryptohome 2. In the wolf-tot-paladin (which is ostensibly from the same CQ run), the autotest artifacts are stale, i.e the user is cryptohome. Point number 1 above suggests that the artifacts have been generated correctly somewhere, and it's the wolf-tot-paladin builder which is using the stale artifacts for some reason. Would anyone happen to know why that is the case?
,
Sep 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/6acd49fbf3a6f4afa3135d7a535546e301e85398 commit 6acd49fbf3a6f4afa3135d7a535546e301e85398 Author: Aviv Keshet <akeshet@chromium.org> Date: Mon Sep 18 23:48:50 2017 chromeos_config: mark wolf-tot-paladin experimental BUG= chromium:765565 TEST=None Change-Id: Iea66a11d9488cc65a24c80cf233623743df10b93 Reviewed-on: https://chromium-review.googlesource.com/671459 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Puneet Kumar <puneetster@chromium.org> [modify] https://crrev.com/6acd49fbf3a6f4afa3135d7a535546e301e85398/cbuildbot/config_dump.json [modify] https://crrev.com/6acd49fbf3a6f4afa3135d7a535546e301e85398/cbuildbot/chromeos_config.py
,
Sep 18 2017
wolf-tot-paladin is a special snowflake which doesn't run with the latest changes being tested on the CQ run, instead it runs from current ToT and is used to verify infrastructure. The logs for wolf-tot-paladin/11378 https://viceroy.corp.google.com/chromeos/build_details?build_id=1860529 https://storage.cloud.google.com/chromeos-autotest-results/143007708-chromeos-test/chromeos4-row1-rack3-host13/debug/autoserv.DEBUG?_ga=2.220086554.-2088893480.1505768831 seem to show all accesses to be of R63-9953.0.0-rc3 which is the correct version. It remains to see what is in that package and if it has the correct contents. I recommend downloading the autotest package and looking at it directlyl.
,
Sep 19 2017
> wolf-tot-paladin is a special snowflake which doesn't run with the latest changes being tested on the CQ run, instead it runs from current ToT and is used to verify infrastructure. Is this the case for chell too?
,
Sep 19 2017
chell-incremental yes. chell-paladin no.
,
Sep 19 2017
RE:Comment #33 : I checked the autotest package for wolf-tot-paladin/11378 and confirmed it does *not* contain the right version of the package (the user in basline is still cryptohome). What's interesting is that the change to make the user "root" in the baseline is not in the current CQ run; it got merged on Sept 14, and so it should be a part of current ToT. So the wolf-tot-paladin version of current ToT is not in sync with actual current ToT : https://cs.corp.google.com/chromeos_internal/src/third_party/autotest/files/client/site_tests/security_SandboxedServices/baseline?rcl=9d1db417848092bdb5add994421ea43a15d38a14&l=17
,
Sep 19 2017
It feels like the symptom here is a lot like the symptom in bug 716151. Is there a difference?
,
Sep 19 2017
Could be related. In both cases the latest autotest artifact package is stale.
,
Sep 19 2017
here's the logs from the wolf bot: - 20170915-013629.log - when the bot updated originally but failed to rebuild/update the /build/$BOARD/usr/local/build/autotest/client/site_tests/security_SandboxedServices/test-security_SandboxedServices.tar.bz2 - 20170919-061211.log - me manually running `emerge-wolf autotest-tests-security` on the bot and the archive getting updated
,
Sep 19 2017
my manual rebuilds on the two bots has recovered them, but there's still something fundamentally wrong in the autotest logic. we can't trust CLs running through the tree.
,
Sep 19 2017
,
Sep 19 2017
May I ask what is different between your manual rebuild, and the normal "emerge" commands that run on wolf-tot-paladin (or indeed, other builders) ? The autotest logic modification shifted the tar-ing to the src_compile() phase (instead of the upload phase).
,
Sep 19 2017
Comparing the logs, the manual emerge logs has the following prints which aren't there in the buildbot logs: >>> Merging chromeos-base/autotest-tests-security-0.0.1-r3179 to /build/wolf/ [32;01m*[0m Running stacked hooks for pre_pkg_preinst [32;01m*[0m wrap_old_config_scripts ... [A[194C [34;01m[ [32;01mok[34;01m ][0m --- /build/wolf/usr/ --- /build/wolf/usr/lib/ --- /build/wolf/usr/lib/debug/ --- /build/wolf/usr/lib/debug/usr/ --- /build/wolf/usr/lib/debug/usr/local/ --- /build/wolf/usr/lib/debug/usr/local/build/ --- /build/wolf/usr/lib/debug/usr/local/build/autotest/ --- /build/wolf/usr/lib/debug/usr/local/build ...... ..... >>> Safely unmerging already-installed instance... No package files given... Grabbing a set. --- replaced dir /build/wolf/usr/local/build/autotest/server/tests --- replaced dir /build/wolf/usr/local/build/autotest/server/site_tests --- replaced dir /build/wolf/usr/local/build/autotest/server --- replaced obj /build/wolf/usr/local/build/autotest/quickmerge/chromeos-base/autotest-tests-security --- replaced dir /build/wolf/usr/local/build/autotest/quickmer ..... ..... Why does the "Merging chromeos-base/autotest-tests-security..." and "Safely unmerging already-installed instance" bit get executed when we run a manual merge but not on the buildbot runs?
,
Sep 19 2017
The above almost seems as if the builder's emerge use the --buildpkgonly option:
From man emerge:
--buildpkgonly (-B)
Creates binary packages for all ebuilds processed without actually merging the packages. This comes with the caveat that all build-time dependencies must already be emerged on the system.
,
Sep 19 2017
,
Sep 20 2017
I'd have to dig through the build_packages code to be sure, but it wouldn't surprise me if we do a --buildpkgonly first then do targetted emerges of that binary package into multiple locations (board specific section of the chroot, the image, etc). It's certain that we emerge the binary package more than once.
,
Sep 21 2017
I went through build_packages.py and I didn't see multiple invocations of "emerge". One thing that did stand out is the use of the "--reuse_pkgs_from_local_boards" flag. There seems to be at least some version of the cbuildbot commands that use this flag (https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/commands.py?rcl=fbbccec70e97d1679475eb6936c28e04da28002a&l=53) so I'm wondering if it might be related.
,
Sep 23 2017
,
Feb 21 2018
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by vapier@chromium.org
, Sep 15 2017