cros tryjob --local --yes chromiumos-sdk doesn't work (or "how does one replicate chromiumos-sdk locally?") |
||||||||||||
Issue descriptionChromiumos-sdk builder had a breakage recently https://bugs.chromium.org/p/chromium/issues/detail?id=757147 While trying to work on this, it was discovered that cros tryjob --local --yes chromiumos-sdk --buildroot /path doesn't work and fails with AccessDeniedException: 403 ********@google.com does not have storage.objects.create access to bucket chromeos-image-archive. This hinders the ability to be reproduce a builders issue locally.
,
Aug 24 2017
I just wiped my workstation while gLinux, and saw similar errors until I restored my ".netrc" and ".gitcookies" files. It's been so long, that I have no idea how I generated those files in the first place.
,
Aug 24 2017
,
Aug 24 2017
Jacob, do you remember where the docs are for generating those files?
,
Aug 24 2017
For the gitcookies file, http://www.chromium.org/chromium-os/developer-guide/gerrit-guide has the information. I never had a .netrc file so maybe it is not required.
,
Aug 24 2017
.gitcookies is documented on the Gerrit pages, and referred from the developer guide: https://www.chromium.org/chromium-os/developer-guide/gerrit-guide I have never had to set up my .netrc (that *used* to be the way we accessed gerrit, but no more). I was one developer who ran into this problem. I'm trying the suggestion from #1 to see if that's also a problem. I'm pretty sure that there was explicitly a part of the SDK stage that was trying to upload some stats or manifest info to GS. But I don't have the full log run from my failed chromiumos-sdk attempt at the moment.
,
Aug 24 2017
I also don't have any .netrc file at all, anywhere in my tree, so +1 to that not being required.
,
Aug 24 2017
(By "Gerrit pages" I mean, it's documented in the Gerrit settings, if you visit chrom{ium,e-internal}-review.googlesource.com.)
,
Aug 24 2017
See: https://chromium-review.googlesource.com/470346 sdk_stages.py: def _SendPerfValues(self, buildroot, sdk_tarball, buildbot_uri_log, version, platform_name): """Generate & upload perf data for the build""" ... # Due to limitations in the perf dashboard, we have to create an integer # based on the current timestamp. This field only accepts integers, and # the perf dashboard accepts this or CrOS+Chrome official versions. revision = int(version.replace('.', '')) perf_values = perf_uploader.LoadPerfValues(perf_path) self._UploadPerfValues(perf_values, platform_name, test_name, revision=revision) ... generic_stages.py: def _UploadPerfValues(self, *args, **kwargs): """Helper for uploading perf values. This currently handles common checks only. We could make perf values more integrated in the overall stage running process in the future though if we had more stages that cared about this. """ # Only upload perf data for buildbots as the data from local tryjobs # probably isn't useful to us. if not self._run.options.buildbot: return try: retry_util.RetryException(perf_uploader.PerfUploadingError, 3, perf_uploader.UploadPerfValues, *args, **kwargs) except perf_uploader.PerfUploadingError: logging.exception('Uploading perf data failed') Perhaps that isn't properly skipping 'local' tryjobs with 'cros tryjob'?
,
Aug 24 2017
"cros tryjob --local --yes --buildroot /path lumpy-compile-only-pre-cq" has the same error.
,
Aug 24 2017
I and dgarrett@ have upgraded our machines to the new Goobuntu, I'm running a test to see whether the upgrade or the absent .netrc looks like a more likely culprit.
,
Aug 24 2017
FWIW, I'm still running Trusty, and seeing these problems. So that shouldn't be relevant? (Or maybe I'm misunderstanding your statement.)
,
Aug 24 2017
Oh... wait, I misread your initial error. I was expecting to see the same thing I'd just seen and Gerrit related credential issues are a red herring. "access to bucket chromeos-image-archive" This comes from your .boto file, and as a member of the deputy rotation, Jacob and I have elevated privledges there. That may be why I'm not seeing the errors you are. I wouldn't have expected local tryjobs to write to that bucket, and resolving that is non-trivial.
,
Aug 25 2017
Had the same problem here when I built my builder locally. Is there any setting to disable all the upload logic in the builder via cros tryjob?
,
Aug 25 2017
Nope. Local tryjobs will only work for a limited number of people (that we don't want to grow). This is not a change in behavior from cbuildbot --local. I started to remove support for them totally (thinking they were generally unused), but vapier@ stopped me, saying that they were.
,
Aug 25 2017
,
Aug 25 2017
> I wouldn't have expected local tryjobs to write to that bucket, and resolving that is non-trivial. is it ? i thought we put a lot of those behind knobs like "if buildbot" which isn't set here ? i guess it broke at some point and we didn't notice :/.
,
Aug 25 2017
That's why I don't really like local tryjobs. We end up having to write/maintain code to work around the fact the workstations are configured differently from builders.
,
Aug 28 2017
So, the conventional wisdom is "don't use local tryjobs unless you're part of the elite squad"? So we should make sure there are easy alternative ways for developers to reproduce rough equivalents on their local machines for things we can expect them to care about. In this case, nobody provided clear instructions on how to do the equivalent of chromiumos-sdk's build/test. In the end, I *think* I scraped the build stages to come up with the equivalent here: https://bugs.chromium.org/p/chromium/issues/detail?id=756240#c16 But I'm still not 100% sure that's quite right, as no one has reviewed it AFAIK. I also see now that the SDK build is partially documented here: https://www.chromium.org/chromium-os/build/sdk-creation but that's still not sufficient for debugging a problem. What can be improved here? Just documentation? Or do we need better helper scripts? Or is this just a rare couple-times-a-year problem that requires reverse engineering the builders, and we're happy as-is?
,
Aug 28 2017
re #17, I totally agree. I thought that we could always execute a cbuildbot --local and it would check if it was inside the "lab" environment and do the right thing (by not trying to push things into the official locations). Please don't get rid of local tryjobs. We don't use them that frequently but it is a great way to reproduce problems.
,
Aug 30 2017
Is this the right thing? https://chromium-review.googlesource.com/641984 It gets me much further along on a local lumpy trybot, at least.
,
Aug 30 2017
Just to ask.... what's useful about a local tryjob compared to a remote tryjob? I work on cbuildbot continuously, and never use them.
,
Aug 30 2017
Debugging. That was the impetus for opening this bug.
,
Aug 30 2017
I mean, what information can you get that you can't get otherwise? Are you looking at the chroot contents? Or is it something else? If it's feasible to make remote tryjobs serve your needs, I'd rather do that and pull --local, since it's not heavily used and not well maintained.
,
Aug 30 2017
Would tarring/archiving the chroot for tryjobs (perhaps behind a flag) be a reasonable solution?
,
Aug 30 2017
In this case, the root cause of the failure was an ancient packaging error in chromeos-base/chromeos-base that Portage swallowed because it was in the non-fatal pkg_postinst ebuild region. This was discovered by hacking the tryjob to stop at the point of failure and manually invoking the exact step with the exact environment and capturing Portage tmp logs for suspect ebuilds. As long as we have systemic logging shortcomings and spooky action at a distance in our build system, I don't see how we'll be able to avoid running these locally. As an aside, coming from 6 years in google3 where almost everything that runs on Forge is hermetic and well-logged, there have been many, many times where I've had to reproduce a Forge failure on my workstation. For example: to inspect the process tree, to inspect file descriptor state between processes, to inspect a local database running in the same process jail, to change the process jail security semantics, etc. Experience tells me that we will never be able to rely solely on remote tryjobs.
,
Aug 30 2017
Alright, I give. Just understand that it doesn't get first class support, and so won't be as robust.
,
Aug 30 2017
fwiw, issue 709532 should be resolved soon (this week?), so that should address the inability to view logs of ebuilds that were processed in every run
,
Aug 30 2017
For me it is important for reproducibility. Like in the case found for jclinton. A different way to do it is to logon to the machine where the build failed and reproduce there but that only works if the machine has not been taken by other build. It occurs to me that you could add an option to the trybot execution to lock the machine after the build is done so that you can go and diagnose the failure there.
,
Aug 31 2017
Don's on it.
,
Sep 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/223541b051fac5dbcce0edeb0899b50f4fd133f2 commit 223541b051fac5dbcce0edeb0899b50f4fd133f2 Author: Don Garrett <dgarrett@google.com> Date: Fri Sep 01 21:41:50 2017 cros tryjob: Explicitly use --debug for local tryjobs. When launching a local tryjob, always pass in --debug explicitly. BUG= chromium:758652 TEST=run_tests Change-Id: I67c04e5114a5096c431f18d87fa231fc78809faa Reviewed-on: https://chromium-review.googlesource.com/644310 Commit-Ready: Don Garrett <dgarrett@chromium.org> Tested-by: Don Garrett <dgarrett@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Paul Hobbs <phobbs@google.com> Reviewed-by: Brian Norris <briannorris@chromium.org> [modify] https://crrev.com/223541b051fac5dbcce0edeb0899b50f4fd133f2/cli/cros/cros_tryjob_unittest.py [modify] https://crrev.com/223541b051fac5dbcce0edeb0899b50f4fd133f2/cli/cros/cros_tryjob.py
,
Sep 1 2017
If you sync, I BELIEVE this will now be fixed. Can you please confirm?
,
Sep 1 2017
Confirmed working! |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by dgarr...@chromium.org
, Aug 24 2017