New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 758652 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

cros tryjob --local --yes chromiumos-sdk doesn't work (or "how does one replicate chromiumos-sdk locally?")

Project Member Reported by manojgupta@chromium.org, Aug 24 2017

Issue description

Chromiumos-sdk builder had a breakage recently
https://bugs.chromium.org/p/chromium/issues/detail?id=757147

While trying to work on this, it was discovered that cros tryjob --local --yes chromiumos-sdk --buildroot /path doesn't work and fails with AccessDeniedException: 403 ********@google.com does not have storage.objects.create access to bucket chromeos-image-archive.

This hinders the ability to be reproduce a builders issue locally.
 
Just to check....

Does "cros tryjob --local --yes --buildroot /path lumpy-compile-only-pre-cq" work?
I just wiped my workstation while gLinux, and saw similar errors until I restored my ".netrc" and ".gitcookies" files.

It's been so long, that I have no idea how I generated those files in the first place.
Owner: manojgupta@chromium.org
Cc: jkop@chromium.org
Jacob, do you remember where the docs are for generating those files?
For the gitcookies file, http://www.chromium.org/chromium-os/developer-guide/gerrit-guide has the information.

I never had a .netrc file so maybe it is not required.


.gitcookies is documented on the Gerrit pages, and referred from the developer guide:

https://www.chromium.org/chromium-os/developer-guide/gerrit-guide

I have never had to set up my .netrc (that *used* to be the way we accessed gerrit, but no more).

I was one developer who ran into this problem. I'm trying the suggestion from #1 to see if that's also a problem.

I'm pretty sure that there was explicitly a part of the SDK stage that was trying to upload some stats or manifest info to GS. But I don't have the full log run from my failed chromiumos-sdk attempt at the moment.

Comment 7 by jkop@chromium.org, Aug 24 2017

I also don't have any .netrc file at all, anywhere in my tree, so +1 to that not being required.
(By "Gerrit pages" I mean, it's documented in the Gerrit settings, if you visit chrom{ium,e-internal}-review.googlesource.com.)
See: https://chromium-review.googlesource.com/470346


sdk_stages.py:
 def _SendPerfValues(self, buildroot, sdk_tarball, buildbot_uri_log, version,
                      platform_name):
    """Generate & upload perf data for the build"""
 ...
    
    # Due to limitations in the perf dashboard, we have to create an integer
    # based on the current timestamp.  This field only accepts integers, and
    # the perf dashboard accepts this or CrOS+Chrome official versions.
    revision = int(version.replace('.', ''))
    perf_values = perf_uploader.LoadPerfValues(perf_path)
    self._UploadPerfValues(perf_values, platform_name, test_name,
                           revision=revision)

...

generic_stages.py:
  def _UploadPerfValues(self, *args, **kwargs):
    """Helper for uploading perf values.

    This currently handles common checks only.  We could make perf values more
    integrated in the overall stage running process in the future though if we
    had more stages that cared about this.
    """
    # Only upload perf data for buildbots as the data from local tryjobs
    # probably isn't useful to us.
    if not self._run.options.buildbot:
      return

    try:
      retry_util.RetryException(perf_uploader.PerfUploadingError, 3,
                                perf_uploader.UploadPerfValues,
                                *args, **kwargs)
    except perf_uploader.PerfUploadingError:
      logging.exception('Uploading perf data failed')


Perhaps that isn't properly skipping 'local' tryjobs with 'cros tryjob'?
Owner: dgarr...@chromium.org
"cros tryjob --local --yes --buildroot /path lumpy-compile-only-pre-cq" has the same error.

Comment 11 by jkop@chromium.org, Aug 24 2017

I and dgarrett@ have upgraded our machines to the new Goobuntu, I'm running a test to see whether the upgrade or the absent .netrc looks like a more likely culprit.
FWIW, I'm still running Trusty, and seeing these problems. So that shouldn't be relevant? (Or maybe I'm misunderstanding your statement.)
Oh... wait, I misread your initial error. I was expecting to see the same thing I'd just seen and Gerrit related credential issues are a red herring.

"access to bucket chromeos-image-archive"

This comes from your .boto file, and as a member of the deputy rotation, Jacob and I have elevated privledges there. That may be why I'm not seeing the errors you are.

I wouldn't have expected local tryjobs to write to that bucket, and resolving that is non-trivial.


Comment 14 by pwang@chromium.org, Aug 25 2017

Cc: pwang@chromium.org
Had the same problem here when I built my builder locally.
Is there any setting to disable all the upload logic in the builder via cros tryjob?
Cc: -apronin@chromium.org vapier@chromium.org apro...@chromium.rg
Nope.

Local tryjobs will only work for a limited number of people (that we don't want to grow). This is not a change in behavior from cbuildbot --local.

I started to remove support for them totally (thinking they were generally unused), but vapier@ stopped me, saying that they were.
Cc: -apro...@chromium.rg apronin@chromium.org
> I wouldn't have expected local tryjobs to write to that bucket, and resolving that is non-trivial.

is it ?  i thought we put a lot of those behind knobs like "if buildbot" which isn't set here ?  i guess it broke at some point and we didn't notice :/.
That's why I don't really like local tryjobs. We end up having to write/maintain code to work around the fact the workstations are configured differently from builders.
Summary: cros tryjob --local --yes chromiumos-sdk doesn't work (or "how does one replicate chromiumos-sdk locally?") (was: cros tryjob --local --yes chromiumos-sdk doesn't work)
So, the conventional wisdom is "don't use local tryjobs unless you're part of the elite squad"? So we should make sure there are easy alternative ways for developers to reproduce rough equivalents on their local machines for things we can expect them to care about. In this case, nobody provided clear instructions on how to do the equivalent of chromiumos-sdk's build/test. In the end, I *think* I scraped the build stages to come up with the equivalent here:

https://bugs.chromium.org/p/chromium/issues/detail?id=756240#c16

But I'm still not 100% sure that's quite right, as no one has reviewed it AFAIK.

I also see now that the SDK build is partially documented here:

https://www.chromium.org/chromium-os/build/sdk-creation

but that's still not sufficient for debugging a problem.

What can be improved here? Just documentation? Or do we need better helper scripts? Or is this just a rare couple-times-a-year problem that requires reverse engineering the builders, and we're happy as-is?
re #17, 

I totally agree. I thought that we could always execute a cbuildbot --local and it would check if it was inside the "lab" environment and do the right thing (by not trying to push things into the official locations). 

Please don't get rid of local tryjobs. We don't use them that frequently but it is a great way to reproduce problems.

Cc: dgarr...@chromium.org
Owner: briannorris@chromium.org
Status: Started (was: Untriaged)
Is this the right thing?

https://chromium-review.googlesource.com/641984

It gets me much further along on a local lumpy trybot, at least.
Just to ask.... what's useful about a local tryjob compared to a remote tryjob?

I work on cbuildbot continuously, and never use them.
Debugging. That was the impetus for opening this bug.
I mean, what information can you get that you can't get otherwise?

Are you looking at the chroot contents? Or is it something else?

If it's feasible to make remote tryjobs serve your needs, I'd rather do that and pull --local, since it's not heavily used and not well maintained.
Would tarring/archiving the chroot for tryjobs (perhaps behind a flag) be a reasonable solution?
In this case, the root cause of the failure was an ancient packaging error in chromeos-base/chromeos-base that Portage swallowed because it was in the non-fatal pkg_postinst ebuild region. This was discovered by hacking the tryjob to stop at the point of failure and manually invoking the exact step with the exact environment and capturing Portage tmp logs for suspect ebuilds.

As long as we have systemic logging shortcomings and spooky action at a distance in our build system, I don't see how we'll be able to avoid running these locally.

As an aside, coming from 6 years in google3 where almost everything that runs on Forge is hermetic and well-logged, there have been many, many times where I've had to reproduce a Forge failure on my workstation. For example: to inspect the process tree, to inspect file descriptor state between processes, to inspect a local database running in the same process jail, to change the process jail security semantics, etc. Experience tells me that we will never be able to rely solely on remote tryjobs.

Alright, I give.

Just understand that it doesn't get first class support, and so won't be as robust.
fwiw,  issue 709532  should be resolved soon (this week?), so that should address the inability to view logs of ebuilds that were processed in every run
For me it is important for reproducibility.
Like in the case found for jclinton.
A different way to do it is to logon to the machine where the build failed and reproduce there but that only works if the machine has not been taken by other build.
It occurs to me that you could add an option to the trybot execution to lock the machine after the build is done so that you can go and diagnose the failure there.
Owner: dgarr...@chromium.org
Don's on it.
Project Member

Comment 31 by bugdroid1@chromium.org, Sep 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/223541b051fac5dbcce0edeb0899b50f4fd133f2

commit 223541b051fac5dbcce0edeb0899b50f4fd133f2
Author: Don Garrett <dgarrett@google.com>
Date: Fri Sep 01 21:41:50 2017

cros tryjob: Explicitly use --debug for local tryjobs.

When launching a local tryjob, always pass in --debug explicitly.

BUG= chromium:758652 
TEST=run_tests

Change-Id: I67c04e5114a5096c431f18d87fa231fc78809faa
Reviewed-on: https://chromium-review.googlesource.com/644310
Commit-Ready: Don Garrett <dgarrett@chromium.org>
Tested-by: Don Garrett <dgarrett@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Paul Hobbs <phobbs@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>

[modify] https://crrev.com/223541b051fac5dbcce0edeb0899b50f4fd133f2/cli/cros/cros_tryjob_unittest.py
[modify] https://crrev.com/223541b051fac5dbcce0edeb0899b50f4fd133f2/cli/cros/cros_tryjob.py

Owner: manojgupta@chromium.org
If you sync, I BELIEVE this will now be fixed. Can you please confirm?
Status: Verified (was: Started)
Confirmed working!

Sign in to add a comment