New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 707456 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

export_to_gcloud from BuildPackages failing

Project Member Reported by davidri...@chromium.org, Apr 1 2017

Issue description

The export_to_gcloud for parallel_emerge is failing, I believe from gcloud/venv issues.

https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fguado_moblab-paladin%2F5499%2F%2B%2Frecipes%2Fsteps%2FBuildPackages%2F0%2Fstdout

  File "/b/cbuild/internal_master/chromite/bin/export_to_gcloud", line 99, in <module>
    main()
  File "/b/cbuild/internal_master/chromite/bin/export_to_gcloud", line 36, in main
    wrapper.DoMain()
  File "/b/cbuild/internal_master/chromite/scripts/wrapper.py", line 164, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/b/cbuild/internal_master/chromite/lib/commandline.py", line 816, in ScriptWrapperMain
    target = find_target_func(target)
  File "/b/cbuild/internal_master/chromite/scripts/wrapper.py", line 139, in FindTarget
    module = cros_import.ImportModule(target)
  File "/b/cbuild/internal_master/chromite/lib/cros_import.py", line 43, in ImportModule
    module = __import__(target)
  File "/b/cbuild/internal_master/chromite/scripts/export_to_gcloud.py", line 9, in <module>
    from gcloud import datastore
  File "/home/chrome-bot/.cache/cros_venv/venv-2.7.6-5addca6cf590166d7b70e22a95bea4a0/local/lib/python2.7/site-packages/gcloud/__init__.py", line 19, in <module>
    __version__ = get_distribution('gcloud').version
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 311, in get_distribution
    if isinstance(dist,Requirement): dist = get_provider(dist)
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 197, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 666, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 565, in resolve
    raise DistributionNotFound(req)  # XXX put more info here
pkg_resources.DistributionNotFound: gcloud
15:57:10: WARNING: Unable to export to datastore: return code: 1; command: /b/cbuild/internal_master/chromite/bin/export_to_gcloud /creds/service_accounts/service-account-chromeos-datastore-writer-prod.json /b/cbuild/internal_master/buildbot_archive/guado_moblab-paladin/R59-9418.0.0-rc4/build-events.json --parent_key "('Build', 1422700, 'BuildStage', 41541274L)"
cmd=['/b/cbuild/internal_master/chromite/bin/export_to_gcloud', '/creds/service_accounts/service-account-chromeos-datastore-writer-prod.json', '/b/cbuild/internal_master/buildbot_archive/guado_moblab-paladin/R59-9418.0.0-rc4/build-events.json', '--parent_key', "('Build', 1422700, 'BuildStage', 41541274L)"]

 
Please verify that this (and the other three instances of build data being export to datastore) is working end to end prior to closing this bug.  I'm repeatedly having to chase stuff getting broken here.
>(and the other three instances of build data being export to datastore)

Which?
There's one from parallel emerge, one from report stages with upload metadata, and one from TKO parse.
(Sorry, three in total, not three other)
Weird:

ayatane@cros-beefy361-c2:~$ cd /b/cbuild/internal_master/chromite
ayatane@cros-beefy361-c2:/b/cbuild/internal_master/chromite$ bin/export_to_gcloud 
usage: export_to_gcloud [-h]
                        [--log-level {fatal,critical,error,warning,notice,info,debug}]
                        [--log_format LOG_FORMAT] [--debug] [--nocolor]
                        [--project_id PROJECT_ID] [--namespace NAMESPACE]
                        [--parent_key PARENT_KEY]
                        service_acct_json entities
export_to_gcloud: error: too few arguments
TKO upload isn't failing, it's just the two calls from cbuildbot workflow.

It looks like something is injecting foreign dependencies into the virtualenv, which is shadowing setuptools.
chrome-bot@cros-beefy361-c2:(Linux 14.04):/b/build$ PYTHONPATH=/b/build/third_party/setuptools-0.6c11/ /b/cbuil
d/internal_master/chromite/bin/export_to_gcloud
Traceback (most recent call last):
  File "/b/cbuild/internal_master/chromite/bin/export_to_gcloud", line 99, in <module>
    main()
  File "/b/cbuild/internal_master/chromite/bin/export_to_gcloud", line 36, in main
    wrapper.DoMain()
  File "/b/cbuild/internal_master/chromite/scripts/wrapper.py", line 164, in DoMain
    commandline.ScriptWrapperMain(FindTarget)
  File "/b/cbuild/internal_master/chromite/lib/commandline.py", line 816, in ScriptWrapperMain
    target = find_target_func(target)
  File "/b/cbuild/internal_master/chromite/scripts/wrapper.py", line 139, in FindTarget
    module = cros_import.ImportModule(target)
  File "/b/cbuild/internal_master/chromite/lib/cros_import.py", line 43, in ImportModule
    module = __import__(target)
  File "/b/cbuild/internal_master/chromite/scripts/export_to_gcloud.py", line 9, in <module>
    from gcloud import datastore
  File "/home/chrome-bot/.cache/cros_venv/venv-2.7.6-5addca6cf590166d7b70e22a95bea4a0/local/lib/python2.7/site-pack
ages/gcloud/__init__.py", line 19, in <module>
    __version__ = get_distribution('gcloud').version
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 311, in get_distribution
    if isinstance(dist,Requirement): dist = get_provider(dist)
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 197, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 666, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/b/build/third_party/setuptools-0.6c11/pkg_resources.py", line 565, in resolve
    raise DistributionNotFound(req)  # XXX put more info here
pkg_resources.DistributionNotFound: gcloud

We should probably not be inheriting PYTHONPATH in the virtualenv.
>We should probably not be inheriting PYTHONPATH in the virtualenv.

1. This statement is true, but we have been inheriting PYTHONPATH in the virtualenv since the beginning of virtualenv deployment.
2. I wonder why we're only seeing this issue now (or did we miss it before?)  Nothing has changed on the virtualenv side to affect PYTHONPATH.  Did something change in the environment elsewhere?
davidriley: Can we surface export_to_gcloud failures somehow?
I didn't mean to imply the TKO upload is failing, but I'd like whatever changes you make to verify that continues to work so we don't get into the situation where the fix breaks some other usage again.

I'm not sure what you mean by surface?  We could fail builds, but I do not feel comfortable failing builds when this keeps breaking and is unreliable.  We want the data, but it's not critical and worth making our build infrastructure more flaky.

c#10: No idea, I'm just really a consumer of all of this.
Project Member

Comment 14 by bugdroid1@chromium.org, Apr 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/82b4840db284ab826b52aef9017c133a80590d9e

commit 82b4840db284ab826b52aef9017c133a80590d9e
Author: Allen Li <ayatane@chromium.org>
Date: Mon Apr 03 22:41:51 2017

Scrub PYTHONPATH from virtualenv scripts

BUG= chromium:707456 
TEST=Run PYTHONPATH=/usr/lib/python3/dist-packages bin/export_to_gcloud

Change-Id: I20ab158ead3ca3f6921d84c30ced911e685ffc1f
Reviewed-on: https://chromium-review.googlesource.com/465550
Commit-Ready: Allen Li <ayatane@chromium.org>
Tested-by: Allen Li <ayatane@chromium.org>
Reviewed-by: David Riley <davidriley@chromium.org>

[modify] https://crrev.com/82b4840db284ab826b52aef9017c133a80590d9e/scripts/virtualenv_wrapper.py

Status: Fixed (was: Untriaged)
Assuming this isn't a flaky error, I checked the latest moblab paladin and this error does not appear any more.
Can we add tests to avoid this in the future?
Tests in what sense?  Ultimately, the only way to ensure that export_to_gcloud doesn't break in all of the exact environments that run it is to actually run it in all of those environments and make sure it doesn't break.

Tests in the sense of regression tests to prevent this particular case from regressing, yes we can, and I recall adding unit tests for it, although a quick check suggests that I remember incorrectly.
I do not want to make export_to_gcloud failures fail the build at this time, especially given how unreliable things have been.  That being said, I do not want more changes to slip through which break things in undetected ways.  I'm open to tests that achieve these two goals across the known usages of virtualenv.
I think we should only whitelist specific errors from export_to_gcloud that we know are caused by expected flake.  There are a lot of ways for it to fail that are not limited to virtualenv, and we should catch all of them except the small subset we know are bogus.

Probably the best is to add a flag to export_to_gcloud that says exit with 0 for flake errors, then we can treat all non-zero exit as a real issue.
No, even then I'm not sure if we want to fail the build.  In particular if a change lands that causes all export_to_gcloud calls to fail on some subset of builds, is it worth failing builds?  Since this data isn't critical, I don't think we should be.

I'd much rather have good tests that ensure that it doesn't land in the first place, instead of using the canaries and other builders as guinea pigs.  

Once we have good tests where we think it's very unlikely for bad changes to slip through, then we can entertain making builds fail based on unsuccessful invocations.
Put another way, more succinctly: if export_to_gcloud is the only part of a CQ run that fails, it should not be causing developers changes to get rejected by a failed CQ run.
If a developer breaks export_to_gcloud, they need to fix their change.  That's the point of the CQ, isn't it?  I'm not talking about flake, I'm talking about any random thing that could break export_to_gcloud, like someone adding an innocuous import in a particular file that causes the house of cards to collapse.

Sign in to add a comment