New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 712349 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 702930



Sign in to add a comment

ChromeOS telemetry bots cannot access cloud storage for perf tests

Project Member Reported by achuith@chromium.org, Apr 17 2017

Issue description

Sample failing run:
https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11299

Stack:

04/17 12:27:59.383 ERROR|      archive_info:0093| You either aren't authenticated or don't have permission to use the archives for this page set.
You may need to run gsutil config.
You can find instructions for gsutil config at: http://www.chromium.org/developers/telemetry/upload_to_cloud_storage
04/17 12:27:59.391 INFO |run_chromeos_tests:0052| benchmarks.system_health_smoke_test.load_tests() failed: Attempted to access a file from Cloud Storage but you have no configured credentials. To configure your credentials:
04/17 12:27:59.391 INFO |run_chromeos_tests:0052|   1. Run "HOME=/home/chromeos-test/ /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil config" and follow its instructions.
04/17 12:27:59.391 INFO |run_chromeos_tests:0052|   2. If you have a @google.com account, use that account.
04/17 12:27:59.392 INFO |run_chromeos_tests:0052|   3. For the project-id, just enter 0.
04/17 12:27:59.449 WARNI|              test:0615| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 609, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 817, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 470, in execute
    dargs)
  File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/telemetry_UnitTests/telemetry_UnitTests.py", line 35, in run_once
    raise error.TestFail(error_str)
TestFail: The unit tests of /usr/local/telemetry/src/tools/perf failed.


Richard, Don, looks like the GS account on the bot needs access to the telemetry buckets. I can request access. What is the GS account?
 
Blocking: 702930
Cc: achuith@chromium.org
Owner: afakhry@chromium.org
Cc: glevin@chromium.org
I have no idea how to handle this. Any guidance will be helpful!
Cc: chingcodes@chromium.org
You'll need help from someone in infra - maybe the deputy can help?
chingcodes, do you know how and where I can run the instructions in #0 for this builder?
Owner: friedman@chromium.org
afakhry: The instructions only apply to normal users. These credentials should have been pre-installed on these builders by puppet.

Elliot, "build246-m2" on the "chromiumos.chromium" waterfall doesn't seem to have the proper .boto file installed.

build246-m2:/home/chrome-bot# grep '^# Boto file' .boto
# Boto file for chromeos.bot@gmail.com

Which matches nodes.yaml in puppet:
https://chrome-internal.googlesource.com/infra/puppet/+/master/puppetm/opt/puppet/conf/nodes.yaml#580
    chrome_infra::credentials::boto:
      boto: chromeos.bot@gmail.com
Owner: achuith@chromium.org
Ah... it does have the boto (I should have logged in to check, sorry Elliot). In fact, there are successful gsutil commands further down in the failing stage.

Looking more carefully at the logs, it seems like the test itself is trying to upload to GS? That seems wrong. Is the test trying to do this from the virtual machine? If so, then this failure is expected. We don't put gsutil credentials on the VM.
Cc: kbr@chromium.org nedngu...@google.com
Ned, do you know who could answer Don's question? Kenneth?
THe test was trying to download WPR files from cloud storage.
And if anyone can show the exact gsutil command that's failing, it might shed light on the problem.
Has something changed maybe 2 months ago with the permissions? This used to work.
Nothing change afaik. But why is the range of suspect so large? (2 months)
Hum... could it simply be that "chromeos.bot@gmail.com" (the account our builders are using) no longer has the needed bucket permissions?

I was really hoping for the gsutil command line used more than the code. That would let me re-run it by hand on a builder and see if I get different results.
Yup, let me see if I can get that for you
This appears to be the failure:
HOME=/home/chromeos-test/ /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil cp gs://chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e /usr/local/telemetry/src/tools/perf/page_sets/data/tmpM8Z84H

ServiceException: 401 Anonymous users does not have storage.objects.get access to object chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e.

Ah.... the problem is that it sets "HOME".

The .boto file is stored in the home directory, and /home/chromeos-test doesn't exist on the builders. The means no .boto, and that means you are "Anonymous".

From our point of view, just stop setting HOME, but I'm sure it was added for a valid reason somewhere else.
This isn't running on the builders - it's running in the VM.

Without HOME, we have:

localhost ~ # /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil cp gs://chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e /usr/local/telemetry/src/tools/perf/page_sets/data/tmpM8Z84H
Traceback (most recent call last):
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil", line 22, in <module>
    gsutil.RunMain()
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil.py", line 106, in RunMain
    import gslib.__main__
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/__main__.py", line 53, in <module>
    from gslib import wildcard_iterator
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/wildcard_iterator.py", line 38, in <module>
    from gslib.util import UTF8
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 308, in <module>
    CreateDirIfNeeded(GetGsutilStateDir())
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 272, in GetGsutilStateDir
    CreateDirIfNeeded(config_file_dir)
  File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 228, in CreateDirIfNeeded
    os.makedirs(dir_path, mode)
  File "/usr/local/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/root/.gsutil'

Cc: tbarzic@chromium.org
+Toni who was the original author of the code to make cloud storage work in the ChromeOS VM.
Owner: afakhry@chromium.org
Achuith and Toni, what do we need exactly from our side or infra to fix this issue?
The test on the VM is not able to access the page set data in cloud storage. See c17. I'm not sure what the right fix is.
Cc: afakhry@chromium.org
Owner: ihf@chromium.org
Off to the next gardener.
Owner: lpique@chromium.org
It looks like the fix should be to set the BOTO_CONFIG environment variable to point at /root/.boto before running the gsutil command.

Taking on as this weeks Gardener.
Landed the fix in https://codereview.chromium.org/2923123006/

Now I'm waiting for https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11664. That build should start at about 5pm today, and should finish about 7:30pm.
Oh, I guess there needs to be a catapult roll before it will show up. I don't know how often those are done.
There's multiple rolls a day, so every few hours?

Were you able to verify locally that the perf tests run?

Excellent work, btw. Thank you!
This seemed like an issue on the VM slave, so I wasn't able to test locally.

It is very possible there are still broken tests. crbug.com/702930 list some, but was blocked on this one.
Sorry, I should have provided repo instructions.

Use go/cros-vm to launch a ChromeOS VM. It looks like TOT VM is busted :( I'm using --version=9592.0.0

Start the VM:
cros_vm --start

Run telemetry perf tests:
cros_vm --cmd 'python /usr/local/telemetry/src/tools/perf/run_tests'


I tried your CL, and now I get:
benchmarks.system_health_smoke_test.load_tests() failed: 'module' object has no attribute 'SystemHealthStorySet'

So I guess the cloud storage issue is resolved?
Thanks! And that does sound like it is fixed, but I'll still wait for the buildbot to run before I close this.
I came back to check on this bot. It is still red, even though it appears to be using the updated telemetry source file.

Taking a deeper look at what is going on, and launching the VM test for myself, while the buildbot might have a .boto file in the path I hard-coded, the VM itself probably does not.

The simplest thing to do might be to somehow copy the .boto file from the host into the VM (to /home/chromeos-test) before running the autotest. I can look into doing that. I don't think there are any security problems with doing so -- we aren't running arbitrary code in the VM AFAIK.

The alternative would appear to be to set up a test-local .boto file (as per the firmware_TouchMTB test -- see PUBLIC_BOTO in cros_gs.py). While perhaps safer, I don't think we need to do so in this case.

I will revert my patch from a few days ago, since it is the wrong fix, and adds confusion.
Project Member

Comment 34 by bugdroid1@chromium.org, Jun 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/aa067ca807e7ef4af9822750969837d49cd115fc

commit aa067ca807e7ef4af9822750969837d49cd115fc
Author: Lloyd Pique <lpique@google.com>
Date: Fri Jun 23 00:03:57 2017

[telemetry_UnitTests] Add telemetry_UnitTestsServer

telemetry_UnitTestsServer handles an additional step of copying a .boto
file to the DUT, and otherwise just runs telemetry_UnitTests on the DUT.

BUG= chromium:712349 
TEST=test_that -b amd64-generic localhost:9222 suite:telemetry_unit_server

Change-Id: I6fc8649723dbbdb7d5b133d9d0375c1b7abd0353
Reviewed-on: https://chromium-review.googlesource.com/541839
Commit-Ready: Lloyd Pique <lpique@google.com>
Tested-by: Lloyd Pique <lpique@google.com>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/aa067ca807e7ef4af9822750969837d49cd115fc/chromeos-base/autotest-chrome/autotest-chrome-9999.ebuild

Project Member

Comment 35 by bugdroid1@chromium.org, Jun 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0

commit 863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0
Author: Lloyd Pique <lpique@google.com>
Date: Fri Jun 23 00:03:58 2017

[telemetry_UnitTests] Copy .boto file to dut/VM

Creates telemetry_UnitTestsServer, which runs as a server-side test, and
copies a .boto file over to the DUT/VM before running
telemetry_UnitTests.

The .boto file is only copied for the two tests that need it (user,
perf), and is removed once the tests are done.

The telemetry tests need a valid .boto file to access telemetry related
test files using gsutil. This change ensures one is available.

BUG= chromium:712349 
TEST=test_that -b amd64-generic localhost:9222 suite:telemetry_unit_server

Change-Id: I12a58caf132a20b1ac62600ce48fbb86e4631f2c
Reviewed-on: https://chromium-review.googlesource.com/531661
Commit-Ready: Lloyd Pique <lpique@google.com>
Tested-by: Lloyd Pique <lpique@google.com>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/site_utils/attribute_whitelist.txt
[add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/telemetry_UnitTestsServer.py
[add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.guest
[add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.user
[add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/test_suites/control.telemetry_unit_server
[add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.perf

Project Member

Comment 36 by bugdroid1@chromium.org, Jun 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/6982e96d94b825be1d181faa046f95cd669844b3

commit 6982e96d94b825be1d181faa046f95cd669844b3
Author: Lloyd Pique <lpique@google.com>
Date: Thu Jun 29 21:16:38 2017

cbuildbot: Switch to telemetry_unit_server

telemetry_unit_server does some extra setup work needed on the bots
before running telemetry_unit.

BUG= chromium:712349 
TEST=cbuildbot --remote amd64-generic-telemetry

Change-Id: I7dd495a9059140f10f983f40dc7435ad939e2499
Reviewed-on: https://chromium-review.googlesource.com/549025
Commit-Ready: Lloyd Pique <lpique@google.com>
Tested-by: Lloyd Pique <lpique@google.com>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/6982e96d94b825be1d181faa046f95cd669844b3/cbuildbot/commands.py

Labels: Hotlist-CrOS-Gardener
Owner: ----
Status: Available (was: Assigned)
Hmm, my change was picked up in build 11808:

https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11808

The logs do mention copying the .boto file, but then the test still fails as it cannot access the files it needs. Looking back at the test runs I did before submitting the final change, I should have caught it there (same failure), but I missed seeing the obvious creditials error mentioned by the logs for telemetry_UnitTestsServer_perf.

It's entirely possible that the .boto file that the builders just doesn't have the right access tokens for this test, but I'm not familiar enough with the tests to say for sure.

I'm going to be out on vacation for a few weeks, so I'm handing this back to the gardener hotlist, and hopefully someone else can take another look.
Cc: erosky@chromium.org
It seems that this issue has been solved? I don't see the same auth failure error in the most recent failed build. Should we close this?
Owner: achuith@chromium.org
Status: Assigned (was: Available)
It does look like it. Let me dig a little deeper.
Status: Fixed (was: Assigned)
This has been fixed

Comment 41 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 42 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment