ChromeOS telemetry bots cannot access cloud storage for perf tests |
|||||||||||||||||
Issue descriptionSample failing run: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11299 Stack: 04/17 12:27:59.383 ERROR| archive_info:0093| You either aren't authenticated or don't have permission to use the archives for this page set. You may need to run gsutil config. You can find instructions for gsutil config at: http://www.chromium.org/developers/telemetry/upload_to_cloud_storage 04/17 12:27:59.391 INFO |run_chromeos_tests:0052| benchmarks.system_health_smoke_test.load_tests() failed: Attempted to access a file from Cloud Storage but you have no configured credentials. To configure your credentials: 04/17 12:27:59.391 INFO |run_chromeos_tests:0052| 1. Run "HOME=/home/chromeos-test/ /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil config" and follow its instructions. 04/17 12:27:59.391 INFO |run_chromeos_tests:0052| 2. If you have a @google.com account, use that account. 04/17 12:27:59.392 INFO |run_chromeos_tests:0052| 3. For the project-id, just enter 0. 04/17 12:27:59.449 WARNI| test:0615| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 609, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 817, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 470, in execute dargs) File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 380, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/telemetry_UnitTests/telemetry_UnitTests.py", line 35, in run_once raise error.TestFail(error_str) TestFail: The unit tests of /usr/local/telemetry/src/tools/perf failed. Richard, Don, looks like the GS account on the bot needs access to the telemetry buckets. I can request access. What is the GS account?
,
May 15 2017
,
May 15 2017
I have no idea how to handle this. Any guidance will be helpful!
,
May 15 2017
You'll need help from someone in infra - maybe the deputy can help?
,
May 15 2017
chingcodes, do you know how and where I can run the instructions in #0 for this builder?
,
May 15 2017
afakhry: The instructions only apply to normal users. These credentials should have been pre-installed on these builders by puppet. Elliot, "build246-m2" on the "chromiumos.chromium" waterfall doesn't seem to have the proper .boto file installed.
,
May 15 2017
build246-m2:/home/chrome-bot# grep '^# Boto file' .boto # Boto file for chromeos.bot@gmail.com Which matches nodes.yaml in puppet: https://chrome-internal.googlesource.com/infra/puppet/+/master/puppetm/opt/puppet/conf/nodes.yaml#580 chrome_infra::credentials::boto: boto: chromeos.bot@gmail.com
,
May 15 2017
Ah... it does have the boto (I should have logged in to check, sorry Elliot). In fact, there are successful gsutil commands further down in the failing stage. Looking more carefully at the logs, it seems like the test itself is trying to upload to GS? That seems wrong. Is the test trying to do this from the virtual machine? If so, then this failure is expected. We don't put gsutil credentials on the VM.
,
May 15 2017
Ned, do you know who could answer Don's question? Kenneth?
,
May 15 2017
THe test was trying to download WPR files from cloud storage.
,
May 15 2017
And if anyone can show the exact gsutil command that's failing, it might shed light on the problem.
,
May 15 2017
Has something changed maybe 2 months ago with the permissions? This used to work.
,
May 15 2017
Nothing change afaik. But why is the range of suspect so large? (2 months)
,
May 15 2017
These FYI bots have unfortunately been red for ~2 months. This is the failing command: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/wpr/archive_info.py?l=86 It calls this: https://cs.chromium.org/chromium/src/third_party/catapult/common/py_utils/py_utils/cloud_storage.py?l=348 And _RunCommand is here: https://cs.chromium.org/chromium/src/third_party/catapult/common/py_utils/py_utils/cloud_storage.py?l=122 It's possible some of the WAR for chromeos broke: https://cs.chromium.org/chromium/src/third_party/catapult/common/py_utils/py_utils/cloud_storage.py?l=130-133
,
May 15 2017
Hum... could it simply be that "chromeos.bot@gmail.com" (the account our builders are using) no longer has the needed bucket permissions? I was really hoping for the gsutil command line used more than the code. That would let me re-run it by hand on a builder and see if I get different results.
,
May 15 2017
Yup, let me see if I can get that for you
,
May 16 2017
This appears to be the failure: HOME=/home/chromeos-test/ /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil cp gs://chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e /usr/local/telemetry/src/tools/perf/page_sets/data/tmpM8Z84H ServiceException: 401 Anonymous users does not have storage.objects.get access to object chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e.
,
May 16 2017
Ah.... the problem is that it sets "HOME". The .boto file is stored in the home directory, and /home/chromeos-test doesn't exist on the builders. The means no .boto, and that means you are "Anonymous". From our point of view, just stop setting HOME, but I'm sure it was added for a valid reason somewhere else.
,
May 16 2017
This isn't running on the builders - it's running in the VM. Without HOME, we have: localhost ~ # /usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil cp gs://chrome-partner-telemetry/7b49a582a4dcf33bd2cb7a7aebab3f25c70ee89e /usr/local/telemetry/src/tools/perf/page_sets/data/tmpM8Z84H Traceback (most recent call last): File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil", line 22, in <module> gsutil.RunMain() File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gsutil.py", line 106, in RunMain import gslib.__main__ File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/__main__.py", line 53, in <module> from gslib import wildcard_iterator File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/wildcard_iterator.py", line 38, in <module> from gslib.util import UTF8 File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 308, in <module> CreateDirIfNeeded(GetGsutilStateDir()) File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 272, in GetGsutilStateDir CreateDirIfNeeded(config_file_dir) File "/usr/local/telemetry/src/third_party/catapult/third_party/gsutil/gslib/util.py", line 228, in CreateDirIfNeeded os.makedirs(dir_path, mode) File "/usr/local/lib64/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 30] Read-only file system: '/root/.gsutil'
,
May 16 2017
+Toni who was the original author of the code to make cloud storage work in the ChromeOS VM.
,
May 18 2017
,
May 18 2017
Achuith and Toni, what do we need exactly from our side or infra to fix this issue?
,
May 19 2017
The test on the VM is not able to access the page set data in cloud storage. See c17. I'm not sure what the right fix is.
,
May 20 2017
Off to the next gardener.
,
Jun 6 2017
It looks like the fix should be to set the BOTO_CONFIG environment variable to point at /root/.boto before running the gsutil command. Taking on as this weeks Gardener.
,
Jun 6 2017
Landed the fix in https://codereview.chromium.org/2923123006/ Now I'm waiting for https://build.chromium.org/p/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11664. That build should start at about 5pm today, and should finish about 7:30pm.
,
Jun 6 2017
Oh, I guess there needs to be a catapult roll before it will show up. I don't know how often those are done.
,
Jun 6 2017
There's multiple rolls a day, so every few hours? Were you able to verify locally that the perf tests run? Excellent work, btw. Thank you!
,
Jun 6 2017
This seemed like an issue on the VM slave, so I wasn't able to test locally. It is very possible there are still broken tests. crbug.com/702930 list some, but was blocked on this one.
,
Jun 6 2017
Sorry, I should have provided repo instructions. Use go/cros-vm to launch a ChromeOS VM. It looks like TOT VM is busted :( I'm using --version=9592.0.0 Start the VM: cros_vm --start Run telemetry perf tests: cros_vm --cmd 'python /usr/local/telemetry/src/tools/perf/run_tests'
,
Jun 7 2017
I tried your CL, and now I get: benchmarks.system_health_smoke_test.load_tests() failed: 'module' object has no attribute 'SystemHealthStorySet' So I guess the cloud storage issue is resolved?
,
Jun 7 2017
Thanks! And that does sound like it is fixed, but I'll still wait for the buildbot to run before I close this.
,
Jun 9 2017
I came back to check on this bot. It is still red, even though it appears to be using the updated telemetry source file. Taking a deeper look at what is going on, and launching the VM test for myself, while the buildbot might have a .boto file in the path I hard-coded, the VM itself probably does not. The simplest thing to do might be to somehow copy the .boto file from the host into the VM (to /home/chromeos-test) before running the autotest. I can look into doing that. I don't think there are any security problems with doing so -- we aren't running arbitrary code in the VM AFAIK. The alternative would appear to be to set up a test-local .boto file (as per the firmware_TouchMTB test -- see PUBLIC_BOTO in cros_gs.py). While perhaps safer, I don't think we need to do so in this case. I will revert my patch from a few days ago, since it is the wrong fix, and adds confusion.
,
Jun 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/aa067ca807e7ef4af9822750969837d49cd115fc commit aa067ca807e7ef4af9822750969837d49cd115fc Author: Lloyd Pique <lpique@google.com> Date: Fri Jun 23 00:03:57 2017 [telemetry_UnitTests] Add telemetry_UnitTestsServer telemetry_UnitTestsServer handles an additional step of copying a .boto file to the DUT, and otherwise just runs telemetry_UnitTests on the DUT. BUG= chromium:712349 TEST=test_that -b amd64-generic localhost:9222 suite:telemetry_unit_server Change-Id: I6fc8649723dbbdb7d5b133d9d0375c1b7abd0353 Reviewed-on: https://chromium-review.googlesource.com/541839 Commit-Ready: Lloyd Pique <lpique@google.com> Tested-by: Lloyd Pique <lpique@google.com> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/aa067ca807e7ef4af9822750969837d49cd115fc/chromeos-base/autotest-chrome/autotest-chrome-9999.ebuild
,
Jun 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0 commit 863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0 Author: Lloyd Pique <lpique@google.com> Date: Fri Jun 23 00:03:58 2017 [telemetry_UnitTests] Copy .boto file to dut/VM Creates telemetry_UnitTestsServer, which runs as a server-side test, and copies a .boto file over to the DUT/VM before running telemetry_UnitTests. The .boto file is only copied for the two tests that need it (user, perf), and is removed once the tests are done. The telemetry tests need a valid .boto file to access telemetry related test files using gsutil. This change ensures one is available. BUG= chromium:712349 TEST=test_that -b amd64-generic localhost:9222 suite:telemetry_unit_server Change-Id: I12a58caf132a20b1ac62600ce48fbb86e4631f2c Reviewed-on: https://chromium-review.googlesource.com/531661 Commit-Ready: Lloyd Pique <lpique@google.com> Tested-by: Lloyd Pique <lpique@google.com> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/site_utils/attribute_whitelist.txt [add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/telemetry_UnitTestsServer.py [add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.guest [add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.user [add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/test_suites/control.telemetry_unit_server [add] https://crrev.com/863a2d8c7dc7c73c50c1d0d33a9d320acebce6d0/server/site_tests/telemetry_UnitTestsServer/control.perf
,
Jun 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/6982e96d94b825be1d181faa046f95cd669844b3 commit 6982e96d94b825be1d181faa046f95cd669844b3 Author: Lloyd Pique <lpique@google.com> Date: Thu Jun 29 21:16:38 2017 cbuildbot: Switch to telemetry_unit_server telemetry_unit_server does some extra setup work needed on the bots before running telemetry_unit. BUG= chromium:712349 TEST=cbuildbot --remote amd64-generic-telemetry Change-Id: I7dd495a9059140f10f983f40dc7435ad939e2499 Reviewed-on: https://chromium-review.googlesource.com/549025 Commit-Ready: Lloyd Pique <lpique@google.com> Tested-by: Lloyd Pique <lpique@google.com> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/6982e96d94b825be1d181faa046f95cd669844b3/cbuildbot/commands.py
,
Jun 30 2017
Hmm, my change was picked up in build 11808: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-telemetry/builds/11808 The logs do mention copying the .boto file, but then the test still fails as it cannot access the files it needs. Looking back at the test runs I did before submitting the final change, I should have caught it there (same failure), but I missed seeing the obvious creditials error mentioned by the logs for telemetry_UnitTestsServer_perf. It's entirely possible that the .boto file that the builders just doesn't have the right access tokens for this test, but I'm not familiar enough with the tests to say for sure. I'm going to be out on vacation for a few weeks, so I'm handing this back to the gardener hotlist, and hopefully someone else can take another look.
,
Jul 26 2017
It seems that this issue has been solved? I don't see the same auth failure error in the most recent failed build. Should we close this?
,
Jul 26 2017
It does look like it. Let me dig a little deeper.
,
Sep 7 2017
This has been fixed
,
Jan 22 2018
,
Jan 23 2018
|
|||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||
Comment 1 by achuith@chromium.org
, May 15 2017