poppy paladin failing on gsutil unit_tests.tar |
||||
Issue descriptionPoppy-paladin has failed twice over the weekend with: poppy-paladin: The UnitTest stage failed: <class 'chromite.lib.timeout_util.TimeoutError'>: Timeout occurred https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/19318 https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/19315 One example of the logs looks to be: 06:30:23: INFO: RunCommand: tar --sparse -I cat -cf /b/c/cbuild/repository/buildbot_archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar . in /b/c/cbuild/repository/chroot/build/poppy/test-packages 06:31:58: INFO: Refreshing due to a 401 (attempt 1/2) 06:31:58: INFO: Refreshing access_token 06:34:15: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.30.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar gs://chromeos-image-archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar 06:35:04: INFO: RetriableHttp: attempt 1 receiving status 503, will retry 06:37:53: INFO: Refreshing due to a 401 (attempt 1/2) 06:37:53: INFO: Refreshing access_token 06:42:59: INFO: Refreshing due to a 401 (attempt 1/2) 06:42:59: INFO: Refreshing access_token 07:32:08: INFO: Refreshing due to a 401 (attempt 1/2) 07:32:08: INFO: Refreshing access_token 07:38:00: INFO: Refreshing due to a 401 (attempt 1/2) 07:38:00: INFO: Refreshing access_token 07:43:04: INFO: Refreshing due to a 401 (attempt 1/2) 07:43:04: INFO: Refreshing access_token 08:03:11: INFO: RetriableHttp: attempt 1 receiving status 503, will retry 08:32:11: INFO: Refreshing due to a 401 (attempt 1/2) 08:32:11: INFO: Refreshing access_token [1;31m08:34:16: ERROR: <class 'chromite.lib.timeout_util.TimeoutError'>: Timeout occurred- waited 7200.0 seconds. Traceback (most recent call last): File "/b/c/cbuild/repository/chromite/lib/failures_lib.py", line 230, in wrapped_functor return functor(*args, **kwargs) File "/b/c/cbuild/repository/chromite/cbuildbot/commands.py", line 2124, in _UploadPathToGS recursive=True) File "/b/c/cbuild/repository/chromite/lib/gs.py", line 728, in CopyInto **kwargs) File "/b/c/cbuild/repository/chromite/lib/gs.py", line 1006, in Copy raise File "/b/c/cbuild/repository/chromite/lib/gs.py", line 983, in Copy result = self.DoCommand(cmd, **kwargs) File "/b/c/cbuild/repository/chromite/lib/gs.py", line 917, in DoCommand extra_env=extra_env, **kwargs) File "/b/c/cbuild/repository/chromite/lib/retry_stats.py", line 180, in RetryWithStats *args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 249, in GenericRetry return _run() File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 182, in _Wrapper ret = func(*args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 248, in _run return functor(*args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/retry_stats.py", line 167, in wrapper result = functor(*args, **kwargs) File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 614, in RunCommand (cmd_result.output, cmd_result.error) = proc.communicate(input) File "/usr/lib/python2.7/subprocess.py", line 796, in communicate self.wait() File "/usr/lib/python2.7/subprocess.py", line 1376, in wait pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0) File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call return func(*args) File "/b/c/cbuild/repository/chromite/lib/timeout_util.py", line 88, in kill_us raise TimeoutError(error_message % {'time': max_run_time}) TimeoutError: Timeout occurred- waited 7200.0 seconds.
,
Jul 30
Looks like an issue uploading to GS, I think gsutil is timing out. The 401 errors (and maybe the 503) just mean 'auto-refreshing oauth2 token', it's just not the best wording choice... I'm a bit stumped though, I'll see if I can catch it during a timeout and poke around the process tree a bit.
,
Jul 31
Looks like another build timed out (slightly different error) on gsutil again but it was after I left yesterday :( I'll keep an eye on it this morning and try to catch it again.
,
Aug 1
Another incidence (from nautilus-release): https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8939420414099766000/+/steps/Archive/0/stdout And a poppy-paladin failure: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939405810805476224 Alec, instead of hoping to catch it, should we just add some logging right after the timeout is triggered?
,
Aug 1
Agreed. I'll try to make a bit of time to do so, I'm a bit backlogged on bugs though :)
,
Sep 12
,
Sep 12
the timeout here is from running `gsutil cp` and gsutil itself took longer than 2 hours to run. the chromite traceback is kind of deep, but it's just retry/timeout logic around that gsutil cp command.
,
Sep 17
,
Sep 17
I don't know that this is still relevant. +gmeinke incase you want to investigate as on-call. |
||||
►
Sign in to add a comment |
||||
Comment 1 by evgreen@chromium.org
, Jul 30