New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 869119 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

poppy paladin failing on gsutil unit_tests.tar

Project Member Reported by evgreen@chromium.org, Jul 30

Issue description

Poppy-paladin has failed twice over the weekend with:

poppy-paladin: The UnitTest stage failed: <class 'chromite.lib.timeout_util.TimeoutError'>: Timeout occurred

https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/19318
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/19315

One example of the logs looks to be:
06:30:23: INFO: RunCommand: tar --sparse -I cat -cf /b/c/cbuild/repository/buildbot_archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar . in /b/c/cbuild/repository/chroot/build/poppy/test-packages
06:31:58: INFO: Refreshing due to a 401 (attempt 1/2)
06:31:58: INFO: Refreshing access_token
06:34:15: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.30.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar gs://chromeos-image-archive/poppy-paladin/R70-10922.0.0-rc1/unit_tests.tar
06:35:04: INFO: RetriableHttp: attempt 1 receiving status 503, will retry
06:37:53: INFO: Refreshing due to a 401 (attempt 1/2)
06:37:53: INFO: Refreshing access_token
06:42:59: INFO: Refreshing due to a 401 (attempt 1/2)
06:42:59: INFO: Refreshing access_token
07:32:08: INFO: Refreshing due to a 401 (attempt 1/2)
07:32:08: INFO: Refreshing access_token
07:38:00: INFO: Refreshing due to a 401 (attempt 1/2)
07:38:00: INFO: Refreshing access_token
07:43:04: INFO: Refreshing due to a 401 (attempt 1/2)
07:43:04: INFO: Refreshing access_token
08:03:11: INFO: RetriableHttp: attempt 1 receiving status 503, will retry
08:32:11: INFO: Refreshing due to a 401 (attempt 1/2)
08:32:11: INFO: Refreshing access_token
08:34:16: ERROR: <class 'chromite.lib.timeout_util.TimeoutError'>: Timeout occurred- waited 7200.0 seconds.
Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/lib/failures_lib.py", line 230, in wrapped_functor
    return functor(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/cbuildbot/commands.py", line 2124, in _UploadPathToGS
    recursive=True)
  File "/b/c/cbuild/repository/chromite/lib/gs.py", line 728, in CopyInto
    **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/gs.py", line 1006, in Copy
    raise
  File "/b/c/cbuild/repository/chromite/lib/gs.py", line 983, in Copy
    result = self.DoCommand(cmd, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/gs.py", line 917, in DoCommand
    extra_env=extra_env, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/retry_stats.py", line 180, in RetryWithStats
    *args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 249, in GenericRetry
    return _run()
  File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 182, in _Wrapper
    ret = func(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/retry_util.py", line 248, in _run
    return functor(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/retry_stats.py", line 167, in wrapper
    result = functor(*args, **kwargs)
  File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 614, in RunCommand
    (cmd_result.output, cmd_result.error) = proc.communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 796, in communicate
    self.wait()
  File "/usr/lib/python2.7/subprocess.py", line 1376, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
    return func(*args)
  File "/b/c/cbuild/repository/chromite/lib/timeout_util.py", line 88, in kill_us
    raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 7200.0 seconds.
 
Summary: poppy paladin failing on gsutil unit_tests.tar (was: poppy paladin failing)
Also failed on Link as jrbarnette noted:

https://luci-milo.appspot.com/buildbot/chromeos/link-paladin/32444

It's roughly the same log on all three, except the one I pasted is the only one with the status 503 in there.
Looks like an issue uploading to GS, I think gsutil is timing out. The 401 errors (and maybe the 503) just mean 'auto-refreshing oauth2 token', it's just not the best wording choice...

I'm a bit stumped though, I'll see if I can catch it during a timeout and poke around the process tree a bit.
Looks like another build timed out (slightly different error) on gsutil again but it was after I left yesterday :( I'll keep an eye on it this morning and try to catch it again.
Another incidence (from nautilus-release):
https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8939420414099766000/+/steps/Archive/0/stdout

And a poppy-paladin failure:
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939405810805476224

Alec, instead of hoping to catch it, should we just add some logging right after the timeout is triggered?
Agreed. I'll try to make a bit of time to do so, I'm a bit backlogged on bugs though :)
Components: -Infra>Client>ChromeOS Infra>Client>ChromeOS>Build
the timeout here is from running `gsutil cp` and gsutil itself took longer than 2 hours to run.  the chromite traceback is kind of deep, but it's just retry/timeout logic around that gsutil cp command.
Owner: ----
Cc: gmeinke@chromium.org
I don't know that this is still relevant. +gmeinke incase you want to investigate as on-call.

Sign in to add a comment