New issue
Advanced search Search tips

Issue 846372 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

crc32c signature computed reported trying to stage artifacts

Reported by jrbarnette@chromium.org, May 24 2018

Issue description

This CQ run failed:
    https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/18691

One of the reasons was kevin-paladin:
    https://luci-milo.appspot.com/buildbot/chromeos/kevin-paladin/4686

The provision suite failed because of a problem staging the build on the
target devserver.  These are the relevant parts of the error message:
  GSCommandError: return code: 1; command: /home/chromeos-test/chromeos-cache/common/gsutil_4.30.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' cp -v -- gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar /home/chromeos-test/images/kevin-paladin/R68-10715.0.0-rc2/
  Copying gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar...
[ ... ]
  Failure: crc32c signature computed for local file (+WK2jQ==) doesn't match cloud-supplied digest (p4T7TQ==). Local file (/home/chromeos-test/images/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar) will be deleted..

This is (apparently) a problem with GS, not with the devserver, per se.

 
Owner: shuqianz@chromium.org
The primary deputy is back, so she gets first crack.

I guess this is network flake, since I manually run the command on the devserver, no issues:

chromeos-test@chromeos2-devserver5:~$ /home/chromeos-test/chromeos-cache/common/gsutil_4.30.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' cp -v -- gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar /home/chromeos-test/images/kevin-paladin/R68-10715.0.0-rc2/
A newer version of gsutil (4.31) is available than the version you are
running (4.30). A detailed log of gsutil release changes is available
at https://pub.storage.googleapis.com/gsutil_ReleaseNotes.txt if you
would like to read them before updating.

Would you like to update [Y/n]? n
Copying gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar...
Created: file:///home/chromeos-test/images/kevin-paladin/R68-10715.0.0-rc2/autotest_packages.tar

Operation completed over 1 objects/1.5 GiB.                      
> I guess this is network flake, since I manually run the command on the devserver, no issues:

This happened once before, and I also tried running the command
manually.  It worked for me, too.

I'm not convinced that running the command manually like that is
sufficient to reproduce:  Can you try re-invoking the staging RPC,
to see if the problem happens that way?

how to invoke the staging RPC? I think the next run will prove whether it is flake or not?
This peach_pit-paladin run from last week also failed with the same symptom:
    https://uberchromegw.corp.google.com/i/chromeos/builders/peach_pit-paladin/builds/19273

> how to invoke the staging RPC? I think the next run will prove whether it is flake or not?

The RPC call for staging should be visible in the logs; there should
be an ssh command invoking "curl" somewhere; search the logs for "curl".
Run that command for yourself.

As for flake:  This _is_ flake; the next run will succeed.  But it's
flake that's happened twice within 7 days.  We need to understand
what's going on, and whether we need to escalate to GS.

I've tried all the curl I find, neither of them will trigger this download.
chromeos-test@chromeos2-devserver5:~/images/kevin-paladin$ curl "http://100.115.245.197:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2"
False

And if we believe this is an GS issue, I don't think the RPC call will make any difference from the manual retry in the devserver?



> I've tried all the curl I find, neither of them will trigger this download.

Oh, right.  This came up yesterday.  The staging call is made
from run_suite.py, and it isn't logged...  I'll rustle up how to
make the call.

> And if we believe this is an GS issue, I don't think the RPC
> call will make any difference from the manual retry in the devserver?

We want to run this manually to determine whether the problem is
still with the data in GS now.  Also, to determine whether there's
a different gsutil (with different problems) installed on the
affected devserver.

> Oh, right.  This came up yesterday.  The staging call is made
> from run_suite.py, and it isn't logged...  I'll rustle up how to
> make the call.

Actually, I take it back.  The calls are logged.
This is the first call:
    05/24 07:56:16.813 DEBUG|             utils:0215| Running 'ssh 100.115.245.197 'curl "http://100.115.245.197:8082/stage?artifacts=full_payload,stateful,autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2"''

After that, the code goes into a polling loop, waiting for "?is_staged" to
return success.  But, the pooling loop fails.  This is the last logged call:
    05/24 08:00:30.931 DEBUG|             utils:0215| Running 'ssh 100.115.245.197 'curl "http://100.115.245.197:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/kevin-paladin/R68-10715.0.0-rc2"''

I've re-run the staging call, and then checked the "is_staged" results.
It took quite some time, but the autotest_package.tar file did eventually
get downloaded and staged.

So, whatever happened in GS didn't persist.  The failure is likely some
transient condition on the server side.

Labels: -Pri-1 Pri-2
Owner: ----
Status: Available (was: Assigned)

Sign in to add a comment