discrepancy between staged.timestamp and test job's view of staging time |
|||||||
Issue descriptionExtracted from crbug.com/621136 dshi writes: I can't explain this. From the image folder: chromeos-test@chromeos2-devserver5:~/images/lumpy-paladin/R53-8466.0.0-rc2$ ll total 1173036 drwxr-xr-x 8 chromeos-test chromeos-test 4096 Jun 17 15:18 autotest -rw-r--r-- 1 chromeos-test chromeos-test 382709760 Jun 17 15:18 autotest_packages.tar -rw-r--r-- 1 chromeos-test chromeos-test 2734080 Jun 17 13:51 control_files.tar -rw-r--r-- 1 chromeos-test chromeos-test 0 Jun 17 13:54 staged.timestamp -rw-r--r-- 1 chromeos-test chromeos-test 264100342 Jun 17 13:54 stateful.tgz -rw-r--r-- 1 chromeos-test chromeos-test 27835 Jun 17 13:51 test_suites.tar.bz2 -rw-r--r-- 1 chromeos-test chromeos-test 551591990 Jun 17 13:54 update.gz -rw-r--r-- 1 chromeos-test chromeos-test 183 Jun 17 15:19 update.meta staged.timestamp shows the staging completed at 13:54 However, the test log show stage didn't complete until 15:24 06/17 15:24:45.823 INFO | dev_server:0961| Finished staging artifacts: build=lumpy-paladin/R53-8466.0.0-rc2, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/lumpy-paladin/R53-8466.0.0-rc2 Need to look into devserver code more.
,
Jun 19 2016
I'm seeing another occurrence of something like this. In this recent provision job: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row6-rack9-host1/56671653-provision/provision_AutoUpdate/debug/ the is_staged call starts being made at 16:49, and loops for 2 hours, always returning false. However, if I look at the server.log on that devserver I see: [18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2, autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2] [18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Downloading artifacts test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2. [18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2] ::ffff:127.0.0.1 - - [18/Jun/2016:16:47:55] "GET /stage?artifacts=test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 7 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:16:47:57] "GET /is_staged?artifacts=test_suites&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:16:48:03] "GET /is_staged?artifacts=test_suites&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 4 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:16:48:04] "GET /controlfiles?suite_name=&build=cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5246 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:16:48:05] "GET /controlfiles?control_path=test_suites/control.suite_attr_wrapper&build=cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 1807 "" "curl/7.35.0" [18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2] [18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Downloading artifacts control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2. [18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2, test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2] ::ffff:127.0.0.1 - - [18/Jun/2016:16:51:46] "GET /stage?artifacts=control_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 7 "" "curl/7.35.0" [18/Jun/2016:16:51:46] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. according to the devserver, the aftifacts were already staged by 16:51. The server logs shows a large number of is_stages requests, though I don't see it logging its return value ::ffff:127.0.0.1 - - [18/Jun/2016:18:21:13] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:18:21:14] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:18:21:17] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0" ::ffff:127.0.0.1 - - [18/Jun/2016:18:21:17] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0" [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
,
Jun 19 2016
If this is a devserver bug, it would certainly explain why everything in the lab is failing so often.
,
Jun 19 2016
I see messages about stateful.tgz being staged, and autotest_packages.tar being already staged, but nothing about "full_payload". What artifact is that referring to? chromeos-test@chromeos4-devserver6:/var/log/devserver$ grep cyan-cheets-paladin/R53-8471.0.0-rc2 server.log | grep full_payload | grep already -> no results
,
Jun 19 2016
Oh, that's probably an alias for AU_TEST_PAYLOAD. chromeos-test@chromeos4-devserver6:/var/log/devserver$ grep cyan-cheets-paladin/R53-8471.0.0-rc2 server.log | grep already [18/Jun/2016:16:51:46] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:16:51:46] BUNDLED_ARTIFACT control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:16:51:46] BUNDLED_ARTIFACT test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged. [18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
,
Jun 19 2016
When I run the rpc locally on the devserver, it is returning True: chromeos-test@chromeos4-devserver6:~/chromiumos/src/platform/dev$ curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_pack&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2" True So, I think the problem must be in the ssh tunneling.
,
Jun 19 2016
From cautotest/
chromeos-test@chromeos-server2:~$ ssh 100.107.160.6 'curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2"'
Warning: Permanently added '100.107.160.6' (ECDSA) to the list of known hosts.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4 100 4 0 0 1416 0 --:--:-- --:--:-- --:--:-- 2000True
Is this confusing the code in dev_server.py ?
try:
result = self.run_call(call)
logging.debug('whether artifact is staged: %r', result)
return result == 'True'
,
Jun 19 2016
Shouldn't be. Looks like all that chaff is on stderr. chromeos-test@chromeos-server2:~$ ssh 100.107.160.6 'curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2"' 2>/dev/null True
,
Jun 19 2016
Hmm. I'm stumped then. Maybe the devserver started reporting True to is_staged only after a few hours had passed, even after it was staged?
,
Jun 19 2016
Enhanced logging: https://chromium-review.googlesource.com/#/c/353777/
,
Jun 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/dev-util/+/57d181772c8d3cf99b383ccdb9864f2ffc1cbade commit 57d181772c8d3cf99b383ccdb9864f2ffc1cbade Author: Aviv Keshet <akeshet@chromium.org> Date: Sun Jun 19 03:39:09 2016 devserver: add logging around is_staged rpc BUG= chromium:621317 TEST=None Change-Id: Ide6a928c74a22d96b67e2d91240d61ca5e4641d5 Reviewed-on: https://chromium-review.googlesource.com/353777 Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/57d181772c8d3cf99b383ccdb9864f2ffc1cbade/build_artifact.py [modify] https://crrev.com/57d181772c8d3cf99b383ccdb9864f2ffc1cbade/devserver.py
,
Jun 19 2016
The above CL is pushed to prod. Hopefully next time this occurs, the devserver's server.log can reveal why we are getting this False is_staged result.
,
Jun 20 2016
I'm waiting to see this recur, so that I can examine this new logging.
,
Jun 21 2016
,
Jun 21 2016
,
Nov 3 2016
Clearly not a P0 if it's been months without action. xixuan@ is this related to the recent timestamp issue we discussed? If not, perhaps this is relevant and can be closed.
,
Dec 9 2016
It's because of several reasons: 1. The devserver log is too long for us to determine when the downloading is finished from devserver.log. 2. The timestamp for devserver and drone/shard is different, which may cause 2 minutes difference, and also may causes some tar errors. The latter is fixed already. Each of them is addressed in other bugs. So close it for now. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by akes...@chromium.org
, Jun 18 2016