New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 621317 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 621136



Sign in to add a comment

discrepancy between staged.timestamp and test job's view of staging time

Project Member Reported by akes...@chromium.org, Jun 18 2016

Issue description

Extracted from crbug.com/621136

dshi writes:
I can't explain this. From the image folder:
chromeos-test@chromeos2-devserver5:~/images/lumpy-paladin/R53-8466.0.0-rc2$ ll
total 1173036
drwxr-xr-x 8 chromeos-test chromeos-test      4096 Jun 17 15:18 autotest
-rw-r--r-- 1 chromeos-test chromeos-test 382709760 Jun 17 15:18 autotest_packages.tar
-rw-r--r-- 1 chromeos-test chromeos-test   2734080 Jun 17 13:51 control_files.tar
-rw-r--r-- 1 chromeos-test chromeos-test         0 Jun 17 13:54 staged.timestamp
-rw-r--r-- 1 chromeos-test chromeos-test 264100342 Jun 17 13:54 stateful.tgz
-rw-r--r-- 1 chromeos-test chromeos-test     27835 Jun 17 13:51 test_suites.tar.bz2
-rw-r--r-- 1 chromeos-test chromeos-test 551591990 Jun 17 13:54 update.gz
-rw-r--r-- 1 chromeos-test chromeos-test       183 Jun 17 15:19 update.meta

staged.timestamp shows the staging completed at 13:54
However, the test log show stage didn't complete until 15:24
06/17 15:24:45.823 INFO |        dev_server:0961| Finished staging artifacts: build=lumpy-paladin/R53-8466.0.0-rc2, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/lumpy-paladin/R53-8466.0.0-rc2

Need to look into devserver code more.
 
Blocking: 621136
I'm seeing another occurrence of something like this.

In this recent provision job:

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row6-rack9-host1/56671653-provision/provision_AutoUpdate/debug/

the is_staged call starts being made at 16:49, and loops for 2 hours, always returning false. However, if I look at the server.log on that devserver I see:

[18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2, autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2]
[18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Downloading artifacts test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2.
[18/Jun/2016:16:47:55] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2]
::ffff:127.0.0.1 - - [18/Jun/2016:16:47:55] "GET /stage?artifacts=test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 7 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:16:47:57] "GET /is_staged?artifacts=test_suites&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:16:48:03] "GET /is_staged?artifacts=test_suites&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 4 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:16:48:04] "GET /controlfiles?suite_name=&build=cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5246 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:16:48:05] "GET /controlfiles?control_path=test_suites/control.suite_attr_wrapper&build=cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 1807 "" "curl/7.35.0"
[18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2]
[18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Downloading artifacts control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2.
[18/Jun/2016:16:51:46] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2, test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2]
::ffff:127.0.0.1 - - [18/Jun/2016:16:51:46] "GET /stage?artifacts=control_files,test_suites&files=&async=True&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 7 "" "curl/7.35.0"
[18/Jun/2016:16:51:46] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.


according to the devserver, the aftifacts were already staged by 16:51.


The server logs shows a large number of is_stages requests, though I don't see it logging its return value

::ffff:127.0.0.1 - - [18/Jun/2016:18:21:13] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:18:21:14] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:18:21:17] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [18/Jun/2016:18:21:17] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2 HTTP/1.1" 200 5 "" "curl/7.35.0"
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
Labels: -Pri-1 Pri-0
If this is a devserver bug, it would certainly explain why everything in the lab is failing so often.
I see messages about stateful.tgz being staged, and autotest_packages.tar being already staged, but nothing about "full_payload". What artifact is that referring to?

chromeos-test@chromeos4-devserver6:/var/log/devserver$ grep cyan-cheets-paladin/R53-8471.0.0-rc2 server.log | grep full_payload | grep already
-> no results

Oh, that's probably an alias for AU_TEST_PAYLOAD.

chromeos-test@chromeos4-devserver6:/var/log/devserver$ grep cyan-cheets-paladin/R53-8471.0.0-rc2 server.log | grep already
[18/Jun/2016:16:51:46] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:16:51:46] BUNDLED_ARTIFACT control_files.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:16:51:46] BUNDLED_ARTIFACT test_suites.tar.bz2->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:17:00:14] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] ARTIFACT stateful.tgz->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
[18/Jun/2016:18:39:42] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/cyan-cheets-paladin/R53-8471.0.0-rc2 is already staged.
When I run the rpc locally on the devserver, it is returning True:

chromeos-test@chromeos4-devserver6:~/chromiumos/src/platform/dev$ curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_pack&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2"
True



So, I think the problem must be in the ssh tunneling.

From cautotest/

chromeos-test@chromeos-server2:~$ ssh 100.107.160.6 'curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2"'
Warning: Permanently added '100.107.160.6' (ECDSA) to the list of known hosts.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     4  100     4    0     0   1416      0 --:--:-- --:--:-- --:--:--  2000True

Is this confusing the code in dev_server.py ?

            try:
                result = self.run_call(call)
                logging.debug('whether artifact is staged: %r', result)
                return result == 'True'

Shouldn't be. Looks like all that chaff is on stderr.

chromeos-test@chromeos-server2:~$ ssh 100.107.160.6 'curl "http://100.107.160.6:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/cyan-cheets-paladin/R53-8471.0.0-rc2"' 2>/dev/null
True
Hmm. I'm stumped then. Maybe the devserver started reporting True to is_staged only after a few hours had passed, even after it was staged?
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/dev-util/+/57d181772c8d3cf99b383ccdb9864f2ffc1cbade

commit 57d181772c8d3cf99b383ccdb9864f2ffc1cbade
Author: Aviv Keshet <akeshet@chromium.org>
Date: Sun Jun 19 03:39:09 2016

devserver: add logging around is_staged rpc

BUG= chromium:621317 
TEST=None

Change-Id: Ide6a928c74a22d96b67e2d91240d61ca5e4641d5
Reviewed-on: https://chromium-review.googlesource.com/353777
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/57d181772c8d3cf99b383ccdb9864f2ffc1cbade/build_artifact.py
[modify] https://crrev.com/57d181772c8d3cf99b383ccdb9864f2ffc1cbade/devserver.py

The above CL is pushed to prod. Hopefully next time this occurs, the devserver's server.log can reveal why we are getting this False is_staged result.
Owner: akes...@chromium.org
Status: Started (was: Untriaged)
I'm waiting to see this recur, so that I can examine this new logging.
Labels: -current-issue
Labels: -dut-health Hotlist-CrOS-DutHealth
Labels: -Pri-0 Pri-2
Owner: xixuan@chromium.org
Clearly not a P0 if it's been months without action.

xixuan@ is this related to the recent timestamp issue we discussed? If not, perhaps this is relevant and can be closed.
Status: WontFix (was: Started)
It's because of several reasons:

1. The devserver log is too long for us to determine when the downloading is finished from devserver.log.

2. The timestamp for devserver and drone/shard is different, which may cause 2 minutes difference, and also may causes some tar errors. The latter is fixed already.

Each of them is addressed in other bugs. So close it for now.

Sign in to add a comment