New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 692694 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

provision failure with perfectly good DUT and perfectly good devserver

Project Member Reported by semenzato@chromium.org, Feb 15 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/samus-release/builds/4027

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/101469640-chromeos-test/chromeos2-row3-rack9-host3/debug/

autoserv.DEBUG:

02/15 08:14:22.643 DEBUG|        base_utils:0185| Running 'ssh 100.115.245.197 'curl "http://100.115.245.197:8082/get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False"''
02/15 08:14:23.977 DEBUG|        dev_server:1794| Current CrOS auto-update status: CrOS update is just started.
02/15 08:14:34.027 DEBUG|        base_utils:0185| Running 'ssh 100.115.245.197 'curl "http://100.115.245.197:8082/get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False"''
02/15 08:14:35.421 DEBUG|        dev_server:1856| Failed to trigger auto-update process on devserver

Devserver is chromeos2-devserver5.  It's working fine at that time, lightly loaded.

Below is the relevant content of server.log for that time frame:

$ grep R58-9286 server.log | grep samus

[15/Feb/2017:08:13:34] GOOGLE_STORAGE_DOWNLOADER Downloading artifacts *_full_*->/home/chromeos-test/images/samus-release/R58-9286.0.0 stateful.tgz->/home/chromeos-test/images/samus-release/R58-9286.0.0 autotest_packages.tar->/home/chromeos-test/images/samus-release/R58-9286.0.0.
[15/Feb/2017:08:13:34] GOOGLE_STORAGE_DOWNLOADER Invoking background download of artifacts for [*_full_*->/home/chromeos-test/images/samus-release/R58-9286.0.0, stateful.tgz->/home/chromeos-test/images/samus-release/R58-9286.0.0, autotest_packages.tar->/home/chromeos-test/images/samus-release/R58-9286.0.0]
::ffff:127.0.0.1 - - [15/Feb/2017:08:13:34] "GET /stage?artifacts=full_payload,stateful,autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/samus-release/R58-9286.0.0 HTTP/1.1" 200 7 "" "curl/7.35.0"
[15/Feb/2017:08:13:34] A_U_TEST_PAYLOAD ArtifactStaged() -> yes, *_full_*->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:34] A_U_TEST_PAYLOAD *_full_*->/home/chromeos-test/images/samus-release/R58-9286.0.0 is already staged.
[15/Feb/2017:08:13:34] ARTIFACT ArtifactStaged() -> yes, stateful.tgz->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:34] ARTIFACT stateful.tgz->/home/chromeos-test/images/samus-release/R58-9286.0.0 is already staged.
[15/Feb/2017:08:13:34] AUTOTEST_TARBALL ArtifactStaged() -> yes, autotest_packages.tar->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:34] AUTOTEST_TARBALL autotest_packages.tar->/home/chromeos-test/images/samus-release/R58-9286.0.0 is already staged.
[15/Feb/2017:08:13:46] A_U_TEST_PAYLOAD ArtifactStaged() -> yes, *_full_*->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:46] ARTIFACT ArtifactStaged() -> yes, stateful.tgz->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:46] AUTOTEST_TARBALL ArtifactStaged() -> yes, autotest_packages.tar->/home/chromeos-test/images/samus-release/R58-9286.0.0 is staged.
[15/Feb/2017:08:13:46] DEVSERVER Responding to is_staged {'artifacts': u'full_payload,stateful,autotest_packages', 'files': u'', 'archive_url': u'gs://chromeos-image-archive/samus-release/R58-9286.0.0'} request with 'True'
::ffff:127.0.0.1 - - [15/Feb/2017:08:13:46] "GET /is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/samus-release/R58-9286.0.0 HTTP/1.1" 200 4 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:13:47] "GET /list_image_dir?archive_url=gs://chromeos-image-archive/samus-release/R58-9286.0.0 HTTP/1.1" 200 3542 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:13:54] "GET /cros_au?build_name=samus-release/R58-9286.0.0&async=True&force_update=True&host_name=chromeos2-row3-rack9-host3&full_update=False HTTP/1.1" 200 13 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:13:56] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:14:12] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:14:23] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:14:35] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=11046&host_name=chromeos2-row3-rack9-host3&full_update=False HTTP/1.1" 200 1033 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:15:11] "GET /cros_au?build_name=samus-release/R58-9286.0.0&async=True&force_update=True&host_name=100.115.226.90&full_update=False HTTP/1.1" 200 13 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:15:12] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=20691&host_name=100.115.226.90&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:15:24] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=20691&host_name=100.115.226.90&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:15:35] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=20691&host_name=100.115.226.90&full_update=False HTTP/1.1" 200 87 "" "curl/7.35.0"
::ffff:127.0.0.1 - - [15/Feb/2017:08:15:57] "GET /get_au_status?build_name=samus-release/R58-9286.0.0&force_update=True&pid=20691&host_name=100.115.226.90&full_update=False HTTP/1.1" 200 1033 "" "curl/7.35.0"


I also did a grep -A 4 to see a bit of extra context, but I don't see any errors.

So what happened?

 
Ah I forgot to add the actual error:

02/15 08:14:38.844 DEBUG|        dev_server:1955| Exception raised on auto_update attempt #1:
 Traceback (most recent call last):
  File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 179, in TriggerAU
    yes=True)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 663, in __init__
    clobber_stateful=clobber_stateful, yes=yes)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 202, in __init__
    self.device_dev_dir = os.path.join(self.device.work_dir, 'src')
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 652, in work_dir
    capture_output=True).output.strip()
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 849, in BaseRunCommand
    return self.GetAgent().RemoteSh(cmd, **kwargs)
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 343, in RemoteSh
    raise SSHConnectionError(e.result.error)
SSHConnectionError: Connection to 100.115.226.90 timed out while waiting to read


So I am betting that "1033" is the size of the http payload, which is the stack dump above.

But the network was not loaded.  There is nothing wrong in /var/log/syslog on the devserver either.

Feb 15 08:11:50 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:60556->[100.115.245.197]:161
Feb 15 08:11:50 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:45717->[100.115.245.197]:161
Feb 15 08:11:50 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:58156->[100.115.245.197]:161
Feb 15 08:11:54 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:12:09 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:12:24 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:12:40 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:12:53 chromeos2-devserver5 kernel: [7332826.716926] init: apache_log_metrics main process (12952) terminated with status 1
Feb 15 08:12:53 chromeos2-devserver5 kernel: [7332826.716954] init: apache_log_metrics main process ended, respawning
Feb 15 08:12:55 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:13:10 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:13:25 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:13:40 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:13:55 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:13:56 chromeos2-devserver5 kernel: [7332890.032890] init: apache_log_metrics main process (551) terminated with status 1
Feb 15 08:13:56 chromeos2-devserver5 kernel: [7332890.032914] init: apache_log_metrics main process ended, respawning
Feb 15 08:14:10 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:14:25 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:14:41 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:14:56 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:15:01 chromeos2-devserver5 CRON[19549]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb 15 08:15:09 chromeos2-devserver5 ag: autotest server[stack::_kill_remote_process|devserver_down|_remote_process_alive] -> ssh_run(test -e /proc/5553)
Feb 15 08:15:09 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_kill_remote_process] -> ssh_run(kill -s SIGTERM 5553)
Feb 15 08:15:09 chromeos2-devserver5 ag: autotest server[stack::poll_for_condition|devserver_down|_remote_process_alive] -> ssh_run(test -e /proc/5553)
Feb 15 08:15:09 chromeos2-devserver5 ag: autotest server[stack::_get_devserver_stdout|_get_devserver_file_content|run_output] -> ssh_run(cat /tmp/devserver-stdout.rlVZs9)
Feb 15 08:15:10 chromeos2-devserver5 ag: autotest server[stack::_get_devserver_log|_get_devserver_file_content|run_output] -> ssh_run(cat /tmp/devserver-log.ZGUyo8)
Feb 15 08:15:10 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-log.ZGUyo8)
Feb 15 08:15:10 chromeos2-devserver5 kernel: [7332964.227859] init: apache_log_metrics main process (11196) terminated with status 1
Feb 15 08:15:10 chromeos2-devserver5 kernel: [7332964.227885] init: apache_log_metrics main process ended, respawning
Feb 15 08:15:10 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-stdout.rlVZs9)
Feb 15 08:15:11 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-port.ZLKlUp)
Feb 15 08:15:11 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:15:11 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-pid.RVXq2N)
Feb 15 08:15:11 chromeos2-devserver5 ag: autotest server[stack::_update_via_test_payloads|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -rf /tmp/devserver-static.jXRMCx)
Feb 15 08:15:26 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:15:41 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:15:52 chromeos2-devserver5 ag: autotest server[stack::_kill_remote_process|devserver_down|_remote_process_alive] -> ssh_run(test -e /proc/23975)
Feb 15 08:15:52 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_kill_remote_process] -> ssh_run(kill -s SIGTERM 23975)
Feb 15 08:15:53 chromeos2-devserver5 ag: autotest server[stack::poll_for_condition|devserver_down|_remote_process_alive] -> ssh_run(test -e /proc/23975)
Feb 15 08:15:53 chromeos2-devserver5 ag: autotest server[stack::_get_devserver_stdout|_get_devserver_file_content|run_output] -> ssh_run(cat /tmp/devserver-stdout.2wbo8Z)
Feb 15 08:15:53 chromeos2-devserver5 ag: autotest server[stack::_get_devserver_log|_get_devserver_file_content|run_output] -> ssh_run(cat /tmp/devserver-log.KMbDsF)
Feb 15 08:15:53 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-log.KMbDsF)
Feb 15 08:15:53 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-stdout.2wbo8Z)
Feb 15 08:15:54 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-port.xl4Anj)
Feb 15 08:15:54 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -f /tmp/devserver-pid.2v7ilw)
Feb 15 08:15:54 chromeos2-devserver5 ag: autotest server[stack::cleanup|stop_devserver|_cleanup_devserver_files] -> ssh_run(rm -rf /tmp/devserver-static.GVCHXd)
Feb 15 08:15:56 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:16:11 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:16:26 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:16:42 chromeos2-devserver5 ag: autotest server[stack::num_devserver_processes|execute_command|runner] -> ssh_run(ps -C devserver.py| wc -l)
Feb 15 08:16:46 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:36383->[100.115.245.197]:161
Feb 15 08:16:46 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:43912->[100.115.245.197]:161
Feb 15 08:16:46 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:43286->[100.115.245.197]:161
Feb 15 08:16:46 chromeos2-devserver5 snmpd[25423]: Connection from UDP: [172.24.185.111]:34150->[100.115.245.197]:161


Summary: provision failure with perfectly good DUT and perfectly good devserver (was: provision failure with good DUT and good devserver)
So is this just bad luck?  Some unexplained network glitch twice in a row, within 1m 21s? :P

Comment 3 by aut...@google.com, Feb 22 2017

Cc: jrbarnette@chromium.org
Are you seeing this regularly? 

jrbarnette@ is making some changes to provision workflow which will improve debugging for provision

Comment 4 by autumn@chromium.org, Mar 15 2017

Status: WontFix (was: Untriaged)

Sign in to add a comment