New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 854003 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

1 of 3 Back to list platform_InstallTestImage is failing with DevServerException: stage_artifacts timed out:

Project Member Reported by dsunk...@chromium.org, Jun 19 2018

Issue description

chedule platform_InstallTestImage Test using cautotest. The test either fails with the below error:

DevServerException: stage_artifacts timed out: build=veyron_minnie-release/R66-10359.0.0, artifacts=['autotest_packages'], files=, archive_url=gs://chromeos-image-archive/veyron_minnie-release/R66-10359.0.0
https://ubercautotest.corp.google.com/afe/#tab_id=view_job&object_id=209570957

06/18 12:38:15.399 ERROR|         traceback:0013| Traceback (most recent call last):
06/18 12:38:15.400 ERROR|         traceback:0013|   File "/usr/local/autotest/server/autoserv", line 603, in run_autoserv
06/18 12:38:15.400 ERROR|         traceback:0013|     use_packaging=(not no_use_packaging))
06/18 12:38:15.400 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 777, in run
06/18 12:38:15.400 ERROR|         traceback:0013|     namespace)
06/18 12:38:15.401 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 1328, in _execute_code
06/18 12:38:15.401 ERROR|         traceback:0013|     execfile(code_file, namespace, namespace)
06/18 12:38:15.401 ERROR|         traceback:0013|   File "/usr/local/autotest/server/control_segments/verify_job_repo_url", line 10, in <module>
06/18 12:38:15.401 ERROR|         traceback:0013|     job.parallel_simple(install, machines, log=False)
06/18 12:38:15.401 ERROR|         traceback:0013|   File "/usr/local/autotest/server/server_job.py", line 613, in parallel_simple
06/18 12:38:15.401 ERROR|         traceback:0013|     log=log, timeout=timeout, return_results=return_results)
06/18 12:38:15.401 ERROR|         traceback:0013|   File "/usr/local/autotest/server/subcommand.py", line 98, in parallel_simple
06/18 12:38:15.402 ERROR|         traceback:0013|     function(arg)
06/18 12:38:15.402 ERROR|         traceback:0013|   File "/usr/local/autotest/server/control_segments/verify_job_repo_url", line 7, in install
06/18 12:38:15.402 ERROR|         traceback:0013|     host.verify_job_repo_url(job.tag)
06/18 12:38:15.402 ERROR|         traceback:0013|   File "/usr/local/autotest/server/hosts/cros_host.py", line 382, in verify_job_repo_url
06/18 12:38:15.403 ERROR|         traceback:0013|     ds.stage_artifacts(image_name, ['autotest_packages'])
06/18 12:38:15.403 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 376, in metrics_wrapper
06/18 12:38:15.403 ERROR|         traceback:0013|     return wrapper()
06/18 12:38:15.403 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 218, in func_retry
06/18 12:38:15.403 ERROR|         traceback:0013|     remaining_time)
06/18 12:38:15.403 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/retry.py", line 123, in timeout
06/18 12:38:15.403 ERROR|         traceback:0013|     default_result = func(*args, **kwargs)
06/18 12:38:15.404 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 370, in wrapper
06/18 12:38:15.404 ERROR|         traceback:0013|     return method(*args, **kwargs)
06/18 12:38:15.404 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 1567, in stage_artifacts
06/18 12:38:15.404 ERROR|         traceback:0013|     self._stage_artifacts(image, artifacts, files, archive_url)
06/18 12:38:15.404 ERROR|         traceback:0013|   File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 1267, in _stage_artifacts
06/18 12:38:15.404 ERROR|         traceback:0013|     'stage_artifacts timed out: %s' % staging_info)
06/18 12:38:15.404 ERROR|         traceback:0013| DevServerException: stage_artifacts timed out: build=veyron_minnie-release/R66-10359.0.0, artifacts=['autotest_packages'], files=, archive_url=gs://chromeos-image-archive/veyron_minnie-release/R66-10359.0.0
06/18 12:38:15.418 INFO |            client:0570| Attempting refresh to obtain initial access_token
06/18 12:38:15.463 INFO |            client:0872| Refreshing access_token
06/18 12:38:15.932 ERROR|          autoserv:0809| Uncaught SystemExit with code 1
Traceback (most recent call last):
  File "/usr/local/autotest/server/autoserv", line 805, in main
    use_ssp)
  File "/usr/local/autotest/server/autoserv", line 627, in run_autoserv
    sys.exit(exit_code)
SystemExit: 1
06/18 12:38:16.051 DEBUG|   logging_manager:0627| Logging subprocess finished
06/18 12:38:16.051 DEBUG|   logging_manager:0627| Logging subprocess finished

 

Comment 1 by jkop@chromium.org, Jun 19 2018

Potentially caused by crbug.com/854061, investigating.

Comment 2 by jkop@chromium.org, Jun 19 2018

Status: Unconfirmed (was: Untriaged)
Correction, that's not a plausible cause. This is probably transient; deverserver load is spiky and poorly load-balanced, which causes symptoms like this. Reassign to me if it still recurs.
Owner: harpreet@chromium.org
Looking at the logs, the devserver URL was this:
    http://100.115.127.249:8082/static/veyron_minnie-release/R66-10359.0.0/autotest/packages

Checking on that devserver:
    $ ssh 100.115.127.249 hostname
    ssh: connect to host 100.115.127.249 port 22: Connection timed out

So, the devserver is down.

shadow_config.ini on cautotest has a reverse map for the IP address:
    $ grep '^chromeos.* = 100.115.127.249' shadow_config.ini 
    chromeos15-infra-devserver2.cros.corp.google.com = 100.115.127.249

Whoever owns the chromeos15 lab needs to take care of it.

Cc: dschimmels@chromium.org
Owner: jashur@chromium.org
Joe / David, can you check the devserver?



Also, is there any alert mechanism in place to detect issues like this and alert appropriate owners? 
Cc: cra...@chromium.org
> Also, is there any alert mechanism in place to detect issues
> like this and alert appropriate owners?

<sigh> Yes, there's an alert mechanism.  No, it doesn't alert the
appropriate owners.  :-(

Status: Assigned (was: Unconfirmed)
This issue has an owner, a component and a priority, but is still listed as untriaged or unconfirmed. By definition, this bug is triaged. Changing status to "assigned". Please reach out to me if you disagree with how I've done this.

Sign in to add a comment