New issue
Advanced search Search tips

Issue 900101 link

Starred by 1 user

Issue metadata

Status: Closed
Owner:
Closed: Oct 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Sanity HWTest timing out on peppy-chrome-pfq and cyan-chrome-pfq

Project Member Reported by martis@chromium.org, Oct 30

Issue description

peppy-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931294908214728048
cyan-chrome-pfq: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931294919197651552

Example error logs are:
31m20:54:06: ERROR: Timeout occurred- waited 19813 seconds, failing. Timeout reason: This build has reached the timeout deadline set by the master. Either this stage or a previous one took too long (see stage timing historical summary in ReportStage) or the build failed to start on time.

The reported error is the same as crbug.com/900092.

Assigning to infra deputy.
 
Labels: -Pri-2 Pri-1
P1 since this is causing PFQ failures
Status: Assigned (was: Untriaged)
It seems like dummy_PassServer.sanity is failing, but there is not much information in the logs.

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8931294908810675632

from autoserv.DEBUG, it seems like the dummy_PassServer.sanity is queued but never run/killed due to timeout. So this seems like an infra issue (possibly related to the ganeti move? issue 882562)

10/29 19:41:17.344 INFO |      suite_common:0341| Parsed 1 child test control files.
10/29 19:41:17.345 DEBUG|             suite:1059| Discovered 1 tests.
10/29 19:41:17.346 INFO |        server_job:0217| INFO	----	Start sanity	timestamp=1540867277	localtime=Oct 29 19:41:17	
10/29 19:41:17.347 DEBUG|             suite:1007| Scheduling dummy_PassServer.sanity
10/29 19:41:17.941 DEBUG|             suite:1289| Adding job keyval for dummy_PassServer.sanity=253446205-chromeos-test
10/29 19:41:17.941 DEBUG|             suite:1079| Scheduled 1 tests, writing the total to keyval.
10/29 19:41:17.941 DEBUG|             suite:1089| Initializing RetryHandler for suite sanity.
10/29 19:41:17.942 DEBUG|             suite:0114| Test dummy_PassServer.sanity has no retries
10/29 19:41:17.942 DEBUG|             suite:1094| retry map created: {} 
10/29 19:41:17.942 DEBUG|     dynamic_suite:0608| Waiting on suite.
10/29 19:42:57.806 ERROR|   logging_manager:0626| Current thread 0x00007f8fff117740:
10/29 19:42:57.807 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/job_status.py", line 144 in _sleep
10/29 19:42:57.807 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/job_status.py", line 136 in wait_for_results
10/29 19:42:57.807 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/suite.py", line 1170 in wait
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/dynamic_suite.py", line 609 in _run_suite
10/29 19:42:57.808 DEBUG|          autoserv:0376| Received SIGTERM
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/dynamic_suite.py", line 573 in _run_suite_with_spec
10/29 19:42:57.808 DEBUG|          autoserv:0379| Finished writing to pid_file. Killing process.
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/dynamic_suite.py", line 559 in _perform_reimage_and_run
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/cros/dynamic_suite/dynamic_suite.py", line 513 in reimage_and_run
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/results/253441733-chromeos-test/hostless/control.srv", line 55 in <module>
10/29 19:42:57.808 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/server_job.py", line 1340 in _execute_code
10/29 19:42:57.809 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/server_job.py", line 817 in run
10/29 19:42:57.809 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/autoserv", line 592 in run_autoserv
10/29 19:42:57.809 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/autoserv", line 806 in main
10/29 19:42:57.809 ERROR|   logging_manager:0626|   File "/usr/local/autotest/server/autoserv", line 823 in <module>
Status: Closed (was: Assigned)
Failures are expected for builds started yesterday due to a machine migration in the lab (issue 882562). I expect the timeout issue to be resolved at this point.

The latest cyan run succeeded and the latest peppy run failed with an honest test failure rather than an infra failure.

Sign in to add a comment