New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 781985 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Reks-release failed due to hwtest timed out

Project Member Reported by gkihumba@google.com, Nov 6 2017

Issue description

Reks has been failing for about a week:

1/06 01:28:34.965 ERROR|   logging_manager:0626| tko parser: The following lines were ignored:
11/06 01:28:34.965 ERROR|   logging_manager:0626| tko parser:   , 1) SSHConnectionError: ssh: connect to host 100.115.202.98 port 22: Connection timed out
11/06 01:28:34.966 ERROR|   logging_manager:0626| 
11/06 01:28:34.966 ERROR|   logging_manager:0626| tko parser:   , 
11/06 01:28:34.966 ERROR|   logging_manager:0626| 
11/06 01:28:34.967 ERROR|   logging_manager:0626| tko parser:   Traceback (most recent call last):
11/06 01:28:34.967 ERROR|   logging_manager:0626| 
11/06 01:28:34.967 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 806, in _call_test_function
11/06 01:28:34.968 ERROR|   logging_manager:0626| 
11/06 01:28:34.968 ERROR|   logging_manager:0626| tko parser:       return func(*args, **dargs)
11/06 01:28:34.968 ERROR|   logging_manager:0626| 
11/06 01:28:34.969 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 470, in execute
11/06 01:28:34.969 ERROR|   logging_manager:0626| 
11/06 01:28:34.970 ERROR|   logging_manager:0626| tko parser:       dargs)
11/06 01:28:34.970 ERROR|   logging_manager:0626| 
11/06 01:28:34.970 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
11/06 01:28:34.971 ERROR|   logging_manager:0626| 
11/06 01:28:34.971 ERROR|   logging_manager:0626| tko parser:       postprocess_profiled_run, args, dargs)
11/06 01:28:34.971 ERROR|   logging_manager:0626| 
11/06 01:28:34.972 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/test.py", line 380, in _call_run_once
11/06 01:28:34.972 ERROR|   logging_manager:0626| 
11/06 01:28:34.973 ERROR|   logging_manager:0626| tko parser:       self.run_once(*args, **dargs)
11/06 01:28:34.973 ERROR|   logging_manager:0626| 
11/06 01:28:34.973 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py", line 423, in run_once
11/06 01:28:34.974 ERROR|   logging_manager:0626| 
11/06 01:28:34.974 ERROR|   logging_manager:0626| tko parser:       cros_device.install_source_image(test_conf['source_payload_uri'])
11/06 01:28:34.974 ERROR|   logging_manager:0626| 
11/06 01:28:34.975 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/update_engine/chromiumos_test_platform.py", line 104, in install_source_image
11/06 01:28:34.975 ERROR|   logging_manager:0626| 
11/06 01:28:34.975 ERROR|   logging_manager:0626| tko parser:       self._install_version(source_payload_uri, clobber_stateful=True)
11/06 01:28:34.976 ERROR|   logging_manager:0626| 
11/06 01:28:34.976 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/server/cros/update_engine/chromiumos_test_platform.py", line 53, in _install_version
11/06 01:28:34.977 ERROR|   logging_manager:0626| 
11/06 01:28:34.977 ERROR|   logging_manager:0626| tko parser:       clobber_stateful=clobber_stateful)
11/06 01:28:34.978 ERROR|   logging_manager:0626| 
11/06 01:28:34.978 ERROR|   logging_manager:0626| tko parser:     File "/usr/local/autotest/client/common_lib/cros/dev_server.py", line 2380, in auto_update
11/06 01:28:34.978 ERROR|   logging_manager:0626| 
11/06 01:28:34.979 ERROR|   logging_manager:0626| tko parser:       error_msg % (host_name, real_error))
11/06 01:28:34.979 ERROR|   logging_manager:0626| 
11/06 01:28:34.979 ERROR|   logging_manager:0626| tko parser:   DevServerException: CrOS auto-update failed for host chromeos4-row11-rack7-host2: 0) SSHConnectionError: ssh: connect to host chromeos4-row11-rack7-host2 port 22: Connection timed out
11/06 01:28:34.980 ERROR|   logging_manager:0626| 
11/06 01:28:34.980 ERROR|   logging_manager:0626| tko parser:   , 1) SSHConnectionError: ssh: connect to host 100.115.202.98 port 22: Connection timed out
11/06 01:28:34.981 ERROR|   logging_manager:0626| 
11/06 01:28:34.981 ERROR|   logging_manager:0626| tko parser:   , 


Link to builds page:
https://uberchromegw.corp.google.com/i/chromeos_release/builders/reks-release%20release-R63-10032.B
 
The DUT chromeos4-row11-rack7-host2 looks problematic, I'm locking it temporarily.

$ dut-status -f chromeos4-row11-rack7-host2
chromeos4-row11-rack7-host2
    2017-11-06 22:19:06  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2025203-repair/
    2017-11-06 22:18:24  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2025198-reset/
    2017-11-06 21:54:44  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154476506-chromeos-test/
    2017-11-06 21:53:50  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2025082-reset/
    2017-11-06 21:32:33  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2025015-repair/
    2017-11-06 21:31:47  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2025009-reset/
    2017-11-06 16:51:27  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154437049-chromeos-test/
    2017-11-06 16:50:34  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024292-reset/
    2017-11-06 16:24:17  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024222-repair/
    2017-11-06 16:23:32  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024219-reset/
    2017-11-06 15:42:47  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154419533-chromeos-test/
    2017-11-06 15:41:59  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024116-reset/
    2017-11-06 15:39:55  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410902-chromeos-test/
    2017-11-06 15:39:02  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024103-reset/
    2017-11-06 15:37:44  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410896-chromeos-test/
    2017-11-06 15:36:52  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024095-reset/
    2017-11-06 15:35:15  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410889-chromeos-test/
    2017-11-06 15:34:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024089-reset/
    2017-11-06 15:33:14  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410883-chromeos-test/
    2017-11-06 15:32:29  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024083-reset/
    2017-11-06 15:31:23  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410873-chromeos-test/
    2017-11-06 15:30:33  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024077-reset/
    2017-11-06 15:29:00  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410860-chromeos-test/
    2017-11-06 15:28:17  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024070-reset/
    2017-11-06 15:27:02  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410831-chromeos-test/
    2017-11-06 15:26:15  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024063-reset/
    2017-11-06 15:25:07  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410829-chromeos-test/
    2017-11-06 15:24:21  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024060-reset/
    2017-11-06 15:22:54  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410820-chromeos-test/
    2017-11-06 15:21:57  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024051-reset/
    2017-11-06 15:20:35  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410812-chromeos-test/
    2017-11-06 15:19:43  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024034-reset/
    2017-11-06 15:18:19  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410782-chromeos-test/
    2017-11-06 15:17:12  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024015-reset/
    2017-11-06 15:15:40  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410770-chromeos-test/
    2017-11-06 15:14:48  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2024003-reset/
    2017-11-06 15:13:10  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410758-chromeos-test/
    2017-11-06 15:11:44  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023979-reset/
    2017-11-06 15:08:45  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154410827-chromeos-test/
    2017-11-06 14:51:14  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023873-provision/
    2017-11-06 14:29:58  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023799-repair/
    2017-11-06 14:11:28  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023735-provision/
    2017-11-06 11:34:57  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023327-cleanup/
    2017-11-06 11:31:05  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154395968-chromeos-test/
    2017-11-06 11:30:19  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023318-reset/
    2017-11-06 11:29:10  OK http://cautotest/tko/retrieve_logs.cgi?job=/results/hosts/chromeos4-row11-rack7-host2/2023305-cleanup/
    2017-11-06 11:27:36  -- http://cautotest/tko/retrieve_logs.cgi?job=/results/154367496-chromeos-test/
Issue 780745 has been merged into this issue.
Labels: Deputy
Cc: akes...@chromium.org
Owner: gkihumba@google.com
Still an issue? If so, please reassign to me.
Status: Fixed (was: Assigned)
Reks' last run was green. Thanks
Issue 780745 still remains same in M64.
Could you have a look?
Owner: pmanavalan@chromium.org
Status: Assigned (was: Fixed)
Please link to recent occurrence.
Cc: -akes...@chromium.org kkashamsetty@chromium.org kkan...@chromium.org rohi...@chromium.org nnita@chromium.org raraghavendra@chromium.org pmanavalan@chromium.org
Owner: akes...@chromium.org
w.r.t. Issue 780745,

10113.0.0 was the most recent failure(Build status FAIL)

https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?boards=reks&milestone=64&chromeOsVersion=&chromeVersion=&startTimeFrom=&startTimeTo=#/


Issue 780745 covers the rest.


Components: Infra>Client>ChromeOS
It's more helpful to link to the *specific failure*.

E.g. in this case I believe https://luci-milo.appspot.com/buildbot/chromeos/reks-release/1660 ?
Owner: ----
Status: Available (was: Assigned)
I don't see evidence that ^ is an infrastructure failure. At this point it's also hard to associate a particular event with a particular repair log.

@akeshet Yes true. it is always a good practice to provide enough info from testing side to triage the bug.

Due to restricted access, TVC's/I are/am refrained from accessing the link  you referenced in # 10  and build status "FAIL" link in Golden eye.
That said, it would be helpful if you share some insight on the gap that make it fruitful( preliminary check and logs of interest from dev perspective for such issues)

Rohit,
As per our guideline, Reks continues to have  "FAIL" build status in M64 and M63 and the same have been reported via  Issue 780745 which then duplicated to the current bug . This is Red(FAIL build status) most of the time and remains same till now. This affects our cts monitoring . Could you please guide us on the approach to take it forward?

Owner: shuqianz@chromium.org
+ deputy
Cc: xixuan@chromium.org
"/tmp/tmp60f18J"'
11/14 05:01:34.158 DEBUG|              test:0183| before_hook ends running.
11/14 05:01:34.215 DEBUG|              test:0362| starting before_iteration_hooks
11/14 05:01:34.216 DEBUG|              test:0365| before_iteration_hooks completed
11/14 05:01:34.217 DEBUG|              test:0379| starting test(run_once()), test details follow
()
11/14 05:01:34.217 DEBUG|provision_AutoUpda:0061| Start provisioning <remote host: chromeos4-row11-rack7-host5> to reks-release/R64-10127.0.0.
11/14 05:01:34.218 INFO |        dev_server:0707| Getting devservers for host: chromeos4-row11-rack7-host5
11/14 05:01:34.278 DEBUG|        dev_server:0728| The host chromeos4-row11-rack7-host5 (100.115.202.100) is in a restricted subnet. Try to locate a devserver inside subnet 100.115.192.0:19.
11/14 05:01:34.282 DEBUG|        dev_server:0679| Pick one healthy devserver from ['http://100.115.219.129:8082', 'http://100.115.219.130:8082', 'http://100.115.219.132:8082', 'http://100.115.219.133:8082', 'http://100.115.219.134:8082', 'http://100.115.219.136:8082', 'http://100.115.219.139:8082', 'http://100.115.219.135:8082']
11/14 05:01:34.283 DEBUG|        dev_server:0683| Check health for http://100.115.219.134:8082
11/14 05:01:34.284 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/check_health?"''
11/14 05:01:34.397 DEBUG|        dev_server:0936| Error occurred with exit_code 255 when executing the ssh call: ssh_exchange_identification: Connection closed by remote host
.
11/14 05:01:34.400 WARNI|             retry:0228| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.134 'curl "http://100.115.219.134:8082/check_health?"'> failed, rc=255, Command returned non-zero exit status
* Command: 
    ssh 100.115.219.134 'curl "http://100.115.219.134:8082/check_health?"'
Exit status: 255
Duration: 0.0241811275482

stderr:
ssh_exchange_identification: Connection closed by remote host)
11/14 05:01:34.404 WARNI|             retry:0183| Retrying in 2.741414 seconds...
11/14 05:01:37.160 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/check_health?"''
11/14 05:01:45.915 DEBUG|        dev_server:0688| Pick http://100.115.219.134:8082
11/14 05:01:45.916 INFO |        dev_server:1153| Staging artifacts on devserver http://100.115.219.134:8082: build=reks-release/R64-10127.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0
11/14 05:01:45.918 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/stage?artifacts=full_payload,stateful,autotest_packages&files=&async=True&archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0"''
11/14 05:01:54.459 DEBUG|        dev_server:1095| response for RPC: 'Success'
11/14 05:01:54.460 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/is_staged?artifacts=full_payload,stateful,autotest_packages&files=&archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0"''
11/14 05:02:02.993 DEBUG|        dev_server:1051| whether artifact is staged: 'True'
11/14 05:02:02.998 INFO |        dev_server:1171| Finished staging artifacts: build=reks-release/R64-10127.0.0, artifacts=['full_payload', 'stateful', 'autotest_packages'], files=, archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0
11/14 05:02:03.004 INFO |        dev_server:1489| Requesting contents from devserver http://100.115.219.134:8082 for image reks-release/R64-10127.0.0
11/14 05:02:03.005 DEBUG|             utils:0212| Running 'ssh 100.115.219.134 'curl "http://100.115.219.134:8082/list_image_dir?archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0"''
11/14 05:02:03.166 DEBUG|        dev_server:0936| Error occurred with exit_code 255 when executing the ssh call: ssh_exchange_identification: Connection closed by remote host
.
11/14 05:02:03.168 WARNI|             retry:0228| <class 'autotest_lib.client.common_lib.error.CmdError'>(Command <ssh 100.115.219.134 'curl "http://100.115.219.134:8082/list_image_dir?archive_url=gs://chromeos-image-archive/reks-release/R64-10127.0.0"'> failed, rc=255, Command returned non-zero exit status
* Command: 
    ssh 100.115.219.134 'curl
    "http://100.115.219.134:8082/list_image_dir?archive_url=gs://chromeos-
    image-archive/reks-release/R64-10127.0.0"'
Exit status: 255
Duration: 0.0392081737518

stderr:
ssh_exchange_identification: Connection closed by remote host)

I find this on the recent logging of the failed HWTest.
Cc: benchan@chromium.org
Owner: nxia@chromium.org
Pass to current deputy.

Comment 17 by nxia@chromium.org, Nov 22 2017

https://uberchromegw.corp.google.com/i/chromeos_release/builders/reks-release%20release-R63-10032.B/builds/41


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/157829732-chromeos-test/chromeos4-row11-rack6-host20/autoupdate_EndToEndTest.paygen_au_dev_full/debug/

The latest build failed at 

 Traceback (most recent call last):
  File "/home/chromeos-test/chromiumos/src/platform/dev/cros_update.py", line 269, in TriggerAU
    clobber_stateful=self.clobber_stateful)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 1020, in __init__
    payload_filename=payload_filename)
  File "/home/chromeos-test/chromiumos/chromite/lib/auto_updater.py", line 252, in __init__
    self.device_dev_dir = os.path.join(self.device.work_dir, 'src')
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 677, in work_dir
    capture_output=True).output.strip()
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 959, in BaseRunCommand
    return self.GetAgent().RemoteSh(cmd, **kwargs)
  File "/home/chromeos-test/chromiumos/chromite/lib/remote_access.py", line 349, in RemoteSh
    raise SSHConnectionError(e.result.error)
SSHConnectionError: ssh: connect to host 100.115.202.94 port 22: Connection timed out
11/22 01:13:33.747 DEBUG|        dev_server:2337| Please see error details in log /usr/local/autotest/results/157829732-chromeos-test/chromeos4-row11-rack6-host20/autoupdate_logs/CrOS_update_100.115.202.94_18149.log
11/22 01:13:33.763 CRITI|chromiumos_test_pl:0055| ERROR: Failed to install image on the DUT.


11/22 01:13:33.765 DEBUG|              test:0410| Test failed due to CrOS auto-update failed for host chromeos4-row11-rack6-host20: 0) Could not copy /home/chromeos-test/images/dev-channel/reks/10032.50.0/payloads/chromeos_10032.50.0_reks_dev-channel_full_test.bin-a8b9705409820a6bcec6b1d4787b0c49 to device., 1) SSHConnectionError: ssh: connect to host 100.115.202.94 port 22: Connection timed out, . Exception log follows the after_iteration_hooks.


Comment 18 Deleted

Comment 19 by nxia@chromium.org, Nov 22 2017

But the dut chromeos4-row11-rack6-host20 seems able to run tests for R64 branch, only failed at provision jobs when the previous run was for R63. 

Comment 20 by nxia@chromium.org, Nov 22 2017

Going to lock the dut, will see if it helps 

Comment 21 by nxia@chromium.org, Nov 27 2017

Owner: xixuan@chromium.org
reks was passing in the past few days, the last failed one was because the lab was closed. passing this to the current deputy to monitor.
Cc: -xixuan@chromium.org dgarr...@chromium.org
Owner: pho...@chromium.org
Pass to current deputy & secondary, recently failed builds are due to hwtest timed out. Should we add more DUTs to bvt?
Summary: Reks-release failed due to hwtest timed out (was: Reks dev server SSH timeout on M63 branch)
Status: Fixed (was: Available)

Sign in to add a comment