New issue
Advanced search Search tips

Issue 917895 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Dec 31
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Tast failed to reconnect to DUT to collect system info

Project Member Reported by dhanyaganesh@chromium.org, Dec 26

Issue description

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8926077091826811776

I have no idea what went wrong here. Both tast related CLs included have passed the build separately. Hopefully derat@ can shed some light.
 
Cc: hidehiko@chromium.org akes...@chromium.org jclinton@chromium.org nya@chromium.org
Summary: Tast failed to reconnect to DUT to collect system info (was: Tast flake faliure)
It looks like the DUT didn't respond when tast tried to SSH to it to collect system information after running tests:

2018/12/26 06:09:43 Collecting system information
2018/12/26 06:09:43 Connecting to chromeos4-row5-rack10-host3:22
...
2018/12/26 06:09:53 Results saved to /usr/local/autotest/results/lxc_job_folder/tast/results
2018/12/26 06:09:53 Failed to write results: dial tcp: i/o timeout

Oddly, the tast process was able to connect to it just before then. There are some kernel warnings, but I don't know if they're related or not:

2018-12-26T06:09:42.807150-08:00 INFO sshd[3115]: Accepted publickey for root from 100.116.60.160 port 58750 ssh2: RSA SHA256:Fp1qWjFLyK1cTpiI5rdk7iEJwoK9lcnYAgbQtGC3jzU
2018-12-26T06:09:43.447158-08:00 WARNING kernel: [   24.032079] host1x 50000000.host1x: start latency exceeded, new value 55917 ns
2018-12-26T06:09:51.447148-08:00 WARNING kernel: [   32.000434] host1x 50000000.host1x: start latency exceeded, new value 71917 ns
2018-12-26T06:09:51.447171-08:00 INFO kernel: [   32.009930] mwifiex_sdio mmc1:0001:1: info: scan: num_probes = 4
2018-12-26T06:09:56.576770-08:00 WARNING kernel: [   37.111173] host1x 50000000.host1x: start latency exceeded, new value 508750 ns
2018-12-26T06:09:56.986843-08:00 WARNING kernel: [   37.519219] gk20a 57000000.gk20a: stop latency exceeded, new value 1159166 ns
2018-12-26T06:10:01.446810-08:00 INFO kernel: [   41.961028] mwifiex_sdio mmc1:0001:1: info: scan: num_probes = 4

Maybe there was a brief network issue. I'm not excited about the idea of adding retries, since we really depend on network connections to DUTs being reliable.

Has this only happened once?
So far, yes. Will report here if I see anymore this week.
Status: WontFix (was: Untriaged)
Closing for now, but please reopen this if it happens again.

Sign in to add a comment