New issue
Advanced search Search tips

Issue 869452 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Nov 1
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 1
Type: Bug-Regression



Sign in to add a comment

'Couldn't connect using SSH' flakes from Fuchsia/x64 FYI bot since latest SDK roll

Project Member Reported by w...@chromium.org, Jul 31

Issue description

Individual test suite runs have been flaking on the FYI bot since the latest SDK roll (https://chromium-review.googlesource.com/c/chromium/src/+/1155791) with:

...
2018-07-30 20:54:56,491:INFO:root:[00001.968] 04472.04485> netstack: started
2018-07-30 20:54:56,491:INFO:root:[00001.969] 04472.04485> netstack: socket dispatcher started
2018-07-30 20:54:56,491:INFO:root:[00001.983] 04472.04781> netstack: starting http pprof server on 0.0.0.0:6060
2018-07-30 20:54:56,491:INFO:root:[00001.987] 04472.04798> netstack: watching for ethernet devices
2018-07-30 20:55:44,584:ERROR:root:Timeout limit reached.
Traceback (most recent call last):
  File "/b/s/w/ir/build/fuchsia/test_runner.py", line 117, in <module>
    sys.exit(main())
  File "/b/s/w/ir/build/fuchsia/test_runner.py", line 91, in main
    target.Start()
  File "/b/s/w/ir/build/fuchsia/qemu_target.py", line 140, in Start
    self._WaitUntilReady();
  File "/b/s/w/ir/build/fuchsia/target.py", line 211, in _WaitUntilReady
    raise FuchsiaTargetException('Couldn\'t connect using SSH.')
target.FuchsiaTargetException: Couldn't connect using SSH.

We're not seeing these flakes on the main waterfall, or the ARM bot, yet, but those bots run fewer test suites, which may explain the difference.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jul 31

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/200790be63c5ac395adaf3f666c75dc5d52ba441

commit 200790be63c5ac395adaf3f666c75dc5d52ba441
Author: Wez <wez@chromium.org>
Date: Tue Jul 31 17:08:00 2018

Revert "Roll Fuchsia SDK from be1455d02b09 to 0e8488b5f1c0"

This reverts commit 68793bfafef08559f6aca41751e93442b4b28142.

Reason for revert: SDK appears to make SSH connection to VM flaky.

Original change's description:
> Roll Fuchsia SDK from be1455d02b09 to 0e8488b5f1c0
>
>
> The AutoRoll server is located here: https://fuchsia-sdk-chromium-roll.skia.org
>
> Documentation for the AutoRoller is here:
> https://skia.googlesource.com/buildbot/+/master/autoroll/README.md
>
> If the roll is causing failures, please contact the current sheriff, who should
> be CC'd on the roll, and stop the roller if necessary.
>
>
> CQ_INCLUDE_TRYBOTS=luci.chromium.try:fuchsia_arm64_cast_audio;luci.chromium.try:fuchsia_x64_cast_audio
> TBR=cr-fuchsia+bot@chromium.org
>
> Change-Id: Ife1784c580c2b4fb9e19f6a24fcd4111dc0e54e3
> Reviewed-on: https://chromium-review.googlesource.com/1155791
> Reviewed-by: Fuchsia SDK Autoroller <fuchsia-sdk-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
> Commit-Queue: Fuchsia SDK Autoroller <fuchsia-sdk-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
> Cr-Commit-Position: refs/heads/master@{#579219}

TBR=fuchsia-sdk-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com,cr-fuchsia+bot@chromium.org

Change-Id: Ib5f8aab6e5bdf123c9c57b3c29aee1fd0e157f96
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  869452 
Cq-Include-Trybots: luci.chromium.try:fuchsia_arm64_cast_audio;luci.chromium.try:fuchsia_x64_cast_audio
Reviewed-on: https://chromium-review.googlesource.com/1155616
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#579441}
[modify] https://crrev.com/200790be63c5ac395adaf3f666c75dc5d52ba441/build/fuchsia/sdk.sha1

Cc: sergeyu@chromium.org
Filed NET-1226 for this issue.

Status: ExternalDependency (was: Started)
Owner: sergeyu@chromium.org
Status: Assigned (was: ExternalDependency)
This just failed again in https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/fuchsia-fyi-x64-rel/1122, which is after I reverted the SDK roll.

sergeyu@ do you think this might actually be related to your runner-script change to have it start up the loglistener?
Components: -Infra Infra>Client>Chrome
Status: WontFix (was: Assigned)
Not seeing this any longer, AFAIK.
Owner: scottmg@chromium.org
Status: Assigned (was: WontFix)
I get this frequently locally, after ~50% of device reboots. But after re-netbooting one or more times it works again. Once it's connected once over ssh, it works consistently until the device is netbooted again, so I assume it's a race in credential setup (somehow?) rather than a local networking situation.

I had assumed everyone was suffering this but I guess not. :) I'll take a look and try to track down what's happening then.
This bug was specifically about the FYI bot flaking, which I don't _think_ we're seeing any more, since that SDK roll.

IIUC your issue is a device deployment, which I know various folks have had issues with, due to issues with the authentication step, I think.
Cc: scottmg@chromium.org
Owner: ----
Status: WontFix (was: Assigned)
Oh, OK. I was just going based on "Couldn't connect using SSH". I can investigate elsewhere anyway.

Sign in to add a comment