New issue
Advanced search Search tips

Issue 814493 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Flaky GCETests on lakitu-release

Project Member Reported by norvez@chromium.org, Feb 21 2018

Issue description

https://luci-milo.appspot.com/buildbot/chromeos/lakitu-release/3130 passed, but needed a retry
https://luci-milo.appspot.com/buildbot/chromeos/lakitu-release/3129 failed however.

Some of the failures are similar:
"
/tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer         [  FAILED  ]
/tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer           FAIL: Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: 
/tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer           02/21 14:19:21.281 ERROR|             utils:0282| [stderr] Warning: Permanently added '35.202.139.64' (RSA) to the list of known hosts.
/tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer           02/21 14:20:05.608 ERROR|             utils:0282| [stderr] ssh: connect to host 35.202.139.64 port 22: Connection timed out
"
 

Comment 1 by edjee@google.com, Feb 27 2018

Owner: wonderfly@google.com
Assigning to Daniel Wang, who modified the test a few times. Please feel free to reassign, if you're not the right assignee.
Cc: lakitu-dev@google.com
Thanks for noticing and reporting. We are aware of this, and a few other flaky tests, and will address them, though with P2 priority.

Is it blocking anything on chromeos side, or causing any trouble for you guys? If not, I'm going to close this and follow up on my internal bug tracker.

Comment 3 by norvez@chromium.org, Feb 27 2018

Cc: ejcaruso@chromium.org matthewmwang@chromium.org
+sheriffs FYI

It's not blocking, as long as it's only in the -release builder and not in the -paladin builder.
However it does trigger alerts that the build sheriffs are supposed to investigate. If it's known to be flaky can you mark the builder as experimental? That way it won't send alerts.
We have deliberately taken these flaky tests out of the paladin builders so they should be fairly stable. And if not, feel free to file bugs against us and mark them experimental as necessary.

The release builders are configured to run all tests. I am not sure there is an "experimental" tag for release builders, but we definitely want to be included if the sheriff is looking at some global breakage.

Comment 5 by norvez@chromium.org, Feb 27 2018

There is an experimental tag for release builders (see for example lakitu-nc-release on go/crosbuild) that are known to be unreliable so they don't crowd out the failures of the "believed stable" builders.

If the builders are experimental they're still listed on the waterfall so sheriffs can look at them -though usually they won't by default-, but they're not reported in go/som which is the main monitoring tool. That way the sheriffs don't have to waste time sorting out the "expected" failures from the unexpected failures.
Good to know. We do want to stay important for our other release builders though. Our oncall actually gets alerted too if something fails. It's their job to triage flaky tests and get the builders back green. So please continue to file bugs when you see a failed build on lakitu-release, and assign them to our oncall - there is a link to our oncall contact, right on go/crosbuild, named "Lakitu Stormchaser:".

Instead of marking the builder experimental, what we typically do is disable the flaky test. In this particular case, the test shows some flakiness but not terribly unacceptable, so we haven't done so. I'll file a bug internally and have somebody look at it.
Status: Fixed (was: Assigned)
The bug was resolved internally and those tests have been passing so far. We do still see test flakes from time to time, but we won't need this bug to keep track of that.

Sign in to add a comment