Flaky GCETests on lakitu-release |
||||
Issue descriptionhttps://luci-milo.appspot.com/buildbot/chromeos/lakitu-release/3130 passed, but needed a retry https://luci-milo.appspot.com/buildbot/chromeos/lakitu-release/3129 failed however. Some of the failures are similar: " /tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer [ FAILED ] /tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer FAIL: Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: /tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer 02/21 14:19:21.281 ERROR| utils:0282| [stderr] Warning: Permanently added '35.202.139.64' (RSA) to the list of known hosts. /tmp/cbuildbotZMsa6q/gce-smoke/test_harness/all/SimpleTestVerify/2_autotest_tests/results-10-kubernetes_StandaloneKubeletServer/kubernetes_StandaloneKubeletServer 02/21 14:20:05.608 ERROR| utils:0282| [stderr] ssh: connect to host 35.202.139.64 port 22: Connection timed out "
,
Feb 27 2018
Thanks for noticing and reporting. We are aware of this, and a few other flaky tests, and will address them, though with P2 priority. Is it blocking anything on chromeos side, or causing any trouble for you guys? If not, I'm going to close this and follow up on my internal bug tracker.
,
Feb 27 2018
+sheriffs FYI It's not blocking, as long as it's only in the -release builder and not in the -paladin builder. However it does trigger alerts that the build sheriffs are supposed to investigate. If it's known to be flaky can you mark the builder as experimental? That way it won't send alerts.
,
Feb 27 2018
We have deliberately taken these flaky tests out of the paladin builders so they should be fairly stable. And if not, feel free to file bugs against us and mark them experimental as necessary. The release builders are configured to run all tests. I am not sure there is an "experimental" tag for release builders, but we definitely want to be included if the sheriff is looking at some global breakage.
,
Feb 27 2018
There is an experimental tag for release builders (see for example lakitu-nc-release on go/crosbuild) that are known to be unreliable so they don't crowd out the failures of the "believed stable" builders. If the builders are experimental they're still listed on the waterfall so sheriffs can look at them -though usually they won't by default-, but they're not reported in go/som which is the main monitoring tool. That way the sheriffs don't have to waste time sorting out the "expected" failures from the unexpected failures.
,
Feb 28 2018
Good to know. We do want to stay important for our other release builders though. Our oncall actually gets alerted too if something fails. It's their job to triage flaky tests and get the builders back green. So please continue to file bugs when you see a failed build on lakitu-release, and assign them to our oncall - there is a link to our oncall contact, right on go/crosbuild, named "Lakitu Stormchaser:". Instead of marking the builder experimental, what we typically do is disable the flaky test. In this particular case, the test shows some flakiness but not terribly unacceptable, so we haven't done so. I'll file a bug internally and have somebody look at it.
,
Mar 20 2018
The bug was resolved internally and those tests have been passing so far. We do still see test flakes from time to time, but we won't need this bug to keep track of that. |
||||
►
Sign in to add a comment |
||||
Comment 1 by edjee@google.com
, Feb 27 2018