New issue
Advanced search Search tips

Issue 813139 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

test_installer seems flaky

Project Member Reported by martiniss@chromium.org, Feb 16 2018

Issue description

https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/101605 is an example of a build where this is flaky.

It doesn't seem to be uploading to the flakiness dashboard, but I did some dremel queries, and it looks like it flakes at about a .1% rate; ~1 per day, with 1000 builds a day. So this isn't urgent, but I'd thought I'd let you know this is happening.

Only seems to be happening on windows.
 

Comment 1 by grt@chromium.org, Feb 28 2018

I don't see test_installer on the flakiness dashboard, nor can I see the test output for the build you referenced. Could you help me find the log from a more recent run of the test? Thanks.

Comment 3 by grt@chromium.org, Apr 6 2018

Labels: -Pri-3 Pri-2
Owner: mmeade@chromium.org
Status: Assigned (was: Available)
None of those logs load for me.

I've just looked through hundreds of win7_chromium_rel_ng runs, and each test_installer failure I see is a legit problem with Chrome rather than something flaky with the test. Recent failures include log output from Chrome showing DCHECKs being hit; e.g., https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/137635.

Is there a way to find all failing runs to be sure that enough actionable information is being reported by the test?

Assigning to mmeade@, who has been making huge improvements to this test harness.

Comment 4 by mmeade@chromium.org, Apr 10 2018

Is there any update on this? I'm not able to see any of those either. Are we sure this is a flake and not valid failures like Greg mentioned? We found legitimate bugs with cls within 12 hours of uploading, so its possible if we have a 0.1% "flake" rate, it may actually be a legit failure rate. Can you post your dremel queries so I can try to monitor for it?

Comment 5 by mmenke@chromium.org, Apr 11 2018

Cc: mmenke@chromium.org
I've just gotten 4 test_installer failures in in a row in windows. 
https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/143779

IF this is broken, can we disable it until it's fixed?

I see no clear failure in the log.  The CL in question removes a function in net/, and modified its one consumer (Chrome omnibox search code, which I don't think the test installer would depend on) - https://chromium-review.googlesource.com/c/chromium/src/+/1004855
win7_chromium_rel_ng is being very flaky today, see  issue 831585 . So this might be a fluke? Just FYI

Comment 7 by mmeade@chromium.org, Apr 11 2018

Cc: dpranke@chromium.org
+cc dpranke@

There seems to be a problem just after the test runner finishes the test and uploads the output to the server. The logs are there, and they show that the test itself passed successfully. It looks like the runner looses connectivity to a server and causes the runner to crash (which is why it didn't run anything after this). It doesn't look like this is the win7_chromium_rel_ng bug, but maybe it is related.

This doesn't seem to be a bug in the test since it completed successfully and the runner uploaded the logs. I'm cc'ing Dirk to see if he can see something I don't.


Here is what I found in the logs

@@@SEED_STEP@test_installer (with patch)@@@
@@@STEP_CURSOR@test_installer (with patch)@@@
This build is configured to send log data exclusively to LogDog. Please use the LogDog link on the build page to view this log stream.
@@@STEP_STARTED@@@
@@@STEP_LINK@stdout-->stdio@https://logs.chromium.org/v/?s=chromium%2Fbb%2Ftryserver.chromium.win%2Fwin7_chromium_rel_ng%2F143779%2F%2B%2Frecipes%2Fsteps%2Ftest_installer__with_patch_%2F0%2Fstdout@@@
LogDog Link [stdio]: https://logs.chromium.org/v/?s=chromium%2Fbb%2Ftryserver.chromium.win%2Fwin7_chromium_rel_ng%2F143779%2F%2B%2Frecipes%2Fsteps%2Ftest_installer__with_patch_%2F0%2Fstdout

remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
There was a network hiccup today that caused a bunch of machines to lose connectivity with the buildbot master. That's probably what this is; issue 831730.

Comment 9 by mmeade@chromium.org, Apr 11 2018

Ah, that would be it. Thanks for finding it.

I'm assigning back to martiniss to monitor and hopefully close.
Owner: martiniss@chromium.org
Take 2
Status: Fixed (was: Assigned)

Sign in to add a comment