unclear that findit is doing something |
|||||||||
Issue descriptionPage URL: https://findit-for-me.appspot.com/waterfall/flake?redirect=1&key=ag9zfmZpbmRpdC1mb3ItbWVyjwELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCJZY2hyb21pdW0ud2luL1dpbjcgVGVzdHMgKGRiZykoMSkvNjIxNTkvYnJvd3Nlcl90ZXN0cy9Rbkp2ZDNObGNsUmxjM1F1VjJsdVpHOTNUM0JsYmtOc2IzTmwMCxITTWFzdGVyRmxha2VBbmFseXNpcxgBDA Description: after entering the information about the flake I'm interested in, I get a somewhat coarse graph, and nothing happens. How can I know that the tool is doing something? ⛆ |
|
|
,
Jan 23 2018
Hey jochen@, in the top-right there's a status bar for the analysis. I circled what mine looks like here: https://screenshot.googleplex.com/7Q0Mnc3OfKu.png I can see why it's hard to tell that the analysis is running, though. It's tough because the analyses take 2ish hours, and we don't know exactly when they'll finish. This sort of rules out a progress bar. Any suggestions on how we can improve would be greatly appreciated.
,
Jan 23 2018
Apologies. Looks like this is possibly related to a deadloop case I'm looking into right now.
,
Jan 23 2018
Looks like it's stuck on build 62138.
,
Jan 23 2018
The root cause of this was fixed: https://chromium-review.googlesource.com/c/infra/infra/+/615095 I think we should purge old app engine instances, and this should take care of it.
,
Jan 24 2018
Re #3, this specific analysis was not running into the dead loop. Instead, the analysis was completed (maybe bailed out for some reason), but the status was not updated accordingly.
,
Jan 24 2018
It's still showing "running" for me - but if it bailed out, should I start a new run? It could also show me a link to the current swarming jobs, so I can at least check myself that the build is progressing?
,
Jan 24 2018
,
Jan 31 2018
There's some stuff that we can do, for sure. I opened an issue we can investigate moving forward. In the mean-time, I did do a rerun of the analysis you were looking at. Rerun: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVyjwELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCJZY2hyb21pdW0ud2luL1dpbjcgVGVzdHMgKGRiZykoMSkvNjIxNTkvYnJvd3Nlcl90ZXN0cy9Rbkp2ZDNObGNsUmxjM1F1VjJsdVpHOTNUM0JsYmtOc2IzTmwMCxITTWFzdGVyRmxha2VBbmFseXNpcxgCDA
,
Feb 1 2018
thanks for the rerun. The results, however, don't look like it actually found a culprit :/
,
Feb 2 2018
Apologies that the results weren't helpful. Findit did identify a regression range: https://chromium.googlesource.com/chromium/src/+log/b4c871c98d03bf300220b1d6f27bf868eac718fe..64cae83473fce17e2d013ba1ca89d80b5e08c586?pretty=fuller Looked through them, but not particularly convinced that this regression range is correct. Mind taking a look to confirm? The outcome is just slightly over our threshold for something that constitutes flaky.
,
Feb 2 2018
,
Feb 2 2018
I filed an issue around this. We have an upper bound of 98% pass rate to determine something flaky. I lowered it to avoid this sort of thing with the rerun I did: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVyjwELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCJZY2hyb21pdW0ud2luL1dpbjcgVGVzdHMgKGRiZykoMSkvNjIxNTkvYnJvd3Nlcl90ZXN0cy9Rbkp2ZDNObGNsUmxjM1F1VjJsdVpHOTNUM0JsYmtOc2IzTmwMCxITTWFzdGVyRmxha2VBbmFseXNpcxgDDA I'll follow up again once this analysis completes to see if my change has the desired effect.
,
Feb 5 2018
Thx for rerunning the analysis. I agree that the regression range looks unlikely. It seems that the graph stops going into the past as soon as it finds one build that is above the threshold? Is it possible to go further to verify that the test is indeed stable before?
,
Feb 5 2018
No problem, sorry Findit couldn't get answers for you. The commit log coming out of the rerun looks more likely (if for no other reason that it's larger) https://chromium.googlesource.com/chromium/src/+log/22acad3fa9d770e7a5f841fb25bb3a7649e554db..3b38e2d000fc51de3f573172c4912437acc628d4?pretty=fuller It's not clear to me what exactly is flaking your test, since it times out on failure. See this test log for details on the test runs: https://chromium-swarm.appspot.com/task?id=3b72187b49288510&refresh=10&show_raw=1 | It seems that the graph stops going into the past as soon as it finds one build that is above the threshold? | Is it possible to go further to verify that the test is indeed stable before? Findit makes the assumption that a point we know is stable is equivalent to multiple adjacent points are stable. This regression range looks pretty convincing to me. The list is pretty long, but if try-jobs were ran against this range, we might find the culprit. Problem is that we currently don't have a way to force try-jobs to run. This is actionable from my end, I'll file a bug for it.
,
Feb 5 2018
Filed a bug that allows admins to force try-jobs to run. In this case where your flakiness problem might be upstream, it would be helpful to force try-jobs to run to confirm what's wrong.
,
Feb 5 2018
We've moved forward with a change to run try-jobs for these cases regardless of confidence. That'll help determine the true culprit of this. Hopefully this will be deployed tomorrow!
,
Feb 7 2018
I can repro the failures locally, and it seems that the timeouts are just because the test sometimes takes a long time, i.e., increasing the timeout will make it pass reliably.
,
Feb 9 2018
Missed this weeks deployment, we're thinking next week. I posted a regression range that might be useful in the meantime. https://chromium.googlesource.com/chromium/src/+log/22acad3fa9d770e7a5f841fb25bb3a7649e554db..3b38e2d000fc51de3f573172c4912437acc628d4?pretty=fuller
,
Mar 21 2018
I looked at BrowserTest.WindowOpenClose3 (https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVykwELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCJdY2hyb21pdW0ud2luL1dpbjcgVGVzdHMgKGRiZykoMSkvNjU5MzIvYnJvd3Nlcl90ZXN0cy9Rbkp2ZDNObGNsUmxjM1F1VjJsdVpHOTNUM0JsYmtOc2IzTmxNdz09DAsSE01hc3RlckZsYWtlQW5hbHlzaXMYAQw) however, the regression range there is incorrect as well :/ I've split up WindowOpenClose into WindowOpenClose{1,2,3} and 3 is the one that's flaky on windows, so the correct build to blame would be the one where I split up the test, but it points to a random other build
,
Mar 21 2018
Thanks for pointing that out! The analysis you linked to has only 40% confidence, so it's findings shouldn't be considered correct. Having said that, it looks like on build 65922, the test was fully passing and the next build 65923 it was slightly flaky so according to Findit this is a good reason to consider these two builds as the regression range. From what I can tell, it might have been the chrome-release-bot change that introduced the flakiness (at least from Findit's perspective). https://chromium.googlesource.com/chromium/src/+/9b1c4662c8d16ef7a84ce3ec47ff160a5293bb7b When did you split the test up? This analysis is roughly a month old.
,
Mar 21 2018
,
Apr 20 2018
|
||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by st...@chromium.org
, Jan 23 2018Status: Assigned (was: Unconfirmed)