New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 850319 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 11
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

Single build can trigger too many analyses

Project Member Reported by robert...@chromium.org, Jun 7 2018

Issue description

e.g.

https://findit-for-me.appspot.com/waterfall/build-failure?url=https://build.chromium.org/p/chromium.memory/builders/Linux%20MSan%20Tests/builds/10255

Having too many analyses running simultaneously can degrade the performance, especially when they are analyzing different tests in the same build.

There are several ways to mitigate this, I can think of a couple
- We could stagger or delay subsequent analyses for the same build/target 
- We could lower the priority of swarming tasks and tryjobs used by the later analyses.

 

Comment 1 by st...@chromium.org, Jun 13 2018

Another option is NOT to trigger an analysis if the flake couldn't be reproduced in the 30 reruns of the deflake Swarming tasks which is triggered before the tryjob-based analysis. I suspect that a lot of them are NOT reproducible in the 30 reruns.

Comment 2 by chanli@chromium.org, Jun 13 2018

Roberto and I just had a discussion on this, maybe we can do something like:
trigger analyses like we do right now, but each analysis will have a lower priority than the previous one. And we can add a new pipeline after a culprit is found which can run a single swarming task to confirm that culprit also causes other flakiness. For the tests are caused by the culprit, we update their MasterfalkeAnalysis with the culprit and finalize the analyses. And leave other analyses to keep running.
After discussion with the team the most reasonable solution would be to base results purely on the deflake task - if all 30/30 reruns pass, do not trigger flake analyses for those tests. Statistically for a test to fail 4x in a row on the main builder then pass 30/30 may point to something being wrong on the bot and not necessarily with the test

Comment 4 by st...@chromium.org, Jun 14 2018

Are we working on a fix?

Comment 5 by chanli@chromium.org, Jun 14 2018

Owner: chanli@chromium.org
Status: Assigned (was: Untriaged)
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/ea38bfd4549333e8b1ab78c00b5ca787390efb3d

commit ea38bfd4549333e8b1ab78c00b5ca787390efb3d
Author: Chan <chanli@chromium.org>
Date: Fri Jun 15 17:59:43 2018

[Findit] Don't run flake analysis on a test if it passes all runs in deflake swarming task.

If a test failed on waterfall but passes all runs in deflake swarming task, don't run flake analysis on it, but still report it as flaky.

Bug:  850319 
Change-Id: I8fcaaf9ff692930d5a49fa505b7301ce8a7e2fbf
Reviewed-on: https://chromium-review.googlesource.com/1102087
Reviewed-by: Shuotao Gao <stgao@chromium.org>
Commit-Queue: Chan Li <chanli@chromium.org>

[modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/model/wf_swarming_task.py
[modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/model/test/wf_swarming_task_test.py
[modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/services/test_failure/test/test_swarming_test.py
[modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/services/test_failure/test_swarming.py

Comment 7 by chanli@chromium.org, Jun 20 2018

Owner: lijeffrey@chromium.org
Assign back to lijeffrey@ for further changes
Is there anything actionable still here?
Status: Fixed (was: Assigned)
I think we are go with Chan's mitigation.

Sign in to add a comment