Single build can trigger too many analyses |
||||
Issue descriptione.g. https://findit-for-me.appspot.com/waterfall/build-failure?url=https://build.chromium.org/p/chromium.memory/builders/Linux%20MSan%20Tests/builds/10255 Having too many analyses running simultaneously can degrade the performance, especially when they are analyzing different tests in the same build. There are several ways to mitigate this, I can think of a couple - We could stagger or delay subsequent analyses for the same build/target - We could lower the priority of swarming tasks and tryjobs used by the later analyses.
,
Jun 13 2018
Roberto and I just had a discussion on this, maybe we can do something like: trigger analyses like we do right now, but each analysis will have a lower priority than the previous one. And we can add a new pipeline after a culprit is found which can run a single swarming task to confirm that culprit also causes other flakiness. For the tests are caused by the culprit, we update their MasterfalkeAnalysis with the culprit and finalize the analyses. And leave other analyses to keep running.
,
Jun 13 2018
After discussion with the team the most reasonable solution would be to base results purely on the deflake task - if all 30/30 reruns pass, do not trigger flake analyses for those tests. Statistically for a test to fail 4x in a row on the main builder then pass 30/30 may point to something being wrong on the bot and not necessarily with the test
,
Jun 14 2018
Are we working on a fix?
,
Jun 14 2018
,
Jun 15 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/ea38bfd4549333e8b1ab78c00b5ca787390efb3d commit ea38bfd4549333e8b1ab78c00b5ca787390efb3d Author: Chan <chanli@chromium.org> Date: Fri Jun 15 17:59:43 2018 [Findit] Don't run flake analysis on a test if it passes all runs in deflake swarming task. If a test failed on waterfall but passes all runs in deflake swarming task, don't run flake analysis on it, but still report it as flaky. Bug: 850319 Change-Id: I8fcaaf9ff692930d5a49fa505b7301ce8a7e2fbf Reviewed-on: https://chromium-review.googlesource.com/1102087 Reviewed-by: Shuotao Gao <stgao@chromium.org> Commit-Queue: Chan Li <chanli@chromium.org> [modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/model/wf_swarming_task.py [modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/model/test/wf_swarming_task_test.py [modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/services/test_failure/test/test_swarming_test.py [modify] https://crrev.com/ea38bfd4549333e8b1ab78c00b5ca787390efb3d/appengine/findit/services/test_failure/test_swarming.py
,
Jun 20 2018
Assign back to lijeffrey@ for further changes
,
Sep 11
Is there anything actionable still here?
,
Sep 11
I think we are go with Chan's mitigation. |
||||
►
Sign in to add a comment |
||||
Comment 1 by st...@chromium.org
, Jun 13 2018