gtest-based test suites should have independent test runs. |
|||||||
Issue descriptionIndependent retries means that flaky tests no longer cause test suites to be flaky. e.g. if a test flakes 50% of the time, and retries are independent, and there are 10 retries, then the probability that the test fails on all retries is 2^-10. When I introduced independent retries to webkit layout tests, it pretty much eliminated test suite flakiness: https://bugs.chromium.org/p/chromium/issues/detail?id=889036 According to go/top-cq-flakes as well as my standalone script, the most flaky test suites are now gtest based test suites [unit_tests, webrunner_browsertests, content_unittests] and Android test suites [haven't investigated yet]. This is likely because test retries are not independent. They are run same-process [exact details depends on the test runner], which allows bad state to be carried between tests. I investigated one example [ Issue 908481 ]. A flaky test [~10% failure] appear to deterministically fail on retries on the bots. I cannot reproduce this behavior locally, but this resembles similar problems I observed with webkit layout tests. There are three ways we can make retries independent. 1) Update the test runners to support independent retries. We'll need to update the process/forking model, possibly with different implementations for different test runners. 2) Add a python wrapper around the test runner that will relaunch the test runner on failures. Since all test runner support the same interface [gtest, with some small additions], we could reuse the wrapper for all the test runners. 3) Make all test runs independent. For webkit_layout_tests, I originally attempted to implement the analog of (1), but eventually determined that the analog of (2) was simpler, and provided more independent results. (3) was determined to add too much overhead (between 2X and 5X). + nednguyen, owner of client test runners + dpranke, jbudorick -- who have the most experience in this area + stgao -- Theoretically, these changes should have no affect on Find-It, as they should be transparent to both recipes and swarming.
,
Nov 26
,
Nov 27
Can you better describe what independent retry means? Most test suites that have retry logic do so by way of new processes, so that bad state is not carried between runs.
,
Nov 28
> Can you better describe what independent retry means? I am observing examples [e.g. issue 908481 ] where a flaky test [flakes 10% of the time when I run it locally] appears to sometimes deterministically pass or fail on the trybots [e.g. fails 40 times in a row]. My current best guess is that this occurs because state is leaking from the parent, test runner process into child processes [maybe from ChromeTestSuite::Initialize (?)]. I have not yet confirmed this. The only other possibility I can think of is that there's machine-specific state that's causing the test to deterministically fail. That seems less likely to me.
,
Nov 28
unit_tests do run in the same process, which means previous state can certainly impact tests. I believe retries are started in a new process, hence the pass on retry. This shouldn't impact browser_tests though, as browser_tests always run in a new process.
,
Nov 28
I finished investigating this particular source of flakiness, see Issue 908481 . The test was incorrectly written and relied on the timing of a background thread being woken up and executing a task. This is entirely at the whims of the OS scheduler and availability of other cores on the device. I imagine that if there are no other available cores, then the test will very likely fail. This seems more likely to happen on the trybots, as those are run on VM images with hardware sharing. I'm going to leave this bug open, and available for now, as there are still improvements that we could make to the test suite runner itself to make retries more independent.
,
Nov 28
As a quick test, I ran unit_tests on Linux, macOS and Windows with --test-launcher-batch-limit=1. Note that the default batch size is 10. This causes a new process to be spawned for every test. ======Linux [building for CrOS]====== [this is with target_os = "chromeos", and same GN config as the crOS rel trybot: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-chromeos-rel/144853] On my Linux 440 device [12 cores w/ HT], unit_tests takes $ time ./out/gn/unit_tests --test-launcher-bot-mode --cfi-diag=0 real 1m26.581s user 6m9.316s sys 1m45.496s This is with the default batch limit of 10: https://cs.chromium.org/chromium/src/base/test/launcher/unit_test_launcher.cc?l=47 When I set --test-launcher-batch-limit=1, unit_tests takes: $ time ./out/gn/unit_tests --test-launcher-bot-mode --cfi-diag=0 --test-launcher-batch-limit=1 real 5m14.393s user 31m12.820s sys 7m45.952s ======macOS====== [24 cores w/ HT]. release, static, symbol_level = 0. $ time ./out/gn/unit_tests --test-launcher-bot-mode --cfi-diag=0 real 0m59.826s user 4m35.727s sys 2m25.196s $ time ./out/gn/unit_tests --test-launcher-bot-mode --cfi-diag=0 --test-launcher-batch-limit=1 real 5m47.229s user 25m28.403s sys 11m15.812s =====Windows==== 28 cores w/ HT. release, static, symbol_level=1, 32-bit binary. Same as win7_chromium_rel_ng trybot. $ Measure-Command {.\out\gn\unit_tests --test-launcher-bot-mode --cfi-diag=0} ... TotalSeconds : 194.6314689 $ Measure-Command {.\out\gn\unit_tests --test-launcher-bot-mode --cfi-diag=0 --test-launcher-batch-limit=1} ... TotalSeconds : 1914.8142326 =====Conclusion====== Setting a batch size of 1 increases run time by ~4-10X. Given how quickly unit_tests runs, even with --test-launcher-batch-limit=1, this seems like acceptable overhead for independent test results.
,
Dec 4
,
Dec 5
,
Dec 5
Note: retries are already spawned in separate processes, but the original test runs are run in batches of 10.
,
Dec 5
IIRC, 10 is the default, but it could be as high as 60. I saw 60 in a local run of some gtest suite.
,
Jan 11
Available, but no owner or component? Please find a component, as no one will ever find this without one. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by nedngu...@google.com
, Nov 26