Compute effectiveness of 'retry with patch' |
||||||||
Issue descriptionWe want to compute the utility of CQ full retries [when a build fails, it is retried]. To do so, we measure how often the retry returns a failure and how often it returns a success. We theorize that this is currently quite useful [a lot of failure -> success] based on an audit of slow CLs. We theorize that this may become less useful once we add recipe based retries. For more context, see design doc: https://docs.google.com/document/d/1YF3SAd7f9ekiwV-6CxYsf7f0FthY6ClAw9znS7N8xU0/edit?ts=5b92d3c4# Pseudocode: """ retry_fail = 0 retry_succeed = 0 for CQ_attempt in ` chrome-infra-events.aggregated.cq_attempts` where cq_name = "chromium/chromium/src" # Builds are monotonically decreasing. Newer builds have smaller IDs. # We want oldest first. CQ_attempt.contributing_bbucket_ids.sort(reverse=True) build_dict = {} for build in CQ_attempt.contributing_bbucket_ids: stats = curl https://cr-buildbucket.appspot.com/_ah/api/buildbucket/v1/builds/<build> builder_name = stats["parameters_json"]["builder_name"] success = stats["result"] if builder_name not in build_dict: build_dict[builder_name] = [] build_dict[builder_name].append(success) for key, value in build_dict: # We care about retries, which show up as lists with more than 1 entry. # We want to know how often a failed build is followed by a failed build, and how often a failed build is followed by a successful build # Successful builds typically don't get retried, so we ignore them. last_value = None for retries in value: if last_value == 'FAILURE': if value == 'SUCCESS': retry_succeed += 1 if value == 'FAILURE': retry_fail += 1 last_value = value """
,
Sep 11
Sorry -- still learning the labels in Ops. My current plan is to write a 20-line python script that returns us the relevant data, and to save the script and results here. If we later decide that this is a stat we want to track/alert on, then we can make a metric and formalize it.
,
Sep 11
1) Install BQ python API. pip install --upgrade google-cloud-bigquery 2) Make a service account. Download the private key. Give it access to chrome-infra-events project. https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python export GOOGLE_APPLICATION_CREDENTIALS=/Users/erikchen/Desktop/chrome-infra-events-service-account.json 3) Run script. This script uses the buildbucket API. It takes a while to run as it sequentially accesses each buildbucket result. It's likely faster to use the buildbucket BQ table, see https://groups.google.com/a/google.com/forum/#!topic/luci-announce/IVdte3E-FRw I'm currently running the script locally. Will report back with results.
,
Sep 11
This script is similar to the previous one, but uses BQ for both aggregated.cq_attempts and buildbucket queries. This requires creation of a service account [in this case erikchen-dev@chrome-cq-retry-calculator.iam.gserviceaccount.com"] which has been given access to both tables.
,
Sep 12
Unfortunately, both of my scripts ended up running into network issues, resulting in premature failures. That being said, the second script has intermediate stage logging, so we have: total_try_jobs: 77372 retry_succeed: 527 retry_fail: 400 A CQ run normally has 27 try-jobs, although there may be more as some get retried on failure. This suggests that over 2865 CQ runs, there were 927 failures followed by retries. This is 3.2%. Of these retries, more than half provide a different result: FAILURE -> SUCCESS. This suggests that CQ full retries currently provide quite a bit of utility.
,
Sep 12
Ah, the network issues are because the computer went to sleep. :(
,
Sep 12
""" total_try_jobs: 141072 retry_succeed: 962 retry_fail: 745 """ Full results. Took a few hours to get stats on the 1-weeks of chromium CQ data. Will rerun with some more details stats.
,
Sep 12
New script, more detailed results.
Take aways:
1/3 of all CQ attempts trigger full retries. Of these full retries, more than half subsequently succeed. This means that at least 1/6 of all builds fail due to flakiness [or infra issues that retriggering a full build fixes]. Keep in mind that retry failures could also be caused by flakiness twice in a row, so these are lower bounds on flakiness.
Looking at the builds that have the most flakiness:
android-kitkat-arm-rel: 144 retries result in success. 39 result in failure. This means that a failure in "android-kitkat-arm-rel" is 78%+ likely to be a flake.
win7_chromium_rel_ng: 88 retries result in success. 25 in failure. A failure is 77%+ likely to be a flake.
chromium_presubmit: 0 retries result in success. 119 result in failure. A failure is 0%+ likely to be a flake.
Anecdotally, this matches my expectations. win7_chromium_rel_ng and android-kitkat-arm-rel flake a lot. Now we have a rough estimate of the magnitude.
=====================================
total_cq_runs: 5139
total_cq_runs_at_least_one_failure_retry: 1719
total_try_jobs: 141072
retry_succeed: 962
retry_fail: 745
success dictionary {u'ios-simulator': 24, u'android_arm64_dbg_recipe': 1, u'win7_chromium_rel_ng': 88, u'mac_chromium_rel_ng': 65, u'win10_chromium_x64_rel_ng': 42, u'linux_trusty_blink_rel': 1, u'fuchsia_x64': 56, u'android-kitkat-arm-rel': 144, u'cast_shell_linux': 37, u'android-marshmallow-arm64-rel': 228, u'linux_chromium_rel_ng': 40, u'chromeos-daisy-rel': 1, u'chromeos-amd64-generic-rel': 84, u'linux_chromium_asan_rel_ng': 17, u'mac_optional_gpu_tests_rel': 5, u'linux_chromium_tsan_rel_ng': 15, u'mac_chromium_compile_dbg_ng': 1, u'linux_layout_tests_slimming_paint_v2': 1, u'linux_optional_gpu_tests_rel': 5, u'android_clang_dbg_recipe': 1, u'linux-blink-gen-property-trees': 1, u'ios-simulator-full-configs': 4, u'linux-ozone-rel': 37, u'linux-chromeos-rel': 56, u'win_chromium_compile_dbg_ng': 2, u'linux_chromium_headless_rel': 1}
failure dictionary {u'ios-simulator': 22, u'android_arm64_dbg_recipe': 4, u'win7_chromium_rel_ng': 25, u'mac_chromium_rel_ng': 28, u'win10_chromium_x64_rel_ng': 19, u'linux_trusty_blink_rel': 2, u'linux_mojo': 1, u'android-kitkat-arm-rel': 39, u'android_compile_dbg': 2, u'cast_shell_linux': 31, u'linux_chromium_compile_dbg_ng': 8, u'android-marshmallow-arm64-rel': 112, u'linux_chromium_rel_ng': 27, u'chromeos-daisy-rel': 8, u'fuchsia_arm64': 6, u'chromium_presubmit': 119, u'linux_chromium_asan_rel_ng': 9, u'cast_shell_android': 4, u'win_optional_gpu_tests_rel': 2, u'chromeos-amd64-generic-rel': 34, u'android_cronet': 1, u'fuchsia_x64': 39, u'linux_layout_tests_layout_ng': 1, u'linux_chromium_tsan_rel_ng': 4, u'linux-jumbo-rel': 19, u'mac_chromium_compile_dbg_ng': 9, u'linux_layout_tests_slimming_paint_v2': 1, u'linux_optional_gpu_tests_rel': 6, u'android_clang_dbg_recipe': 2, u'linux-blink-gen-property-trees': 2, u'ios-simulator-full-configs': 14, u'linux-ozone-rel': 28, u'linux-chromeos-rel': 77, u'win_chromium_compile_dbg_ng': 22, u'linux_chromium_headless_rel': 15}
success dictionary counts retries that succeeded. failure dictionary counts retries that failed.
,
Sep 12
+maruel, tandrii, martiniss as people who will likely be interested in these results. See c#8 for some interesting stats on full retries.
,
Sep 12
The findings here matches the data in go/top-cq-flakes. 1. win7_chromium_rel_ng, android-kitkat-arm-rel and android-marshmallow-arm64-rel are the top flaky builders due to flaky tests. 2. android-kitkat-arm-rel and android-marshmallow-arm64-rel are the top flaky builders due to invalid test results (expired/timeouted Swarming tasks, no Android devices, etc). See bug 881991. You may switch to different "Build Failure Type" to dig further.
,
Sep 13
Thanks a lot Erik for extracting the data! + Ben and John for android-kitkat-arm-rel
,
Sep 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc commit 150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc Author: erikchen <erikchen@chromium.org> Date: Fri Sep 14 21:10:40 2018 Add 'retry with patch' step to win7_chromium_rel_ng trybots. Prior research at https://bugs.chromium.org/p/chromium/issues/detail?id=882969#c8 suggests that at least 77% of retries in win7_chromium_rel_ng are flakes. This CL turns on the 'retry with patch' step which retries only the failing tests in the chromium trybot recipe. This should greatly reduce cycle time on win7_chromium_rel_ng by [frequently] skipping full retries caused by flakiness. This change has been tested via led at https://chromium-swarm.appspot.com/task?id=3ff2793164cdda10&refresh=10. Bug: 882969 , 883321 Change-Id: Ibb85e3d13cb464aa3aef0db30ba339368917c82d Reviewed-on: https://chromium-review.googlesource.com/1226313 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> [modify] https://crrev.com/150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc/scripts/slave/recipe_modules/chromium_tests/trybots.py
,
Sep 17
To get an idea for the impact of the change in c#12, I ran a query for # of retries/flakes in win7_chromium_rel_ng in the period from 9/07 - 9/14. total_cq_runs: 5745 total_cq_runs_at_least_one_failure_retry: 1901 # of retries in win7_chromium_rel_ng that went from failure->success [flaky]: 125 # of retries in win7_chromium_rel_ng that went from failure->failure: 35 flaky percentage: 78% [minimum]
,
Sep 17
The period was 9/07 [inclusive] to 9/14 [exclusive]. So it does not include any changes from 9/14, which is when the change landed.
,
Sep 17
,
Sep 17
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/e10399ad98774a1fd5fcf127f30110941ea79a10 commit e10399ad98774a1fd5fcf127f30110941ea79a10 Author: erikchen <erikchen@chromium.org> Date: Mon Sep 17 18:27:35 2018 Add 'retry with patch' step to android-marshmallow-arm64-rel trybots. This change has been tested via led at https://chromium-swarm.appspot.com/task?id=3ff2c4bea8af1d10. Bug: 882969 , 883321 Change-Id: I91a27452091fc7d6457993b96de1d089fb309170 Reviewed-on: https://chromium-review.googlesource.com/1226314 Reviewed-by: Stephen Martinis <martiniss@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> [modify] https://crrev.com/e10399ad98774a1fd5fcf127f30110941ea79a10/scripts/slave/recipe_modules/chromium_tests/trybots.py
,
Sep 17
There were two instances in the last day where a CQ-layer retry for win7_chromium_rel_ng caused the result to turn from FAILURE->SUCCESS. 1) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86385 2) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86368 In both cases, a flaky crash in the system dependency checker for webkit_layout_tests caused the test runner to return INVALID_TEST_RESULTS, which in turn prevented the 'retry with patch' logic from applying. I'm working on fixing this in Issue 884776 . Separately -- INVALID_TEST_RESULTS should not cause the CQ run to abort. There are currently several different causes of this error code: https://bugs.chromium.org/p/chromium/issues/detail?id=881991 In each case, the trybot recipe should retry the failing test suite. This will make the trybot recipe more robust to test runner/infra outages.
,
Sep 18
I went through every example in the last 48 hours [there were 5] where a retry of a win7_chromium_rel_ng build changed result from FAIL->SUCCESS In every case, the initial failure was due to irrecoverable "TEST RESULTS WERE INVALID". In one case, this was due to an exception thrown in test_installer. https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8935114539403456640/+/steps/test_installer__with_patch_/0/stdout This appears to be a bug in test_installer -- something that should have emitted as a normal test error was instead emitted as a "TEST RESULTS WERE INVALID" In every other case, webkit_layout_tests experienced "TEST RESULTS WERE INVALID".
,
Sep 19
I have discovered one example where 'retry with patch' failed to catch a flake, but a full CQ retry caught it: https://chromium-cq-status.appspot.com/v2/patch-status/chromium-review.googlesource.com/1229737/5 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/88226 It looks like the test "virtual/threaded/fast/events/touch/gesture/touch-gesture-scroll-page-zoomed.html" flaked both the original 'with patch' run, and the 'retry with patch' run. Investigaitng.
,
Sep 19
Here's what happens. 1) This test "virtual/threaded/fast/events/touch/gesture/touch-gesture-scroll-page-zoomed.html" flakes. 2) It's retried three times at the end of the test suite. It fails all three times. 3) We roll checkout, rebuild, and then test again. It fails once [no retries]. So I tried running it locally. Exact same results. Same failure. """ C:\src\chromium\src>.\out\gn\content_shell.exe --run-web-tests --no-sandbox --enable-threaded-compositing C:\src\chromium\src\third_party\WebKit\LayoutTests\fast\events\touch\gesture\touch-gesture-scroll-page-zoomed.html C:\src\chromium\src>#READY DevTools listening on ws://127.0.0.1:56880/devtools/browser/18e55026-38fb-461c-aa45-2028861f006d Content-Type: text/plain This is a testharness.js-based test. FAIL This tests scroll gesture event scrolling on a whole page with browserzoom. promise_test: Unhandled rejection with value: "Reaches the maximum frames." Harness: the test ran to completion. #EOF #EOF #EOF [10824:21340:0919/094215.367:WARNING:discardable_shared_memory_manager.cc(438)] Some MojoDiscardableSharedMemoryManagerImpls are still alive. They will be leaked. """
,
Sep 19
,
Sep 24
To get a more representative sample of the CQ, I ran the script over the weekend on a month's worth of CQ runs [8/10-9/10 -- prior to any of my changes]. The script encountered an appengine error near the end and errored out, but it has most of the data [19800 CQ runs].
total_cq_runs: 19800
total_cq_runs_at_least_one_failure_retry: 6406
total_try_jobs: 540322
retry_succeed: 3285
retry_fail: 3104
success dictionary {u'ios-simulator': 85, u'android_arm64_dbg_recipe': 56, u'mac_optional_gpu_tests_rel': 5, u'win7_chromium_rel_ng': 404, u'mac_chromium_rel_ng': 366, u'win10_chromium_x64_rel_ng': 279, u'linux_trusty_blink_rel': 11, u'linux_mojo': 2, u'android-kitkat-arm-rel': 368, u'android_compile_dbg': 7, u'cast_shell_linux': 135, u'linux_chromium_compile_dbg_ng': 9, u'android-marshmallow-arm64-rel': 633, u'linux_chromium_rel_ng': 111, u'chromeos-daisy-rel': 10, u'chromeos-amd64-generic-rel': 102, u'chromium_presubmit': 3, u'linux_chromium_asan_rel_ng': 46, u'cast_shell_android': 2, u'win_optional_gpu_tests_rel': 7, u'fuchsia_arm64': 2, u'fuchsia_x64': 168, u'linux_chromium_tsan_rel_ng': 81, u'linux-jumbo-rel': 10, u'mac_chromium_compile_dbg_ng': 6, u'android_cronet_tester': 1, u'linux_layout_tests_slimming_paint_v2': 2, u'linux_optional_gpu_tests_rel': 14, u'android_clang_dbg_recipe': 21, u'win_angle_rel_ng': 1, u'linux-blink-gen-property-trees': 2, u'ios-simulator-full-configs': 18, u'linux-ozone-rel': 40, u'linux-chromeos-rel': 250, u'win_chromium_compile_dbg_ng': 20, u'linux_chromium_headless_rel': 8}
failure dictionary {u'ios-simulator': 106, u'android_cronet': 1, u'mac_optional_gpu_tests_rel': 2, u'win7_chromium_rel_ng': 202, u'mac_chromium_rel_ng': 137, u'ios-simulator-cronet': 10, u'win10_chromium_x64_rel_ng': 103, u'linux_vr': 1, u'linux_trusty_blink_rel': 10, u'linux_mojo': 5, u'android-kitkat-arm-rel': 269, u'android_compile_dbg': 20, u'cast_shell_linux': 127, u'linux_chromium_compile_dbg_ng': 20, u'android-marshmallow-arm64-rel': 297, u'linux_chromium_rel_ng': 183, u'chromeos-daisy-rel': 52, u'chromeos-amd64-generic-rel': 58, u'chromium_presubmit': 468, u'linux_chromium_asan_rel_ng': 31, u'cast_shell_android': 14, u'win_optional_gpu_tests_rel': 3, u'obbs_fyi': 1, u'android_arm64_dbg_recipe': 40, u'fuchsia_arm64': 23, u'fuchsia_x64': 120, u'linux_layout_tests_layout_ng': 14, u'linux_chromium_tsan_rel_ng': 20, u'linux-jumbo-rel': 34, u'mac_chromium_compile_dbg_ng': 49, u'android_cronet_tester': 5, u'linux_layout_tests_slimming_paint_v2': 7, u'linux_optional_gpu_tests_rel': 15, u'android_clang_dbg_recipe': 7, u'linux-blink-gen-property-trees': 6, u'ios-simulator-full-configs': 65, u'linux-ozone-rel': 37, u'linux-chromeos-rel': 420, u'win_chromium_compile_dbg_ng': 68, u'linux_chromium_headless_rel': 54}
We see that win7_chromium_rel_ng succeeded on retry 404 times, and failed on retry 202 times. This means that at least 67% of all win7_chromium_rel_ng failures were due to flakes -- this is slightly lower than the 77% we observed in c#8, but still in the same ball park.
,
Sep 24
====================Raw Data=============================== From 8/10-9/10 total_cq_runs: 19800 [a small amount of data was not processed due to appengine error] win7_chromium_rel_ng [retry success]: 404 win7_chromium_rel_ng [retry failure]: 202 win10_chromium_x64_rel_ng [retry success]: 279 win10_chromium_x64_rel_ng [retry failure]: 103 android-marshmallow-arm-rel [retry success]: 633 android-marshmallow-arm-rel [retry failure]: 297 android-kitkat-arm-rel [retry success]: 368 android-kitkat-arm-rel [retry failure]: 269 We see that 3.1% of all CQ runs failed in win7_chromium_rel_ng, and 2/3rds of those were flakes. We see that 4.6% of all CQ runs failed in android-marshmallow-arm-rel, and 2/3rds of those were flakes. 1.9% of all CQ runs failed in win10_chromium_x64_rel_ng. 3.2% of all CQ runs failed in android-kitkat-arm-rel. From 9/20 - 9/25. [retry with patch enabled for win7 and android-marshmallow]. total_cq_runs: 3090 win7_chromium_rel_ng [retry success]: 4 win7_chromium_rel_ng [retry failure]: 3 win10_chromium_x64_rel_ng [retry success]: 30 win10_chromium_x64_rel_ng [retry failure]: 30 android-marshmallow-arm-rel [retry success]: 28 android-marshmallow-arm-rel [retry failure]: 7 android-kitkat-arm-rel [retry success]: 82 android-kitkat-arm-rel [retry failure]: 29 0.2% of all CQ runs failed in win7_chromium_rel_ng, and 58% of those were flakes 1.1% of all CQ runs failed in android-marshmallow-arm-rel, and 74% of those were flakes. 1.9% of all CQ runs failed in win10_chromium_x64_rel_ng. 3.5% of all CQ runs failed in android-kitkat-arm-rel. ====================Analysis=============================== The introduction of 'retry with patch', and the fix of several causes of flakiness have reduced failures caused by win7_chromium_rel_ng by 15X [3.1% -> 0.2%]. No changes were made to win10_chromium_x64_rel_ng and failures have stayed constant at 1.9%. The introduction of 'retry with patch' [no flakiness fixes] to android-marshmallow-arm-rel reduced failures by 4X [4.6% -> 1.1%]. No changes were made to android-kitkat-arm-rel and failures stayed relatively constant [3.2% -> 3.5%].
,
Sep 24
Do these runs include compile failures? I don't remember offhand whether we will try to retry a compile failure. I imagine these numbers also don't include jobs where retries were disabled?
,
Sep 24
> Do these runs include compile failures? Kind of. This script looks at all CQ runs where a build was retried. The CQ doesn't distinguish between failure reasons [e.g. compile vs flaky tests vs invalid test results] and will retry failing builds up to the retry limit. The numbers above for win7_chromium_rel_ng are looking at CQ runs where win7_chromium_rel_ng was retried [implying that the first build was a failure and that the retry limit has not yet been hit]. This means that a generic compile error will not be included, since the win7 builder is usually slower than linux counterparts. However, a win specific compile error will most likely be included, since the retry limit will not be hit. So the most accurate interpretation of the #s above is: From the period 8/10-9/10, there were ~20k CQ runs. Of those, win7_chromium_rel_ng was retried 3.1% of the time. Failures that did not result in retry [which can only happen due to CQ hitting the retry limit] are not included. 9/20-9/25 had ~3K CQ runs. Of those, win7_chromium_rel_ng was retried 0.2% of the time. > I imagine these numbers also don't include jobs where retries were disabled? Are you referring to retries at the recipe layer? The CQ-layer retry does not know about recipe-layer retry-disabling.
,
Sep 24
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/6f802d525fd8cc689be520f4340d3cd51af2f94d commit 6f802d525fd8cc689be520f4340d3cd51af2f94d Author: erikchen <erikchen@chromium.org> Date: Mon Sep 24 22:24:00 2018 All chromium tests should use 'retry with patch' by default. 'retry with patch' is much cheaper than full CQ retries, since it only reruns failing tests. For the two builders that fail most frequently, 'retry with patch' reduces failures by 4-11X. Turning it on for all builders should both reduce fleet utilization, and reduce false rejects. For more details, see https://crbug.com/882969#c23 . Bug: 882969 , 883321 Change-Id: I03bc3c2680e488891688587878e00a093286aefb Reviewed-on: https://chromium-review.googlesource.com/1241655 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Erik Chen <erikchen@chromium.org> [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/invalid_results.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarming_test_failure.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/dynamic_swarmed_isolated_script_test_failure_no_result_json.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarming_trigger_failure.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipe_modules/chromium_tests/api.py [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/dynamic_isolated_script_test_on_trybot_failing.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_webkit_tests_interrupted.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_webkit_tests_unexpected_error.json [modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_layout_tests_too_many_failures_for_retcode.json
,
Sep 24
> Are you referring to retries at the recipe layer? Yes, I was, both for this and for the compile failure question. You're right, that if you're looking at CQ-level retries you wouldn't even know about this. The point is that for cases where the tree is legitimately broken a "flaky" try job might not actually be a flake, it might be a real failure. For example, if the patch had recipe-level retries disabled, and a test was failing at tip-of-tree (or if the compile failed), the try job would also fail. Once the tree was fixed, the try job would then succeed. If I'm understanding your methodology correctly, you don't have a good way of detecting these situations, right?
,
Sep 25
> Yes, I was, both for this and for the compile failure question. A compile failure in 'with patch' will immediately error out the recipe. I'm considering change that. See: https://bugs.chromium.org/p/chromium/issues/detail?id=888734#c1 > If I'm understanding your methodology correctly, you don't have a good way of detecting these situations, right? Correct
,
Sep 25
Here are some snapshots from go/top-cq-flakes that also show a very large improvement on win7_chromium_rel_ng: Not sure why the INVALID TEST RESULTS graph stops at 9/21 -- maybe because there have been 0 occurrences since then.
,
Sep 25
* Not sure why the INVALID TEST RESULTS graph stops at 9/21 -- there have been 0 instances of INVALID TEST RESULTS since then that have caused flakes -- I suspect the graph is just failing to render the these.
,
Sep 25
Improvements on android-marshmallow are even larger according to go/top-cq-flakes. In my earlier measurements I was starting my query from 9/20, but this CL landed on 9/20: https://chromium-review.googlesource.com/c/chromium/tools/build/+/1234437 so I actually needed to start the query on 9/21.
,
Sep 26
False rejects have stayed under 5% for over a week. This has pretty much never happened before. :)
,
Sep 26
\o/ Thanks a lot for this.
,
Sep 26
Amazing work! Thank you for this!!!
,
Sep 26
Woo!
,
Sep 28
Update for c#23. More accurate stats for android-marshmallow-arm64-rel from 9/22-9/28 [since 9/20 was prior to CL landing]. Total CQ runs: 5389 retry succeeds: 7 retry fails: 17 Failure rate went from 4.6% -> 0.44%. Known flakes dropped to 29% from 66%.
,
Sep 28
Wow, awesome!
,
Sep 28
This is an amazing work! Now I need to change my query to monitor hidden flakes by the introduction of (retry with patch)!
,
Oct 30
I'm changing the title of this crbug to match its contents. 'retry with patch' has roughly reduced false rejects due to flakiness by 10X. See c#23 and c#37 for details. CQ full retries still provide different results ~25-50% of the time. As such, removing them would cause a significant degradation to "false rejects" and shouldn't be done yet. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by tandrii@chromium.org
, Sep 11