New issue
Advanced search Search tips

Issue 882969 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 30
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 884776
issue 884882
issue 886894



Sign in to add a comment

Compute effectiveness of 'retry with patch'

Project Member Reported by erikc...@chromium.org, Sep 11

Issue description

We want to compute the utility of CQ full retries [when a build fails, it is retried]. To do so, we measure how often the retry returns a failure and how often it returns a success.

We theorize that this is currently quite useful [a lot of failure -> success] based on an audit of slow CLs. We theorize that this may become less useful once we add recipe based retries.

For more context, see design doc:
https://docs.google.com/document/d/1YF3SAd7f9ekiwV-6CxYsf7f0FthY6ClAw9znS7N8xU0/edit?ts=5b92d3c4#

Pseudocode:
"""
retry_fail = 0
retry_succeed = 0

for CQ_attempt in ` chrome-infra-events.aggregated.cq_attempts` where cq_name = "chromium/chromium/src"
  # Builds are monotonically decreasing. Newer builds have smaller IDs.
  # We want oldest first.
  CQ_attempt.contributing_bbucket_ids.sort(reverse=True)
  build_dict = {}
  for build in CQ_attempt.contributing_bbucket_ids:
    stats = curl https://cr-buildbucket.appspot.com/_ah/api/buildbucket/v1/builds/<build>
    builder_name = stats["parameters_json"]["builder_name"]
    success = stats["result"]
    if builder_name not in build_dict:
      build_dict[builder_name] = []
    build_dict[builder_name].append(success)
  
  for key, value in build_dict:
    # We care about retries, which show up as lists with more than 1 entry.
    # We want to know how often a failed build is followed by a failed build, and how often a failed build is followed by a successful build
    # Successful builds typically don't get retried, so we ignore them.
    last_value = None
    for retries in value:
      if last_value == 'FAILURE':
        if value == 'SUCCESS':
          retry_succeed += 1
        if value == 'FAILURE':
          retry_fail += 1
      last_value = value
"""
 
Components: -Infra>Platform>CQ Infra>Client>Chrome
this is very specific to chromium/src CQ, as opposed to say v8 and other projects, so let's keep this and similar issues in ICC.
That's not to say i'm against doing my bit in CQ daemon side if necessary to support your work.
Sorry -- still learning the labels in Ops. 

My current plan is to write a 20-line python script that returns us the relevant data, and to save the script and results here. If we later decide that this is a stat we want to track/alert on, then we can make a metric and formalize it.
1) Install BQ python API.
pip install --upgrade google-cloud-bigquery

2) Make a service account. Download the private key. Give it access to chrome-infra-events project.

https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python
export GOOGLE_APPLICATION_CREDENTIALS=/Users/erikchen/Desktop/chrome-infra-events-service-account.json

3) Run script.

This script uses the buildbucket API. It takes a while to run as it sequentially accesses each buildbucket result. It's likely faster to use the buildbucket BQ table, see 
https://groups.google.com/a/google.com/forum/#!topic/luci-announce/IVdte3E-FRw



I'm currently running the script locally. Will report back with results.
retry_utility.py
2.4 KB View Download
This script is similar to the previous one, but uses BQ for both aggregated.cq_attempts and buildbucket queries.

This requires creation of a service account [in this case erikchen-dev@chrome-cq-retry-calculator.iam.gserviceaccount.com"] which has been given access to both tables.
retry_utility2.py
2.4 KB View Download
Unfortunately, both of my scripts ended up running into network issues, resulting in premature failures. That being said, the second script has intermediate stage logging, so we have:

total_try_jobs: 77372
retry_succeed: 527
retry_fail: 400

A CQ run normally has 27 try-jobs, although there may be more as some get retried on failure.

This suggests that over 2865 CQ runs, there were 927 failures followed by retries. This is 3.2%. Of these retries, more than half provide a different result: FAILURE -> SUCCESS. This suggests that CQ full retries currently provide quite a bit of utility.
Ah, the network issues are because the computer went to sleep. :(
"""
total_try_jobs: 141072
retry_succeed: 962
retry_fail: 745
"""

Full results. Took a few hours to get stats on the 1-weeks of chromium CQ data. Will rerun with some more details stats.
Cc: liaoyuke@chromium.org st...@chromium.org
New script, more detailed results.

Take aways:
1/3 of all CQ attempts trigger full retries. Of these full retries, more than half subsequently succeed. This means that at least 1/6 of all builds fail due to flakiness [or infra issues that retriggering a full build fixes]. Keep in mind that retry failures could also be caused by flakiness twice in a row, so these are lower bounds on flakiness.

Looking at the builds that have the most flakiness:

android-kitkat-arm-rel: 144 retries result in success. 39 result in failure. This means that a failure in "android-kitkat-arm-rel" is 78%+ likely to be a flake.
win7_chromium_rel_ng: 88 retries result in success. 25 in failure. A failure is 77%+ likely to be a flake.
chromium_presubmit: 0 retries result in success. 119 result in failure. A failure is 0%+ likely to be a flake.

Anecdotally, this matches my expectations. win7_chromium_rel_ng and android-kitkat-arm-rel flake a lot. Now we have a rough estimate of the magnitude. 


=====================================

total_cq_runs: 5139
total_cq_runs_at_least_one_failure_retry: 1719
total_try_jobs: 141072
retry_succeed: 962
retry_fail: 745

success dictionary {u'ios-simulator': 24, u'android_arm64_dbg_recipe': 1, u'win7_chromium_rel_ng': 88, u'mac_chromium_rel_ng': 65, u'win10_chromium_x64_rel_ng': 42, u'linux_trusty_blink_rel': 1, u'fuchsia_x64': 56, u'android-kitkat-arm-rel': 144, u'cast_shell_linux': 37, u'android-marshmallow-arm64-rel': 228, u'linux_chromium_rel_ng': 40, u'chromeos-daisy-rel': 1, u'chromeos-amd64-generic-rel': 84, u'linux_chromium_asan_rel_ng': 17, u'mac_optional_gpu_tests_rel': 5, u'linux_chromium_tsan_rel_ng': 15, u'mac_chromium_compile_dbg_ng': 1, u'linux_layout_tests_slimming_paint_v2': 1, u'linux_optional_gpu_tests_rel': 5, u'android_clang_dbg_recipe': 1, u'linux-blink-gen-property-trees': 1, u'ios-simulator-full-configs': 4, u'linux-ozone-rel': 37, u'linux-chromeos-rel': 56, u'win_chromium_compile_dbg_ng': 2, u'linux_chromium_headless_rel': 1}

failure dictionary {u'ios-simulator': 22, u'android_arm64_dbg_recipe': 4, u'win7_chromium_rel_ng': 25, u'mac_chromium_rel_ng': 28, u'win10_chromium_x64_rel_ng': 19, u'linux_trusty_blink_rel': 2, u'linux_mojo': 1, u'android-kitkat-arm-rel': 39, u'android_compile_dbg': 2, u'cast_shell_linux': 31, u'linux_chromium_compile_dbg_ng': 8, u'android-marshmallow-arm64-rel': 112, u'linux_chromium_rel_ng': 27, u'chromeos-daisy-rel': 8, u'fuchsia_arm64': 6, u'chromium_presubmit': 119, u'linux_chromium_asan_rel_ng': 9, u'cast_shell_android': 4, u'win_optional_gpu_tests_rel': 2, u'chromeos-amd64-generic-rel': 34, u'android_cronet': 1, u'fuchsia_x64': 39, u'linux_layout_tests_layout_ng': 1, u'linux_chromium_tsan_rel_ng': 4, u'linux-jumbo-rel': 19, u'mac_chromium_compile_dbg_ng': 9, u'linux_layout_tests_slimming_paint_v2': 1, u'linux_optional_gpu_tests_rel': 6, u'android_clang_dbg_recipe': 2, u'linux-blink-gen-property-trees': 2, u'ios-simulator-full-configs': 14, u'linux-ozone-rel': 28, u'linux-chromeos-rel': 77, u'win_chromium_compile_dbg_ng': 22, u'linux_chromium_headless_rel': 15}

success dictionary counts retries that succeeded. failure dictionary counts retries that failed.

retry_utility3.py
3.3 KB View Download
Cc: tandrii@chromium.org mar...@chromium.org martiniss@chromium.org
+maruel, tandrii, martiniss as people who will likely be interested in these results. See c#8 for some interesting stats on full retries.
The findings here matches the data in go/top-cq-flakes.
1. win7_chromium_rel_ng, android-kitkat-arm-rel and android-marshmallow-arm64-rel are the top flaky builders due to flaky tests.
2. android-kitkat-arm-rel and android-marshmallow-arm64-rel are the top flaky builders due to invalid test results (expired/timeouted Swarming tasks, no Android devices, etc). See bug 881991.

You may switch to different "Build Failure Type" to dig further.
Cc: bpastene@chromium.org
Thanks a lot Erik for extracting the data!
+ Ben and John for android-kitkat-arm-rel
Project Member

Comment 12 by bugdroid1@chromium.org, Sep 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc

commit 150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc
Author: erikchen <erikchen@chromium.org>
Date: Fri Sep 14 21:10:40 2018

Add 'retry with patch' step to win7_chromium_rel_ng trybots.

Prior research at
https://bugs.chromium.org/p/chromium/issues/detail?id=882969#c8 suggests that at
least 77% of retries in win7_chromium_rel_ng are flakes. This CL turns on the
'retry with patch' step which retries only the failing tests in the chromium
trybot recipe.

This should greatly reduce cycle time on win7_chromium_rel_ng by [frequently]
skipping full retries caused by flakiness.

This change has been tested via led at
https://chromium-swarm.appspot.com/task?id=3ff2793164cdda10&refresh=10.

Bug:  882969 ,  883321 
Change-Id: Ibb85e3d13cb464aa3aef0db30ba339368917c82d
Reviewed-on: https://chromium-review.googlesource.com/1226313
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/150f1b24841ed7d1cbcb38e4fb2b484ff0726bfc/scripts/slave/recipe_modules/chromium_tests/trybots.py

To get an idea for the impact of the change in c#12, I ran a query for # of retries/flakes in win7_chromium_rel_ng in the period from 9/07 - 9/14. 

total_cq_runs: 5745
total_cq_runs_at_least_one_failure_retry: 1901
# of retries in win7_chromium_rel_ng that went from failure->success [flaky]: 125
# of retries in win7_chromium_rel_ng that went from failure->failure:  35

flaky percentage: 78% [minimum]

The period was 9/07 [inclusive] to 9/14 [exclusive]. So it does not include any changes from 9/14, which is when the change landed.
Blockedon: 884776
Project Member

Comment 16 by bugdroid1@chromium.org, Sep 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/e10399ad98774a1fd5fcf127f30110941ea79a10

commit e10399ad98774a1fd5fcf127f30110941ea79a10
Author: erikchen <erikchen@chromium.org>
Date: Mon Sep 17 18:27:35 2018

Add 'retry with patch' step to android-marshmallow-arm64-rel trybots.

This change has been tested via led at
https://chromium-swarm.appspot.com/task?id=3ff2c4bea8af1d10.

Bug:  882969 ,  883321 
Change-Id: I91a27452091fc7d6457993b96de1d089fb309170
Reviewed-on: https://chromium-review.googlesource.com/1226314
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/e10399ad98774a1fd5fcf127f30110941ea79a10/scripts/slave/recipe_modules/chromium_tests/trybots.py

There were two instances in the last day where a CQ-layer retry for win7_chromium_rel_ng caused the result to turn from FAILURE->SUCCESS.

1) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86385
2) https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/86368

In both cases, a flaky crash in the system dependency checker for webkit_layout_tests caused the test runner to return INVALID_TEST_RESULTS, which in turn prevented the 'retry with patch' logic from applying. I'm working on fixing this in  Issue 884776 .

Separately -- INVALID_TEST_RESULTS should not cause the CQ run to abort. There are currently several different causes of this error code:
https://bugs.chromium.org/p/chromium/issues/detail?id=881991

In each case, the trybot recipe should retry the failing test suite. This will make the trybot recipe more robust to test runner/infra outages.
Blockedon: 884882
I went through every example in the last 48 hours [there were 5] where a retry of a win7_chromium_rel_ng build changed result from FAIL->SUCCESS

In every case, the initial failure was due to irrecoverable "TEST RESULTS WERE INVALID". In one case, this was due to an exception thrown in test_installer. https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8935114539403456640/+/steps/test_installer__with_patch_/0/stdout

This appears to be a bug in test_installer -- something that should have emitted as a normal test error was instead emitted as a "TEST RESULTS WERE INVALID"

In every other case, webkit_layout_tests experienced "TEST RESULTS WERE INVALID".
I have discovered one example where 'retry with patch' failed to catch a flake, but a full CQ retry caught it:

https://chromium-cq-status.appspot.com/v2/patch-status/chromium-review.googlesource.com/1229737/5

https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/88226

It looks like the test "virtual/threaded/fast/events/touch/gesture/touch-gesture-scroll-page-zoomed.html" flaked both the original 'with patch' run, and the 'retry with patch' run. Investigaitng.
Here's what happens. 

1) This test "virtual/threaded/fast/events/touch/gesture/touch-gesture-scroll-page-zoomed.html" flakes.
2) It's retried three times at the end of the test suite. It fails all three times.
3) We roll checkout, rebuild, and then test again. It fails once [no retries].

So I tried running it locally. Exact same results. Same failure.

"""
C:\src\chromium\src>.\out\gn\content_shell.exe --run-web-tests --no-sandbox --enable-threaded-compositing C:\src\chromium\src\third_party\WebKit\LayoutTests\fast\events\touch\gesture\touch-gesture-scroll-page-zoomed.html

C:\src\chromium\src>#READY

DevTools listening on ws://127.0.0.1:56880/devtools/browser/18e55026-38fb-461c-aa45-2028861f006d
Content-Type: text/plain
This is a testharness.js-based test.
FAIL This tests scroll gesture event scrolling on a whole page with browserzoom. promise_test: Unhandled rejection with value: "Reaches the maximum frames."
Harness: the test ran to completion.

#EOF
#EOF
#EOF
[10824:21340:0919/094215.367:WARNING:discardable_shared_memory_manager.cc(438)] Some MojoDiscardableSharedMemoryManagerImpls are still alive. They will be leaked.
"""
Blockedon: 886894
Going to split this investigation into a separate bug. See  Issue 886894 .
To get a more representative sample of the CQ, I ran the script over the weekend on a month's worth of CQ runs [8/10-9/10 -- prior to any of my changes]. The script encountered an appengine error near the end and errored out, but it has most of the data [19800 CQ runs]. 

total_cq_runs: 19800
total_cq_runs_at_least_one_failure_retry: 6406
total_try_jobs: 540322
retry_succeed: 3285
retry_fail: 3104
success dictionary {u'ios-simulator': 85, u'android_arm64_dbg_recipe': 56, u'mac_optional_gpu_tests_rel': 5, u'win7_chromium_rel_ng': 404, u'mac_chromium_rel_ng': 366, u'win10_chromium_x64_rel_ng': 279, u'linux_trusty_blink_rel': 11, u'linux_mojo': 2, u'android-kitkat-arm-rel': 368, u'android_compile_dbg': 7, u'cast_shell_linux': 135, u'linux_chromium_compile_dbg_ng': 9, u'android-marshmallow-arm64-rel': 633, u'linux_chromium_rel_ng': 111, u'chromeos-daisy-rel': 10, u'chromeos-amd64-generic-rel': 102, u'chromium_presubmit': 3, u'linux_chromium_asan_rel_ng': 46, u'cast_shell_android': 2, u'win_optional_gpu_tests_rel': 7, u'fuchsia_arm64': 2, u'fuchsia_x64': 168, u'linux_chromium_tsan_rel_ng': 81, u'linux-jumbo-rel': 10, u'mac_chromium_compile_dbg_ng': 6, u'android_cronet_tester': 1, u'linux_layout_tests_slimming_paint_v2': 2, u'linux_optional_gpu_tests_rel': 14, u'android_clang_dbg_recipe': 21, u'win_angle_rel_ng': 1, u'linux-blink-gen-property-trees': 2, u'ios-simulator-full-configs': 18, u'linux-ozone-rel': 40, u'linux-chromeos-rel': 250, u'win_chromium_compile_dbg_ng': 20, u'linux_chromium_headless_rel': 8}

failure dictionary {u'ios-simulator': 106, u'android_cronet': 1, u'mac_optional_gpu_tests_rel': 2, u'win7_chromium_rel_ng': 202, u'mac_chromium_rel_ng': 137, u'ios-simulator-cronet': 10, u'win10_chromium_x64_rel_ng': 103, u'linux_vr': 1, u'linux_trusty_blink_rel': 10, u'linux_mojo': 5, u'android-kitkat-arm-rel': 269, u'android_compile_dbg': 20, u'cast_shell_linux': 127, u'linux_chromium_compile_dbg_ng': 20, u'android-marshmallow-arm64-rel': 297, u'linux_chromium_rel_ng': 183, u'chromeos-daisy-rel': 52, u'chromeos-amd64-generic-rel': 58, u'chromium_presubmit': 468, u'linux_chromium_asan_rel_ng': 31, u'cast_shell_android': 14, u'win_optional_gpu_tests_rel': 3, u'obbs_fyi': 1, u'android_arm64_dbg_recipe': 40, u'fuchsia_arm64': 23, u'fuchsia_x64': 120, u'linux_layout_tests_layout_ng': 14, u'linux_chromium_tsan_rel_ng': 20, u'linux-jumbo-rel': 34, u'mac_chromium_compile_dbg_ng': 49, u'android_cronet_tester': 5, u'linux_layout_tests_slimming_paint_v2': 7, u'linux_optional_gpu_tests_rel': 15, u'android_clang_dbg_recipe': 7, u'linux-blink-gen-property-trees': 6, u'ios-simulator-full-configs': 65, u'linux-ozone-rel': 37, u'linux-chromeos-rel': 420, u'win_chromium_compile_dbg_ng': 68, u'linux_chromium_headless_rel': 54}

We see that win7_chromium_rel_ng succeeded on retry 404 times, and failed on retry 202 times. This means that at least 67% of all win7_chromium_rel_ng failures were due to flakes -- this is slightly lower than the 77% we observed in c#8, but still in the same ball park.
====================Raw Data===============================
From 8/10-9/10 
total_cq_runs: 19800  [a small amount of data was not processed due to appengine error]
win7_chromium_rel_ng [retry success]: 404
win7_chromium_rel_ng [retry failure]: 202
win10_chromium_x64_rel_ng [retry success]: 279
win10_chromium_x64_rel_ng [retry failure]: 103

android-marshmallow-arm-rel [retry success]: 633
android-marshmallow-arm-rel [retry failure]: 297
android-kitkat-arm-rel [retry success]: 368
android-kitkat-arm-rel [retry failure]: 269

We see that 3.1% of all CQ runs failed in win7_chromium_rel_ng, and 2/3rds of those were flakes.
We see that 4.6% of all CQ runs failed in android-marshmallow-arm-rel, and 2/3rds of those were flakes.

1.9% of all CQ runs failed in win10_chromium_x64_rel_ng.
3.2% of all CQ runs failed in android-kitkat-arm-rel.

From 9/20 - 9/25. 

[retry with patch enabled for win7 and android-marshmallow].

total_cq_runs: 3090
win7_chromium_rel_ng [retry success]: 4
win7_chromium_rel_ng [retry failure]: 3
win10_chromium_x64_rel_ng [retry success]: 30
win10_chromium_x64_rel_ng [retry failure]: 30

android-marshmallow-arm-rel [retry success]: 28
android-marshmallow-arm-rel [retry failure]: 7
android-kitkat-arm-rel [retry success]: 82
android-kitkat-arm-rel [retry failure]: 29

0.2% of all CQ runs failed in win7_chromium_rel_ng, and 58% of those were flakes
1.1% of all CQ runs failed in android-marshmallow-arm-rel, and 74% of those were flakes.

1.9% of all CQ runs failed in win10_chromium_x64_rel_ng.
3.5% of all CQ runs failed in android-kitkat-arm-rel.

====================Analysis===============================

The introduction of 'retry with patch', and the fix of several causes of flakiness have reduced failures caused by win7_chromium_rel_ng by 15X [3.1% -> 0.2%]. No changes were made to win10_chromium_x64_rel_ng and failures have stayed constant at 1.9%. 

The introduction of 'retry with patch' [no flakiness fixes] to android-marshmallow-arm-rel reduced failures by 4X [4.6% -> 1.1%]. No changes were made to android-kitkat-arm-rel and failures stayed relatively constant [3.2% -> 3.5%].
Status: Started (was: Assigned)
Do these runs include compile failures? I don't remember offhand whether we will try to retry a compile failure.

I imagine these numbers also don't include jobs where retries were disabled?
> Do these runs include compile failures? 
Kind of.

This script looks at all CQ runs where a build was retried. The CQ doesn't distinguish between failure reasons [e.g. compile vs flaky tests vs invalid test results] and will retry failing builds up to the retry limit. The numbers above for win7_chromium_rel_ng are looking at CQ runs where win7_chromium_rel_ng was retried [implying that the first build was a failure and that the retry limit has not yet been hit]. 

This means that a generic compile error will not be included, since the win7 builder is usually slower than linux counterparts. However, a win specific compile error will most likely be included, since the retry limit will not be hit.

So the most accurate interpretation of the #s above is:

From the period 8/10-9/10, there were ~20k CQ runs. Of those, win7_chromium_rel_ng was retried 3.1% of the time. Failures that did not result in retry [which can only happen due to CQ hitting the retry limit] are not included.

9/20-9/25 had ~3K CQ runs. Of those, win7_chromium_rel_ng was retried 0.2% of the time.

> I imagine these numbers also don't include jobs where retries were disabled?

Are you referring to retries at the recipe layer? The CQ-layer retry does not know about recipe-layer retry-disabling.

Project Member

Comment 26 by bugdroid1@chromium.org, Sep 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/6f802d525fd8cc689be520f4340d3cd51af2f94d

commit 6f802d525fd8cc689be520f4340d3cd51af2f94d
Author: erikchen <erikchen@chromium.org>
Date: Mon Sep 24 22:24:00 2018

All chromium tests should use 'retry with patch' by default.

'retry with patch' is much cheaper than full CQ retries, since it only
reruns failing tests. For the two builders that fail most frequently, 'retry
with patch' reduces failures by 4-11X. Turning it on for all builders should
both reduce fleet utilization, and reduce false rejects.

For more details, see  https://crbug.com/882969#c23 .

Bug:  882969 ,  883321 
Change-Id: I03bc3c2680e488891688587878e00a093286aefb
Reviewed-on: https://chromium-review.googlesource.com/1241655
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/invalid_results.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarming_test_failure.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/dynamic_swarmed_isolated_script_test_failure_no_result_json.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarming_trigger_failure.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipe_modules/chromium_tests/api.py
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/dynamic_isolated_script_test_on_trybot_failing.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_webkit_tests_interrupted.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_webkit_tests_unexpected_error.json
[modify] https://crrev.com/6f802d525fd8cc689be520f4340d3cd51af2f94d/scripts/slave/recipes/chromium_trybot.expected/swarmed_layout_tests_too_many_failures_for_retcode.json

> Are you referring to retries at the recipe layer?

Yes, I was, both for this and for the compile failure question. You're right, that if you're looking at CQ-level retries you wouldn't even know about this.

The point is that for cases where the tree is legitimately broken a "flaky" try job might not actually be a flake, it might be a real failure. For example, if the patch had recipe-level retries disabled, and a test was failing at tip-of-tree (or if the compile failed), the try job would also fail. Once the tree was fixed, the try job would then succeed. 

If I'm understanding your methodology correctly, you don't have a good way of detecting these situations, right?
> Yes, I was, both for this and for the compile failure question.

A compile failure in 'with patch' will immediately error out the recipe. I'm considering change that. See:
https://bugs.chromium.org/p/chromium/issues/detail?id=888734#c1

> If I'm understanding your methodology correctly, you don't have a good way of detecting these situations, right?

Correct

Comment 29 Deleted

Here are some snapshots from go/top-cq-flakes that also show a very large improvement on win7_chromium_rel_ng:

Not sure why the INVALID TEST RESULTS graph stops at 9/21 -- maybe because there have been 0 occurrences since then.

Screen Shot 2018-09-25 at 2.16.43 PM.png
42.1 KB View Download
Screen Shot 2018-09-25 at 2.16.46 PM.png
41.8 KB View Download
* Not sure why the INVALID TEST RESULTS graph stops at 9/21 -- there have been 0 instances of INVALID TEST RESULTS since then that have caused flakes -- I suspect the graph is just failing to render the these.
Improvements on android-marshmallow are even larger according to go/top-cq-flakes. In my earlier measurements I was starting my query from 9/20, but this CL landed on 9/20: https://chromium-review.googlesource.com/c/chromium/tools/build/+/1234437 so I actually needed to start the query on 9/21.


Screen Shot 2018-09-25 at 2.31.48 PM.png
53.9 KB View Download
Screen Shot 2018-09-25 at 2.32.57 PM.png
40.4 KB View Download
False rejects have stayed under 5% for over a week. This has pretty much never happened before. :)
Screen Shot 2018-09-26 at 1.28.48 PM.png
61.4 KB View Download
\o/ Thanks a lot for this.
Amazing work! Thank you for this!!!
Woo!
Update for c#23. More accurate stats for android-marshmallow-arm64-rel from 9/22-9/28 [since 9/20 was prior to CL landing].

Total CQ runs: 5389
retry succeeds: 7
retry fails: 17

Failure rate went from 4.6% -> 0.44%. 
Known flakes dropped to 29% from 66%.
Screen Shot 2018-09-28 at 3.42.08 PM.png
64.7 KB View Download
Wow, awesome!
This is an amazing work!

Now I need to change my query to monitor hidden flakes by the introduction of (retry with patch)!
Status: Fixed (was: Started)
Summary: Compute effectiveness of 'retry with patch' (was: Compute utility of CQ full retries.)
I'm changing the title of this crbug to match its contents.

'retry with patch' has roughly reduced false rejects due to flakiness by 10X. See c#23 and c#37 for details.

CQ full retries still provide different results ~25-50% of the time. As such, removing them would cause a significant degradation to "false rejects" and shouldn't be done yet.

Sign in to add a comment