New issue
Advanced search Search tips

Issue 906166 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 883321

Blocking:
issue 902904
issue 905519
issue 916762



Sign in to add a comment

"retry with patch" causing WebGL conformance failures to slip through the CQ

Project Member Reported by kbr@chromium.org, Nov 16

Issue description

See this tryjob:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/11820

From this CL:
https://chromium-review.googlesource.com/c/chromium/src/+/1338159/4

Several tests failed in the dry run, and passed without the patch, but the overall run was incorrectly marked as succeeding.

Most likely related to the work to reapply patch in  Issue 883321 .

This could be allowing major regressions to slip into the product, and could be the reason why the CL causing  Issue 905519  made it through the CQ. Marking P0 regression.

 
Cc: rjkroege@chromium.org piman@chromium.org
More generally, please see this trybot:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel?limit=200

There are a large number of runs where webgl2_conformance_tests failed but the run isn't red. I only triaged one which definitely should not be green. It's possible that the others failed the same tests during the initial run and the retry.

In this particular case, the specific tests which failed seem to be passing in the "retry with patch".

This is concerning because it seems there's a clear bug in https://chromium-review.googlesource.com/c/chromium/src/+/1338159/4 , but the associated tests which it broke are passing when run individually.

Blocking: 902904
Owner: erikc...@chromium.org
Status: Assigned (was: Untriaged)
That's concerning. I believe we had always assumed that this wouldn't happen. Our retry logic isn't very good at dealing with false positives like this.

Erik, can you take a look?
Owner: ----
Status: Available (was: Assigned)
Erik is OOO today :(
Cc: dpranke@chromium.org
Hoping someone else can help us investigate this in the meantime.

I think that in our group's test harness, which uses the entire browser and navigates from page to page, tests aren't isolated hermetically enough to catch the same failures in the "retry with patch" step.

Is it possible to turn this behavior off for individual test suites, relying instead on the results from the "retry without patch" step? I realize this will amplify flakiness, but we're allowing breaking changes through the CQ right now.

We don't have the ability to turn this off very granularly right now. I can make a recipe change to do that though, although it'll be somewhat complicated.
To be clear: I think that the "retry with patch" functionality is probably working correctly. This one particular test suite doesn't seem to work well with that functionality though (the failures go away when the individual tests are run) so I think we need to turn it off.

Labels: -Pri-0 Pri-1
I don't think this is a P0, though I do think it merits revisiting. This logic is working as intended but doesn't handle this particular class of failure well. In theory, though, the retry mechanism inside the task would get this result regardless (if retries were sufficiently independent).


@kbr - am I understanding you correctly that you're saying that if a test failed in the initial run, passed without the patch, and then passed when retried with the patch (which is what appeared to have happened in, say, https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel/11831 ), you'd want that to fail the build?

From my point of view, that should *not* have failed the build, because it was a flaky failure.
Patches that introduce flakiness should not land?
@piman - yes, that is a fair objection, and perhaps that's what kbr is asking for. 

However, currently, that's not the way the system is set up. We don't really support turning off retries at any layer, so and as a result we very clearly bias towards false positives (letting patches through that shouldn't have landed) over false negatives (rejecting patches that should've landed). 

We do want to fix that, to be clear, to add more knobs. This is something we've been talking about a lot with erikchen@ and stgao@ in particular.

Also, it's hard generally to tell when something is introducing flakiness vs. the flakiness already existing, unless you adopt a zero-tolerance policy for flakes. You need access to history the bots currently don't have access to.
dpranke@: I want to be able to turn off the "retry with patch" functionality on a per-step basis. It should be possible to add this as a field to the testing/buildbot JSON files, and for the recipe to honor it when determining whether to attempt retry with patch for a particular failing step.

Basically: our group's tests act as integration tests and not just unit tests, and many of the flaky failures and crashes they catch in the browser unfortunately don't reproduce when the tests are rerun individually.

Because we have evidence of the "retry with patch" retries allowing at least one major bug through the CQ ( Issue 905519 ) and another which would have made it through if we hadn't caught it manually ( https://chromium-review.googlesource.com/c/chromium/src/+/1338159/4 ) I feel it's both warranted and urgent to provide this knob.

> It should be possible to add this as a field to the testing/buildbot JSON files

Sure, I agreed that it should be possible. I disagree with the urgency; put differently, I'm fine with this being a P1, but not a P0. 

It's also not clear to me that being able turn off retries is the most urgent problem in this area. We're not going to turn off retries on the layout tests or browser tests any time soon, and so any tooling we have that might help catch new flakes getting through on *both* those suites and the gpu tests might be more urgent, for example (though probably not, because it's probably more work to do well).
Labels: -Restrict-View-Google -Type-Bug-Regression Type-Bug
-Resstrict-View-Google; as far as I know, there's no need for that label.

P1 is fine. At the time of filing I thought there might be a bug in the recipe.

Being able to disable the "retry with patch" mechanism is urgent in order to maintain stability of the GPU tests. It's allowing bugs to slip past the CQ which are causing reliable failures on the waterfall bots. Please see this screenshot of https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28NVIDIA%29?limit=200 . This bot used to be reliably green.

Screenshot from 2018-11-16 17-00-55.png
516 KB View Download
I see now (I think) that your tests are *also* not retrying at the individual step level. Am I correct that that's intentional (not simply missing functionality in the script), and that that's why you're calling this a regression?
Also, having a test that reliably fails when run as part of a series but doesn't reliably fail when run in isolation seems like a clear problem with the test harness, not an issue of something being a flake or an integration test. 

Are you sure there's nothing you can do here?
The GPU test harness deliberately does not retry all tests, only ones explicitly marked flaky. Our group takes a no-tolerance policy on flaky failures, and by doing so have caught dozens, if not hundreds, of bugs in areas throughout the browser. We will not change this strategy.

Unfortunately I also don't know of any way to catch the failures that would have been accidentally introduced in https://chromium-review.googlesource.com/c/chromium/src/+/1338159/4 . They might be happening upon page navigation or something similar, which is an operation outside the purvey of any single test in our harness, but part of the overall integration test.

Is there some state that is being preserved from test to test?
The GPU process stays alive between the tests. In the browser, it stays alive between page navigations, too. So if we're seeing GPU process crashes while running these tests, that means that end users will see them, too, when navigating around the web.

Well, end users will see them too assuming your test is faithfully reproducing what an end user does (of course; I assume that's what you're saying, just being clear).

At any rate, I don't want to debate the merits of relying on bugs discovered this way too much more; we wander too far off-topic. Perhaps we should discuss this more elsewhere, though, because I often hear you say that catching this sort of an issue this way is a good thing, and from my point of view, the downsides (the failure not being reproducible, the failure not having anything to do with the test in question) outweigh the upsides.
> the failure not having anything to do with the test in question) outweigh the
> upsides.

Not for our team. I agree that it's suboptimal that these issues are being caught in this manner, but papering over them by retrying all of our group's tests is a huge downside, and has resulted in shipping real bugs to customers in the past.

Perhaps the thing to do is to be writing tests that explicitly target catching such issues, rather than relying on them as a confusing side effect of a different system?

I understand (and agree) that it's good to catch tests before they get to customers, but relying on infrastructure that isn't intended to do that (reliably) is papering over missing coverage in a different way.
er, good to catch bugs, obviously :).
Maybe if the assumption of this particular test suite is that the overall state of the process is important to the integrity of the test suite, that this test should ignore requests to run a subset of the tests, but actually always run all the tests?

This seems, overall, worse than making the tests reliably fail when run in isolation, but maybe it's the thing to be done for now to prevent further leaks? (and wouldn't require changing the recipe)
(basically it sounds to me that the test suite doesn't properly implement running subsets of tests, since requesting a subset of tests affects their outcome, which should never be the case)
We can also allow the logic which only retries certain tests to be configurable, so that every time this test suite is retried, the list of failed tests is ignored. That's currently done just as a performance/capacity optimization
> Perhaps the thing to do is to be writing tests that explicitly target
> catching such issues, rather than relying on them as a confusing side
> effect of a different system?

The Chrome team as a whole does a good job in creating minimized regression tests for the issues that are uncovered by these test suites, but we need to continue to catch these issues.

iannucci@, martiniss@: it's fairly expensive to run shards of these test suites so I'd lean in favor of being able to disable the "retry with patch" step on a suite-by-suite basis; but whatever approach you prefer to address this issue and prevent tryjobs from https://chromium-review.googlesource.com/c/chromium/src/+/1338159/4 from being incorrectly green is fine with me.

Summary: "retry with patch" causing WebGL conformance failures to slip through the CQ (was: Trybot failures being reported as successes)
Components: Blink>WebGL Internals>GPU>Testing
Cc: chanli@chromium.org
Owner: erikc...@chromium.org
Status: Assigned (was: Available)
Here's my first pass response. Happy to GVC to discuss further.

====More explanation on 'retry with patch'====
Letting through bad CLs is unavoidable as long as we only run successful tests once.

e.g., imagine a CL that introduces a test that fails with 20% probability.
  EXPECT_EQ(rand() % 5, 0);

With 80% probability, this CL will land without the author knowing any better. Subsequently, 20% of all CLs [which cause the test target to be run] will fail. If we don't retry failing tests, then when such a CL lands, it will cause a drastic increase in false rejects in the CQ. This severely hampers all other Chrome developers.

webgl2_conformance_tests does not retry failing tests [only those marked flaky]. "Our group takes a no-tolerance policy on flaky failures, and by doing so have caught dozens, if not hundreds, of bugs in areas throughout the browser."

However, as noted with the example above, the intention of retrying failures is not to prevent flaky tests from landing [to do that, we have to retry successes]. The intention is to lessen the impact of landed flaky tests on other Chromium developers.

Because we do not retry flaky tests at the webgl2_conformance_tests layer [which takes O(seconds)], we instead trigger retries at the CQ layer [the whole build gets retried]. Based on go/cq-top-flakes, we see that there are several examples where webgl2_conformance_tests fails [both original run and 'retry with patch'], but then succeeds when the build is retried. This indicates that webgl2_conformance_tests does have flakiness.

Example build 1 with failures: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/131220
Example build 2 [retry] with success: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/131251

'retry with patch' is a temporary placeholder. The goal is to leverage it to reduce full build retries. Eventually, I'd like to get rid of it in favor of test-suite layer retries.

====Find-it/flakiness detection====
As noted above, flaky tests are going to land in our codebase. We absolutely should be investigating them, as they frequently mask underlying bugs. We currently have some tooling to do this, and stgao's team is working on making this even better. One of the side effects of both 'retry with patch' and full build retries is that this gives us more information about whether a test is deterministically failing or flaky.

===='retry with patch' and issues we're seeing in this CL====
This CL brings up a specific example. I'll simplify it to the following:
(1) If we run test B after test A, then test B fails.
(2) If we run test B by itself, test B succeeds.

This example is a common problem with test suites that leak state between tests. We've observed it with both blink layout_tests and content_unittests. 

There's a complimentary problem that we've observed in blink layout tests:
(1) If we run test B after test A, then test B succeeds.
(2) If we run test B by itself, test B fails.

Put another way, we have a test that sometimes fails, and sometimes passes. This is the essence of a flaky test. Right now, we don't have much sophistication and we let all flaky tests pass the CQ. Reiterating -- the goal is not to prevent flaky tests from landing. That is going to happen regardless. The goal is to prevent flaky tests from causing false rejects for the Chromium developers.

We eventually want to be in a state where we have the sophistication to say: Prior to this CL, test B was never flaky. During this CL run, test B was flaky. This means that the CL introduced flakiness, and we should reject it.

====Moving retries into test suite runners====
We currently have 3 retry layers: test suite, 'retry with patch', full build retry. Each of these is at least an order of magnitude more expensive than the previous. I would like to eventually move us toward a world where we only have 1 set of retries, which happens in the test suite runner. 

====Disabling 'retry with patch' for webgl2_conformance_tests====
I don't think that removing 'retry with patch' will have the desired effect, as we have yet another, more expensive retry layer that will take place instead. 

There are examples such as this one where 'retry with patch' will allow a flaky test to land, where it previously would have been rejected by the full build retry layer. [Notably, retrying individual tests causes different behavior from running all tests in sequence]. If we think it's appropriate, we can special case 'retry with patch' to rerun all tests on any failure in webgl2_conformance_tests. I don't think that this moves us in the right direction. We're implicitly relying on the assumption that state being carried over between tests is deterministic. We're also doing a lot of potentially unnecessary work [retrying succeeding tests]. 

kbr, what are your thoughts?


@erikchen your #c32 makes it sound like 'retry with patch' didn't make things worse, but I think it actually does (from point of view of landing a test that fails initially as part of a group, at least).

Previously, because we didn't retry at the step level, retrying at the CQ level meant that every time the test ran it would only be run as part of the group, meaning the step would *always* fail and the patch wouldn't land. Now, with 'retry with patch', you'll retry the test in isolation, and so the patch will land. Right?
> Now, with 'retry with patch', you'll retry the test in isolation, and so the patch will land. Right?

Correct, see the section "Disabling 'retry with patch' for webgl2_conformance_tests". One proposal is: "we can special case 'retry with patch' to rerun all tests on any failure in webgl2_conformance_tests" which will fix this particular issue, but as I note above, I'm not sure this is the right approach moving forward, but I'm willing to consider it as a short-term option. 
Erik, thanks for your writeup. While I agree with most of your specifics, I disagree with the high-level conclusion.

For better or worse, the WebGL conformance tests on the CQ act as integration tests and not just unit tests. State in the browser, renderer and GPU process is implicitly and necessarily carried over between tests, because the tests just drive a given tab to new URLs over and over. If we changed the way these tests run in order to isolate them more strongly, we would be blind to certain kinds of changes that cause reliability problems for end users. As I've pointed out, the way these tests run has uncovered hundreds of bugs in areas like Chrome's GPU stack, V8, metrics, the media stack, and many others. They add significant value to Chrome's ecosystem.

Some browser changes cause random tests in this suite to fail. It's somewhat rare, but this failure mode does happen -  Issue 905519  is a recent one which prompted filing this bug. It is critically important to surface these kinds of failures to the developer writing the CL, and not allow these changes to reach the source tree. They're tedious and often very difficult to track down. It's a bad idea to retry these sorts of failures *at all* because they're by nature intermittent. For this reason the webgl_conformance_integration_test.py harness (and the other GPU test harnesses) do not retry tests by default, only if they're observed to be flaky. The expectation is that the tests and the browser are reliable, and this is in general true. Our team's bots typically run >100 green builds in a row. When unexpected flakiness does arise, however, we treat it as a P1 (or P0 in some teams' eyes), and work to get the flakiness resolved immediately.

Even if retries with patch re-ran all of the tests, it's still a retry step that would increase the likelihood that a broken CL will make it into the tree, and I do not want to apply it to any of the GPU test suites.

Have I explained clearly enough why we want to be able to disable this on a per-test-suite basis? Especially if "retry with patch" is intended to be a temporary measure, can we please make it configurable per test suite? I'll gladly take responsibility for making the changes to test_suites.pyl to turn it off for our group's test suites, once that's possible.

Thanks.

Thanks for the detailed description, kbr. I think this will be easiest to resolve off thread. I'll schedule something.

I think what kbr is asking for (and others like wez@ have asked for in the past) is the concept that a test suite shouldn't tolerate any potential flakiness, and we shouldn't do any retries, in order to maximize our chances of catching new flakiness without increasing execution time.

I'm fine with adding that tool to our toolbox. 

I think doing so raises the risk that someone will land a patch due to a flaky *pass* and then other people's CLs will be unduly affected because there will be no retry logic to guard against that. 

In other words, this setting theoretically trades off increased false rejects in favor of less flakiness being introduced into the code base. As long as that's true, I think that's a good tradeoff to make *for suitably scrutinized test steps* like the webgl suites where kbr@ and his team are on top of things. I wouldn't turn this on for any arbitrary test suite just because it didn't seem to be flaky at a point in time.

And, I think to that end we'll eventually want some sort of monitoring to help better triage this tradeoff if it causes significant increases in false rejects.

@erikchen - hopefully that makes sense. If you schedule a discussion off-thread, please include at least stgao and myself in the conversation along with kbr. 
> I think what kbr is asking for ... is the concept that a test suite shouldn't tolerate any potential flakiness, and we shouldn't do any retries ... this setting theoretically trades off increased false rejects in favor of less flakiness being introduced into the code base

kbr -- can you confirm that this is what you're asking for?

If so, then I understand the ask and agree that these are the tradeoffs. 

Note 1: disabling 'retry with patch' does not accomplish the ask. We currently have a CQ-layer retry mechanism which triggers on any test failure, including webgl conformance tests. I'm working on progressively removing that retry mechanism. To accomplish the ask, we'd need to disable both 'retry with patch' and the CQ-layer retry mechanism. 

Note 2: There currently does exist flakiness in webgl conformance tests [see example in c#32]. I am currently not yet aware of any process that the webGL team has in place to follow up on these flakes [or even detect that they're happening]. kbr -- could you provide more details on this?


> Note 1: disabling 'retry with patch' does not accomplish the ask. 
> We currently have a CQ-layer retry mechanism which triggers on any
> test failure, including webgl conformance tests.

I think it does?

The problem with 'retry with patch' is that it's only retrying the single failures, not the whole suite. The CQ-layer retry retries the whole suite. 

This means that `retry with patch` allows things through that the CQ-layer retry doesn't. I.e., any failure that reproducibly happens as part of the whole suite but not as individual failures will get through with 'retry with patch' but not with the cq-layer retry. 

That (as I understand it) is the point of this bug.


> I think it does?

If we changed 'retry with patch' to retry the whole suite for WebGL test failures, then we could still disable CQ-layer retries for WebGL test failures.

> That (as I understand it) is the point of this bug.

Then I'm missing something. I think there are two separate issues: 
* 'retry with patch' retries individual tests whereas 'with patch' runs the whole suite.
* We currently retry WebGL test failures at the CQ layer.
> > I think it does?

> If we changed 'retry with patch' to retry the whole suite ...

Agreed, that would fix it.

I don't think retrying WebGL failures at the CQ layer is an issue (any more than retrying any other failure is). The point of the CQ-layer retry is as much to work around infra issues as anything else.

The "flaky" failures in c#32:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/android-marshmallow-arm64-rel/131220

happened because of two CLs that were independently tested on the CQ, but collided when they landed. See  Issue 905682 .

Chrome's GPU team runs a sheriffing rotation called pixel wrangling:
https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/pixel_wrangling.md

the goal of which is to eliminate flakiness on the waterfall and the trybots. All members do an excellent job filing and fixing flakiness seen on the waterfall, using tools like sheriff-o-matic and FindIt's Flake Detector. rjkroege@ as the last pixel wrangler did a superb job handling the newly introduced flakiness on the waterfall and driving things back to a stable state:
https://bugs.chromium.org/p/chromium/issues/list?can=1&q=reporter%3Arjkroege%40chromium.org+&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids

It's due to the hard work of all of the team members that bots on these waterfalls:
https://ci.chromium.org/p/chromium/g/chromium.gpu/console
https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console

typically run >100 green builds at a time. It's also due to the CQ catching many kinds of bugs that break these tests before they actually reach the source tree.

The importance of stamping out flakiness in these tests is impressed upon all team members:
https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing.md#Stamping-out-Flakiness

Ultimately, yes, I would like to disable all CQ-level retries for Chromium's GPU test suites (including the WebGL conformance tests), because they often cover up real problems in the browser. That's a secondary issue at this point. It sounds like "retry with patch" is intended to allow complete removal of CQ-level retries, an effort which I wholeheartedly support.

The immediate issue is the behavior of "retry with patch". Rather than special-casing it to rerun the entire WebGL conformance suite in case of any trybot failures, I would rather add the knob which allows disabling of "retry with patch" on a per-step basis. It can be encoded in test_suites.pyl and honored by the Chromium recipe. This will make more visible any flakiness that developers accidentally introduce in their CLs.

Let me point out one more difference between "retry with patch" and CQ-level retries: developers that do dry runs of their CLs would like to see flakiness as early as possible. "retry with patch" hides test flakiness that is newly introduced by their CL, because the retry is hidden within the single trybot run, and if the initial try failed but the second one succeeded, there won't be any red or orange indicator on the trybot's status bubble pointing out that an error occurred.

+1 to that, I find it incredibly useful that with CQ-level retry I can at least see the red X next to the green run. I would very much like to see these surfaced, e.g. with a similar "green but with a red box next to it"
All levels of retries unfortunately hide flakiness. CQ-level retries hide the least, but we need to get better about all of them.
Thanks for the context, kbr. I was not previously aware of the pixel wrangling rotation. That makes me feel much better about removing both 'retry with patch' and CQ-level retries for WebGL conformance tests.

In the short term, I'll add a flag to disable 'retry with patch' at a test-suite level. 

Moving forward, it sounds like there are also no concerns about disabling CQ-level retries if the only problem with a build is test failures in WebGL conformance tests.
> Moving forward, it sounds like there are also no concerns about disabling
> CQ-level retries if the only problem with a build is test failures
> in WebGL conformance tests.

I might have a few almost proforma concerns that could be satisfied by data, i.e., proof that we're not seeing too many false rejects that way.

Otherwise, sounds good.
Project Member

Comment 48 by bugdroid1@chromium.org, Nov 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5

commit dce44174721a0a7a9db41dc2b7e1eb0116c65cb5
Author: erikchen <erikchen@chromium.org>
Date: Fri Nov 30 21:38:46 2018

Implement the flag should_retry_with_patch.

The flag can be set on test suites to prevent 'retry with patch' from running
for failures on that test suite.

Bug:  906166 
Change-Id: I348788fec8d99681435cae181fc2203764434b97
Reviewed-on: https://chromium-review.googlesource.com/c/1355712
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Erik Chen <erikchen@chromium.org>

[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipes/blink_downstream.py
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/swarming_plus_optional_dimension.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.py
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/merge.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/steps.py
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/README.recipes.md
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/set_up and tear down.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/test_utils/api.py
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/basic.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/trigger_script.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/api.py
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/steps/generate_gtest.expected/swarming.json
[modify] https://crrev.com/dce44174721a0a7a9db41dc2b7e1eb0116c65cb5/scripts/slave/recipe_modules/chromium_tests/tests/api/run_tests_on_tryserver.py

Super! Thank you Erik for implementing this. Are we at a state where we can start specifying that flag in Chromium's src/testing/buildbot files?

Correct. You should be good to go. 
Project Member

Comment 51 by bugdroid1@chromium.org, Dec 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/44910c3086529709879bac11f407d4c31ba3ffeb

commit 44910c3086529709879bac11f407d4c31ba3ffeb
Author: Kenneth Russell <kbr@chromium.org>
Date: Mon Dec 03 23:35:11 2018

Set should_retry_with_patch=false for Chromium GPU tests.

The GPU tests act much like integration tests for the entire browser,
and tend to uncover flakiness bugs more readily than other test
suites. In order to surface any flakiness more readily to the
developer of the CL which is introducing it, we disable retries with
patch on the commit queue.

Bug:  906166 
Change-Id: Iee56100a774377b9172325627da4694ce0a15009
Reviewed-on: https://chromium-review.googlesource.com/c/1357569
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Erik Chen <erikchen@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613319}
[modify] https://crrev.com/44910c3086529709879bac11f407d4c31ba3ffeb/testing/buildbot/chromium.gpu.fyi.json
[modify] https://crrev.com/44910c3086529709879bac11f407d4c31ba3ffeb/testing/buildbot/chromium.gpu.json
[modify] https://crrev.com/44910c3086529709879bac11f407d4c31ba3ffeb/testing/buildbot/client.v8.fyi.json
[modify] https://crrev.com/44910c3086529709879bac11f407d4c31ba3ffeb/testing/buildbot/generate_buildbot_json.py
[modify] https://crrev.com/44910c3086529709879bac11f407d4c31ba3ffeb/testing/buildbot/generate_buildbot_json_unittest.py

Status: Fixed (was: Assigned)
Should be fixed.

Blocking: 916762

Sign in to add a comment