New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 917486 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 917122

Blocking:
issue 915319



Sign in to add a comment

Removal of CQ-level build retries for GPU integration tests.

Project Member Reported by erikc...@chromium.org, Dec 21

Issue description

We've disabled 'retry with patch' for GPU integration tests, but we still run CQ-level build retries. I conferred with Ken to come up with the best way to turn down CQ-level build retries.

Observations:
  * GPU integration tests have device affinity [e.g. GPUs wear down over time]
  * We don't know the frequency, but GPU integration tests do occasionally flake due to Chrome bugs.
  * The test ordering is not stable. Regardless, we want to run tests in the same order as much as possible. We do not want to retry tests with a different ordering.

Proposal:
  * We run GPU tests once during 'with patch'. 
  * If there is a failure, we redispatch the same swarming task [possibly N times]. 
  * We mark the test run as a success as long as there are [M successes].
  * We never trigger 'retry with patch' or CQ-level build retries.
Most likely values for N & M are [N=1, M=1], [N=3, M=2]. The implementation will likely be shared with Issue 917122.

Other proposals, discarded:
  * In 'with patch', dispatch N tasks. Only let CL pass if all N tasks pass.
    * PRO: Exponentially small probability that newly introduced flakiness lands.
    * CON: Insufficient device capacity.
    * CON: Task/device-affine flakiness will cause large amounts of false rejects.
 
Blockedon: 917122

Sign in to add a comment