New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 780922 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

[Findit] Flake Analyzer - RecursiveFlakeTryJobPipeline has ended in error

Project Member Reported by wylieb@chromium.org, Nov 2 2017

Issue description

This error is occurring more and more frequently, most of the flake analyses that make it to this pipeline end in error. 
 
Owner: lijeffrey@chromium.org
Status: Assigned (was: Untriaged)
The root cause of most of the analyses that ended in error is due to trying to access the report from the try job, but the field is not there as captured in  issue 774172 , which apparently isn't fully fixed.

4 options from here:
1. Get infra to fix report being unavailable immediately, though the original owner has since left the infra team and a new owner would need to be found.
2. Implement retry with backoff to get the report: caveats though are the status code returned is http 200 indicating things are OK, but the returned content should be validated before continuing and retried if not satisfactory
3. Get flake analyzer to detect the proposed-recycled try job doesn't have a report and report it accordingly. Does not fundamentally solve the problem, but does a better job of acknowledging the problem other than "RecursiveFlakeTryJobPipeline was aborted unexpectedly"
4. Do nothing: In many cases the analyses are still able to make it to identifying a culprit, and RecursiveFlakeTryJobPipeline is to be deprecated altogether

Solution 3 is easy to implement but doesn't fundamentally solve the problem: the root cause is in fact not Findit's fault, but the error message reported is just not informative.

Solution 2 should also be considered, but is out of the scope of Flake Analyzer as compile try jobs are impacted as well and should be fixed separately.
Project Member

Comment 3 by bugdroid1@chromium.org, Nov 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/fe07e9d460991d475a5cab221ba634f075aa4562

commit fe07e9d460991d475a5cab221ba634f075aa4562
Author: Jeffrey Li <lijeffrey@chromium.org>
Date: Fri Nov 03 22:54:34 2017

[Findit] Flake Analyzer - Do not reuse try jobs without usable data

Many analyses are ending in "error" which didn't actually have errors
due to recycling try job data from previous runs that ended in "error"
due to report not being available. This change disregards those try jobs
and reruns new ones.

Bug:  780922 
Change-Id: Idd666c3c2e034f0b0ef3a7daf0970cb0126a12e9
Reviewed-on: https://chromium-review.googlesource.com/752165
Reviewed-by: Brandon Wylie <wylieb@chromium.org>
Commit-Queue: Brandon Wylie <wylieb@chromium.org>

[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/process_flake_try_job_result_pipeline.py
[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/test/process_flake_try_job_result_pipeline_test.py
[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/recursive_flake_try_job_pipeline.py
[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/services/flake_failure/flake_try_job_service.py
[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/services/flake_failure/test/flake_try_job_service_test.py
[modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/test/recursive_flake_try_job_pipeline_test.py

Status: Fixed (was: Assigned)
This is technically fixed, though there are other errors with different root causes. This one's just one of them

Sign in to add a comment