[Findit] Flake Analyzer - RecursiveFlakeTryJobPipeline has ended in error |
||
Issue descriptionThis error is occurring more and more frequently, most of the flake analyses that make it to this pipeline end in error.
,
Nov 2 2017
The root cause of most of the analyses that ended in error is due to trying to access the report from the try job, but the field is not there as captured in issue 774172 , which apparently isn't fully fixed. 4 options from here: 1. Get infra to fix report being unavailable immediately, though the original owner has since left the infra team and a new owner would need to be found. 2. Implement retry with backoff to get the report: caveats though are the status code returned is http 200 indicating things are OK, but the returned content should be validated before continuing and retried if not satisfactory 3. Get flake analyzer to detect the proposed-recycled try job doesn't have a report and report it accordingly. Does not fundamentally solve the problem, but does a better job of acknowledging the problem other than "RecursiveFlakeTryJobPipeline was aborted unexpectedly" 4. Do nothing: In many cases the analyses are still able to make it to identifying a culprit, and RecursiveFlakeTryJobPipeline is to be deprecated altogether Solution 3 is easy to implement but doesn't fundamentally solve the problem: the root cause is in fact not Findit's fault, but the error message reported is just not informative. Solution 2 should also be considered, but is out of the scope of Flake Analyzer as compile try jobs are impacted as well and should be fixed separately.
,
Nov 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/fe07e9d460991d475a5cab221ba634f075aa4562 commit fe07e9d460991d475a5cab221ba634f075aa4562 Author: Jeffrey Li <lijeffrey@chromium.org> Date: Fri Nov 03 22:54:34 2017 [Findit] Flake Analyzer - Do not reuse try jobs without usable data Many analyses are ending in "error" which didn't actually have errors due to recycling try job data from previous runs that ended in "error" due to report not being available. This change disregards those try jobs and reruns new ones. Bug: 780922 Change-Id: Idd666c3c2e034f0b0ef3a7daf0970cb0126a12e9 Reviewed-on: https://chromium-review.googlesource.com/752165 Reviewed-by: Brandon Wylie <wylieb@chromium.org> Commit-Queue: Brandon Wylie <wylieb@chromium.org> [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/process_flake_try_job_result_pipeline.py [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/test/process_flake_try_job_result_pipeline_test.py [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/recursive_flake_try_job_pipeline.py [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/services/flake_failure/flake_try_job_service.py [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/services/flake_failure/test/flake_try_job_service_test.py [modify] https://crrev.com/fe07e9d460991d475a5cab221ba634f075aa4562/appengine/findit/waterfall/flake/test/recursive_flake_try_job_pipeline_test.py
,
Nov 15 2017
This is technically fixed, though there are other errors with different root causes. This one's just one of them |
||
►
Sign in to add a comment |
||
Comment 1 by lijeffrey@chromium.org
, Nov 2 2017Status: Assigned (was: Untriaged)