[Findit] Flake Analyzer - New task requests to swarming can fail silently |
|||||
Issue descriptionThere are cases where we get back errors from swarming, but somehow we assume the request succeeded, and continue anyways. Analysis: https://findit-for-me.appspot.com/waterfall/flake?key=ag9zfmZpbmRpdC1mb3ItbWVy6AELEhdNYXN0ZXJGbGFrZUFuYWx5c2lzUm9vdCKxAWNocm9taXVtLm1lbW9yeS9MaW51eCBDaHJvbWl1bSBPUyBBU2FuIExTYW4gVGVzdHMgKDEpLzI0NzUwL2NvbnRlbnRfYnJvd3NlcnRlc3RzL1UybDBaVkJsY2xCeWIyTmxjM05DY205M2MyVnlWR1Z6ZEM1RFlXNWpaV3hYYUdWbGJGTmpjbTlzYkVKMVltSnNhVzVuVDI1WGFHVmxiRlJoY21kbGRFUmxiR1YwYVc5dQwLEhNNYXN0ZXJGbGFrZUFuYWx5c2lzGAEM Status: https://findit-for-me.appspot.com/_ah/pipeline/status?root=1c8f93a31e784b90a84307ca5caa87a7&auto=false#pipeline-bc4de46a49c8423d90d83eefa99e894c LOG: 2017-11-16 06:37:06.388 PST got response status 200 for url https://chromium-swarm.appspot.com/_ah/api/swarming/v1/task/39dcebf1c312d010/request (/base/data/home/apps/s~findit-for-me/waterfall-backend:12746-fb6e95f.405435860884942934/libs/http/interceptor.py:123) 2017-11-16 06:37:06.410 PST got exception <class 'google.appengine.api.urlfetch_errors.ConnectionClosedError'>("Connection closed unexpectedly by server at URL: https://chromium-swarm.appspot.com/_ah/api/swarming/v1/tasks/new") for url https://chromium-swarm.appspot.com/_ah/api/swarming/v1/tasks/new (/base/data/home/apps/s~findit-for-me/waterfall-backend:12746-fb6e95f.405435860884942934/libs/http/interceptor.py:130) 2017-11-16 06:37:06.411 PST Retrying connection to https://chromium-swarm.appspot.com/_ah/api/swarming/v1/tasks/new in 60 seconds (/base/data/home/apps/s~findit-for-me/waterfall-backend:12746-fb6e95f.405435860884942934/waterfall/swarming_util.py:181) 2017-11-16 06:38:06.701 PST got response status 200 for url https://chromium-swarm.appspot.com/_ah/api/swarming/v1/tasks/new This pipeline returned a task_id of 'no task - exception'. This whole stack fails silently.
,
Nov 16 2017
,
Nov 21 2017
The root cause of this appears to be that when a build number is determined to be invalid at triggering time (can't be compiled or whatever), then a task id no task is returned. This bubbles up to UpdateFlakeSwarmingTaskDataPointsPipeline.
,
Nov 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e commit 0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e Author: Brandon Wylie <wylieb@chromium.org> Date: Wed Nov 22 22:11:11 2017 [Findit] Flake Analyzer - Find nearby build that's valid. Bug: 786028 Change-Id: Iff26cf2278dd763a5637e0907e0f4c19e15288d0 Reviewed-on: https://chromium-review.googlesource.com/780731 Commit-Queue: Brandon Wylie <wylieb@chromium.org> Reviewed-by: Jeffrey Li <lijeffrey@chromium.org> Reviewed-by: Shuotao Gao <stgao@chromium.org> [modify] https://crrev.com/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e/appengine/findit/waterfall/build_util.py [modify] https://crrev.com/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e/appengine/findit/waterfall/flake/test/recursive_flake_pipeline_test.py [modify] https://crrev.com/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e/appengine/findit/waterfall/flake/recursive_flake_pipeline.py [modify] https://crrev.com/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e/appengine/findit/waterfall/test/build_util_test.py [modify] https://crrev.com/0e3a203b90d7e8c306fe2a7312dd6c93fdbf617e/appengine/findit/waterfall/test/swarming_util_test.py
,
Nov 27 2017
,
Nov 29 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by wylieb@chromium.org
, Nov 16 2017