Slave failure reporting broken on branched swarming master. |
||||||||||||
Issue descriptionhttps://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8939603459422467472 10:02:35: INFO: Running cidb query on pid 16830, repr(query) starts with u'SELECT id, build_stage_id, outer_failure_id, exception_type, exception_message, exception_category [1;31m10:02:35: ERROR: <type 'exceptions.KeyError'>: '' Traceback (most recent call last): File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/failures_lib.py", line 230, in wrapped_functor return functor(*args, **kwargs) File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/report_stages.py", line 364, in PerformStage waterfall_url = waterfall.WATERFALL_TO_DASHBOARD[failure.waterfall] KeyError: '' [0m 10:02:35: INFO: Translating result <type 'exceptions.KeyError'>: '' Traceback (most recent call last): File "/b/swarming/w/ir/cache/cbuild/repository/chromite/lib/failures_lib.py", line 230, in wrapped_functor return functor(*args, **kwargs) File "/b/swarming/w/ir/cache/cbuild/repository/chromite/cbuildbot/stages/report_stages.py", line 364, in PerformStage waterfall_url = waterfall.WATERFALL_TO_DASHBOARD[failure.waterfall] KeyError: '' to fail.
,
Jul 31
I think this might be blocking https://bugs.chromium.org/p/chromium/issues/detail?id=866768 which is blocking the automated running of various test suites (like CTS) against ToT...
,
Jul 31
To be more specific, this seems to happen on master-release, not just branches. https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=master-release&buildBranch=master --> https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8939493988213925904 --> https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8939493988213925904/+/steps/SlaveFailureSummary/0/stdout The last time we saw this pass was 7/2 with https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8942091101558299760 First failure exhibiting this was https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8942060899699577824 --> https://ci.chromium.org/p/chromeos/builders/luci.chromeos.general/Prod/b8942060899699577824 --> https://logs.chromium.org/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8942060899699577824/+/steps/SlaveFailureSummary/0/stdout
,
Jul 31
,
Aug 1
The relevant code is buildbot specific. I'm updating it to generate Legoland links. My expectation is that error will only be see if one of the slaves failed, and so won't cause a master to fail that should have passed. Um.... it might cause a master to fail if an experimental slave failed.
,
Aug 1
,
Aug 1
We need to fix this today.
,
Aug 1
Can we bump this to a P0 then?
,
Aug 1
,
Aug 1
I've prepared this CL: https://crrev.com/c/1157700
,
Aug 1
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/78210c66c23618073485c61dcdc827d7409e3033 commit 78210c66c23618073485c61dcdc827d7409e3033 Author: Don Garrett <dgarrett@google.com> Date: Wed Aug 01 19:25:49 2018 report_stage: Use Legoland for failed slave builders. We were trying to generate buildbot links on master for linking to failed slave builds. This was able to point directly at the stage that failed. Replace those links with Legoland links to the slave builds in question, which work for swarming builds, even if they aren't stage specific. BUG= chromium:869430 TEST=run_tests Change-Id: I45cdceb00cee6f5bec059f8317391f92457902ee Reviewed-on: https://chromium-review.googlesource.com/1157700 Tested-by: Don Garrett <dgarrett@chromium.org> Trybot-Ready: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> Reviewed-by: Bernie Thompson <bhthompson@chromium.org> Commit-Queue: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/78210c66c23618073485c61dcdc827d7409e3033/cbuildbot/stages/report_stages.py
,
Aug 2
Green run of master-release last night :) Marking this closed.
,
Aug 2
,
Aug 2
,
Aug 2
Here is the cherry-pick CL: https://crrev.com/c/1160785
,
Aug 2
This bug requires manual review: M69 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 3
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/9cf06250b2a7801b72a9868693e4280d217762d1 commit 9cf06250b2a7801b72a9868693e4280d217762d1 Author: Don Garrett <dgarrett@google.com> Date: Fri Aug 03 16:17:22 2018 report_stage: Use Legoland for failed slave builders. We were trying to generate buildbot links on master for linking to failed slave builds. This was able to point directly at the stage that failed. Replace those links with Legoland links to the slave builds in question, which work for swarming builds, even if they aren't stage specific. BUG= chromium:869430 TEST=run_tests Change-Id: I45cdceb00cee6f5bec059f8317391f92457902ee Reviewed-on: https://chromium-review.googlesource.com/1157700 Tested-by: Don Garrett <dgarrett@chromium.org> Trybot-Ready: Don Garrett <dgarrett@chromium.org> Reviewed-by: Alec Thilenius <athilenius@google.com> Reviewed-by: Bernie Thompson <bhthompson@chromium.org> Commit-Queue: Don Garrett <dgarrett@chromium.org> (cherry picked from commit 78210c66c23618073485c61dcdc827d7409e3033) Reviewed-on: https://chromium-review.googlesource.com/1160785 [modify] https://crrev.com/9cf06250b2a7801b72a9868693e4280d217762d1/cbuildbot/stages/report_stages.py
,
Aug 3
I got a TPM +2, but not merge approval. I treated that as good enough, and submitted. |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by athilenius@chromium.org
, Jul 31