All slave cbuildbots are reported as failing when in fact only one fails. |
||||||||||||
Issue descriptionhttps://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16385 master-paladin kept failing in CommitQueueCompletion with error: The master destructed itself and stopped waiting for the following slaves I took a look at some of the failed builders and they were in an exception state. Most of them failed in UnitTest step and their logs were incomplete and no error could be seen. Looks like they were interrupted and the issue is more likely in master-paladin itself. I'm going to close the tree as the CQ builders are not likely to succeed.
,
Sep 25 2017
yes
,
Sep 25 2017
Changing the name to better reflect the issue. Reporting all cbuildbots as failing is confusing. Not sure what can be done here though, can they all keep going and report success? Assigning to xixuan@ for proper routing.
,
Sep 25 2017
,
Sep 25 2017
,
Sep 25 2017
this was affected by jkop@'s recent changes. as I commented on #44 https://bugs.chromium.org/p/chromium/issues/detail?id=753189#c44, the logging should be changed.
,
Sep 25 2017
Yeah, accidentally pulled my fix out of the most recent CL while I was changing other aspects of it. Will get that addressed.
,
Sep 25 2017
Investigated. 'cbuildbot failed' is the default message if there aren't any error messages reported for a build but it's failing anyway. One option is to add a "Master self-destructed" error class and add it to each canceled build. This seems to me to probably have negative side effects, such as spuriously blaming CLs in builds which were canceled. But it's an option. Another would be to mark canceled builds as ABORTED, which apparently doesn't happen by default in CIDB when builds are canceled in Buildbucket. This may be tricky but would be my preferred option. Another would be to change the default message to reflect cancellation being more likely. This is less confusing but not any more useful.
,
Sep 26 2017
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome/tools/build/+/5d3148bd66e2abfc3c4908d0ad7fdf37203b9b50 commit 5d3148bd66e2abfc3c4908d0ad7fdf37203b9b50 Author: Ningning Xia <nxia@google.com> Date: Tue Sep 26 19:13:55 2017
,
Sep 26 2017
#9 is actually for crbug.com/753189 . Sorry for the confusion.
,
Nov 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/a4e877eee3974f2b1194588c590bb84e4772f9ed commit a4e877eee3974f2b1194588c590bb84e4772f9ed Author: Jacob Kopczynski <jkop@google.com> Date: Wed Nov 01 00:31:06 2017 Make build message for cancelled slaves useful When slaves have no listed error messages, check CIDB for BuildMessages and pass back a cancellation error if it was canceled by master self-destruction Split the check for cancellation by master into a lib function to factor it out for cases where CIDB is not connected. This also improves test separation. TEST=tryjob BUG= chromium:768313 Change-Id: I276763a76a68139d0b4db00772083fe646c78f9f Reviewed-on: https://chromium-review.googlesource.com/692980 Commit-Ready: Jacob Kopczynski <jkop@chromium.org> Tested-by: Jacob Kopczynski <jkop@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/builder_status_lib.py [modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/fake_cidb.py [modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/cbuildbot/stages/generic_stages.py [modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/builder_status_lib_unittest.py [modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/cbuildbot/stages/report_stages_unittest.py
,
Nov 2 2017
https://luci-milo.appspot.com/buildbot/chromeos/master-paladin/16789 seems not working as expected.
,
Nov 2 2017
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
Saw failures in the report stage.
https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F16789%2F%2B%2F%2A%2A%2Fstdout&s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F16789%2F%2B%2F%2A%2A%2Fstderr
OperationalError: (OperationalError) (1054, "Unknown column 'None' in 'where clause'") 'SELECT build_id, build_config, waterfall, builder_name, build_number, message_type, message_subtype, message_value, timestamp, board FROM buildMessageTable c JOIN buildTable b ON build_id = b.id WHERE build_id = None' ()
[0m
,
Nov 2 2017
,
Nov 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/97475597f4f8bdb22eea74395440e4becd15cc6b commit 97475597f4f8bdb22eea74395440e4becd15cc6b Author: Ningning Xia <nxia@chromium.org> Date: Thu Nov 02 04:28:02 2017 builder_status_lib: fix GetBuilderStatusFromCIDB and slave builder logs. 1) when the CQ master called GetBuilderStatusFromCIDB to create BuilderStatus, it failed at AbortedBySelfDestruction as master_build_id is None. 2) Check if slaves were aborted by self-destruction when master is creating slave messages in SlaveBuilderStatus._GetMessage. BUG= chromium:768313 TEST=unit_tests Change-Id: I5d779969f448e04868af3f2682678a43159bbf8d Reviewed-on: https://chromium-review.googlesource.com/750053 Trybot-Ready: Ningning Xia <nxia@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Commit-Queue: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/97475597f4f8bdb22eea74395440e4becd15cc6b/lib/builder_status_lib.py [modify] https://crrev.com/97475597f4f8bdb22eea74395440e4becd15cc6b/lib/builder_status_lib_unittest.py
,
Nov 14 2017
,
Nov 14 2017
,
Jan 22 2018
,
Jan 23 2018
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by chinyue@chromium.org
, Sep 25 2017