New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 768313 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Nov 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug


Show other hotlists

Hotlists containing this issue:
Hotlist-1


Sign in to add a comment

All slave cbuildbots are reported as failing when in fact only one fails.

Project Member Reported by chinyue@chromium.org, Sep 25 2017

Issue description

https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/16385

master-paladin kept failing in CommitQueueCompletion with error:

The master destructed itself and stopped waiting for the following slaves


I took a look at some of the failed builders and they were in an exception state. Most of them failed in UnitTest step and their logs were incomplete and no error could be seen. Looks like they were interrupted and the issue is more likely in master-paladin itself.

I'm going to close the tree as the CQ builders are not likely to succeed.

 
The master-paladin's report says:

FAIL CommitQueueCompletion (0:15:16) with ImportantBuilderFailedException


And the slave builders are:

wolf-paladin: cbuildbot failed
betty-paladin: cbuildbot failed
veyron_rialto-paladin: cbuildbot failed
caroline-paladin: cbuildbot failed
kevin-paladin: cbuildbot failed
whirlwind-paladin: cbuildbot failed
lumpy-paladin: cbuildbot failed
falco-full-compile-paladin: cbuildbot failed
parrot-paladin: cbuildbot failed
tidus-paladin: cbuildbot failed
monroe-paladin: cbuildbot failed
guado-paladin: cbuildbot failed
stumpy-paladin: cbuildbot failed
veyron_mighty-paladin: cbuildbot failed
zoombini-paladin: cbuildbot failed
glados-paladin: cbuildbot failed
amd64-generic-paladin: cbuildbot failed
betty-arc64-paladin: cbuildbot failed
cave-paladin: cbuildbot failed
fizz-paladin: cbuildbot failed
oak-paladin: cbuildbot failed
veyron_speedy-paladin: cbuildbot failed
scarlet-paladin: cbuildbot failed
stout-paladin: cbuildbot failed
nyan_kitty-paladin: cbuildbot failed
hana-paladin: cbuildbot failed
eve-paladin: cbuildbot failed
auron_yuna-paladin: cbuildbot failed
daisy_skate-paladin: cbuildbot failed
sentry-paladin: cbuildbot failed
quawks-paladin: cbuildbot failed
daisy-paladin: cbuildbot failed
kip-paladin: cbuildbot failed
leon-paladin: cbuildbot failed
veyron_jaq-paladin: cbuildbot failed
chell-nowithdebug-paladin: cbuildbot failed
daisy_spring-paladin: cbuildbot failed
samus-paladin: cbuildbot failed
veyron_minnie-paladin: cbuildbot failed
panther-paladin: cbuildbot failed
wizpig-paladin: cbuildbot failed
reef-paladin: cbuildbot failed
peach_pit-paladin: cbuildbot failed
veyron_tiger-paladin: cbuildbot failed
lakitu-paladin: cbuildbot failed
winky-paladin: cbuildbot failed
tricky-paladin: cbuildbot failed
nyan-full-compile-paladin: cbuildbot failed
elm-paladin: cbuildbot failed
link-paladin: cbuildbot failed
chell-paladin: cbuildbot failed
coral-paladin: The BuildImage stage failed: ./build_image failed (code=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
edgar-paladin: cbuildbot failed
poppy-paladin: cbuildbot failed
falco-paladin: cbuildbot failed
bob-paladin: cbuildbot failed
veyron_jerry-paladin: cbuildbot failed
cyan-paladin: cbuildbot failed
reef-uni-paladin: cbuildbot failed
peppy-paladin: cbuildbot failed
butterfly-paladin: cbuildbot failed
nyan_big-paladin: cbuildbot failed
guado_moblab-paladin: cbuildbot failed
gale-paladin: cbuildbot failed
arm-generic-paladin: cbuildbot failed


Is it because coral-paladin failed so master-paladin got affected?
If so, then we need to fix the coral  issue 768280 

Comment 2 by xixuan@chromium.org, Sep 25 2017

Status: WontFix (was: Untriaged)
yes
Owner: xixuan@chromium.org
Status: Available (was: WontFix)
Summary: All slave cbuildbots are reported as failing when in fact only one fails. (was: master-paladin failed in CommitQueueCompletion)
Changing the name to better reflect the issue. 

Reporting all cbuildbots as failing is confusing. Not sure what can be done here though, can they all keep going and report success?

Assigning to xixuan@ for proper routing.

Comment 4 by xixuan@chromium.org, Sep 25 2017

Cc: xixuan@chromium.org
Owner: nxia@chromium.org

Comment 5 by xixuan@chromium.org, Sep 25 2017

Cc: vbendeb@chromium.org

Comment 6 by nxia@chromium.org, Sep 25 2017

Cc: nxia@chromium.org
Owner: jkop@chromium.org
this was affected by jkop@'s recent changes. as I commented on #44 https://bugs.chromium.org/p/chromium/issues/detail?id=753189#c44, the logging should be changed. 

Comment 7 by jkop@chromium.org, Sep 25 2017

Yeah, accidentally pulled my fix out of the most recent CL while I was changing other aspects of it. Will get that addressed.

Comment 8 by jkop@chromium.org, Sep 25 2017

Status: Started (was: Available)
Investigated. 'cbuildbot failed' is the default message if there aren't any error messages reported for a build but it's failing anyway.

One option is to add a "Master self-destructed" error class and add it to each canceled build. This seems to me to probably have negative side effects, such as spuriously blaming CLs in builds which were canceled. But it's an option.

Another would be to mark canceled builds as ABORTED, which apparently doesn't happen by default in CIDB when builds are canceled in Buildbucket. This may be tricky but would be my preferred option.

Another would be to change the default message to reflect cancellation being more likely. This is less confusing but not any more useful.
Project Member

Comment 9 by bugdroid1@chromium.org, Sep 26 2017

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome/tools/build/+/5d3148bd66e2abfc3c4908d0ad7fdf37203b9b50

commit 5d3148bd66e2abfc3c4908d0ad7fdf37203b9b50
Author: Ningning Xia <nxia@google.com>
Date: Tue Sep 26 19:13:55 2017

Comment 10 by nxia@chromium.org, Sep 26 2017

#9 is actually for  crbug.com/753189 . Sorry for the confusion.
Project Member

Comment 11 by bugdroid1@chromium.org, Nov 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/a4e877eee3974f2b1194588c590bb84e4772f9ed

commit a4e877eee3974f2b1194588c590bb84e4772f9ed
Author: Jacob Kopczynski <jkop@google.com>
Date: Wed Nov 01 00:31:06 2017

Make build message for cancelled slaves useful

When slaves have no listed error messages, check CIDB for BuildMessages
 and pass back a cancellation error if it was canceled by master
 self-destruction
Split the check for cancellation by master into a lib function to factor
 it out for cases where CIDB is not connected.
 This also improves test separation.

TEST=tryjob
BUG= chromium:768313 

Change-Id: I276763a76a68139d0b4db00772083fe646c78f9f
Reviewed-on: https://chromium-review.googlesource.com/692980
Commit-Ready: Jacob Kopczynski <jkop@chromium.org>
Tested-by: Jacob Kopczynski <jkop@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/builder_status_lib.py
[modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/fake_cidb.py
[modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/cbuildbot/stages/generic_stages.py
[modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/lib/builder_status_lib_unittest.py
[modify] https://crrev.com/a4e877eee3974f2b1194588c590bb84e4772f9ed/cbuildbot/stages/report_stages_unittest.py

Comment 13 by nxia@chromium.org, Nov 2 2017

  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
Saw failures in the report stage.

https://logs.chromium.org/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F16789%2F%2B%2F%2A%2A%2Fstdout&s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F16789%2F%2B%2F%2A%2A%2Fstderr

OperationalError: (OperationalError) (1054, "Unknown column 'None' in 'where clause'") 'SELECT build_id, build_config, waterfall, builder_name, build_number, message_type, message_subtype, message_value, timestamp, board FROM buildMessageTable c JOIN buildTable b ON build_id = b.id  WHERE build_id = None' ()


Comment 14 by nxia@chromium.org, Nov 2 2017

Cc: ayatane@chromium.org
Project Member

Comment 15 by bugdroid1@chromium.org, Nov 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/97475597f4f8bdb22eea74395440e4becd15cc6b

commit 97475597f4f8bdb22eea74395440e4becd15cc6b
Author: Ningning Xia <nxia@chromium.org>
Date: Thu Nov 02 04:28:02 2017

builder_status_lib: fix GetBuilderStatusFromCIDB and slave builder logs.

1) when the CQ master called GetBuilderStatusFromCIDB to create
BuilderStatus, it failed at AbortedBySelfDestruction as master_build_id
is None.

2) Check if slaves were aborted by self-destruction when master is
creating slave messages in SlaveBuilderStatus._GetMessage.

BUG= chromium:768313 
TEST=unit_tests

Change-Id: I5d779969f448e04868af3f2682678a43159bbf8d
Reviewed-on: https://chromium-review.googlesource.com/750053
Trybot-Ready: Ningning Xia <nxia@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Commit-Queue: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/97475597f4f8bdb22eea74395440e4becd15cc6b/lib/builder_status_lib.py
[modify] https://crrev.com/97475597f4f8bdb22eea74395440e4becd15cc6b/lib/builder_status_lib_unittest.py

Comment 16 by jkop@chromium.org, Nov 14 2017

Owner: nxia@chromium.org

Comment 17 by nxia@chromium.org, Nov 14 2017

Status: Fixed (was: Started)

Comment 18 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Comment 19 by dchan@chromium.org, Jan 23 2018

Status: Fixed (was: Archived)

Sign in to add a comment