New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 707894 link

Starred by 2 users

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Master builder should list ignored slaves in build_message table.

Project Member Reported by dgarr...@chromium.org, Apr 3 2017

Issue description

Whenever a CQ master builder self-destructs, it should explicitly list information about which slaves it is ignoring / aborting in the build_messages table in CIDB.
 
To confirm, these buildMessageTable rows will have build_id == slave, right?

Comment 2 by aut...@google.com, Apr 4 2017

Labels: -current-issue

Comment 3 by nxia@chromium.org, Apr 7 2017

Plan to insert the value (build_id = 'slave_build_id', message_type = 'abort_reason', message_value='self_destruction'). any opinions?
I think it would be useful to somehow distinguish between "early success" and "early failure" in the message value.

Comment 5 by nxia@chromium.org, Apr 7 2017

Re#4 that information could be got by querying the build status of the master build. 
Not quite. It's possible for the master to abort slaves thinking it's passing, then fail during a cleanup/report stage.

Comment 7 by nxia@chromium.org, Apr 12 2017

After discussion in the gerrit thread, we decided to insert the message in the format of ( build_id=master_build_id, message_type='abort_slave', message_subtype='self_destruction', mesage_value=slave_build_id).

Re #6, this happens rarely, but if it's required, a single message indicating 'early success' or 'early failure' can be inserted in the master completion stage, as the message is related to the master-build, not the ignored-slave-builds. 
Project Member

Comment 8 by bugdroid1@chromium.org, Apr 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/c034a38e7c53335bf844b4912aa54560739c3a1d

commit c034a38e7c53335bf844b4912aa54560739c3a1d
Author: Ningning Xia <nxia@chromium.org>
Date: Thu Apr 13 22:19:01 2017

When CQ-master destructs itself, it needs to record ignored slaves.

When self-destruction is triggered, for each uncompleted slave builds,
the master build will insert a message with the slave build_id as the
value into the buildMessageTable. The slave builds may pass, fail or
abort after the master destructs itself.

BUG= chromium:707894 
TEST=unit_tests

Change-Id: I75e4d946cd51600343ebda94fe58438a6837d159
Reviewed-on: https://chromium-review.googlesource.com/471849
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/cbuildbot/build_status_unittest.py
[modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/lib/constants.py
[modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/lib/fake_cidb.py
[modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/cbuildbot/build_status.py

Project Member

Comment 9 by bugdroid1@chromium.org, Apr 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/431f8c3226a05f835a2ae29dba23b5387c56c816

commit 431f8c3226a05f835a2ae29dba23b5387c56c816
Author: Ningning Xia <nxia@chromium.org>
Date: Thu Apr 13 22:19:01 2017

Do not mark slaves as 'aborted' in the master CleanUpStage

After the master sends out the 'cancel' requests to Buildbucket in the
CleanUpStage, the slaves may pass or fail before they really get aborted.
Marking the slaves as aborted in the master CleanUpStage may cause
inconsistency in CIDB (see details in  crbug.com/707993  and
 crbug.com/708392 ). Instead of marking slaves as 'aborted' in master, the
master will insert ignored slaves in the buildMessageTable when it
destructs itself. crbug.com/707897 will update the final status for old
builds from inflight to aborted/expired.

BUG= chromium:707894 
TEST=unit_tests

Change-Id: I612727dbb99df1723abbb94b4828a50de8421b22
Reviewed-on: https://chromium-review.googlesource.com/476023
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/431f8c3226a05f835a2ae29dba23b5387c56c816/cbuildbot/stages/build_stages.py
[modify] https://crrev.com/431f8c3226a05f835a2ae29dba23b5387c56c816/cbuildbot/stages/build_stages_unittest.py

Project Member

Comment 10 by bugdroid1@chromium.org, Apr 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/23ce57b72ab9afd36fedfc18635648989c5a9260

commit 23ce57b72ab9afd36fedfc18635648989c5a9260
Author: Ningning Xia <nxia@chromium.org>
Date: Fri Apr 14 01:48:33 2017

Revert "When CQ-master destructs itself, it needs to record ignored slaves."

This reverts commit c034a38e7c53335bf844b4912aa54560739c3a1d.

Reason for revert: chromium:711554

Original change's description:
> When CQ-master destructs itself, it needs to record ignored slaves.
> 
> When self-destruction is triggered, for each uncompleted slave builds,
> the master build will insert a message with the slave build_id as the
> value into the buildMessageTable. The slave builds may pass, fail or
> abort after the master destructs itself.
> 
> BUG= chromium:707894 
> TEST=unit_tests
> 
> Change-Id: I75e4d946cd51600343ebda94fe58438a6837d159
> Reviewed-on: https://chromium-review.googlesource.com/471849
> Commit-Ready: Ningning Xia <nxia@chromium.org>
> Tested-by: Ningning Xia <nxia@chromium.org>
> Reviewed-by: Aviv Keshet <akeshet@chromium.org>
> 

TBR=dgarrett@chromium.org,akeshet@chromium.org,davidriley@chromium.org,nxia@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= chromium:707894 

Change-Id: Iee1c46f8ab205aa0e5a6fb8aa415f09888c1ce9a
Reviewed-on: https://chromium-review.googlesource.com/477440
Reviewed-by: Don Garrett <dgarrett@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/cbuildbot/build_status_unittest.py
[modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/lib/constants.py
[modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/lib/fake_cidb.py
[modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/cbuildbot/build_status.py

What's the status of getting this in?

Comment 12 by nxia@chromium.org, Apr 25 2017

The new CL is here:

https://chromium-review.googlesource.com/c/478414/

was deputy last week. I'll get it in this week.
Project Member

Comment 13 by bugdroid1@chromium.org, Apr 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/63de9457827b460d4565729e4143dcb788bbca0c

commit 63de9457827b460d4565729e4143dcb788bbca0c
Author: Ningning Xia <nxia@chromium.org>
Date: Tue Apr 25 23:32:52 2017

When CQ-master destructs itself, it needs to record ignored slaves.

When self-destruction is triggered, for each uncompleted slave builds,
the master build will insert a message with the slave build_id as the
value into the buildMessageTable. The slave builds may pass, fail or
abort after the master destructs itself.

BUG= chromium:707894 ;chromium:711554
TEST=unit_tests

Change-Id: I2083be86f8590a0ef9f4a5b4c4d29a472e09fbb3
Reviewed-on: https://chromium-review.googlesource.com/478414
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/cbuildbot/build_status_unittest.py
[modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/lib/constants.py
[modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/lib/fake_cidb.py
[modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/cbuildbot/build_status.py

Comment 14 by nxia@chromium.org, Apr 26 2017

The change was in and here's an example.
Please note the message_value is a str and not indexed.

mysql> select * from buildMessageTable where build_id=1476161;
+---------+----------+----------------+------------------+---------------+---------------------+-------+
| id      | build_id | message_type   | message_subtype  | message_value | timestamp           | board |
+---------+----------+----------------+------------------+---------------+---------------------+-------+
| 1086620 |  1476161 | ignored_reason | self_destruction | 1476180       | 2017-04-26 12:05:43 | NULL  |
| 1086621 |  1476161 | ignored_reason | self_destruction | 1476191       | 2017-04-26 12:05:43 | NULL  |
| 1086622 |  1476161 | ignored_reason | self_destruction | 1476182       | 2017-04-26 12:05:43 | NULL  |
| 1086623 |  1476161 | ignored_reason | self_destruction | 1476181       | 2017-04-26 12:05:43 | NULL  |
| 1086624 |  1476161 | ignored_reason | self_destruction | 1476195       | 2017-04-26 12:05:44 | NULL  |
| 1086625 |  1476161 | ignored_reason | self_destruction | 1476184       | 2017-04-26 12:05:44 | NULL  |
| 1086626 |  1476161 | ignored_reason | self_destruction | 1476186       | 2017-04-26 12:05:44 | NULL  |
| 1086627 |  1476161 | ignored_reason | self_destruction | 1476196       | 2017-04-26 12:05:44 | NULL  |

Does anyone know about creating a view where we merge buildTable with buildMessageTable.  I want a view for Viceroy/GoldenEye/SoM which has a correctified finish_time and status based on the values from this table.
If we had such a view, it could also show build status as timed out instead of in-flight, after timeouts have been reached.

Comment 17 by nxia@chromium.org, Apr 26 2017

Joining buildTable and buildMessageTable on master_build_id can give the information. A view also sounds not bad if this view will be used by many components outside CrOs infra. If we agree on the new view, we can file a separate bug.

But I think the view won't show timed out instead of in-flight, it's just a readonly table of joined tables. 
The view can calculate the value for any given column using arbitrarily complex expressions. That can be used to show timed out for the build status.

Pro:
  "Timed out" is self managing. No cleanup cron job, no lag in updating status.

Cons:
  The status can be different between the view and table access, giving inconsistent results depending on where you look.
  Could impact performance (we lose indexing on build status).

I think we should consider this a table where you want to find the status of some well defined set of builds.  I'm okay having a limited set of indexes to provide a consistent view of this information when we want to know canonically the status of a build.  
Project Member

Comment 20 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/fa841a7164780cadc6b489555cd20b26d87d39c8

commit fa841a7164780cadc6b489555cd20b26d87d39c8
Author: David Riley <davidriley@chromium.org>
Date: Wed May 03 03:24:17 2017

SoM: Mark self destructing builds.

Use data from CIDB buildMessageTable to indicate if builds were
self destructed and attempt to reduce the noise in SoM because of
them.

BUG= chromium:707894 
TEST=som_alerts_dispatcher

Change-Id: Iac97f65bf3c6e2df7d7a9bd33c3ead36b70f274d
Reviewed-on: https://chromium-review.googlesource.com/492147
Commit-Ready: David Riley <davidriley@chromium.org>
Tested-by: David Riley <davidriley@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/fa841a7164780cadc6b489555cd20b26d87d39c8/lib/constants.py
[modify] https://crrev.com/fa841a7164780cadc6b489555cd20b26d87d39c8/scripts/som_alerts_dispatcher.py

Project Member

Comment 21 by bugdroid1@chromium.org, May 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/6f67d8a3ed9b9efb2ea1e502911356137136ef7b

commit 6f67d8a3ed9b9efb2ea1e502911356137136ef7b
Author: Ningning Xia <nxia@chromium.org>
Date: Wed May 03 03:24:19 2017

Add info log for failed build stage.

Before recording a build stage as 'fail' in CIDB, print logging of the
stage result.

BUG= chromium:707894 
TEST=unit_tests

Change-Id: I818f60282dbf00d6d64607887aeec71b52d959d7
Reviewed-on: https://chromium-review.googlesource.com/494226
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: David Riley <davidriley@chromium.org>

[modify] https://crrev.com/6f67d8a3ed9b9efb2ea1e502911356137136ef7b/cbuildbot/stages/generic_stages.py

Comment 22 by nxia@chromium.org, May 12 2017

Status: Fixed (was: Untriaged)
Labels: VerifyIn-61

Comment 24 by dchan@chromium.org, Jan 22 2018

Status: Archived (was: Fixed)

Sign in to add a comment