Master builder should list ignored slaves in build_message table. |
|||||
Issue descriptionWhenever a CQ master builder self-destructs, it should explicitly list information about which slaves it is ignoring / aborting in the build_messages table in CIDB.
,
Apr 4 2017
,
Apr 7 2017
Plan to insert the value (build_id = 'slave_build_id', message_type = 'abort_reason', message_value='self_destruction'). any opinions?
,
Apr 7 2017
I think it would be useful to somehow distinguish between "early success" and "early failure" in the message value.
,
Apr 7 2017
Re#4 that information could be got by querying the build status of the master build.
,
Apr 7 2017
Not quite. It's possible for the master to abort slaves thinking it's passing, then fail during a cleanup/report stage.
,
Apr 12 2017
After discussion in the gerrit thread, we decided to insert the message in the format of ( build_id=master_build_id, message_type='abort_slave', message_subtype='self_destruction', mesage_value=slave_build_id). Re #6, this happens rarely, but if it's required, a single message indicating 'early success' or 'early failure' can be inserted in the master completion stage, as the message is related to the master-build, not the ignored-slave-builds.
,
Apr 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/c034a38e7c53335bf844b4912aa54560739c3a1d commit c034a38e7c53335bf844b4912aa54560739c3a1d Author: Ningning Xia <nxia@chromium.org> Date: Thu Apr 13 22:19:01 2017 When CQ-master destructs itself, it needs to record ignored slaves. When self-destruction is triggered, for each uncompleted slave builds, the master build will insert a message with the slave build_id as the value into the buildMessageTable. The slave builds may pass, fail or abort after the master destructs itself. BUG= chromium:707894 TEST=unit_tests Change-Id: I75e4d946cd51600343ebda94fe58438a6837d159 Reviewed-on: https://chromium-review.googlesource.com/471849 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/cbuildbot/build_status_unittest.py [modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/lib/constants.py [modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/lib/fake_cidb.py [modify] https://crrev.com/c034a38e7c53335bf844b4912aa54560739c3a1d/cbuildbot/build_status.py
,
Apr 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/431f8c3226a05f835a2ae29dba23b5387c56c816 commit 431f8c3226a05f835a2ae29dba23b5387c56c816 Author: Ningning Xia <nxia@chromium.org> Date: Thu Apr 13 22:19:01 2017 Do not mark slaves as 'aborted' in the master CleanUpStage After the master sends out the 'cancel' requests to Buildbucket in the CleanUpStage, the slaves may pass or fail before they really get aborted. Marking the slaves as aborted in the master CleanUpStage may cause inconsistency in CIDB (see details in crbug.com/707993 and crbug.com/708392 ). Instead of marking slaves as 'aborted' in master, the master will insert ignored slaves in the buildMessageTable when it destructs itself. crbug.com/707897 will update the final status for old builds from inflight to aborted/expired. BUG= chromium:707894 TEST=unit_tests Change-Id: I612727dbb99df1723abbb94b4828a50de8421b22 Reviewed-on: https://chromium-review.googlesource.com/476023 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/431f8c3226a05f835a2ae29dba23b5387c56c816/cbuildbot/stages/build_stages.py [modify] https://crrev.com/431f8c3226a05f835a2ae29dba23b5387c56c816/cbuildbot/stages/build_stages_unittest.py
,
Apr 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/23ce57b72ab9afd36fedfc18635648989c5a9260 commit 23ce57b72ab9afd36fedfc18635648989c5a9260 Author: Ningning Xia <nxia@chromium.org> Date: Fri Apr 14 01:48:33 2017 Revert "When CQ-master destructs itself, it needs to record ignored slaves." This reverts commit c034a38e7c53335bf844b4912aa54560739c3a1d. Reason for revert: chromium:711554 Original change's description: > When CQ-master destructs itself, it needs to record ignored slaves. > > When self-destruction is triggered, for each uncompleted slave builds, > the master build will insert a message with the slave build_id as the > value into the buildMessageTable. The slave builds may pass, fail or > abort after the master destructs itself. > > BUG= chromium:707894 > TEST=unit_tests > > Change-Id: I75e4d946cd51600343ebda94fe58438a6837d159 > Reviewed-on: https://chromium-review.googlesource.com/471849 > Commit-Ready: Ningning Xia <nxia@chromium.org> > Tested-by: Ningning Xia <nxia@chromium.org> > Reviewed-by: Aviv Keshet <akeshet@chromium.org> > TBR=dgarrett@chromium.org,akeshet@chromium.org,davidriley@chromium.org,nxia@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= chromium:707894 Change-Id: Iee1c46f8ab205aa0e5a6fb8aa415f09888c1ce9a Reviewed-on: https://chromium-review.googlesource.com/477440 Reviewed-by: Don Garrett <dgarrett@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/cbuildbot/build_status_unittest.py [modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/lib/constants.py [modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/lib/fake_cidb.py [modify] https://crrev.com/23ce57b72ab9afd36fedfc18635648989c5a9260/cbuildbot/build_status.py
,
Apr 25 2017
What's the status of getting this in?
,
Apr 25 2017
The new CL is here: https://chromium-review.googlesource.com/c/478414/ was deputy last week. I'll get it in this week.
,
Apr 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/63de9457827b460d4565729e4143dcb788bbca0c commit 63de9457827b460d4565729e4143dcb788bbca0c Author: Ningning Xia <nxia@chromium.org> Date: Tue Apr 25 23:32:52 2017 When CQ-master destructs itself, it needs to record ignored slaves. When self-destruction is triggered, for each uncompleted slave builds, the master build will insert a message with the slave build_id as the value into the buildMessageTable. The slave builds may pass, fail or abort after the master destructs itself. BUG= chromium:707894 ;chromium:711554 TEST=unit_tests Change-Id: I2083be86f8590a0ef9f4a5b4c4d29a472e09fbb3 Reviewed-on: https://chromium-review.googlesource.com/478414 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/cbuildbot/build_status_unittest.py [modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/lib/constants.py [modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/lib/fake_cidb.py [modify] https://crrev.com/63de9457827b460d4565729e4143dcb788bbca0c/cbuildbot/build_status.py
,
Apr 26 2017
The change was in and here's an example. Please note the message_value is a str and not indexed. mysql> select * from buildMessageTable where build_id=1476161; +---------+----------+----------------+------------------+---------------+---------------------+-------+ | id | build_id | message_type | message_subtype | message_value | timestamp | board | +---------+----------+----------------+------------------+---------------+---------------------+-------+ | 1086620 | 1476161 | ignored_reason | self_destruction | 1476180 | 2017-04-26 12:05:43 | NULL | | 1086621 | 1476161 | ignored_reason | self_destruction | 1476191 | 2017-04-26 12:05:43 | NULL | | 1086622 | 1476161 | ignored_reason | self_destruction | 1476182 | 2017-04-26 12:05:43 | NULL | | 1086623 | 1476161 | ignored_reason | self_destruction | 1476181 | 2017-04-26 12:05:43 | NULL | | 1086624 | 1476161 | ignored_reason | self_destruction | 1476195 | 2017-04-26 12:05:44 | NULL | | 1086625 | 1476161 | ignored_reason | self_destruction | 1476184 | 2017-04-26 12:05:44 | NULL | | 1086626 | 1476161 | ignored_reason | self_destruction | 1476186 | 2017-04-26 12:05:44 | NULL | | 1086627 | 1476161 | ignored_reason | self_destruction | 1476196 | 2017-04-26 12:05:44 | NULL |
,
Apr 26 2017
Does anyone know about creating a view where we merge buildTable with buildMessageTable. I want a view for Viceroy/GoldenEye/SoM which has a correctified finish_time and status based on the values from this table.
,
Apr 26 2017
If we had such a view, it could also show build status as timed out instead of in-flight, after timeouts have been reached.
,
Apr 26 2017
Joining buildTable and buildMessageTable on master_build_id can give the information. A view also sounds not bad if this view will be used by many components outside CrOs infra. If we agree on the new view, we can file a separate bug. But I think the view won't show timed out instead of in-flight, it's just a readonly table of joined tables.
,
Apr 26 2017
The view can calculate the value for any given column using arbitrarily complex expressions. That can be used to show timed out for the build status. Pro: "Timed out" is self managing. No cleanup cron job, no lag in updating status. Cons: The status can be different between the view and table access, giving inconsistent results depending on where you look. Could impact performance (we lose indexing on build status).
,
Apr 26 2017
I think we should consider this a table where you want to find the status of some well defined set of builds. I'm okay having a limited set of indexes to provide a consistent view of this information when we want to know canonically the status of a build.
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/fa841a7164780cadc6b489555cd20b26d87d39c8 commit fa841a7164780cadc6b489555cd20b26d87d39c8 Author: David Riley <davidriley@chromium.org> Date: Wed May 03 03:24:17 2017 SoM: Mark self destructing builds. Use data from CIDB buildMessageTable to indicate if builds were self destructed and attempt to reduce the noise in SoM because of them. BUG= chromium:707894 TEST=som_alerts_dispatcher Change-Id: Iac97f65bf3c6e2df7d7a9bd33c3ead36b70f274d Reviewed-on: https://chromium-review.googlesource.com/492147 Commit-Ready: David Riley <davidriley@chromium.org> Tested-by: David Riley <davidriley@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/fa841a7164780cadc6b489555cd20b26d87d39c8/lib/constants.py [modify] https://crrev.com/fa841a7164780cadc6b489555cd20b26d87d39c8/scripts/som_alerts_dispatcher.py
,
May 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/6f67d8a3ed9b9efb2ea1e502911356137136ef7b commit 6f67d8a3ed9b9efb2ea1e502911356137136ef7b Author: Ningning Xia <nxia@chromium.org> Date: Wed May 03 03:24:19 2017 Add info log for failed build stage. Before recording a build stage as 'fail' in CIDB, print logging of the stage result. BUG= chromium:707894 TEST=unit_tests Change-Id: I818f60282dbf00d6d64607887aeec71b52d959d7 Reviewed-on: https://chromium-review.googlesource.com/494226 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: David Riley <davidriley@chromium.org> [modify] https://crrev.com/6f67d8a3ed9b9efb2ea1e502911356137136ef7b/cbuildbot/stages/generic_stages.py
,
May 12 2017
,
Aug 1 2017
,
Jan 22 2018
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by davidri...@chromium.org
, Apr 4 2017