Scripts: add mitigations against bad revisions in order to save CPU cycles |
|||
Issue descriptionSince we use master branch, we may experience any types of bad revisions: - broken build - build hangs indefinitely - etc We need to have some mitigations against such things. Also copied some discussion from the chat: bot 0005 seems to be running a bad revision, failed to build the following targets: $ cat /chromium/src/logs/_fuzz_and_test_targets_build_fail.log browser_tests chrome_app_unittests interactive_ui_tests keyboard_unittests sync_integration_tests unit_tests Abhishek Arya, 13 mins , thanks for heads up Max Moroz, 12 mins , not sure if I just simply restart it, or maybe we should re-start the loop automatically if too many targets not built $ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End' ### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 02:57:48 UTC 2018 ### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 02:59:28 UTC 2018 mmoroz@code-coverage-linux-0001:~$ date Tue Jun 19 16:09:02 UTC 2018 it's already building stuff for 13 hours, crazy I guess that's just another bad revision... because another bot that was restarted 3 hours later is doing great: mmoroz@code-coverage-linux-0002:~$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End' ### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 06:04:52 UTC 2018 ### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 06:05:42 UTC 2018 ### End /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 10:19:57 UTC 2018 ### Start /home/coverage-bot/scripts/run_test_targets.bash at Tue Jun 19 10:19:57 UTC 2018 ### End /home/coverage-bot/scripts/run_test_targets.bash at Tue Jun 19 12:53:08 UTC 2018 ### Start /home/coverage-bot/scripts/run_fuzz_targets.bash at Tue Jun 19 12:53:08 UTC 2018 0003 is also stuck on building: $ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End' ### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 00:30:08 UTC 2018 ### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 00:32:09 UTC 2018 mmoroz@code-coverage-linux-0003:~$ date Tue Jun 19 16:10:38 UTC 2018 and 0004 as well: mmoroz@code-coverage-linux-0004:~$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End' ### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Mon Jun 18 19:42:55 UTC 2018 ### Start /home/coverage-bot/scripts/build_targets.bash at Mon Jun 18 19:44:05 UTC 2018 mmoroz@code-coverage-linux-0004:~$ date Tue Jun 19 16:11:04 UTC 2018 fun Abhishek Arya, 8 mins , so, test target are hanging with that 3hr timeout ? and multiple ones Max Moroz, 8 mins , no, build is hanging on 3 bots on another 1 bot some targets failed to build and only one bot (the latest I've recreated) is doing good Max Moroz, 7 mins , Edited, I'll take a quick look at the revisions range log Abhishek Arya, 6 mins , lets setup a meeting to brainstorm these recent breakages and any hard mitigations we can do? Max Moroz, 5 mins , sure Max Moroz, 5 mins , NEWNEW I think we just have to set up a timeout for build script, .e.g 6 hours or something. If it fails, we restart the loop and if after the build there are too many targets failed to build, we restart the loop + log those cases into a separate log maybe, e.g. error.log not much else we can do, as we use master branch that was one of the breakages https://chromium.googlesource.com/chromium/src/+/812edd08bc908333c1c10205cbc5f52ef33c7dec could be another one https://chromium.googlesource.com/chromium/src/+/156afe2aa39dd26b9a5ab769b586449c312ad361 it's from the revision range between the two bots that were re-created last: https://chromium.googlesource.com/chromium/src/+log/6c23b698ee4c48b7..e406b0dbadf2 earlier one may be affected by something else I'll upload a CL
,
Jun 25 2018
Haven't seen any of these anymore, but the fix should save us if it ever happens again.
,
Jun 25 2018
,
Aug 30
|
|||
►
Sign in to add a comment |
|||
Comment 1 by bugdroid1@chromium.org
, Jun 19 2018