New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 854226 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 759794



Sign in to add a comment

Scripts: add mitigations against bad revisions in order to save CPU cycles

Project Member Reported by mmoroz@chromium.org, Jun 19 2018

Issue description

Since we use master branch, we may experience any types of bad revisions:
- broken build
- build hangs indefinitely
- etc

We need to have some mitigations against such things.

Also copied some discussion from the chat:

bot 0005 seems to be running a bad revision, failed to build the following targets:

$ cat /chromium/src/logs/_fuzz_and_test_targets_build_fail.log 
browser_tests
chrome_app_unittests
interactive_ui_tests
keyboard_unittests
sync_integration_tests
unit_tests

Abhishek Arya,
13 mins
,
thanks for heads up

Max Moroz,
12 mins
,
not sure if I just simply restart it, or maybe we should re-start the loop automatically if too many targets not built
$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End'
### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 02:57:48 UTC 2018
### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 02:59:28 UTC 2018
mmoroz@code-coverage-linux-0001:~$ date
Tue Jun 19 16:09:02 UTC 2018
it's already building stuff for 13 hours, crazy
I guess that's just another bad revision... because another bot that was restarted 3 hours later is doing great:
mmoroz@code-coverage-linux-0002:~$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End'
### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 06:04:52 UTC 2018
### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 06:05:42 UTC 2018
### End /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 10:19:57 UTC 2018
### Start /home/coverage-bot/scripts/run_test_targets.bash at Tue Jun 19 10:19:57 UTC 2018
### End /home/coverage-bot/scripts/run_test_targets.bash at Tue Jun 19 12:53:08 UTC 2018
### Start /home/coverage-bot/scripts/run_fuzz_targets.bash at Tue Jun 19 12:53:08 UTC 2018
0003 is also stuck on building:
$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End'
### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Tue Jun 19 00:30:08 UTC 2018
### Start /home/coverage-bot/scripts/build_targets.bash at Tue Jun 19 00:32:09 UTC 2018
mmoroz@code-coverage-linux-0003:~$ date
Tue Jun 19 16:10:38 UTC 2018
and 0004 as well:
mmoroz@code-coverage-linux-0004:~$ cat ../coverage-bot/scripts/_bot.log | egrep '^###' | egrep 'Start|End'
### Start /home/coverage-bot/scripts/code_coverage_loop.bash at Mon Jun 18 19:42:55 UTC 2018
### Start /home/coverage-bot/scripts/build_targets.bash at Mon Jun 18 19:44:05 UTC 2018
mmoroz@code-coverage-linux-0004:~$ date
Tue Jun 19 16:11:04 UTC 2018


fun

Abhishek Arya,
8 mins
,
so, test target are hanging with that 3hr timeout ?
and multiple ones

Max Moroz,
8 mins
,
no, build is hanging
on 3 bots
on another 1 bot some targets failed to build
and only one bot (the latest I've recreated) is doing good

Max Moroz,
7 mins
,
Edited,
I'll take a quick look at the revisions range log

Abhishek Arya,
6 mins
,
lets setup a meeting to brainstorm these recent breakages and any hard mitigations we can do?

Max Moroz,
5 mins
,
sure

Max Moroz,
5 mins
,
NEWNEW
I think we just have to set up a timeout for build script, .e.g 6 hours or something. If it fails, we restart the loop
and if after the build there are too many targets failed to build, we restart the loop
+ log those cases into a separate log maybe, e.g. error.log
not much else we can do, as we use master branch
that was one of the breakages https://chromium.googlesource.com/chromium/src/+/812edd08bc908333c1c10205cbc5f52ef33c7dec
could be another one https://chromium.googlesource.com/chromium/src/+/156afe2aa39dd26b9a5ab769b586449c312ad361
it's from the revision range between the two bots that were re-created last: https://chromium.googlesource.com/chromium/src/+log/6c23b698ee4c48b7..e406b0dbadf2

earlier one may be affected by something else
I'll upload a CL
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jun 19 2018

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chrome/tools/code-coverage/+/99a7f9b79cca741fecf41c16d9fb9067e77a7604

commit 99a7f9b79cca741fecf41c16d9fb9067e77a7604
Author: Max Moroz <mmoroz@google.com>
Date: Tue Jun 19 17:27:24 2018

Comment 2 by mmoroz@chromium.org, Jun 25 2018

Haven't seen any of these anymore, but the fix should save us if it ever happens again.

Comment 3 by mmoroz@chromium.org, Jun 25 2018

Status: Fixed (was: Started)
Blocking: 759794

Sign in to add a comment