New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 678465 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

chromium.perf stopped scheduling builds for *some* builders

Project Member Reported by katthomas@google.com, Jan 5 2017

Issue description

At around 5pm pst today, martiniss noticed that some chromium.perf builders were not scheduling builds. About fifteen minutes later, a bunch of builds were scheduled at once. 

There were builders with pending queues, and bots available but not utilized.

For the builds that are running, they look as though they've all been scheduled in a batch. This is what I see right now for Linux Builder Currently Building:

85902 ETA: 17:34:37 [3 mins, 7 secs] [Running for 20 mins, 52 secs] compile
85903 ETA: 17:34:34 [3 mins, 3 secs] [Running for 20 mins, 52 secs] compile
85904 ETA: 17:34:38 [3 mins, 8 secs] [Running for 20 mins, 52 secs] compile
85905 ETA: 17:34:34 [3 mins, 3 secs] [Running for 20 mins, 52 secs] compile
85906 ETA: 17:31:50 [19 secs] [Running for 20 mins, 52 secs] compile
85907 ETA: 17:31:50 [19 secs] [Running for 20 mins, 52 secs] compile
85908 ETA: 17:31:46 [15 secs] [Running for 20 mins, 52 secs] compile
85910 ETA: 17:31:44 [13 secs] [Running for 20 mins, 52 secs] compile
85909 ETA: 17:31:45 [14 secs] [Running for 20 mins, 52 secs] compile
85911 ETA: 17:31:45 [14 secs] [Running for 20 mins, 52 secs] compile
85912 ETA: 17:31:44 [13 secs] [Running for 20 mins, 52 secs] compile
85913 ETA: 17:31:44 [13 secs] [Running for 20 mins, 52 secs] compile
85914 ETA: 17:31:44 [13 secs] [Running for 20 mins, 52 secs] compile
85915 ETA: 17:31:44 [14 secs] [Running for 20 mins, 52 secs] compile
85916 ETA: 17:31:42 [11 secs] [Running for 11 mins, 16 secs] compile 

It looks like calls to get master.json are taking a long time. This might be a contributing factor. Looking at the viceroy graphs (https://viceroy.corp.google.com/chrome_infra/Machines/masters?duration=30d), nothing looks all that out of the ordinary. We need to do more investigation. 
 
We're also seeing a bunch of these lines in the chromium.perf master logs:

2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33] Traceback (most recent call last):
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33]   File "/usr/lib/python2.7/logging/handlers.py", line 78, in emit
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33]     self.doRollover()
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33]   File "/usr/lib/python2.7/logging/handlers.py", line 338, in doRollover
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33]     os.rename(self.baseFilename, dfn)
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33] OSError: [Errno 2] No such file or directory
2017-01-04 17:48:06-0800 [Broker,27404,192.168.110.33] Logged from file status_logger.py, line 244

Not sure if it's related to anything.

We looked at the master machines, and didn't notice any CPU spikes (for master 1) that look correlated. Master7 has a CPU spike, which is weird.

Also, pubsub has this log line: 
2017-01-04 17:36:09-0800 [-] PubSub: Last send session took total 14.0s, 0.0 load build, 10.8 master, 3.2 send. len_tcq 1. br 1170. bs 680

The 10.8 master part means that the master json took 10 seconds to generate. There are much bigger numbers in all of the twistd.log files. 
Summary: chromium.perf stopped scheduling builds for *some* builders (was: chromium.perf stop scheduling builds for *some* builders)
Cc: sergeybe...@chromium.org
That error could be caused by sergeyberezin deleting all the logs in issue 668306?
Components: -Infra Infra>Client>Perf
This still a problem?
Status: WontFix (was: Available)
I think we ended up deciding this was just an artifact of an old master with too much to do. We can resolve for now and reopen if needed.

Sign in to add a comment