New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 724358 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner: ----
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

swarming proxy has big occasional spikes of abort_suite jobs running

Project Member Reported by akes...@chromium.org, May 19 2017

Issue description

chromeos-test@chromeos-server22:/usr/local/autotest$ ps aux | grep abort_suite | wc -l
85
chromeos-test@chromeos-server22:/usr/local/autotest$ ps aux | grep run_suite | wc -l
40



 
This was true for a few minutes, but now has settled down.

That suggests to me that there was some huge source of abort_suite golo-proxied commands (probably release builders starting and trying to abort prior suites on that branch en masse?), and thus this sudden spike of abort_suite jobs can eat up a lot of golo proxy "slots".

We need metrics / tracking around how many slots are being used, and for what.
Summary: swarming proxy has big occasional spikes of abort_suite jobs running (was: swarming proxy consistently has >80 abort_suite jobs running)
Relatedly, we have sysmon metrics about the server that is acting as the golo proxy. The spikes during which we receive a slew of jobs are clearly visible, and seem to cause load spikes that last around 20 minutes.

https://viceroy.corp.google.com/chromeos/machines?hostname=chromeos-server22&duration=1d&refresh=-1

Comment 4 by aut...@google.com, May 23 2017

What are the next steps here?

Comment 5 by aut...@google.com, May 30 2017

Status: Unconfirmed (was: Untriaged)
Mergedinto: 726207
Status: Duplicate (was: Unconfirmed)
I'm working on improved metrics here, which will let us track this latency.

Sign in to add a comment