New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 749157 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 0
Type: Bug



Sign in to add a comment

kill_slow_queries processes piling up on shard(s)

Project Member Reported by akes...@chromium.org, Jul 26 2017

Issue description

https://viceroy.corp.google.com/chromeos/machines?hostname=chromeos-server27&board=sentry&duration=8777686&mdb_role=chrome-infra&pool=managed%3Acts&refresh=-1&status=Running&topstreams=20#_VG_oYXH0N8z

chromeos-test@chromeos-server27:~$ ps aux | grep kill_slow_queries | wc
   1364   20581  203536


This is starting to push us into memory pressure on at least that shard, and possibly others.

 
Status: Assigned (was: Untriaged)
This issue is not confined to server27

chromeos-test@chromeos-server101:~$ ps aux | grep kill_slow_queries | wc
   2569   38839  384034
Cc: pho...@chromium.org
Owner: shuqianz@chromium.org
I check the log of the /var/log/kill_slow_queries.log, and find that besides the kill_slow_queries upstart job, there is an old kill_slow_queries cronjob running every second on the shards, which accounts for the 1000+ kill queries process. I will clean up the old cron jobs on all shards and keep track for this for the rest of the week.
chromeos-test@chromeos-server101:~$ sudo killall -r -9 kill_slow_queries

We still have lots of these piled up on other shards.

Comment 5 by pho...@chromium.org, Jul 26 2017

We used to run kill_slow_queries in cron, and now we loop infinitely thanks to https://chromium-review.googlesource.com/c/544336/. It makes sense that these processes are piling up.
I've ran a script to cleaned up all the cron jobs and killed the kill_slow_queries on the shard. But it seems not all of the processes get killed. I will keep tracking this. 
Status: Fixed (was: Assigned)
I've checked all the shards and database server. Now the number of kill_slow_queries processes is under 8 for all the servers. All the 8 processes is created by the upstart job. I will claim victory for this bug. 

Comment 8 by ihf@chromium.org, Jul 28 2017

Cc: ihf@chromium.org
Status: Assigned (was: Fixed)
Can you please check chromeos-server100.mtv? According to the graph below it is still leaking. I can't ssh into it.

Comment 9 by ihf@chromium.org, Jul 28 2017

Checking the boards running on the shard: one possibility is that chromeos-server100.mtv is actually overloaded right now.

Comment 10 by ihf@google.com, Jul 28 2017

I powercycled the server from portal as it did not respond for hours. It is back alive now.
Status: Fixed (was: Assigned)
There was only 5 kill_slow_queries processes running on chromeos-server100.mtv.  I think the leak is not caused by this bug. 
Mergedinto: 755193
Status: Duplicate (was: Fixed)
This is a real bug. See deduped bug for details.
Fix is in-flight.

Sign in to add a comment