New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 704433 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: Mar 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocking:
issue 697730



Sign in to add a comment

high afe rpc QPS from certain servers

Project Member Reported by akes...@chromium.org, Mar 23 2017

Issue description


chromeos-test@chromeos-server2:/var/log/apache2$ tail -f other_vhosts_access.log | grep 172.25.71.53
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:33 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:33 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"
chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:35 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7"

chromeos-test@chromeos-server2:/var/log/apache2$ nslookup 172.25.71.53
Server:         127.0.1.1
Address:        127.0.1.1#53

Non-authoritative answer:
53.71.25.172.in-addr.arpa       name = cros-autotest-shard2.cbf.corp.google.com.


That's a drone. Looks like it's emitting at least 10 QPS of rpcs at a steady page. Why? That many jobs making that many rpcs back to master?


Also hovering around 5-10 QPS: chromeos-server22.cbf.corp.google.com, which is a swarming proxy bot. I guess these are all the run_suite rpcs flying out of it?

Another 10ish QPS from chromeos-server6.mtv.corp.google.com, also a drone

20+ QPS from cros-autotest-shard4.hot.corp.google.com also a drone

That's just scratching the surface. Also, a lot of these are HTTP POST requests, which suggests they are more than simply polling for tko results (wouldn't that be a GET)?

So, what are these servers POSTing about all the time?
 

Comment 1 by dshi@chromium.org, Mar 23 2017

For the last 24 hours, 172.25.71.53 made 1319005 RPC to cautotest

For the 28 seconds between 3/22/2017, 10:59:31 PM and 3/22/2017, 10:59:59 PM, it made 482 calls, of which, 432 calls are get_jobs. From the call details, it seems to be suite job is polling to see if all child jobs are finished. We might consider adding some sleep in suite's wait loop.

You can do more filtering in metadb:
http://172.25.66.62:9800/_plugin/elastic-hammer/
{ 
"query": {"bool": {
   "must": [
   {"term": {"_type": "rpc"}},
   {"term": {"method": "get_jobs"}},
   {"term": {"remote_ip": "172.25.71.53"}},
   {"range": {"time_recorded": 
       {"gte": 1490248771 ,
        "lte": 1490248799 }}}
    ]}},
 "size": 100,
 "sort": [{"value": "desc"}]}

Project Member

Comment 2 by bugdroid1@chromium.org, Mar 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/31fc8f424e41daf841a0ec754e1b34a329337eff

commit 31fc8f424e41daf841a0ec754e1b34a329337eff
Author: Aviv Keshet <akeshet@chromium.org>
Date: Sat Mar 25 02:38:09 2017

[autotest] increase dynamic_suite polling interval to 30s, add fuzz

BUG= chromium:704433 
TEST=job_status_unittest

Change-Id: I5b2c5571b1d663fe51d870a19a0912fccf0d3bd3
Reviewed-on: https://chromium-review.googlesource.com/457790
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status.py
[modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status_unittest.py

Project Member

Comment 3 by bugdroid1@chromium.org, Mar 25 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/31fc8f424e41daf841a0ec754e1b34a329337eff

commit 31fc8f424e41daf841a0ec754e1b34a329337eff
Author: Aviv Keshet <akeshet@chromium.org>
Date: Sat Mar 25 02:38:09 2017

[autotest] increase dynamic_suite polling interval to 30s, add fuzz

BUG= chromium:704433 
TEST=job_status_unittest

Change-Id: I5b2c5571b1d663fe51d870a19a0912fccf0d3bd3
Reviewed-on: https://chromium-review.googlesource.com/457790
Commit-Ready: Aviv Keshet <akeshet@chromium.org>
Tested-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status.py
[modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status_unittest.py

Comment 4 by aut...@google.com, Mar 28 2017

Cc: akes...@chromium.org
Fixed now?
Blocking: 697730
Not certain. Added some moderate mitigation, but likley worth keeping an eye on.

Comment 6 by aut...@google.com, Apr 4 2017

Labels: -current-issue
Status: Unconfirmed (was: Untriaged)
Status: Archived (was: Unconfirmed)

Sign in to add a comment