high afe rpc QPS from certain servers |
||||
Issue descriptionchromeos-test@chromeos-server2:/var/log/apache2$ tail -f other_vhosts_access.log | grep 172.25.71.53 chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:33 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:33 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:34 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-server2.mtv.corp.google.com:80 172.25.71.53 - - [22/Mar/2017:22:59:35 -0700] "POST /afe/server/noauth/rpc/ HTTP/1.1" 200 354 "-" "Python-urllib/2.7" chromeos-test@chromeos-server2:/var/log/apache2$ nslookup 172.25.71.53 Server: 127.0.1.1 Address: 127.0.1.1#53 Non-authoritative answer: 53.71.25.172.in-addr.arpa name = cros-autotest-shard2.cbf.corp.google.com. That's a drone. Looks like it's emitting at least 10 QPS of rpcs at a steady page. Why? That many jobs making that many rpcs back to master? Also hovering around 5-10 QPS: chromeos-server22.cbf.corp.google.com, which is a swarming proxy bot. I guess these are all the run_suite rpcs flying out of it? Another 10ish QPS from chromeos-server6.mtv.corp.google.com, also a drone 20+ QPS from cros-autotest-shard4.hot.corp.google.com also a drone That's just scratching the surface. Also, a lot of these are HTTP POST requests, which suggests they are more than simply polling for tko results (wouldn't that be a GET)? So, what are these servers POSTing about all the time?
,
Mar 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/31fc8f424e41daf841a0ec754e1b34a329337eff commit 31fc8f424e41daf841a0ec754e1b34a329337eff Author: Aviv Keshet <akeshet@chromium.org> Date: Sat Mar 25 02:38:09 2017 [autotest] increase dynamic_suite polling interval to 30s, add fuzz BUG= chromium:704433 TEST=job_status_unittest Change-Id: I5b2c5571b1d663fe51d870a19a0912fccf0d3bd3 Reviewed-on: https://chromium-review.googlesource.com/457790 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status.py [modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status_unittest.py
,
Mar 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/31fc8f424e41daf841a0ec754e1b34a329337eff commit 31fc8f424e41daf841a0ec754e1b34a329337eff Author: Aviv Keshet <akeshet@chromium.org> Date: Sat Mar 25 02:38:09 2017 [autotest] increase dynamic_suite polling interval to 30s, add fuzz BUG= chromium:704433 TEST=job_status_unittest Change-Id: I5b2c5571b1d663fe51d870a19a0912fccf0d3bd3 Reviewed-on: https://chromium-review.googlesource.com/457790 Commit-Ready: Aviv Keshet <akeshet@chromium.org> Tested-by: Aviv Keshet <akeshet@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status.py [modify] https://crrev.com/31fc8f424e41daf841a0ec754e1b34a329337eff/server/cros/dynamic_suite/job_status_unittest.py
,
Mar 28 2017
Fixed now?
,
Mar 29 2017
,
Apr 4 2017
,
Mar 31 2018
|
||||
►
Sign in to add a comment |
||||
Comment 1 by dshi@chromium.org
, Mar 23 2017For the last 24 hours, 172.25.71.53 made 1319005 RPC to cautotest For the 28 seconds between 3/22/2017, 10:59:31 PM and 3/22/2017, 10:59:59 PM, it made 482 calls, of which, 432 calls are get_jobs. From the call details, it seems to be suite job is polling to see if all child jobs are finished. We might consider adding some sleep in suite's wait loop. You can do more filtering in metadb: http://172.25.66.62:9800/_plugin/elastic-hammer/ { "query": {"bool": { "must": [ {"term": {"_type": "rpc"}}, {"term": {"method": "get_jobs"}}, {"term": {"remote_ip": "172.25.71.53"}}, {"range": {"time_recorded": {"gte": 1490248771 , "lte": 1490248799 }}} ]}}, "size": 100, "sort": [{"value": "desc"}]}