install DNS cache on shards to speed up ssh and other networking tools |
|||||||
Issue descriptionFor context see issue 734887 and issue 726481. We can speed up tests on the CQ by 20-30s (or 5-15%) simply by installing a DNS cache. This should free up to one DUT out of the pool:cq (usually around 7). This is not the final fix for ssh being slow, just a general quick improvement for the servers (and should be easy using puppet). https://bugs.chromium.org/p/chromium/issues/detail?id=734887#c16 Time improvements: 1284 1301 1305 delta % security_NetworkListeners 138s 142s 118s 20s 14% cheets_KeyboardTest 176s 165s 141s 24s 14% cheets_CTS_N.CtsOpenGlPerf2TestCases 484s 494s 461s 23s 5% cheets_StartAndroid.stress 681s 683s 650s 31s 5% --- If you want to just follow chromeos-server98: apt-get install unbound Change config /etc/unbound/unbound.conf to something like forward-zone: name: "." forward-addr: 8.8.8.8 forward-addr: 8.8.4.4 forward-addr: 208.67.222.222 forward-addr: 208.67.220.220 To check it is working run unbound-control stats Also instead of about 40ms without the cache this should show 0ms or 1ms after repeated use dig chromeos6-row2-rack23-host8.cros | grep time ;; Query time: 0 msec
,
Jun 30 2017
The last time I added a DNS cache service I broke a lot of things spectacularly. Shards are Goobuntu machines, so our Puppet contends with Goobuntu's Puppet, and Goobuntu's Puppet deploys its own DNS cache/resolver service (dnsmasq I believe). What I'm saying is, this isn't quite as straightforward as it may seem.
,
Jun 30 2017
Fair enough - do you recall the old issue? We should keep monitoring chromeos-server98 then as it has been using the cache for the last week unbound-control stats thread0.num.queries=86602626 thread0.num.cachehits=85057603 thread0.num.cachemiss=1545023 time.up=838964.203310 time.elapsed=657699.519068
,
Jul 13 2017
The issue is simply that our Puppet and Goobuntu Puppet would deploy different configuration on top of each other. If you have a machine successfully using it, we should be able to replicate the same setup through Puppet. What DNS cache and configuration are you using?
,
Jul 19 2017
The cache is called "unbound" and the config is above. I just checked and it still works fine on chromeos-server98.mtv.
,
Jul 19 2017
Upgrading to P1, rationale: significant performance impact on class 1 service (CQ). +deputies for the week. If you have spare cycles, can you pick up adding a puppet-owned conf file as described in OP?
,
Jul 19 2017
+jrbarnette (#6 should have included him)
,
Jul 19 2017
I can pick this up.
,
Aug 18 2017
,
Aug 19 2017
The change landed as https://chrome-internal-review.googlesource.com/#/c/chromeos/chromeos-admin/+/416090/ Looks like the runtime of client test security_AccountsBaseline on reef-paladin went from 42-52s to 28-32s with this change. Good stuff. -- Details: Last slow run is https://viceroy.corp.google.com/chromeos/build_details?build_config=reef-paladin&build_number=3337 https://viceroy.corp.google.com/chromeos/suite_details?build_id=1765328 First fast run is https://viceroy.corp.google.com/chromeos/build_details?build_config=reef-paladin&build_number=3338 Start time Aug. 17, 2017, 9:05 p.m. https://viceroy.corp.google.com/chromeos/suite_details?build_id=1765579 More samples security_ModuleLocking 48s vs 35s graphics_Gbm 52s vs. 40s graphics_dEQP.bvt 54s vs. 38s In other words 20-25% speedup of client tests. |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by ayatane@chromium.org
, Jun 23 2017Owner: ayatane@chromium.org
Status: (was: Untriaged)