New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 734887 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Why does reusing ssh master connection take so long?

Project Member Reported by ihf@chromium.org, Jun 20 2017

Issue description

We discussed a while ago why in issue 726481 reusing an established ssh connection (even with master) takes 50...300ms (and if there are ways to make it faster). Grant suggested stracing the calls.
 
Status: Started (was: Untriaged)
Cc: ihf@chromium.org
We spend 5ms opening and dynamically linking to .so files. Can we link SSH statically on the autotest servers (not the DUT)?

I've appended this to my ~/.ssh/config:
Host *
   ControlMaster auto
# ihf   ControlPath %C
# https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys
   ControlPath ~/.ssh/multiplex/%r@%h:%p
   ControlPersist 10

Relevant snippet of strace output from (inside the chroot):
    strace -tt -T -C -o ~/ssh-perf/ssh-v7.3-strace-arm-master-01.txt ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsb-release


13:33:16.941051 execve("/usr/bin/ssh", ["ssh", "-2", "-p", "22", "-x", "root@100.107.2.127", "cat", "/etc/lsb-release"], [/* 44 vars */]) = 0 <0.000444>
13:33:16.941804 brk(NULL)               = 0x5566c5878000 <0.000017>
13:33:16.941903 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0269000000 <0.000019>
13:33:16.941970 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000021>
13:33:16.942051 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000022>
13:33:16.942110 fstat(3, {st_mode=S_IFREG|0644, st_size=51608, ...}) = 0 <0.000072>
13:33:16.942222 mmap(NULL, 51608, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0268ff3000 <0.000017>
13:33:16.942274 close(3)                = 0 <0.000015>
13:33:16.942330 open("/usr/lib64/libcrypto.so.1.0.0", O_RDONLY|O_CLOEXEC) = 3 <0.000021>
13:33:16.942385 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0q\6\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.942436 fstat(3, {st_mode=S_IFREG|0555, st_size=2273568, ...}) = 0 <0.000015>
13:33:16.942494 mmap(NULL, 4383704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f02689af000 <0.000019>
13:33:16.942543 mprotect(0x7f0268bb5000, 2097152, PROT_NONE) = 0 <0.000023>
13:33:16.942593 mmap(0x7f0268db5000, 151552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x206000) = 0x7f0268db5000 <0.000021>
13:33:16.942651 mmap(0x7f0268dda000, 13272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0268dda000 <0.000017>
13:33:16.942709 close(3)                = 0 <0.000014>
13:33:16.942769 open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000021>
13:33:16.942820 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\16\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.942869 fstat(3, {st_mode=S_IFREG|0755, st_size=14200, ...}) = 0 <0.000014>
13:33:16.942914 mmap(NULL, 2109584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f02687ab000 <0.000021>
13:33:16.942963 mprotect(0x7f02687ad000, 2097152, PROT_NONE) = 0 <0.000020>
13:33:16.943009 mmap(0x7f02689ad000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f02689ad000 <0.000017>
13:33:16.943068 close(3)                = 0 <0.000014>
13:33:16.943116 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0268ff2000 <0.000016>
13:33:16.943164 open("/lib64/libz.so.1", O_RDONLY|O_CLOEXEC) = 3 <0.000020>
13:33:16.943216 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 $\0\0\0\0\0\0"..., 832) = 832 <0.000015>
13:33:16.943262 fstat(3, {st_mode=S_IFREG|0755, st_size=96224, ...}) = 0 <0.000014>
13:33:16.943307 mmap(NULL, 2191376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0268593000 <0.000019>
13:33:16.943354 mprotect(0x7f02685aa000, 2093056, PROT_NONE) = 0 <0.000019>
13:33:16.943400 mmap(0x7f02687a9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f02687a9000 <0.000018>
13:33:16.943458 close(3)                = 0 <0.000014>
13:33:16.943511 open("/lib64/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000019>
13:33:16.943560 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2409\0\0\0\0\0\0"..., 832) = 832 <0.000018>
13:33:16.943609 fstat(3, {st_mode=S_IFREG|0755, st_size=88192, ...}) = 0 <0.000015>
13:33:16.943654 mmap(NULL, 2193736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f026837b000 <0.000018>
13:33:16.943700 mprotect(0x7f026838f000, 2097152, PROT_NONE) = 0 <0.000019>
13:33:16.943746 mmap(0x7f026858f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f026858f000 <0.000018>
13:33:16.943799 mmap(0x7f0268591000, 6472, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0268591000 <0.000016>
13:33:16.943853 close(3)                = 0 <0.000017>
13:33:16.943914 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 <0.000024>
13:33:16.943967 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20a\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.944015 fstat(3, {st_mode=S_IFREG|0755, st_size=133472, ...}) = 0 <0.000015>
13:33:16.944075 mmap(NULL, 2208296, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f026815f000 <0.000024>
13:33:16.944129 mprotect(0x7f0268176000, 2093056, PROT_NONE) = 0 <0.000020>
13:33:16.944176 mmap(0x7f0268375000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f0268375000 <0.000018>
13:33:16.944230 mmap(0x7f0268377000, 12840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0268377000 <0.000017>
13:33:16.944284 close(3)                = 0 <0.000014>
13:33:16.944337 open("/usr/lib64/libgssapi_krb5.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000033>
13:33:16.944400 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \310\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.944447 fstat(3, {st_mode=S_IFREG|0755, st_size=303144, ...}) = 0 <0.000015>
13:33:16.944491 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0268ff1000 <0.000017>
13:33:16.944547 mmap(NULL, 2398976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0267f15000 <0.000018>
13:33:16.944593 mprotect(0x7f0267f5c000, 2097152, PROT_NONE) = 0 <0.000018>
13:33:16.944637 mmap(0x7f026815c000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x47000) = 0x7f026815c000 <0.000017>
13:33:16.944695 close(3)                = 0 <0.000014>
13:33:16.944747 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000019>
13:33:16.944796 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\10\2\0\0\0\0\0"..., 832) = 832 <0.000015>
13:33:16.944841 fstat(3, {st_mode=S_IFREG|0755, st_size=1726096, ...}) = 0 <0.000014>
13:33:16.944892 mmap(NULL, 3833048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0267b6d000 <0.000017>
13:33:16.944937 mprotect(0x7f0267d0a000, 2097152, PROT_NONE) = 0 <0.000017>
13:33:16.944981 mmap(0x7f0267f0a000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7f0267f0a000 <0.000018>
13:33:16.945033 mmap(0x7f0267f11000, 15576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0267f11000 <0.000015>
13:33:16.945084 close(3)                = 0 <0.000013>
13:33:16.945151 open("/usr/lib64/libkrb5.so.3", O_RDONLY|O_CLOEXEC) = 3 <0.000020>
13:33:16.945201 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@L\2\0\0\0\0\0"..., 832) = 832 <0.000018>
13:33:16.945248 fstat(3, {st_mode=S_IFREG|0755, st_size=858456, ...}) = 0 <0.000014>
13:33:16.945293 mmap(NULL, 2954432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f026789b000 <0.000018>
13:33:16.945339 mprotect(0x7f026795d000, 2093056, PROT_NONE) = 0 <0.000018>
13:33:16.945383 mmap(0x7f0267b5c000, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xc1000) = 0x7f0267b5c000 <0.000017>
13:33:16.945444 close(3)                = 0 <0.000013>
13:33:16.945496 open("/usr/lib64/libk5crypto.so.3", O_RDONLY|O_CLOEXEC) = 3 <0.000022>
13:33:16.945546 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`G\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.945593 fstat(3, {st_mode=S_IFREG|0755, st_size=206760, ...}) = 0 <0.000015>
13:33:16.945638 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0268ff0000 <0.000016>
13:33:16.945686 mmap(NULL, 2305880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0267668000 <0.000020>
13:33:16.945733 mprotect(0x7f0267697000, 2097152, PROT_NONE) = 0 <0.000016>
13:33:16.945776 mmap(0x7f0267897000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2f000) = 0x7f0267897000 <0.000017>
13:33:16.945833 close(3)                = 0 <0.000014>
13:33:16.945886 open("/lib64/libcom_err.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000019>
13:33:16.945935 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\25\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.945982 fstat(3, {st_mode=S_IFREG|0755, st_size=14120, ...}) = 0 <0.000014>
13:33:16.946028 mmap(NULL, 2109552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0267464000 <0.000020>
13:33:16.946074 mprotect(0x7f0267467000, 2093056, PROT_NONE) = 0 <0.000019>
13:33:16.946119 mmap(0x7f0267666000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f0267666000 <0.000020>
13:33:16.946206 close(3)                = 0 <0.000014>
13:33:16.946256 open("/usr/lib64/libkrb5support.so.0", O_RDONLY|O_CLOEXEC) = 3 <0.000021>
13:33:16.946308 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200/\0\0\0\0\0\0"..., 832) = 832 <0.000017>
13:33:16.946357 fstat(3, {st_mode=S_IFREG|0755, st_size=47192, ...}) = 0 <0.000016>
13:33:16.946404 mmap(NULL, 2142856, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0267258000 <0.000019>
13:33:16.946451 mprotect(0x7f0267263000, 2093056, PROT_NONE) = 0 <0.000018>
13:33:16.946496 mmap(0x7f0267462000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa000) = 0x7f0267462000 <0.000016>
13:33:16.946552 close(3)                = 0 <0.000013>

Comment 3 by ihf@chromium.org, Jun 20 2017

Sure, we can change ssh to something custom.

How long does your repro ssh command take, in other words is it 5ms out of 50ms with your current setup?
After getting rid of "Identity" lines that didn't exist in my chroot, I'm now getting:

13:48:35.361461 execve("/usr/bin/ssh", ["ssh", "-2", "-p", "22", "-x", "root@100.107.2.127", "cat", "/etc/lsb-release"], [/* 44 vars */]) = 0 <0.000426>
...
13:48:35.418857 +++ exited with 0 +++

57.4ms on this run...but as low as 40ms on others. So roughly in the same ball park that you previously shared with me for "local" runs.

Hrm...seems to be slower with strace on my z840:

$ for i in $(seq 0 9) ; do  time ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsb-release ; done 2>&1 | fgrep real
real    0m0.041s
real    0m0.032s
real    0m0.021s
real    0m0.022s
real    0m0.018s
real    0m0.025s
real    0m0.022s
real    0m0.024s
real    0m0.023s
real    0m0.023s

Let me try to locally to build/test a statically linked ssh client and see how much that helps.

If statically linked shaves 5ms off the above numbers, that would still be significant for server operations where typical time might be 100ms....but then it tells me I need to be measuring on one of the autotest servers directly with one of it's DUTs.

Ilja, Can you collect/attach the strace output from the same command I posted in comment #2 (and the same options in your .ssh/configu file)?

Comment 6 by ihf@chromium.org, Jun 20 2017

Looks like more time in linking. And it opens the .ssh/config, which isn't the same as used for the lab. Might be more complicated to parse?
chromeos-server98.txt
34.6 KB View Download

Comment 7 by ihf@chromium.org, Jun 20 2017

Here is one that took 362ms *shudder*. Load of the server went up, I guess IO down?
chromeos-server98-362ms.txt
34.6 KB View Download
The one that took 362ms has just about everything slower.

fgrep mmap chromeos-server98-362ms.txt | sed -e 's/.*<//' -e 's/>$//' | awk 'NF==1 { t+=$1}
END { print t }'

The sum of all mmap calls is : 0.016151

That's 16ms just in mmap! (mostly for shared libraries). That's not even opening the .so and reading them in.

I've attached two more logs. total time spent in mmap on those is less than 1ms:

Idle z840:   0.000896
"Busy" z840: 0.000501

[Uh..don't ask me why busy is faster than "idle" - I'll assume some caching was involved. I verified 40+ emerge_parallel things were running...but not much gcc or clang - so maybe only "busy" looking up cached objects. :/]


I see this as more evidence statically linking SSH should help on a busy machine since 
In any case, this might be telling us more about how busy the autotest servers are rather than any particular issues with SSH binary.

using the above "one-liner" to total time for various syscalls
Call         362ms,txt    -03.txt    -05.txt
poll         0.071142    0.036904    0.027350
fstat        0.022043    0.000442    0.000230
mmap         0.016151    0.000896    0.000501
close        0.010891    0.000485    0.000249
rt_sigaction 0.010513    0.000158    0.000102
open         0.010139    0.000631    0.000351
read         0.008288    0.000652    0.000353
mprotect     0.008101    0.000637    0.000368
futex        0.004051    none        none
statfs       0.003798    none        none
munmap       0.002511    0.000125    0.000064
access       0.002021    0.000027    0.000014

All remaining calls: 0.019265  0.000982  0.000485

I should try to better understand what the poll call is waiting for to understand what the bottleneck is there.

We also spend 9ms making sys calls that return ENOENT - implying that we should specify enough parameters that ssh doesn't have to guess which files to open:
fgrep 'ENOENT' chromeos-server98-362ms.txt | sed -e 's/.*<//' -e 's/>$//' | awk 'NF==1 { t+=$1}
END { print t }'
0.009123

I don't know if that's reasonable or possible - but the equivalent time on z840 was < 27 microseconds ("us" unit!).
So this matters more on a busy autotest server than it does on my desktop.


My next step will be look at "gap" between syscalls - ie how long did ssh wait for something to happen and start looking at the longest idle periods between syscalls to see what was supposed to be happening.

ssh-v7.3-strace-arm-master-03.txt
24.7 KB View Download
ssh-v7.3-strace-arm-master-05.txt
24.6 KB View Download
Cc: marc...@chromium.org
I see two main slow things in these traces:

1) The time taken for dynamic linking, things like this which account for ~70ms in trace from #7:

14:52:23.114252 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000074>
14:52:23.114529 open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 <0.000078>
14:52:23.114738 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0[\0\0\0\0\0\0"..., 832) = 832 <0.000067>
14:52:23.114976 fstat(3, {st_mode=S_IFREG|0644, st_size=134296, ...}) = 0 <0.000206>
14:52:23.115345 mmap(NULL, 2238192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fd8e9575000 <0.000089>
14:52:23.115574 mprotect(0x7fd8e9595000, 2093056, PROT_NONE) = 0 <0.000206>
14:52:23.115986 mmap(0x7fd8e9794000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1f000) = 0x7fd8e9794000 <0.000117>
14:52:23.116284 mmap(0x7fd8e9796000, 5872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fd8e9796000 <0.000081>
14:52:23.116540 close(3)                = 0 <0.000061>

We could use prelink to fix this, add ssh to prelink.conf and benchmark again with that.

2) The time taken by DNS requests, things like this which account for ~50ms in trace from #7:

14:52:23.308981 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.255.1")}, 16) = 0 <0.000097>
14:52:23.309306 gettimeofday({1497995543, 309393}, NULL) = 0 <0.000081>
14:52:23.309572 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}]) <0.000075>
14:52:23.309859 sendmmsg(3, {{{msg_name(0)=NULL, msg_iov(1)=[{"Y-\1\0\0\1\0\0\0\0\0\0\33chromeos6-row2-rack"..., 66}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE|MSG_CTRUNC|MSG_EOR}, 66}, {{msg_name(0)=NULL, msg_iov(1)=[{"\213\261\1\0\0\1\0\0\0\0\0\0\33chromeos6-row2-rack"..., 66}], msg_controllen=0, msg_flags=MSG_OOB|MSG_PEEK|MSG_DONTROUTE|MSG_TRUNC|MSG_SYN|MSG_CONFIRM|MSG_ERRQUEUE|MSG_MORE|MSG_WAITFORONE|MSG_CMSG_CLOEXEC|0x88920010}, 66}}, 2, MSG_NOSIGNAL) = 2 <0.000155>
14:52:23.310261 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}]) <0.041237>
14:52:23.351761 ioctl(3, FIONREAD, [93]) = 0 <0.000087>
14:52:23.352563 recvfrom(3, "Y-\201\200\0\1\0\1\0\0\0\1\33chromeos6-row2-rack"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.255.1")}, [16]) = 93 <0.000098>
14:52:23.353010 gettimeofday({1497995543, 353102}, NULL) = 0 <0.000072>
14:52:23.353299 poll([{fd=3, events=POLLIN}], 1, 4956) = 1 ([{fd=3, revents=POLLIN}]) <0.000081>
14:52:23.353611 ioctl(3, FIONREAD, [165]) = 0 <0.000340>
14:52:23.354141 brk(0x7fd8ea206000)     = 0x7fd8ea206000 <0.000095>
14:52:23.354494 recvfrom(3, "\213\261\201\200\0\1\0\0\0\1\0\1\33chromeos6-row2-rack"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("172.16.255.1")}, [16]) = 165 <0.000134>
14:52:23.354877 close(3)                = 0 <0.000140>

Here also there is a simple solution: we could use a local DNS cache on our servers. For example unbound is a pretty fast DNS server and has caching abilities.

Comment 10 by ihf@chromium.org, Jun 21 2017

After installing and configuring unbound the time for nslookup went from 76...156ms down to 27...40ms.

The server isn't loaded right now (load < 3) and hence the linking seems cheaper, but the ssh command runs about 150ms now. New strace attached.

time strace -tt -T -C -o chromeos-server98-unbound.txt ssh -2 -p 22 -x root@chromeos6-row2-rack23-host8.cros cat /etc/lsb-release
real	0m0.155s
user	0m0.025s
sys	0m0.115s

Notice the final poll before reading result from DUT is usually 30-40ms. This is time burned on the .ssh server on the DUT. (time cat /etc/lsb-release is about 4ms and ping chromeos6-row2-rack23-host8.cros about 1ms).

I am really not convinced that using ssh is the right thing here (well, I never was), but I guess we are stuck here?
chromeos-server98-unbound.txt
34.5 KB View Download

Comment 11 by ihf@chromium.org, Jun 21 2017

unbound-control stats
thread0.num.queries=224647
thread0.num.cachehits=220975
thread0.num.cachemiss=3672
time.up=2174.736348

At 30ms a miss we saved 2h in a good half an hour. 4 cores?

Comment 12 by ihf@chromium.org, Jun 21 2017

That said I installed prelink and preload and as far as I can tell they are not doing anything for ssh.

Comment 13 by ihf@chromium.org, Jun 21 2017

Cc: kenobi@chromium.org
I was able to build openssh statically inside the chroot with this line:
     USE="-pie -kerberos static" sudo emerge openssh

and this uses ~80 fewer syscalls:
file                              Percent   (bogus)    Syscalls  Errors
ssh-v7.3-strace-arm-master-05.txt:100.00    0.000010      256      3      total
ssh-v7.3-strace-arm-master-06.txt:100.00    0.000000      178      2      total

mmap time dropped by ~10%:  0.000501 vs 0.000443
mprotect dropped by  more:  0.000368 vs 0.000257


Hrm...still have some open("*.so.*") calls. :/
10:43:56.458270 open("/dev/null", O_RDWR) = 3 <0.000030>
10:43:56.458478 open("/proc/17726/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 <0.000026>
10:43:56.458925 open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3 <0.000023>
10:43:56.459219 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000021>
10:43:56.459445 open("/lib64/libnss_compat.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000024>
10:43:56.459860 open("/lib64/libnsl.so.1", O_RDONLY|O_CLOEXEC) = 3 <0.000024>
10:43:56.460313 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000023>
10:43:56.460794 open("/lib64/ld-linux-x86-64.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000023>
10:43:56.461816 open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3 <0.000024>
10:43:56.462111 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000037>
10:43:56.462366 open("/lib64/libnss_nis.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000023>
10:43:56.462779 open("/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000023>
10:43:56.463422 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 <0.000022>
10:43:56.463867 open("/usr/lib64/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000024>
10:43:56.464449 open("/etc/ssl/openssl.cnf", O_RDONLY) = 3 <0.000033>
10:43:56.467647 open("/home/grundler/.ssh/config", O_RDONLY) = 3 <0.000030>
10:43:56.468073 open("/etc/ssh/ssh_config", O_RDONLY) = 3 <0.000023>
10:43:56.468432 open("/dev/urandom", O_RDONLY|O_NOCTTY|O_NONBLOCK) = 3 <0.000023>
10:43:56.468899 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 <0.000021>
  0.00    0.000000           0        19           open

Even though ssh claims to be statically linked:
$ file /usr/bin/ssh
/usr/bin/ssh: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=b4daa883fec9d464b70da89eb7d6e50a2659c7f8, stripped

And I just realized SSH opens /etc/nsswitch.conf twice - maybe a bug in libnss?
IDK...just unpacked the source to take a look.
ssh-v7.3-strace-arm-master-06.txt
17.2 KB View Download

Comment 15 by ihf@chromium.org, Jun 21 2017

Cc: pwang@chromium.org
As a warning most of the servers don't use the chroot ssh. They use Goobuntu and Ubuntu (inside of lxc) ssh. But the builders use chroot ssh I think. So very complicated, as we have to change at least 3 places (plus DUTs/moblab).

Considering the lower bound of 40ms I think we still need to get rid of as many calls a possible by refactoring autotest. But I guess it is important to make ssh as fast and reliable as possible as well (two prongs).

Comment 16 by ihf@chromium.org, Jun 21 2017

Some numbers showing the impact of reducing test run time by enabling the DNS cache. Minnie and Caroline are on chromeos-server98 which I modified.

before using unbound, minnie build 1284
https://viceroy.corp.google.com/chromeos/build_details?build_config=veyron_minnie-chrome-pfq&amp;build_number=1284
https://viceroy.corp.google.com/chromeos/suite_details?build_id=1598496

before using unbound, minnie build 1301
https://viceroy.corp.google.com/chromeos/build_details?build_config=veyron_minnie-chrome-pfq&amp;build_number=1301
https://viceroy.corp.google.com/chromeos/suite_details?build_id=1598496

after using unbound, minnie build 1305
https://viceroy.corp.google.com/chromeos/build_details?build_id=1609703
https://viceroy.corp.google.com/chromeos/suite_details?build_id=1609703


                                        1284   1301   1305   delta

security_NetworkListeners               138s   142s   118s     20s
cheets_KeyboardTest                     176s   165s   141s     24s
cheets_CTS_N.CtsOpenGlPerf2TestCases    484s   494s   461s     23s
cheets_StartAndroid.stress              681s   683s   650s     31s

The delta of 20-30s is conservative subtracting from the faster of the two pre-DNS caching runs. But removing ssh should make the delta 3-4 times the size of DNS cache alone.

---

And the cache keeps running hard (resets itself once in a while).

unbound-control stats
thread0.num.queries=8152658
thread0.num.cachehits=8035614
thread0.num.cachemiss=117044
time.up=68416.226427
time.elapsed=60536.768595

The updated about DNS caching is excellent! :)

More fun with USE.  TL;DR: Dropping features via USE has some small (~10%-15%) effect. More interesting is why is the first call (of 10 in a row) almost 2x more expensive. Some cache is throwing things out rather quickly. :/


USE="-sctp -selinux -pie -kerberos static" sudo emerge openssh
$ for i in $(seq 0 9) ; do  time ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsb-release ; done 2>&1 | fgrep real | sed 's/real //' | while read s ; do printf "$s " ; done
0m0.037s 0m0.035s 0m0.027s 0m0.020s 0m0.020s 0m0.019s 0m0.021s 0m0.019s 0m0.020s 0m0.019s

USE="-pam -kerberos -ldap -sctp -selinux -pie -kerberos static" sudo emerge openssh
$ for i in $(seq 0 9) ; do  time ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsgrep real | sed 's/real//' | while read s ; do printf "$s " ; done
0m0.043s 0m0.039s 0m0.033s 0m0.022s 0m0.023s 0m0.022s 0m0.024s 0m0.023s 0m0.023s 0m0.023s
0m0.041s 0m0.037s 0m0.027s 0m0.020s 0m0.023s 0m0.020s 0m0.019s 0m0.027s 0m0.025s 0m0.024s
0m0.028s 0m0.027s 0m0.029s 0m0.030s 0m0.028s 0m0.024s 0m0.024s 0m0.025s 0m0.023s 0m0.024s
0m0.039s 0m0.033s 0m0.035s 0m0.023s 0m0.026s 0m0.024s 0m0.023s 0m0.020s 0m0.023s 0m0.020s 


USE="-libedit -ldns -pam -kerberos -ldap -sctp -selinux -pie -kerberos static" sudo emerge openssh
$ for i in $(seq 0 9) ; do  time ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsb-release ; done 2>&1 | fgrep real | sed 's/real//' | while read s ; do printf "$s " ; done
0m0.067s 0m0.034s 0m0.039s 0m0.024s 0m0.020s 0m0.019s 0m0.018s 0m0.020s 0m0.020s 0m0.020s 
0m0.043s 0m0.044s 0m0.029s 0m0.022s 0m0.018s 0m0.030s 0m0.022s 0m0.022s 0m0.024s 0m0.025s
0m0.038s 0m0.028s 0m0.029s 0m0.030s 0m0.033s 0m0.026s 0m0.026s 0m0.029s 0m0.031s 0m0.024s
0m0.038s 0m0.027s 0m0.037s 0m0.031s 0m0.021s 0m0.020s 0m0.022s 0m0.017s 0m0.019s 0m0.023s
0m0.034s 0m0.024s 0m0.020s 0m0.021s 0m0.021s 0m0.022s 0m0.019s 0m0.019s 0m0.019s 0m0.019s

USE="-pie -kerberos static" sudo emerge openssh
 $ for i in $(seq 0 9) ; do  time ssh -2 -p 22 -x root@100.107.2.127 cat /etc/lsb-release ; done 2>&1 | fgrep real | sed 's/real//' | while read s ; do printf "$s " ; done
0m0.076s 0m0.034s 0m0.038s 0m0.027s 0m0.023s 0m0.025s 0m0.023s 0m0.025s 0m0.024s 0m0.026s
0m0.042s 0m0.041s 0m0.039s 0m0.029s 0m0.027s 0m0.027s 0m0.028s 0m0.028s 0m0.024s 0m0.022s
0m0.041s 0m0.048s 0m0.037s 0m0.022s 0m0.022s 0m0.023s 0m0.021s 0m0.030s 0m0.026s 0m0.027s
0m0.026s 0m0.063s 0m0.030s 0m0.020s 0m0.019s 0m0.034s 0m0.028s 0m0.028s 0m0.022s 0m0.021s


Comment 18 by ihf@chromium.org, Jun 22 2017

Grant asked for the effects of preload and precache after running longer. Nothing much changed from yesterday. Attaching logs for completeness.
chromeos-server98-unbound-prelink-preload.txt
34.4 KB View Download
chromeos-server98-unbound-prelink-preload-load23.txt
34.4 KB View Download
Cc: llozano@chromium.org
The first log has 17 instances of:
   18:35:14.017854 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000106>

What's sad is this is a known issue:
   https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/558571

"This isn't a bug, read man ld.so.
...
/etc/ld.so.nohwcap When this file is present the dynamic linker will load the non-optimized version of a library, even if the CPU supports the optimized version."

I'm still baffled why anyone would check the presence of a file instead of presence of an environment variable. This is so wrong from a performance PoV one may as well not use "optimized libraries" for most common tools. *sigh*

Fortunately, someone seems to have been "enlightened" and this behavior seems to have been removed from Debian 8 according to:
    https://unix.stackexchange.com/questions/283853/file-etc-ld-so-nohwcap-is-missing-in-debian-7

Luis, is this something the tool chain team could "easily" fix for Android and ChromeOS bin-utils?
Cc: yunlian@chromium.org
interesting... 

Yunlian, can you please see if there is anything in glibc/ld.so that we can use to avoid this behavior?

I will try to fix this on glibc side.
yunlian, I've created  http://crbug.com/736151  to track the "/etc/ld.so.nohwcap" usage. Thanks! :)
Labels: akeshet-pending-downgrade
ChromeOS Infra P1 Bugscrub.

P1 Bugs in this component should be important enough to get weekly status updates.

Is this already fixed?  -> Fixed
Is this no longer relevant? -> Archived or WontFix
Is this not a P1, based on go/chromeos-infra-bug-slo rubric? -> lower priority.
Is this a Feature Request rather than a bug? Type -> Feature
Is this missing important information or scope needed to decide how to proceed? -> Ask question on bug, possibly reassign.
Does this bug have the wrong owner? -> reassign.

Bugs that remain in this state next week will be downgraded to P2.
Labels: -akeshet-pending-downgrade Pri-2
ChromeOS Infra P1 Bugscrub.

Issue untouched in a week after previous message. Downgrading to P2.
Owner: ----
Status: Available (was: Started)
I've invested as much time as a can in this for now. I think we can close this as "fixed" (by changes made by IHF) since those at least addressed part of the issue. Further work will need to look at the performance on the DUT (which is running sshd) to gain any further advantages.
Hi, this bug has not been updated recently. Please acknowledge the bug and provide status within two weeks (6/22/2018), or the bug will be archived. Thank you.
Labels: Enterprise-Triaged

Sign in to add a comment