Failure running 'pwd' in vm.CrostiniStartEverything on octopus/grunt |
||
Issue descriptionThis is a new test failure that is showing up in 11578.0.0...however, it may have been introduced as early as 11559.0.0 because there was another problem causing failures between them. The failure is only occurring on grunt and octopus. The error message is: Failed to run pwd: exit status 1, Failure dumping container log: exit status 1 It *may* be from this CL: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/1403857/ The termina version did not change across these versions, so it won't be on that end. @smbarber, do you think that CL could be the cause?
,
Jan 14
Confirmed on grunt that reverting kernel/v4.14 5ac122cffbbf5ba4228a4d670ddfdbbe22618bd5 makes Terminal usable again and vm.CrostiniStartEverything pass.
,
Jan 14
From some quick hacking on the __vsock_bind_stream() function in the kernel, it seems that port numbers greater than 2^16 are being truncated somewhere (changing the original code to 'static u32 port = 65536;' triggers the same apparent issue). I'm guessing that the port gets truncated to 16 bits somewhere along the way, and the random port number is quite likely to be greater than 65536. I also don't totally understand the rationale for the change - if I'm reading it correctly, the random starting port number is just chosen once per kernel boot (port is a static variable), then gets incremented once per bind(). This function gets called on the host kernel, at least in the way that Crostini uses it, so the randomization only happens once. (Maybe this is intended to randomize the port numbers inside the VM? I don't think we bind() in that case in Crostini.)
,
Jan 14
The original change is for fixing this case: vm guest crashed, it could leave some orphan sockets in host side, then if vm guest tried to make a connection to same host port, it will hit that orphan sockets and connection will fail. To make the start port randomization, then next time vm guest would likely choose a different port and then the connection can pass through. BTW, how this impact crostini? could you share more log so I can understand why this cause failure of crostini?
,
Jan 14
I found the bug - it's actually in Crostini-specific code. src/platform2/vm_tools/vsh/vshd.cc casts the port number to uint16_t, which causes the truncation as mentioned above. The kernel change just happens to tickle this because the random number generator will mostly pick ports outside of the 0..2^16 range, whereas previously the ports always started at 1024 and wouldn't normally exceed the 2^16 range that would cause problems.
,
Jan 15
dverkamp@ are you going to do a fix for that?
,
Jan 15
smbarber@ has a fix in the CQ: https://crrev.com/c/1410063
,
Jan 15
Also note that this will need a component update before the fix takes effect, since the change is in vshd, which is inside the Termina VM rootfs.
,
Jan 16
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/15cbcb25169cb1839e8810c2c8be579acb03deb3 commit 15cbcb25169cb1839e8810c2c8be579acb03deb3 Author: Stephen Barber <smbarber@chromium.org> Date: Wed Jan 16 03:50:14 2019 vm_tools: vsh: cast forwarding port to uint32_t vsock ports are uint32_t, not uint16_t. BUG=chromium:921700 TEST=vm.CrostiniStartEverything on octopus/grunt Change-Id: I5b7150f3801d102bcfaed8609a0faa0b20a2ec7d Reviewed-on: https://chromium-review.googlesource.com/1410063 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Daniel Verkamp <dverkamp@chromium.org> Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> [modify] https://crrev.com/15cbcb25169cb1839e8810c2c8be579acb03deb3/vm_tools/vsh/vshd.cc |
||
►
Sign in to add a comment |
||
Comment 1 by dverkamp@chromium.org
, Jan 14