New issue
Advanced search Search tips

Issue 918920 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

vm.CrostiniStartEverything fails in time sync test sometimes

Project Member Reported by jkardatzke@google.com, Jan 3

Issue description

This is an occasional failure that happens about once a day where the SyncTime subtest fails to retrieve the time from the container. The specific error is:

Failed to get time: bad seconds: "": strconv.ParseInt: parsing "": invalid syntax

Full logs are here: https://stainless.corp.google.com/browse/chromeos-autotest-results/273344741-chromeos-test/

And it can occur in either location where the getTime function is called.

https://cs.corp.google.com/chromeos_public/src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/vm/subtest/set_time.go?dr&g=0&l=29

I don't see anything obvious as to the cause of the failure...and it looks like the 'date' command is getting executed successfully, but it ends up returning an empty string for the result which fails parsing.
 
Strange, I also don't see anything obvious that would be the cause.

How do we collect stdout from commands we run? Is it possible that races with actually printing the output?

Does this happen at any particular time of day?
The testexec package is what we use to do this: https://cs.corp.google.com/chromeos_public/src/platform/tast-tests/src/chromiumos/tast/local/testexec/testexec.go

And for this case, we are executing vsh locally, which should be outputting stdin/stdout from the command execution in the container. This looks like the only place where we actually consume stdout/stderr from vsh execution in any of the tests...so it's very possible this is a bug in vsh.

And it happens at varying times of day, no correlation there.
Cc: nya@chromium.org
I haven't seen any similar problems reported about non-vsh testexec usage (and testexec is just a thin wrapper around the unlikely-to-have-a-race exec package in Go's standard library), so I think that a race in vsh is the most likely explanation.
Cc: mutexlox@chromium.org
Owner: smbar...@chromium.org
Sounds likely. We had a similar race with vsh in  issue 898973  where the output was fine but the status code indicated failure.

I can take a look.
Project Member

Comment 5 by bugdroid1@chromium.org, Jan 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/64d7e0818cfdddfcdc38a38b408033fc49be8fa9

commit 64d7e0818cfdddfcdc38a38b408033fc49be8fa9
Author: Stephen Barber <smbarber@chromium.org>
Date: Sat Jan 12 07:54:32 2019

vm_tools: vsh: eliminate race for output on child exit

The EXITED state requires that the host/vsh client exit immediately. Modify
VshForwarder to defer sending the EXITED status until all target process
output is consumed.

Also fix a race where the window size might be sent to vshd after the
target process has exited. The initial window size should be sent before
the pseudoterminal is created.

BUG=chromium:918920
TEST=vm.CrostiniStartEverything (with SyncTime in loop)

Change-Id: If74f156c40e152983ef0e5f8629f9a5159e686f0
Reviewed-on: https://chromium-review.googlesource.com/1403857
Commit-Ready: Stephen Barber <smbarber@chromium.org>
Tested-by: Stephen Barber <smbarber@chromium.org>
Reviewed-by: Chirantan Ekbote <chirantan@chromium.org>

[modify] https://crrev.com/64d7e0818cfdddfcdc38a38b408033fc49be8fa9/vm_tools/vsh/vsh_forwarder.cc
[modify] https://crrev.com/64d7e0818cfdddfcdc38a38b408033fc49be8fa9/vm_tools/vsh/vsh_forwarder.h
[modify] https://crrev.com/64d7e0818cfdddfcdc38a38b408033fc49be8fa9/vm_tools/vsh/vsh_client.cc
[modify] https://crrev.com/64d7e0818cfdddfcdc38a38b408033fc49be8fa9/vm_tools/vsh/vsh_client.h
[modify] https://crrev.com/64d7e0818cfdddfcdc38a38b408033fc49be8fa9/vm_tools/proto/vsh.proto

Sign in to add a comment