tast vm test times out on betty-release |
||||
Issue descriptionI think it needs more than 30m. It times out in this build: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924392383274643552 https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8924392383274643552/+/steps/TastVMTest__attempt_2_/0/stdout but on the previous build (which failed for a different reason) it was really close to timing out: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8924423047361954896 https://luci-logdog.appspot.com/logs/chromeos/buildbucket/cr-buildbucket.appspot.com/8924423047361954896/+/steps/TastVMTest__attempt_2_/0/stdout Unless I am mistaken, it took about 29 minutes.
,
Jan 14
I am not really sure that the 30 minute timeout is too short. This build has been broken for a long time. My sampling shows exclusively VMtest failures, mostly due to timeouts of subtests. Even with those timeouts, I've seen times as low as 22 minutes. Thus it's not clear that we need a longer timeout here. It is possible that the error reporting could be better, i.e. if one or more of the tests fails, and then the entire vmtest times out, it may be better to report those failures rather than the enclosing timeout. Does the betty build have an owner?
,
Jan 14
We run both informational and "important" (non-informational) tests on betty-release, and people are adding a lot of informational tests, and logging in and starting ARC on a VM takes substantially more time than on real hardware. Adding support for "preconditions" to Tast (issue 892009) is the longer-term fix here. It carries some risk, though, so I'm going to be cautious in rolling it out. In the meantime, increasing the timeout sounds like the correct fix here.
,
Jan 15
I've uploaded https://crrev.com/c/1409625 to increase the TastVMTest stage's timeout to an hour. I also worked on a small local change to pass the suite's timeout through to Tast via its -timeout flag. This ensures that the "tast run" process will abort testing early enough to reserve time to collect system information and report results. We already use this when running Tast tests in the lab. I need to double-check that TastVMTest will exit with a nonzero status code when it aborts testing prematurely so that these timeouts (and the missing tests) won't be swallowed, though. In the lab, the tast.py Autotest server test runs "tast list" first to get a list of expected tests so that it can ensure that it sees all of the expected results. I'd prefer to not duplicate all of that code in chromite for VM tests if we can help it.
,
Jan 16
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d87095cb183e4801fe9b333149ee394eb3077c3a commit d87095cb183e4801fe9b333149ee394eb3077c3a Author: Daniel Erat <derat@chromium.org> Date: Wed Jan 16 03:50:12 2019 chromeos_config: Increase Tast VM timeout to an hour. Increase TastVMTestConfig's default timeout from 30 minutes to an hour. As more informational tests are added, betty-release is bumping up against the current timeout, and this is more in line other VM stages' defaults (90 minutes for VMTestConfig, 60 minutes for GCETestConfig). BUG=chromium:921641 TEST=none Change-Id: I0a00c30498d3c18122fb6515c57fcea08b3188e6 Reviewed-on: https://chromium-review.googlesource.com/1409625 Commit-Ready: Dan Erat <derat@chromium.org> Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Ilja H. Friedel <ihf@chromium.org> Reviewed-by: Shuhei Takahashi <nya@chromium.org> [modify] https://crrev.com/d87095cb183e4801fe9b333149ee394eb3077c3a/config/config_dump.json [modify] https://crrev.com/d87095cb183e4801fe9b333149ee394eb3077c3a/lib/config_lib.py |
||||
►
Sign in to add a comment |
||||
Comment 1 by jclinton@chromium.org
, Jan 14Status: Available (was: Untriaged)