Stop on first failure in VMTests |
||
Issue descriptionFiled for https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-paladin/828 45 minutes is too long VMTest. And I think we're waiting around for the same failures over and over again. Perhaps we should fail-fast in VMTests on the CQ (fail at the first error encountered?) 10/27 13:57:21.904 INFO | test_runner_utils:0199| autoserv| run process timeout (299.999954939) fired on: /usr/bin/ssh -a -x -F /dev/null -i /dev/null -o ControlPath=/tmp/_autotmp_FWlc45ssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 9228 127.0.0.1 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::execute_section|_execute_daemon|run] -> ssh_run(/usr/local/autotest/bin/autotestd_monitor /tmp/autoserv-tHCEyq 0 0)\";fi; /usr/local/autotest/bin/autotestd_monitor /tmp/autoserv-tHCEyq 0 0" 10/27 14:02:38.024 INFO | test_runner_utils:0199| autoserv| Running 'rsync -L --timeout=1800 --rsh='/usr/bin/ssh -a -x -F /dev/null -i /dev/null -o ControlPath=/tmp/_autotmp_3WhkZfssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 9228' -az --no-o --no-g "/build/betty-arc64/usr/local/build/autotest/packages/packages.checksum" "root@127.0.0.1:"/usr/local/autotest/packages.checksum""' 10/27 14:02:38.125 INFO | test_runner_utils:0199| autoserv| Running (ssh) 'echo B > /usr/local/autotest/tmp/_autotmp_3039WVharness-fifo/autoserv.fifo' from '_wait_for_commands|process_output|write|_process_line|run|run_very_slowly' 10/27 14:05:17.680 INFO | test_runner_utils:0199| autoserv| AUTOTEST_STATUS:: FAIL security_NetworkListeners security_NetworkListeners timestamp=1509131116 localtime=Oct 27 14:05:16 Android did not boot! 10/27 14:09:20.291 INFO | test_runner_utils:0199| autoserv| Running (ssh) 'echo B > /usr/local/autotest/tmp/_autotmp_0hb4e3harness-fifo/autoserv.fifo' from '_wait_for_commands|process_output|write|_process_line|run|run_very_slowly' 10/27 14:12:35.687 INFO | test_runner_utils:0199| autoserv| AUTOTEST_STATUS:: GOOD login_Cryptohome login_Cryptohome timestamp=1509131554 10/27 14:22:48.897 INFO | test_runner_utils:0199| autoserv| Running (ssh) 'echo B > /usr/local/autotest/tmp/_autotmp_i2C_5iharness-fifo/autoserv.fifo' from '_wait_for_commands|process_output|write|_process_line|run|run_very_slowly' 10/27 14:34:17.647 INFO | test_runner_utils:0199| autoserv| AUTOTEST_STATUS:: FAIL login_CryptohomeIncognito login_CryptohomeIncognito
,
Oct 27 2017
Impact assessment: betty and betty-arc64 paladin become slowest in absence of failures on reef/reef-uni (they're currently experimental), and when failures happen in VMTest. e.g.: https://viceroy.corp.google.com/chromeos/build_details?build_config=master-paladin&build_number=16733 https://viceroy.corp.google.com/chromeos/build_details?build_config=master-paladin&build_number=16732 Haven't seen it super frequently (yet). But the VMTest time is pushing the limits.
,
Aug 17
+Alex who is working on redoing how VMTests are run. |
||
►
Sign in to add a comment |
||
Comment 1 by pprabhu@chromium.org
, Oct 27 2017Labels: -Type-Bug -Pri-3 Pri-2 Type-Feature
Owner: pprabhu@chromium.org
Summary: Stop on first failure in VMTests (was: betty vmtest failure mode is slow)