New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 595055 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Mar 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Chrome OS perf (amd & x86 generic telemetry) failing in VMTest consistently

Project Member Reported by x...@chromium.org, Mar 15 2016

Issue description

See https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/x86-generic-telemetry?numbuilds=200 and https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-telemetry?numbuilds=200

It seems that the VMTest has been failing consistently for more than a month.

dshi@, could you take a look at it and help find the right owner? Thanks!
 
Cc: achuith@chromium.org
Owner: x...@chromium.org
These should be fixed by a gardener actually. I was going to take a look at trying to get these green but haven't had the time.
They run this test: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/client/site_tests/telemetry_UnitTests/telemetry_UnitTests.py
with all 3 control files (control.user, control.perf and control.guest).

Last I looked, running these tests would cause the VM to halt, and a gardener needs to look into why this is happening.

These bots run the telemetry unit tests, which are an indicator of the health of the cros performance tests. The cros performance tests are used by various teams for comparing performance of boards, build toolchains, graphics performance, etc.

We shouldn't get rid of these bots.

Comment 3 by x...@chromium.org, Mar 15 2016

I looked into the log file, and I think it resembles  Issue 528035 . achuith@, do you have any thought on the failure?

Selected sample log:
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf                            [  FAILED  ]
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf/telemetry_UnitTests.perf   [  FAILED  ]
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-2-telemetry_UnitTests                                 [  FAILED  ]
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-2-telemetry_UnitTests/telemetry_UnitTests.user        [  FAILED  ]
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-3-telemetry_UnitTests_guest                           [  FAILED  ]
/tmp/cbuildbot4Ipbzy/test_harness/all/SimpleTestVerify/1_autotest_tests/results-3-telemetry_UnitTests_guest/telemetry_UnitTests.guest [  FAILED  ]
-------------------------------------------------------------------------------
Total PASS: 0/6 (0%)

Sample traceback (most recent call last):
Traceback (most recent call last):
File "/usr/local/autotest/common_lib/log.py", line 70, in decorated_func
  fn(*args, **dargs)
File "/usr/local/autotest/bin/base_sysinfo.py", line 374, in log_before_each_iteration
  board = utils.get_board_with_frequency_and_memory()
File "/usr/local/autotest/bin/site_utils.py", line 864, in get_board_with_frequency_and_memory
  frequency = int(round(get_cpu_max_frequency() * 1e-8)) * 0.1
File "/usr/local/autotest/bin/site_utils.py", line 788, in get_cpu_max_frequency
  paths = _get_cpufreq_paths('cpuinfo_max_freq')
File "/usr/local/autotest/bin/site_utils.py", line 925, in _get_cpufreq_paths
  paths = utils.run(cmd, verbose=False).stdout.splitlines()
File "/usr/local/autotest/common_lib/base_utils.py", line 893, in run
  "Command returned non-zero exit status")
* Command:
    ls /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
Exit status: 2
Duration: 0.000597953796387

stderr:
ls: cannot access /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq: No such file or directory

==========================
Saving memory snapshot to /b/cbuild/shared_external/chroot/tmp/cbuildbot4Ipbzy/test_harness/failed/SimpleTestVerify/1_autotest_tests/chromiumos_qemu_mem.bin.rJW_Fg...
    freezing VM...
    saving memory, piping through cat...
    flushing disk buffers...
    asking KVM to quit...
    done.

======================================================================
FAIL: SimpleTestVerify (crostestutils.au_test_harness.au_test.AUTest)
Test that only verifies the target image.
----------------------------------------------------------------------

----------------------------------------------------------------------
Ran 1 test in 1013.556s

FAILED (failures=1)
2016/03/15 09:15:56 - cros_build_lib.py - ERROR   : <class 'chromite.lib.parallel.BackgroundFailure'>: Test harness failed. See logs for details.
Traceback (most recent call last):
  File "/b/cbuild/shared_external/chromite/lib/parallel.py", line 440, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/cbuild/shared_external/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py", line 77, in run
    raise parallel.BackgroundFailure(msg)
BackgroundFailure: Test harness failed. See logs for details.


https://chromium-review.googlesource.com/#/c/332638/1

This fixes the failure with not being able to run the test, but the VM still reboots when the test runs, and that will need to be investigated.

Comment 5 by x...@chromium.org, Mar 15 2016

Cc: davidjames@chromium.org
+davidjames@ could you help bring some thought on this issue?
Labels: -Pri-2 Pri-1
Owner: achuith@chromium.org
Status: Started (was: Assigned)
https://codereview.chromium.org/1805733002/

Comment 7 by x...@chromium.org, Mar 15 2016

Re #6: Thanks for taking over this issue!
Thanks for finding  http://crbug.com/528035  - that was key to solving this problem.
Project Member

Comment 9 by bugdroid1@chromium.org, Mar 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/26dc38548ec0632ea0a2eb8841291fd3d7d76ed6

commit 26dc38548ec0632ea0a2eb8841291fd3d7d76ed6
Author: Achuith Bhandarkar <achuith@chromium.org>
Date: Tue Mar 15 20:05:07 2016

Update telemetry path to catapult.

BUG= chromium:595055 
TEST=this is the test.

Change-Id: I611d262d400291126a2d30accc8c8de2c622a06c
Reviewed-on: https://chromium-review.googlesource.com/332638
Commit-Ready: Achuith Bhandarkar <achuith@chromium.org>
Tested-by: Achuith Bhandarkar <achuith@chromium.org>
Tested-by: Xiaoqian Dai <xdai@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>
Reviewed-by: Xiaoqian Dai <xdai@chromium.org>

[modify] https://crrev.com/26dc38548ec0632ea0a2eb8841291fd3d7d76ed6/client/site_tests/telemetry_UnitTests/telemetry_UnitTests.py

Comment 10 by x...@chromium.org, Mar 18 2016

So after the fix, the VMTest is still failing. But the error message changed. This is the first build with the fix in #9: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/x86-generic-telemetry/builds/8822

Selected log:
-------------------------------------------------------------------------------------------------------------------------------------------------
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf                            [  FAILED  ]
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf                              FAIL: The unit tests of /usr/local/telemetry/src/tools/perf failed.
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf/telemetry_UnitTests.perf   [  FAILED  ]
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-1-telemetry_UnitTests_perf/telemetry_UnitTests.perf     FAIL: The unit tests of /usr/local/telemetry/src/tools/perf failed.
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-2-telemetry_UnitTests                                 [  FAILED  ]
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-2-telemetry_UnitTests                                   FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run.
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-2-telemetry_UnitTests/telemetry_UnitTests.user        [  FAILED  ]
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-3-telemetry_UnitTests_guest                           [  PASSED  ]
/tmp/cbuildbotCDyLmz/test_harness/all/SimpleTestVerify/1_autotest_tests/results-3-telemetry_UnitTests_guest/telemetry_UnitTests.guest [  PASSED  ]
-------------------------------------------------------------------------------------------------------------------------------------------------
Total PASS: 2/6 (33%)
======================================================================
FAIL: SimpleTestVerify (crostestutils.au_test_harness.au_test.AUTest)
Test that only verifies the target image.
----------------------------------------------------------------------

FAILED (failures=1)
2016/03/15 23:04:40 - cros_build_lib.py - ERROR   : <class 'chromite.lib.parallel.BackgroundFailure'>: Test harness failed. See logs for details.
Traceback (most recent call last):
  File "/b/cbuild/shared_external/chromite/lib/parallel.py", line 440, in _Run
    self._task(*self._task_args, **self._task_kwargs)
  File "/b/cbuild/shared_external/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py", line 77, in run
    raise parallel.BackgroundFailure(msg)
BackgroundFailure: Test harness failed. See logs for details.

VMTest Error Traceback (most recent call last):
  File "bin/ctest", line 293, in <module>
    main()
  File "bin/ctest", line 284, in main
    opts.suite)
  File "bin/ctest", line 210, in RunAUTestHarness
    error_code_ok=True, kill_timeout=10 * 60)
  File "/b/cbuild/shared_external/chromite/lib/cros_build_lib.py", line 589, in RunCommand
    (cmd_result.output, cmd_result.error) = proc.communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 751, in communicate
    self.wait()
  File "/usr/lib/python2.7/subprocess.py", line 1291, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.7/subprocess.py", line 478, in _eintr_retry_call
    return func(*args)
  File "/b/cbuild/shared_external/chromite/lib/cros_build_lib.py", line 344, in _KillChildProcess
    raise TerminateRunCommandError('Received signal %i' % signum, cmd_result)
chromite.lib.cros_build_lib.TerminateRunCommandError: return code: 3840; command: /b/cbuild/shared_external/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py '--base_image=/b/cbuild/shared_external/src/build/images/x86-generic/latest-cbuildbot/chromiumos_test_image.bin' '--target_image=/b/cbuild/shared_external/src/build/images/x86-generic/latest-cbuildbot/chromiumos_test_image.bin' '--board=x86-generic' '--type=vm' '--remote=0.0.0.0' --verbose '--jobs=16' '--ssh_private_key=/b/cbuild/shared_external/src/build/images/x86-generic/latest-cbuildbot/id_rsa' '--verify_suite_name=telemetry_unit' '--test_prefix=SimpleTestVerify' '--test_results_root=/b/cbuild/shared_external/chroot/tmp/cbuildbotSosBB7/test_harness' --no_graphics
Received signal 15
22:08:35: ERROR: 
!!!VMTests failed!!!

@@@STEP_WARNINGS@@@
22:08:39: WARNING: Traceback (most recent call last):
  File "/b/cbuild/shared_external/chromite/cbuildbot/stages/generic_stages.py", line 467, in Run
    self.PerformStage()
  File "/b/cbuild/shared_external/chromite/cbuildbot/stages/test_stages.py", line 215, in PerformStage
    self._RunTest(test_type, test_results_dir)
  File "/b/cbuild/shared_external/chromite/cbuildbot/stages/test_stages.py", line 204, in _RunTest
    ssh_private_key=ssh_private_key)
  File "/b/cbuild/shared_external/chromite/cbuildbot/commands.py", line 675, in RunTestSuite
    kill_timeout=10 * 60)
  File "/b/cbuild/shared_external/chromite/lib/cros_build_lib.py", line 589, in RunCommand
    (cmd_result.output, cmd_result.error) = proc.communicate(input)
  File "/usr/lib/python2.7/subprocess.py", line 751, in communicate
    self.wait()
  File "/usr/lib/python2.7/subprocess.py", line 1291, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.7/subprocess.py", line 478, in _eintr_retry_call
    return func(*args)
  File "/b/cbuild/shared_external/chromite/lib/timeout_util.py", line 53, in kill_us
    raise TimeoutError(error_message % {'time': max_run_time})
TimeoutError: Timeout occurred- waited 3600 seconds.

Status: Fixed (was: Started)
I've opened a separate bug for the timeout issue:
https://bugs.chromium.org/p/chromium/issues/detail?id=596945

The x86-generic builder is now green, but it seems to always succeed on the retry:
https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/x86-generic-telemetry/builds/8868

I've landed a few CLs disabling failing tests, fixing the test harness to run the test, and the crash during the test. I'm going to close this bug. 596945 is assigned to the shadow gardener this week.
Labels: VerifyIn-51
Status: Verified (was: Fixed)
Bulk verified

Sign in to add a comment