Issue metadata
Sign in to add a comment
|
VMTest broken on betty-release in multiple ways. |
||||||||||||||||||||||
Issue descriptionI've noticed that the VMTest and UploadTestArtifacts stages have been failing consistently on betty-release since October 21 -- see e.g. http://uberchromegw/i/chromeos/builders/betty-release/builds/649. The first VMTest run fails with errors like this: 14:47:25: INFO: Running VM test cros_vm_test. 14:47:25: INFO: Setting up the VM for testing. ... 14:48:25: INFO: Test to flash the VM device with the latest image. 14:48:25: INFO: RunCommand: cros flash --no-wipe --no-reboot ssh://localhost:54915 latest [1;31m14:49:53: ERROR: Failed to flash the VM device. [0m ... [24/Oct/2017:14:49:51] UPDATE Running cros_generate_stateful_update_payload --image /mnt/host/source/devserver/static/betty/R64-10064.0.0/chromiumos_test_image.bin --output_dir /mnt/host/source/devserver/static/cache/dbd3ab9d5e86a476b10d1e550360c54c [24/Oct/2017:14:49:53] HTTP Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/cherrypy/_cprequest.py", line 656, in respond response.body = self.handler() File "/usr/lib64/python2.7/site-packages/cherrypy/lib/encoding.py", line 188, in __call__ self.body = self.oldhandler(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 34, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/lib/devserver/devserver.py", line 1466, in xbuddy image_name=file_name) File "/usr/lib64/devserver/autoupdate.py", line 500, in GetUpdateForLabel rel_path = self.GenerateUpdateImageWithCache(static_image_path) File "/usr/lib64/devserver/autoupdate.py", line 433, in GenerateUpdateImageWithCache self.GenerateUpdateImage(image_path, cache_dir) File "/usr/lib64/devserver/autoupdate.py", line 399, in GenerateUpdateImage raise AutoupdateError('Failed to generate update in %s' % output_dir) AutoupdateError: Failed to generate update in /mnt/host/source/devserver/static/cache/dbd3ab9d5e86a476b10d1e550360c54c [24/Oct/2017:14:49:53] HTTP Request Headers: HOST: 192.168.120.129:52843 CONNECTION: close Remote-Addr: ::ffff:192.168.120.129 USER-AGENT: Python-urllib/2.7 ACCEPT-ENCODING: identity ::ffff:192.168.120.129 - - [24/Oct/2017:14:49:53] "GET /xbuddy/local/betty/R64-10064.0.0/test?for_update=true&return_dir=true HTTP/1.1" 500 1890 "" "Python-urllib/2.7" --- End output from /b/c/cbuild/repository/chroot/tmp/devserver_wrapperrlZxrt/dev_server.log --- [0m [1;31m14:49:53: ERROR: Device update failed. [0m [1;31m14:49:53: ERROR: cros flash failed before completing. [0m [1;31m14:49:53: ERROR: HTTP Error 500: Internal Server Error [0m ---- The second VMTest run fails differently: 15:32:26: INFO: Running VM test dev_mode_test. 15:32:26: INFO: RunCommand: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin 2017/10/24 15:32:27 - devinstall_test.py - INFO : Setting up the image /b/c/cbuild/repository/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin for vm testing. 2017/10/24 15:32:27 - devinstall_test.py - INFO : Making copy of the vm image /b/c/cbuild/repository/src/build/images/betty/R64-10064.0.0/chromiumos_qemu_image.bin to manipulate. 2017/10/24 15:32:42 - devinstall_test.py - DEBUG : Copy of vm image stored at /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin. 2017/10/24 15:32:42 - devinstall_test.py - INFO : Wiping /usr/local/bin from the image. 2017/10/24 15:32:45 - devinstall_test.py - WARNING : Received error during cleanup Traceback (most recent call last): File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 73, in Cleanup self.vm.Stop() AttributeError: 'NoneType' object has no attribute 'Stop' ... chromite.lib.cros_build_lib.RunCommandError: return code: 1; command: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s' --safe mount: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so ERROR : mount failed: device=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin target=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s format= ro/rw=rw options=offset=4456448000 ERROR : Tue Oct 24 15:32:45 PDT 2017 ERROR : PGID PPID PID ELAPSED TIME %CPU COMMAND ERROR : Arguments of 28045: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s' '--safe' ERROR : Backtrace: (most recent call is last) ERROR : mount_gpt_image.sh:427:main(), called: mount_image ERROR : mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin' ERROR : ERROR : Error was: ERROR : Failed to mount all partitions in /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin ---- And UploadTestArtifacts fails like this (with a bunch more errors on the end): 4:04:01: INFO: RunCommand: cros_sdk -- /mnt/host/source/src/platform/dev/host/cros_generate_stateful_update_payload --image /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin --output /tmp/generate_payloadsaWiCcK in /b/c/cbuild/repository/src/scripts INFO:cros_generate_stateful_update_payload:Generating stateful update file. mount: wrong fs type, bad option, bad superblock on /dev/loop3, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. ERROR : mount failed: device=/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin target=/tmp/tmpzX1SiDstateful format=ext4 ro/rw=ro options=offset=2311061504 Current loopback device status: /dev/loop0: [2052]:201721121 (/b/c/cbuild/repository/chroot/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_base_image.bin), offset 2311061504, sizelimit 2147483648 /dev/loop1: [2052]:201721121 (/b/c/cbuild/repository/chroot/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_base_image.bin), offset 163577856, sizelimit 2147483648 /dev/loop2: [2052]:201721113 (/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin), offset 163577856 ERROR : Tue Oct 24 14:04:03 PDT 2017 ERROR : PGID PPID PID ELAPSED TIME %CPU COMMAND ERROR : Arguments of 262: '--from=/mnt/host/source/src/build/images/betty/R64-10064.0.0' '--image=chromiumos_test_image.bin' '--read_only' '--rootfs_mountpt=/tmp/tmpEYKrVLrootfs' '--stateful_mountpt=/tmp/tmpzX1SiDstateful' ERROR : Backtrace: (most recent call is last) ERROR : mount_gpt_image.sh:427:main(), called: mount_image ERROR : mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin' ERROR : ERROR : Error was: ERROR : Failed to mount all partitions in /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin ERROR:cros_generate_stateful_update_payload:Failed to create stateful update file ---- This appears to have started when I added TastVMTestStage to this builder (for issue 772053 ), and I see that TastVMTest is running in parallel with VMTest: VMTest (attempt 1): ************************************************************ ** Start Stage VMTest (attempt 1) - Tue, 24 Oct 2017 13:59:47 -0700 (PDT) ** ** Run autotests in a virtual machine. ************************************************************ 13:59:47: INFO: Running cidb query on pid 2893, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f769ff82890> Preconditions for the stage successfully met. Beginning to execute stage... TastVMTest: ************************************************************ ** Start Stage TastVMTest (attempt 1) - Tue, 24 Oct 2017 13:59:48 -0700 (PDT) ** ** Runs Tast integration tests in a virtual machine. ************************************************************ 13:59:48: INFO: Running cidb query on pid 2914, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f769ff82650> Preconditions for the stage successfully met. Beginning to execute stage... ---- Don mentioned over email that dual access to the VM image file might be a problem. I think I see how to make TastVMTestStage run serially instead of in parallel if that's needed (by modifying _RunBackgroundStagesForBoard in chromite/cbuildbot/builders/simple_builders.py), but it'd probably be better to figure out why we aren't able to run two stages that use VMs in parallel. The tests in VMTest look like they're passing; it's just the stuff afterwards that's broken.
,
Oct 28 2017
Is it expected that VMTest and TastVMTest can't run in parallel without VMTest failing?
,
Oct 28 2017
I expect it would be difficult to get them running reliably in parallel, because you'd have to fix all the flakiness bugs that happen when you launch two vms in parallel on same machine. Another option for implementing: you could add a giant lock around the two stages. That might be better long term because you could narrow the lock later if we get them running better in parallel.
,
Oct 28 2017
,
Oct 28 2017
Re #3, I was looking at this yesterday (on the deduped bug). I think this is a case of the devserver somehow getting confused due to the two tests rather than the parallelly running VMs causing problems. That said, while working on moblab vm, I did find issues with cros_start_vm launching two VMs at the same time, and I fixed some of them.
,
Oct 30 2017
For now, I'm just going to make these stages run serially as David suggested in #1.
,
Oct 30 2017
,
Oct 30 2017
The fix is waiting for a review at https://crrev.com/c/744326.
,
Oct 31 2017
,
Oct 31 2017
https://crrev.com/c/744326 is in (sorry, I messed up the bug number in its commit message). I can't verify that it fixed the issue since the last few betty-release builds failed to build chromeos-chrome without any meaningful error messages, as far as I can tell. :-(
,
Nov 2 2017
Sigh. Even without TastVMTest running in parallel, both VMTest attempts on bety-release still fail. See e.g. http://uberchromegw/i/chromeos/builders/betty-release/builds/674: VMTest (attempt 1): Total PASS: 111/111 (100%) 09:13:06 INFO | Finished running tests. Results can be found in /tmp/cbuildbotgpnZuU/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests or /tmp/test_that_latest INFO : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 INFO : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard Persist requested. Use --ssh_port 9228 --ssh_private_key /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/id_rsa --kvm_pid /tmp/kvm.9228 to re-connect to it. INFO : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 INFO : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard Stopping the KVM instance . ---------------------------------------------------------------------- Ran 1 test in 2416.349s OK 07:13:15: INFO: Running VM test dev_mode_test. ... 07:18:49: INFO: Running VM test cros_vm_test. 07:18:49: INFO: Setting up the VM for testing. 07:18:49: INFO: RunCommand: parted -m /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_qemu_image.bin unit MB print 07:19:48: INFO: VM started at port 45729 07:19:48: INFO: The VM has been successfully set up. Ready to run tests. 07:19:48: INFO: Running test for cros shell. 07:19:48: INFO: Test to use shell command to write a file to the VM device. 07:19:48: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'echo "shell command test file" > /tmp/shell-test' 07:19:49: INFO: Test to use shell command to read a file on the VM device. 07:19:49: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'cat /tmp/shell-test' 07:19:50: INFO: Test to use shell command to remove a file on the VM device. 07:19:50: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'rm /tmp/shell-test' 07:19:51: INFO: Test for cros shell passed. 07:19:51: INFO: Running test for cros debug. 07:19:51: INFO: Test to start and debug a new process on the VM device. 07:19:51: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:45729 07:19:59: INFO: Test to attach a running process on the VM device. [1;31m07:19:59: ERROR: Failed to find any running process to debug.[0m 07:19:59: INFO: Stopping the VM. 07:20:31: INFO: The VM has been stopped. [1;31m07:20:32: ERROR: !!!VMTests failed!!! ----- VMTest (attempt 2): Total PASS: 111/111 (100%) 09:58:38 INFO | Finished running tests. Results can be found in /tmp/cbuildbotaEupiv/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests or /tmp/test_that_latest INFO : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 INFO : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard Persist requested. Use --ssh_port 9228 --ssh_private_key /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/id_rsa --kvm_pid /tmp/kvm.9228 to re-connect to it. INFO : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 INFO : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard Stopping the KVM instance . ---------------------------------------------------------------------- Ran 1 test in 2252.596s OK 07:58:47: INFO: Running VM test dev_mode_test. ... 07:58:47: INFO: RunCommand: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin 2017/11/02 07:58:48 - devinstall_test.py - INFO : Setting up the image /b/c/cbuild/repository/src/build/images/betty/R64-10091.0.0/chromiumos_test_image.bin for vm testing. 2017/11/02 07:58:48 - devinstall_test.py - INFO : Making copy of the vm image /b/c/cbuild/repository/src/build/images/betty/R64-10091.0.0/chromiumos_qemu_image.bin to manipulate. 2017/11/02 07:58:59 - devinstall_test.py - DEBUG : Copy of vm image stored at /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin. 2017/11/02 07:58:59 - devinstall_test.py - INFO : Wiping /usr/local/bin from the image. 2017/11/02 07:59:07 - devinstall_test.py - ERROR : Received error during cleanup Traceback (most recent call last): File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 74, in Cleanup osutils.RmDir(self.tmpdir, ignore_missing=True) File "/b/c/cbuild/repository/chromite/lib/osutils.py", line 365, in RmDir shutil.rmtree(path) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 247, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.7/shutil.py", line 252, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "/usr/lib/python2.7/shutil.py", line 250, in rmtree os.remove(fullname) OSError: [Errno 30] Read-only file system: '/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m/lib64/libbrlapi.so' Traceback (most recent call last): File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 187, in <module> main() File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 178, in main test.PrepareTest() File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 105, in PrepareTest self._WipeDevInstall() File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 86, in _WipeDevInstall safe=True) File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/../../crostestutils/lib/mount_helper.py", line 23, in MountImage cmd, print_cmd=False, cwd=constants.CROSUTILS_DIR, capture_output=True) File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand raise RunCommandError(msg, cmd_result) chromite.lib.cros_build_lib.RunCommandError: return code: 1; command: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s' --safe mount: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so ERROR : mount failed: device=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin target=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s format= ro/rw=rw options=offset=4456448000 ERROR : Thu Nov 2 07:59:07 PDT 2017 ERROR : PGID PPID PID ELAPSED TIME %CPU COMMAND ERROR : Arguments of 25425: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s' '--safe' ERROR : Backtrace: (most recent call is last) ERROR : mount_gpt_image.sh:427:main(), called: mount_image ERROR : mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin' ERROR : ERROR : Error was: ERROR : Failed to mount all partitions in /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin Current loopback device status: /dev/loop0: [0804]:82051281 (/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin), offset 2308964352 cmd=['./mount_gpt_image.sh', '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest', '--image=chromiumos_qemu_image.bin', '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m', '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s', '--safe'], cwd=/b/c/cbuild/repository/src/scripts [1;31m07:59:07: ERROR: !!!VMTests failed!!! Logs are uploaded in the corresponding vm_test_results_2. This can be found by clicking on the artifacts link in the "Report" Stage. Specifically look for the test_harness/failed for the failing tests. For more particulars, please refer to which test failed i.e. above see the individual test that failed -- or if an update failed, check the corresponding update directory. [0m 07:59:07: INFO: Uploading artifacts to Google Storage... 07:59:12: INFO: RunCommand: tar --sparse -I /b/c/cbuild/repository/chroot/usr/bin/pigz -cf /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz . in /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2 07:59:13: INFO: Waiting up to forever for breakpad symbols ... 07:59:13: INFO: Uploading artifacts to Google Storage... 07:59:13: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -r -e -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2 gs://chromeos-image-archive/betty-release/R64-10091.0.0/vm_test_results_2 08:00:09: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/UPLOADED gs://chromeos-image-archive/betty-release/R64-10091.0.0/UPLOADED 07:59:13: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz gs://chromeos-image-archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz 07:59:15: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/UPLOADED gs://chromeos-image-archive/betty-release/R64-10091.0.0/UPLOADED 08:00:11: INFO: Parsing test report /b/c/cbuild/repository/chroot/tmp/cbuildbotaEupiv/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/test_report.log [1;31m08:00:12: ERROR: Traceback (most recent call last): File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run self.PerformStage() File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 237, in PerformStage self._RunTest(test_type, per_test_results_dir) File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 199, in _RunTest self._build_root, self._current_board, self.GetImageDirSymlink()) File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 472, in RunDevModeTest cros_build_lib.RunCommand(cmd) File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand raise RunCommandError(msg, cmd_result) RunCommandError: return code: 1; command: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin cmd=['/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py', '--verbose', u'betty', u'/b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin'] [0m
,
Nov 2 2017
I'm not 100% sure this is the same root cause, but I think we're still seeing the same problem today: https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-release/236 https://luci-milo.appspot.com/buildbot/chromeos/betty-release/674
,
Nov 2 2017
You should be able to debug this locally using cbuildbot --local One option we have as a quick fix is just to disable normal VMTests on betty & only run the tast tests. We don't need regular VM tests on each & every canary. But long-term it would be good to be able to run both on a machine :)
,
Nov 2 2017
"cros tryjob --local", but yeah.
,
Nov 2 2017
#13 is no longer an option because we've now earmarked betty to be the VMTest board. Unless I'm behind the times, we no longer run VMTests on anything but betty.
,
Nov 2 2017
From #11, there are two independent problems here. 'cros debug' is failing in the VM: 07:18:49: INFO: Running VM test cros_vm_test. 07:18:49: INFO: Setting up the VM for testing. .. 07:19:51: INFO: Test to start and debug a new process on the VM device. 07:19:51: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:45729 07:19:59: INFO: Test to attach a running process on the VM device. [1;31m07:19:59: ERROR: Failed to find any running process to debug.[0m 07:19:59: INFO: Stopping the VM. 07:20:31: INFO: The VM has been stopped. ------------ And then, on retry, the source image is corrupted. I suspect that cros_vm_test is launching VM without copying the image anywhere, and then not stopping it cleanly, leading to a corrupt image.
,
Nov 2 2017
We have a short list of boards that run VMTests now, so it's more than betty, but not many more. https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/chromeos_config.py?type=cs&q=_vmtest_boards&sq=package:%5Echromeos_(internal%7Cpublic)$&l=753
,
Nov 2 2017
Actually, looking at the code briefly, I don't think cros_vm_test is failing to cleanup the VM, so I can't explain why that'd corrupt the image. Nor can I explain why 'cros debug' is failing to find update_engine process on the VM. So, next suspicion is on devmode test. Trybot without that test: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/16922
,
Nov 7 2017
#18: That run failed in the same way. :-( ... 16:37:43: INFO: Running VM test dev_mode_test. 16:37:43: INFO: Skipped devmode test 16:37:43: INFO: Running VM test cros_vm_test. 16:37:43: INFO: Setting up the VM for testing. 16:37:43: INFO: RunCommand: parted -m /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_qemu_image.bin unit MB print 16:38:41: INFO: VM started at port 58406 16:38:41: INFO: The VM has been successfully set up. Ready to run tests. 16:38:41: INFO: Running test for cros shell. 16:38:41: INFO: Test to use shell command to write a file to the VM device. 16:38:41: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'echo "shell command test file" > /tmp/shell-test' 16:38:43: INFO: Test to use shell command to read a file on the VM device. 16:38:43: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'cat /tmp/shell-test' 16:38:43: INFO: Test to use shell command to remove a file on the VM device. 16:38:43: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'rm /tmp/shell-test' 16:38:44: INFO: Test for cros shell passed. 16:38:44: INFO: Running test for cros debug. 16:38:44: INFO: Test to start and debug a new process on the VM device. 16:38:44: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:58406 16:38:53: INFO: Test to attach a running process on the VM device. [1;31m16:38:53: ERROR: Failed to find any running process to debug.[0m 16:38:53: INFO: Stopping the VM. 16:39:26: INFO: The VM has been stopped. [1;31m16:39:26: ERROR: !!!VMTests failed!!! --- Somewhat-tangential question: since making VMTest and TastVMTest run serially, I've noticed that TastVMTest no longer runs when VMTest fails (which is always, right now :-/). I think that's expected: the self._RunStage call that runs VMTestStage is presumably raising an exception that makes us bail out of SimpleBuilder._RunVMTests before running TastVMTestStage. Is there a recommended way to handle this? For example, if I catch the VMTest exception when it's thrown, is there a way I can reraise it alongside an exception from TastVMTest (if it fails too) later?
,
Nov 7 2017
Yes it is possible to ignore/reraise/chain/combine exceptions in Python. In cbuildbot we use the CompoundException class for this. This is starting to make me wonder if the failures in VMTest are entirely unrelated to tast (i.e. even if you ripped out all the tast code, maybe VMTest would still fail because an unrelated regression was submitted). We may want to try reverting tast and see whether things work before re-introducing tast. That might be useful as a trybot test, at least.
,
Nov 7 2017
I don't think there's any question that the current failures (discussed in #11 and onward) are unrelated. VMTest is failing despite TastVMStage not running at all (due to the issue mentioned at the end of #19).
,
Nov 7 2017
Take a look at failures_lib.CompoundFailure. You can see an example of it in parallel.BackgroundFailure. If you collect the exceptions into a CompoundFailure, then raise it after both stages finish, that SHOULD work fine. Also, a CompoundFailure commonly contains a single exception.
,
Nov 7 2017
Re #19: Not the same way. Even though cros_vm_test is still failing, VMTest (attempt 2) no longer fails with mounting errors. So, there seem to be multiple problems here: [1] cros_vm_test 'cros debug' test is broken. [2] dev_mode_test is screwing up the source image (even when it passes) in a way that VMTestStage can no longer manipulate the image properly. These are either regressions or have always failed on betty (we recently switched all VMTests to betty, maybe this never worked?) betty-paladin VMTestStage passes because it runs neither cros_vm_test nor devmode_test.
,
Nov 7 2017
,
Nov 9 2017
I've found the reason for failure [1], it's an "interesting" chain of events. Support for cirrus has recently been removed (Issue 710629, Issue 733875 ), only virtio_gpu is supported now. cros_vm_test doesn't pass the board ID to cros_start_vm, so cros_start_vm picks cirrus by default instead of virtio_gpu for (now obsolete) legacy reasons. Since graphics don't work, Chrome fails to start => login-prompt-visible is never emitted => boot-complete is not emitted => .... => update_engine is not started => the 'cros debug' test doesn't find the process. I'll send a CL.
,
Nov 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/crosutils/+/c779127055235960caad8c3a430928b3da261e3f commit c779127055235960caad8c3a430928b3da261e3f Author: Nicolas Norvez <norvez@chromium.org> Date: Sat Nov 11 02:45:51 2017 cros_vm_lib: always use virtio_gpu cirrus support has been removed, virtio_gpu is the only supported option. This fixes cros_vm_test which is failing because the board is not specified by the chromite calling code, so qemu uses cirrus by default. This in turns breaks Chrome, so system services (e.g. update_engine) aren't started. BUG= chromium:779267 BUG= chromium:782052 BUG=chromium:710629 TEST=cros_vm_test on novato passes Change-Id: I87b1e03e8cc60169ad9bc47433c4b270233c60e8 Reviewed-on: https://chromium-review.googlesource.com/761416 Commit-Ready: Nicolas Norvez <norvez@chromium.org> Tested-by: Nicolas Norvez <norvez@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/c779127055235960caad8c3a430928b3da261e3f/lib/cros_vm_lib.sh
,
Nov 15 2017
I think that this is working now -- VMTest has been passing on betty-release since http://uberchromegw/i/chromeos/builders/betty-release/builds/701 at Nov 11 02:09:09.
,
Jul 23
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by davidjames@google.com
, Oct 28 2017So what you'd do here is make a function similar to _RunHWTests that runs both the VMTest stage and your TastVMTest stage. Remove VMTest and TastVMTest from stage_objs (that's what makes them run in parallel), and instead have it get executed here in parallel with the other stages, something like this: parallel.RunParallelSteps([ lambda: self._RunParallelStages(stage_objs + [archive_stage]), lambda: self._RunVMTests(builder_run, board), lambda: self._RunHWTests(builder_run, board), lambda: self._RunDebugSymbolStages(builder_run, board), ])