New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 779267 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug-Regression

Blocked on:
issue 782052
issue 782055



Sign in to add a comment

VMTest broken on betty-release in multiple ways.

Project Member Reported by derat@chromium.org, Oct 27 2017

Issue description

I've noticed that the VMTest and UploadTestArtifacts stages have been failing consistently on betty-release since October 21 -- see e.g. http://uberchromegw/i/chromeos/builders/betty-release/builds/649.

The first VMTest run fails with errors like this:

14:47:25: INFO: Running VM test cros_vm_test.
14:47:25: INFO: Setting up the VM for testing.
...
14:48:25: INFO: Test to flash the VM device with the latest image.
14:48:25: INFO: RunCommand: cros flash --no-wipe --no-reboot ssh://localhost:54915 latest
[1;31m14:49:53: ERROR: Failed to flash the VM device. [0m
...
[24/Oct/2017:14:49:51] UPDATE Running cros_generate_stateful_update_payload --image /mnt/host/source/devserver/static/betty/R64-10064.0.0/chromiumos_test_image.bin --output_dir /mnt/host/source/devserver/static/cache/dbd3ab9d5e86a476b10d1e550360c54c
[24/Oct/2017:14:49:53] HTTP Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/cherrypy/_cprequest.py", line 656, in respond
    response.body = self.handler()
  File "/usr/lib64/python2.7/site-packages/cherrypy/lib/encoding.py", line 188, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 34, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/lib/devserver/devserver.py", line 1466, in xbuddy
    image_name=file_name)
  File "/usr/lib64/devserver/autoupdate.py", line 500, in GetUpdateForLabel
    rel_path = self.GenerateUpdateImageWithCache(static_image_path)
  File "/usr/lib64/devserver/autoupdate.py", line 433, in GenerateUpdateImageWithCache
    self.GenerateUpdateImage(image_path, cache_dir)
  File "/usr/lib64/devserver/autoupdate.py", line 399, in GenerateUpdateImage
    raise AutoupdateError('Failed to generate update in %s' % output_dir)
AutoupdateError: Failed to generate update in /mnt/host/source/devserver/static/cache/dbd3ab9d5e86a476b10d1e550360c54c
[24/Oct/2017:14:49:53] HTTP 
Request Headers:
  HOST: 192.168.120.129:52843
  CONNECTION: close
  Remote-Addr: ::ffff:192.168.120.129
  USER-AGENT: Python-urllib/2.7
  ACCEPT-ENCODING: identity
::ffff:192.168.120.129 - - [24/Oct/2017:14:49:53] "GET /xbuddy/local/betty/R64-10064.0.0/test?for_update=true&return_dir=true HTTP/1.1" 500 1890 "" "Python-urllib/2.7"
--- End output from /b/c/cbuild/repository/chroot/tmp/devserver_wrapperrlZxrt/dev_server.log --- [0m
[1;31m14:49:53: ERROR: Device update failed. [0m
[1;31m14:49:53: ERROR: cros flash failed before completing. [0m
[1;31m14:49:53: ERROR: HTTP Error 500: Internal Server Error [0m

----

The second VMTest run fails differently:

15:32:26: INFO: Running VM test dev_mode_test.
15:32:26: INFO: RunCommand: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin
2017/10/24 15:32:27 - devinstall_test.py - INFO    : Setting up the image /b/c/cbuild/repository/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin for vm testing.
2017/10/24 15:32:27 - devinstall_test.py - INFO    : Making copy of the vm image /b/c/cbuild/repository/src/build/images/betty/R64-10064.0.0/chromiumos_qemu_image.bin to manipulate.
2017/10/24 15:32:42 - devinstall_test.py - DEBUG   : Copy of vm image stored at /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin.
2017/10/24 15:32:42 - devinstall_test.py - INFO    : Wiping /usr/local/bin from the image.
2017/10/24 15:32:45 - devinstall_test.py - WARNING : Received error during cleanup
Traceback (most recent call last):
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 73, in Cleanup
    self.vm.Stop()
AttributeError: 'NoneType' object has no attribute 'Stop'
...
chromite.lib.cros_build_lib.RunCommandError: return code: 1; command: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s' --safe
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
ERROR   : mount failed: device=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin target=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s format= ro/rw=rw options=offset=4456448000
ERROR   : Tue Oct 24 15:32:45 PDT 2017
ERROR   :  PGID  PPID   PID     ELAPSED     TIME %CPU COMMAND
ERROR   : Arguments of 28045: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/s' '--safe'
ERROR   : Backtrace:  (most recent call is last)
ERROR   :  mount_gpt_image.sh:427:main(), called: mount_image  
ERROR   :  mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin' 
ERROR   : 
ERROR   : Error was:
ERROR   :   Failed to mount all partitions in /tmp/cbuildbot-tmpgbkJ9g/tmpgLKENCDevModeTest/chromiumos_qemu_image.bin

----

And UploadTestArtifacts fails like this (with a bunch more errors on the end):

4:04:01: INFO: RunCommand: cros_sdk -- /mnt/host/source/src/platform/dev/host/cros_generate_stateful_update_payload --image /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin --output /tmp/generate_payloadsaWiCcK in /b/c/cbuild/repository/src/scripts
INFO:cros_generate_stateful_update_payload:Generating stateful update file.
mount: wrong fs type, bad option, bad superblock on /dev/loop3,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so.
ERROR   : mount failed: device=/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin target=/tmp/tmpzX1SiDstateful format=ext4 ro/rw=ro options=offset=2311061504
Current loopback device status:
    /dev/loop0: [2052]:201721121 (/b/c/cbuild/repository/chroot/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_base_image.bin), offset 2311061504, sizelimit 2147483648
    /dev/loop1: [2052]:201721121 (/b/c/cbuild/repository/chroot/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_base_image.bin), offset 163577856, sizelimit 2147483648
    /dev/loop2: [2052]:201721113 (/mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin), offset 163577856
ERROR   : Tue Oct 24 14:04:03 PDT 2017
ERROR   :  PGID  PPID   PID     ELAPSED     TIME %CPU COMMAND
ERROR   : Arguments of 262:  '--from=/mnt/host/source/src/build/images/betty/R64-10064.0.0' '--image=chromiumos_test_image.bin' '--read_only' '--rootfs_mountpt=/tmp/tmpEYKrVLrootfs' '--stateful_mountpt=/tmp/tmpzX1SiDstateful'
ERROR   : Backtrace:  (most recent call is last)
ERROR   :  mount_gpt_image.sh:427:main(), called: mount_image  
ERROR   :  mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin' 
ERROR   : 
ERROR   : Error was:
ERROR   :   Failed to mount all partitions in /mnt/host/source/src/build/images/betty/R64-10064.0.0/chromiumos_test_image.bin
ERROR:cros_generate_stateful_update_payload:Failed to create stateful update file

----

This appears to have started when I added TastVMTestStage to this builder (for  issue 772053 ), and I see that TastVMTest is running in parallel with VMTest:

VMTest (attempt 1):

************************************************************
** Start Stage VMTest (attempt 1) - Tue, 24 Oct 2017 13:59:47 -0700 (PDT)
** 
** Run autotests in a virtual machine.
************************************************************
13:59:47: INFO: Running cidb query on pid 2893, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f769ff82890>
Preconditions for the stage successfully met. Beginning to execute stage...

TastVMTest:

************************************************************
** Start Stage TastVMTest (attempt 1) - Tue, 24 Oct 2017 13:59:48 -0700 (PDT)
** 
** Runs Tast integration tests in a virtual machine.
************************************************************
13:59:48: INFO: Running cidb query on pid 2914, repr(query) starts with <sqlalchemy.sql.expression.Update object at 0x7f769ff82650>
Preconditions for the stage successfully met. Beginning to execute stage...

----

Don mentioned over email that dual access to the VM image file might be a problem. I think I see how to make TastVMTestStage run serially instead of in parallel if that's needed (by modifying _RunBackgroundStagesForBoard in chromite/cbuildbot/builders/simple_builders.py), but it'd probably be better to figure out why we aren't able to run two stages that use VMs in parallel. The tests in VMTest look like they're passing; it's just the stuff afterwards that's broken.
 
So what you'd do here is make a function similar to _RunHWTests that runs both the VMTest stage and your TastVMTest stage.

Remove VMTest and TastVMTest from stage_objs (that's what makes them run in parallel),
and instead have it get executed here in parallel with the other stages, something like this:

    parallel.RunParallelSteps([
        lambda: self._RunParallelStages(stage_objs + [archive_stage]),
        lambda: self._RunVMTests(builder_run, board),
        lambda: self._RunHWTests(builder_run, board),
        lambda: self._RunDebugSymbolStages(builder_run, board),
    ])

Comment 2 by derat@chromium.org, Oct 28 2017

Is it expected that VMTest and TastVMTest can't run in parallel without VMTest failing?
I expect it would be difficult to get them running reliably in parallel,
because you'd have to fix all the flakiness bugs that happen when you
launch two vms in parallel on same machine.

Another option for implementing: you could add a giant lock around the two
stages. That might be better long term because you could narrow the lock
later if we get them running better in parallel.
Cc: ihf@chromium.org mruthven@chromium.org
 Issue 777461  has been merged into this issue.
Re #3, I was looking at this yesterday (on the deduped bug). I think this is a case of the devserver somehow getting confused due to the two tests rather than the parallelly running VMs causing problems.

That said, while working on moblab vm, I did find issues with cros_start_vm launching two VMs at the same time, and I fixed some of them.

Comment 6 by derat@chromium.org, Oct 30 2017

Owner: derat@chromium.org
Status: Started (was: Untriaged)
For now, I'm just going to make these stages run serially as David suggested in #1.
Cc: teravest@chromium.org justincarlson@chromium.org

Comment 8 by derat@chromium.org, Oct 30 2017

The fix is waiting for a review at https://crrev.com/c/744326.

Comment 9 by ihf@chromium.org, Oct 31 2017

Cc: pwang@chromium.org

Comment 10 by derat@chromium.org, Oct 31 2017

https://crrev.com/c/744326 is in (sorry, I messed up the bug number in its commit message).

I can't verify that it fixed the issue since the last few betty-release builds failed to build chromeos-chrome without any meaningful error messages, as far as I can tell. :-(
Sigh. Even without TastVMTest running in parallel, both VMTest attempts on bety-release still fail. See e.g. http://uberchromegw/i/chromeos/builders/betty-release/builds/674:

VMTest (attempt 1):

Total PASS: 111/111 (100%)
09:13:06 INFO | Finished running tests. Results can be found in /tmp/cbuildbotgpnZuU/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests or /tmp/test_that_latest
INFO    : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Persist requested.  Use --ssh_port 9228  --ssh_private_key /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/id_rsa  --kvm_pid /tmp/kvm.9228 to re-connect to it.
INFO    : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Stopping the KVM instance
.
----------------------------------------------------------------------
Ran 1 test in 2416.349s
OK
07:13:15: INFO: Running VM test dev_mode_test.
...
07:18:49: INFO: Running VM test cros_vm_test.
07:18:49: INFO: Setting up the VM for testing.
07:18:49: INFO: RunCommand: parted -m /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_qemu_image.bin unit MB print
07:19:48: INFO: VM started at port 45729
07:19:48: INFO: The VM has been successfully set up. Ready to run tests.
07:19:48: INFO: Running test for cros shell.
07:19:48: INFO: Test to use shell command to write a file to the VM device.
07:19:48: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'echo "shell command test file" > /tmp/shell-test'
07:19:49: INFO: Test to use shell command to read a file on the VM device.
07:19:49: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'cat /tmp/shell-test'
07:19:50: INFO: Test to use shell command to remove a file on the VM device.
07:19:50: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:45729 -- 'rm /tmp/shell-test'
07:19:51: INFO: Test for cros shell passed.
07:19:51: INFO: Running test for cros debug.
07:19:51: INFO: Test to start and debug a new process on the VM device.
07:19:51: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:45729
07:19:59: INFO: Test to attach a running process on the VM device.
07:19:59: ERROR: Failed to find any running process to debug.
07:19:59: INFO: Stopping the VM.
07:20:31: INFO: The VM has been stopped.
07:20:32: ERROR: 
!!!VMTests failed!!!

-----

VMTest (attempt 2):

Total PASS: 111/111 (100%)
09:58:38 INFO | Finished running tests. Results can be found in /tmp/cbuildbotaEupiv/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests or /tmp/test_that_latest
INFO    : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Persist requested.  Use --ssh_port 9228  --ssh_private_key /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/id_rsa  --kvm_pid /tmp/kvm.9228 to re-connect to it.
INFO    : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Stopping the KVM instance
.
----------------------------------------------------------------------
Ran 1 test in 2252.596s
OK
07:58:47: INFO: Running VM test dev_mode_test.
...
07:58:47: INFO: RunCommand: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin
2017/11/02 07:58:48 - devinstall_test.py - INFO    : Setting up the image /b/c/cbuild/repository/src/build/images/betty/R64-10091.0.0/chromiumos_test_image.bin for vm testing.
2017/11/02 07:58:48 - devinstall_test.py - INFO    : Making copy of the vm image /b/c/cbuild/repository/src/build/images/betty/R64-10091.0.0/chromiumos_qemu_image.bin to manipulate.
2017/11/02 07:58:59 - devinstall_test.py - DEBUG   : Copy of vm image stored at /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin.
2017/11/02 07:58:59 - devinstall_test.py - INFO    : Wiping /usr/local/bin from the image.
2017/11/02 07:59:07 - devinstall_test.py - ERROR   : Received error during cleanup
Traceback (most recent call last):
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 74, in Cleanup
    osutils.RmDir(self.tmpdir, ignore_missing=True)
  File "/b/c/cbuild/repository/chromite/lib/osutils.py", line 365, in RmDir
    shutil.rmtree(path)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 247, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.7/shutil.py", line 252, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 250, in rmtree
    os.remove(fullname)
OSError: [Errno 30] Read-only file system: '/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m/lib64/libbrlapi.so'
Traceback (most recent call last):
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 187, in <module>
    main()
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 178, in main
    test.PrepareTest()
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 105, in PrepareTest
    self._WipeDevInstall()
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py", line 86, in _WipeDevInstall
    safe=True)
  File "/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/../../crostestutils/lib/mount_helper.py", line 23, in MountImage
    cmd, print_cmd=False, cwd=constants.CROSUTILS_DIR, capture_output=True)
  File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand
    raise RunCommandError(msg, cmd_result)
chromite.lib.cros_build_lib.RunCommandError: return code: 1; command: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s' --safe
mount: wrong fs type, bad option, bad superblock on /dev/loop1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
ERROR   : mount failed: device=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin target=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s format= ro/rw=rw options=offset=4456448000
ERROR   : Thu Nov  2 07:59:07 PDT 2017
ERROR   :  PGID  PPID   PID     ELAPSED     TIME %CPU COMMAND
ERROR   : Arguments of 25425: ./mount_gpt_image.sh '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest' '--image=chromiumos_qemu_image.bin' '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m' '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s' '--safe'
ERROR   : Backtrace:  (most recent call is last)
ERROR   :  mount_gpt_image.sh:427:main(), called: mount_image  
ERROR   :  mount_gpt_image.sh:346:mount_image(), called: die 'Failed to mount all partitions in /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin' 
ERROR   : 
ERROR   : Error was:
ERROR   :   Failed to mount all partitions in /tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin
Current loopback device status:
    /dev/loop0: [0804]:82051281 (/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/chromiumos_qemu_image.bin), offset 2308964352
cmd=['./mount_gpt_image.sh', '--from=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest', '--image=chromiumos_qemu_image.bin', '--rootfs_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/m', '--stateful_mountpt=/tmp/cbuildbot-tmpG7DkI_/tmpZP1QR7DevModeTest/s', '--safe'], cwd=/b/c/cbuild/repository/src/scripts
07:59:07: ERROR: 
!!!VMTests failed!!!
Logs are uploaded in the corresponding vm_test_results_2. This can be found
by clicking on the artifacts link in the "Report" Stage. Specifically look
for the test_harness/failed for the failing tests. For more
particulars, please refer to which test failed i.e. above see the
individual test that failed -- or if an update failed, check the
corresponding update directory.

07:59:07: INFO: Uploading artifacts to Google Storage...
07:59:12: INFO: RunCommand: tar --sparse -I /b/c/cbuild/repository/chroot/usr/bin/pigz -cf /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz . in /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2
07:59:13: INFO: Waiting up to forever for breakpad symbols ...
07:59:13: INFO: Uploading artifacts to Google Storage...
07:59:13: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -r -e -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2 gs://chromeos-image-archive/betty-release/R64-10091.0.0/vm_test_results_2
08:00:09: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/UPLOADED gs://chromeos-image-archive/betty-release/R64-10091.0.0/UPLOADED
07:59:13: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz gs://chromeos-image-archive/betty-release/R64-10091.0.0/vm_test_results_2.tgz
07:59:15: INFO: RunCommand: /b/c/cbuild/repository/.cache/common/gsutil_4.27.tar.gz/gsutil/gsutil -o 'Boto:num_retries=10' -m cp -v -- /b/c/cbuild/repository/buildbot_archive/betty-release/R64-10091.0.0/UPLOADED gs://chromeos-image-archive/betty-release/R64-10091.0.0/UPLOADED
08:00:11: INFO: Parsing test report /b/c/cbuild/repository/chroot/tmp/cbuildbotaEupiv/smoke_suite/test_harness/all/SimpleTestVerify/1_autotest_tests/test_report.log
08:00:12: ERROR: Traceback (most recent call last):
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/generic_stages.py", line 701, in Run
    self.PerformStage()
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 237, in PerformStage
    self._RunTest(test_type, per_test_results_dir)
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 199, in _RunTest
    self._build_root, self._current_board, self.GetImageDirSymlink())
  File "/b/c/cbuild/repository/chromite/cbuildbot/stages/vm_test_stages.py", line 472, in RunDevModeTest
    cros_build_lib.RunCommand(cmd)
  File "/b/c/cbuild/repository/chromite/lib/cros_build_lib.py", line 654, in RunCommand
    raise RunCommandError(msg, cmd_result)
RunCommandError: return code: 1; command: /b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py --verbose betty /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin
cmd=['/b/c/cbuild/repository/src/platform/crostestutils/devmode-test/devinstall_test.py', '--verbose', u'betty', u'/b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_test_image.bin']

I'm not 100% sure this is the same root cause, but I think we're still seeing the same problem today:

https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-release/236
https://luci-milo.appspot.com/buildbot/chromeos/betty-release/674


You should be able to debug this locally using cbuildbot --local

One option we have as a quick fix is just to disable normal VMTests on betty & only run the tast tests. We don't need regular VM tests on each & every canary. But long-term it would be good to be able to run both on a machine :)
"cros tryjob --local", but yeah.
#13 is no longer an option because we've now earmarked betty to be the VMTest board. Unless I'm behind the times, we no longer run VMTests on anything but betty.
From #11, there are two independent problems here. 

'cros debug' is failing in the VM:

07:18:49: INFO: Running VM test cros_vm_test.
07:18:49: INFO: Setting up the VM for testing.

..

07:19:51: INFO: Test to start and debug a new process on the VM device.
07:19:51: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:45729
07:19:59: INFO: Test to attach a running process on the VM device.
07:19:59: ERROR: Failed to find any running process to debug.
07:19:59: INFO: Stopping the VM.
07:20:31: INFO: The VM has been stopped.

------------
And then, on retry, the source image is corrupted.
I suspect that cros_vm_test is launching VM without copying the image anywhere, and then not stopping it cleanly, leading to a corrupt image.

We have a short list of boards that run VMTests now, so it's more than betty, but not many more.

https://cs.corp.google.com/chromeos_public/chromite/cbuildbot/chromeos_config.py?type=cs&q=_vmtest_boards&sq=package:%5Echromeos_(internal%7Cpublic)$&l=753
Actually, looking at the code briefly, I don't think cros_vm_test is failing to cleanup the VM, so I can't explain why that'd corrupt the image.
Nor can I explain why 'cros debug' is failing to find update_engine process on the VM.

So, next suspicion is on devmode test. Trybot without that test: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/release/builds/16922 
#18: That run failed in the same way. :-(

...
16:37:43: INFO: Running VM test dev_mode_test.
16:37:43: INFO: Skipped devmode test
16:37:43: INFO: Running VM test cros_vm_test.
16:37:43: INFO: Setting up the VM for testing.
16:37:43: INFO: RunCommand: parted -m /b/c/cbuild/repository/src/build/images/betty/latest-cbuildbot/chromiumos_qemu_image.bin unit MB print
16:38:41: INFO: VM started at port 58406
16:38:41: INFO: The VM has been successfully set up. Ready to run tests.
16:38:41: INFO: Running test for cros shell.
16:38:41: INFO: Test to use shell command to write a file to the VM device.
16:38:41: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'echo "shell command test file" > /tmp/shell-test'
16:38:43: INFO: Test to use shell command to read a file on the VM device.
16:38:43: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'cat /tmp/shell-test'
16:38:43: INFO: Test to use shell command to remove a file on the VM device.
16:38:43: INFO: RunCommand: cros shell --no-known-hosts ssh://localhost:58406 -- 'rm /tmp/shell-test'
16:38:44: INFO: Test for cros shell passed.
16:38:44: INFO: Running test for cros debug.
16:38:44: INFO: Test to start and debug a new process on the VM device.
16:38:44: INFO: RunCommand: cros debug --exe /bin/bash ssh://localhost:58406
16:38:53: INFO: Test to attach a running process on the VM device.
16:38:53: ERROR: Failed to find any running process to debug.
16:38:53: INFO: Stopping the VM.
16:39:26: INFO: The VM has been stopped.
16:39:26: ERROR: 
!!!VMTests failed!!!

---

Somewhat-tangential question: since making VMTest and TastVMTest run serially, I've noticed that TastVMTest no longer runs when VMTest fails (which is always, right now :-/). I think that's expected: the self._RunStage call that runs VMTestStage is presumably raising an exception that makes us bail out of SimpleBuilder._RunVMTests before running TastVMTestStage.

Is there a recommended way to handle this? For example, if I catch the VMTest exception when it's thrown, is there a way I can reraise it alongside an exception from TastVMTest (if it fails too) later?
Yes it is possible to ignore/reraise/chain/combine exceptions in Python. In cbuildbot we use the CompoundException class for this.

This is starting to make me wonder if the failures in VMTest are entirely unrelated to tast (i.e. even if you ripped out all the tast code, maybe VMTest would still fail because an unrelated regression was submitted). We may want to try reverting tast and see whether things work before re-introducing tast. That might be useful as a trybot test, at least.

I don't think there's any question that the current failures (discussed in #11 and onward) are unrelated. VMTest is failing despite TastVMStage not running at all (due to the issue mentioned at the end of #19).
Take a look at failures_lib.CompoundFailure. You can see an example of it
in parallel.BackgroundFailure.

If you collect the exceptions into a CompoundFailure, then raise it after
both stages finish, that SHOULD work fine. Also, a CompoundFailure commonly
contains a single exception.
Re #19: Not the same way.
Even though cros_vm_test is still failing, VMTest (attempt 2) no longer fails with mounting errors.

So, there seem to be multiple problems here:
[1] cros_vm_test 'cros debug' test is broken.
[2] dev_mode_test is screwing up the source image (even when it passes) in a way that VMTestStage can no longer manipulate the image properly.

These are either regressions or have always failed on betty (we recently switched all VMTests to betty, maybe this never worked?)

betty-paladin VMTestStage passes because it runs neither cros_vm_test nor devmode_test.
Blockedon: 782052 782054 782055
Summary: VMTest broken on betty-release in multiple ways. (was: VMTest fails on betty-release (because TastVMTest runs in parallel?))
[3] Add to that your bug/feature request to continue VMTestStages on failure

Let's deal with these issues separately.
Owner: ----
Status: Available (was: Started)
I've found the reason for failure [1], it's an "interesting" chain of events.

Support for cirrus has recently been removed (Issue 710629,  Issue 733875 ), only virtio_gpu is supported now. 

cros_vm_test doesn't pass the board ID to cros_start_vm, so cros_start_vm picks cirrus by default instead of virtio_gpu for (now obsolete) legacy reasons.

Since graphics don't work, Chrome fails to start => login-prompt-visible is never emitted => boot-complete is not emitted => .... => update_engine is not started => the 'cros debug' test doesn't find the process.

I'll send a CL.

Project Member

Comment 27 by bugdroid1@chromium.org, Nov 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/crosutils/+/c779127055235960caad8c3a430928b3da261e3f

commit c779127055235960caad8c3a430928b3da261e3f
Author: Nicolas Norvez <norvez@chromium.org>
Date: Sat Nov 11 02:45:51 2017

cros_vm_lib: always use virtio_gpu

cirrus support has been removed, virtio_gpu is the only supported
option.
This fixes cros_vm_test which is failing because the board is not
specified by the chromite calling code, so qemu uses cirrus by default.
This in turns breaks Chrome, so system services (e.g. update_engine)
aren't started.

BUG= chromium:779267 
BUG= chromium:782052 
BUG=chromium:710629
TEST=cros_vm_test on novato passes

Change-Id: I87b1e03e8cc60169ad9bc47433c4b270233c60e8
Reviewed-on: https://chromium-review.googlesource.com/761416
Commit-Ready: Nicolas Norvez <norvez@chromium.org>
Tested-by: Nicolas Norvez <norvez@chromium.org>
Reviewed-by: Mike Frysinger <vapier@chromium.org>

[modify] https://crrev.com/c779127055235960caad8c3a430928b3da261e3f/lib/cros_vm_lib.sh

Comment 28 by derat@chromium.org, Nov 15 2017

Owner: norvez@chromium.org
Status: Verified (was: Available)
I think that this is working now -- VMTest has been passing on betty-release since http://uberchromegw/i/chromeos/builders/betty-release/builds/701 at Nov 11 02:09:09.
Blockedon: -782054

Sign in to add a comment