New issue
Advanced search Search tips
Starred by 2 users
Status: Fixed
Owner:
Closed: Jan 26
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment
VMTest SimpleTestUpdateAndVerify failed on chrome/android pfq
Project Member Reported by bmgordon@chromium.org, Jan 24 Back to list
The last three android-pfq builders for betty and betty-arc64 have failed VM tests.  It looks like the auto-update succeeds, but then the VM never reboots.

The previous build that passed didn't run VM tests, so it seems possible that a bad CL went through without getting caught.

https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-nyc-android-pfq/624

https://luci-milo.appspot.com/buildbot/chromeos/betty-nyc-android-pfq/1271
 
Cc: kinaba@chromium.org
Owner: khmel@chromium.org
Assigning to ARC constable for further triage.
Status: Started
chrome PFQ is also affected:
https://luci-milo.appspot.com/buildbot/chromeos/betty-chrome-pfq/953

Based on both PFQs:

last good: 10334.0.0-rc1
first bad: 10335.0.0-rc2


I built betty locally and it fails (test) in  10334.0.0

bin/ctest --board=betty --type=vm --no_graphics --verbose --target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --quick_update --ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa

2018/01/24 16:43:03 - ctest - INFO    : Could not find a latest image to use. Using target instead.
2018/01/24 16:43:03 - cros_build_lib.py - INFO    : RunCommand: ../platform/crostestutils/generate_test_payloads/cros_generate_test_payloads.py '--target=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--base=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--board=betty' '--jobs=1' --basic_suite in /mnt/host/source/src/scripts
2018/01/24 16:43:04 - cros_generate_test_payloads.py - INFO    : Generating updates required for this test suite in parallel.
2018/01/24 16:43:04 - cros_generate_test_payloads.py - INFO    : Generating an unsigned delta payload from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin and not patching the kernel.
2018/01/24 16:43:52 - cros_generate_test_payloads.py - INFO    : Dumping /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/update.cache
2018/01/24 16:43:52 - cros_build_lib.py - INFO    : RunCommand: /mnt/host/source/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py '--base_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--board=betty' '--type=vm' '--remote=0.0.0.0' --verbose '--jobs=1' '--ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa' '--test_prefix=SimpleTestUpdateAndVerify' --no_graphics in /mnt/host/source/src/scripts
2018/01/24 16:43:52 - cros_au_test_harness.py - INFO    : Loading update cache from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/update.cache
2018/01/24 16:43:53 - dev_server_wrapper.py - DEBUG   : Retrieving http://127.0.0.1:8080/check_health
2018/01/24 16:43:53 - cros_build_lib.py - DEBUG   : RunCommand: sudo 'CROS_SUDO_KEEP_ALIVE=/dev/pts/10:unknown' -- start_devserver --pidfile /mnt/host/source/chroot/tmp/au_test_harnesslek8sG/devserver_wrapperLZbTvl --logfile /mnt/host/source/chroot/tmp/au_test_harnesslek8sG/dev_server.log '--port=8080' --critical_update in /mnt/host/source
2018/01/24 16:43:58 - dev_server_wrapper.py - DEBUG   : Retrieving http://127.0.0.1:8080/check_health
2018/01/24 16:44:19 - au_worker.py - INFO    : SimpleTestUpdateAndVerify: Copied shared disk image /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /tmp/chromiumos_qemu_disk.bin.sEGyTv.
2018/01/24 16:44:26 - au_worker.py - INFO    : SimpleTestUpdateAndVerify: Performing a delta update from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin in a VM
2018/01/24 16:44:26 - au_worker.py - WARNING : None
E
======================================================================
ERROR: SimpleTestUpdateAndVerify (crostestutils.au_test_harness.au_test.AUTest)
Test that updates to itself.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_test.py", line 221, in SimpleTestUpdateAndVerify
    self.worker.PerformUpdate(target_image_path, target_image_path)
  File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_worker.py", line 153, in PerformUpdate
    key_to_use)
  File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/vm_au_worker.py", line 115, in UpdateImage
    self.RunUpdateCmd(cmd, log_directory)
  File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_worker.py", line 238, in RunUpdateCmd
    raise update_exception.UpdateException(result.returncode, 'Update failed')
UpdateException: Update failed

----------------------------------------------------------------------
Ran 1 test in 73.278s

FAILED (errors=1)
2018/01/24 16:45:11 - cros_build_lib.py - ERROR   : Test harness failed.
2018/01/24 16:45:11 - dev_server_wrapper.py - DEBUG   : Stopping devserver instance with pid 131212
2018/01/24 16:45:11 - cros_build_lib.py - DEBUG   : RunCommand: sudo 'CROS_SUDO_KEEP_ALIVE=/dev/pts/10:unknown' -- kill 131212
2018/01/24 16:45:12 - cros_build_lib.py - ERROR   : /mnt/host/source/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py --base_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --board=betty --type=vm --remote=0.0.0.0 --verbose --jobs=1 --ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa --test_prefix=SimpleTestUpdateAndVerify --no_graphics exited with code 1: None



Could set my env to get this test executed.

Doing bisec now.


Issue 805464 has been merged into this issue.
that is bad. Test itself is flaky :(

I see passing and failing cases for 10330. Not sure how to find the problem.
Cc: achuith@chromium.org
With several retries I can pass test on versions that failed on bot.
I also tool a look:
https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?boards=betty#/

There 	10337.0.0-10339.0.0 were failed due this problem
However 10340.0.0 is good.


Cc: steve...@chromium.org
Labels: Build-PFQ-Failures
I am also trying to remote bisect on amd64-generic-chromium-pfq builder: https://uberchromegw.corp.google.com/i/chromeos/waterfall?show=amd64-generic-chromium-pfq

I don't have experience on identifying failures on vmtest. +Steven for suggestion as well.


Cc: malaykeshav@chromium.org
amd64-generic-chromium-pfq
https://luci-milo.appspot.com/buildbot/chromeos/amd64-generic-chromium-pfq/11394
failed with the same symptoms.
Cc: akes...@chromium.org
The error message is "can not connect to device"

erforming standard stateful update.
INFO    : Rebooting 127.0.0.1...
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
      7s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     12s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     17s: can not connect to device
Connection to 127.0.0.1 closed by remote host.
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     22s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument

+infra deputy, to check if this is infra problem
Cc: ihf@chromium.org norvez@chromium.org vapier@chromium.org khmel@chromium.org
Owner: warx@chromium.org
Add more people for insights of it, this is now a major blocker of chrome/android pfq
Summary: VMTest SimpleTestUpdateAndVerify failed on chrome/android pfq (was: betty android-pfq builders failing)
flags:WARN flag name ([no]ssh_port) already defined
INFO    : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Starting a KVM instance
INFO    : Set kvm_smp to min(8, 40).
INFO    : Launching: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 -enable-kvm -m 8G -smp 8 -vga virtio -pidfile /tmp/kvm.9227 -chardev pipe,id=control_pipe,path=/tmp/kvm.9227.monitor -serial file:/tmp/kvm.9227.serial -mon chardev=control_pipe -daemonize -net nic,model=virtio,vlan=9227 -display none -net user,hostfwd=tcp:127.0.0.1:9227-:22,vlan=9227 -drive file=/tmp/cbuildbot-tmpjw892l/chromiumos_qemu_disk.bin.YLJGh0,index=0,media=disk,cache=unsafe
WARNING: Image format was not specified for '/tmp/cbuildbot-tmpjw892l/chromiumos_qemu_disk.bin.YLJGh0' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
INFO    : KVM started with pid stored in /tmp/kvm.9227
INFO    : Serial output, if available, can be found here in /tmp/kvm.9227.serial

!!! You are using a deprecated script !!!

Please use 'cros flash' in the future. See 'cros flash -h' for the details.
More information available in the link below.
https://sites.google.com/a/chromium.org/dev/chromium-os/build/cros-flash

Iteration:  1 of 1
INFO    : Omaha URL:  --omaha_url=http://192.168.120.66:8080/update/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
INFO    : Forcing update independent of the current version
INFO    : Waiting to initiate contact with the update_engine.
INFO    : Starting update using args --update --omaha_url=http://192.168.120.66:8080/update/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
INFO    : Devserver handling ping.  Check dev_server.log for more info.
[0123/215127:INFO:update_engine_client.cc(486)] Forcing an update by setting app_version to ForcedUpdate.
[0123/215127:INFO:update_engine_client.cc(488)] Initiating update check and install.
[0123/215127:INFO:update_engine_client.cc(517)] Waiting for update to complete.
INFO    : State of updater has changed to: UPDATE_STATUS_UPDATE_AVAILABLE
INFO    : State of updater has changed to: UPDATE_STATUS_FINALIZING
......[0123/215218:INFO:update_engine_client.cc(239)] Update succeeded -- reboot needed.
INFO    : Autoupdate was successful.
INFO    : Starting stateful update using URL http://192.168.120.66:8080/static/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
Downloading stateful payload from http://192.168.120.66:8080/static/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd/stateful.tgz
Downloading command returns code 0.
Successfully downloaded update.
Performing standard stateful update.
INFO    : Rebooting 127.0.0.1...
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
      7s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     12s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     17s: can not connect to device
Connection to 127.0.0.1 closed by remote host.
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
     22s: can not connect to device

-----------------------------
Attaching the full log here. I see this line "Connection to 127.0.0.1 closed by remote host" and followed with many ssh ping tries.
Cc: hoegsberg@chromium.org
can a recent change to remote_access be related 
https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/872461
go/crosland shows it is landed in 10336.0.0.

chrome-pfq starts failing on R66-10336.0.0-rc3. I was told android side starts failing on 10335...
yeah I agree, but the failure pattern looks quite related to me...

The failing connection is from this _check_if_rebooted() call
https://chromium.googlesource.com/chromiumos/platform/crosutils/+/master/remote_access.sh#198
which invokes $(ssh_connect_settings) with no argument, but the change
https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/872461
now looks requiring to pass -p or -P to $(ssh_connect_settings), and lack of which will make `ssh 9227` where it should have been `ssh -p 9227`
Status: Fixed
Failed builders turned green.
Sign in to add a comment