VMTest SimpleTestUpdateAndVerify failed on chrome/android pfq |
||||||||||
Issue descriptionThe last three android-pfq builders for betty and betty-arc64 have failed VM tests. It looks like the auto-update succeeds, but then the VM never reboots. The previous build that passed didn't run VM tests, so it seems possible that a bad CL went through without getting caught. https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-nyc-android-pfq/624 https://luci-milo.appspot.com/buildbot/chromeos/betty-nyc-android-pfq/1271
,
Jan 24 2018
,
Jan 24 2018
chrome PFQ is also affected: https://luci-milo.appspot.com/buildbot/chromeos/betty-chrome-pfq/953 Based on both PFQs: last good: 10334.0.0-rc1 first bad: 10335.0.0-rc2
,
Jan 25 2018
I built betty locally and it fails (test) in 10334.0.0
bin/ctest --board=betty --type=vm --no_graphics --verbose --target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --quick_update --ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa
2018/01/24 16:43:03 - ctest - INFO : Could not find a latest image to use. Using target instead.
2018/01/24 16:43:03 - cros_build_lib.py - INFO : RunCommand: ../platform/crostestutils/generate_test_payloads/cros_generate_test_payloads.py '--target=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--base=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--board=betty' '--jobs=1' --basic_suite in /mnt/host/source/src/scripts
2018/01/24 16:43:04 - cros_generate_test_payloads.py - INFO : Generating updates required for this test suite in parallel.
2018/01/24 16:43:04 - cros_generate_test_payloads.py - INFO : Generating an unsigned delta payload from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin and not patching the kernel.
2018/01/24 16:43:52 - cros_generate_test_payloads.py - INFO : Dumping /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/update.cache
2018/01/24 16:43:52 - cros_build_lib.py - INFO : RunCommand: /mnt/host/source/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py '--base_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin' '--board=betty' '--type=vm' '--remote=0.0.0.0' --verbose '--jobs=1' '--ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa' '--test_prefix=SimpleTestUpdateAndVerify' --no_graphics in /mnt/host/source/src/scripts
2018/01/24 16:43:52 - cros_au_test_harness.py - INFO : Loading update cache from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/update.cache
2018/01/24 16:43:53 - dev_server_wrapper.py - DEBUG : Retrieving http://127.0.0.1:8080/check_health
2018/01/24 16:43:53 - cros_build_lib.py - DEBUG : RunCommand: sudo 'CROS_SUDO_KEEP_ALIVE=/dev/pts/10:unknown' -- start_devserver --pidfile /mnt/host/source/chroot/tmp/au_test_harnesslek8sG/devserver_wrapperLZbTvl --logfile /mnt/host/source/chroot/tmp/au_test_harnesslek8sG/dev_server.log '--port=8080' --critical_update in /mnt/host/source
2018/01/24 16:43:58 - dev_server_wrapper.py - DEBUG : Retrieving http://127.0.0.1:8080/check_health
2018/01/24 16:44:19 - au_worker.py - INFO : SimpleTestUpdateAndVerify: Copied shared disk image /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /tmp/chromiumos_qemu_disk.bin.sEGyTv.
2018/01/24 16:44:26 - au_worker.py - INFO : SimpleTestUpdateAndVerify: Performing a delta update from /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin to /mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_qemu_image.bin in a VM
2018/01/24 16:44:26 - au_worker.py - WARNING : None
E
======================================================================
ERROR: SimpleTestUpdateAndVerify (crostestutils.au_test_harness.au_test.AUTest)
Test that updates to itself.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_test.py", line 221, in SimpleTestUpdateAndVerify
self.worker.PerformUpdate(target_image_path, target_image_path)
File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_worker.py", line 153, in PerformUpdate
key_to_use)
File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/vm_au_worker.py", line 115, in UpdateImage
self.RunUpdateCmd(cmd, log_directory)
File "/mnt/host/source/src/platform/crostestutils/au_test_harness/../../crostestutils/au_test_harness/au_worker.py", line 238, in RunUpdateCmd
raise update_exception.UpdateException(result.returncode, 'Update failed')
UpdateException: Update failed
----------------------------------------------------------------------
Ran 1 test in 73.278s
FAILED (errors=1)
2018/01/24 16:45:11 - cros_build_lib.py - ERROR : Test harness failed.
2018/01/24 16:45:11 - dev_server_wrapper.py - DEBUG : Stopping devserver instance with pid 131212
2018/01/24 16:45:11 - cros_build_lib.py - DEBUG : RunCommand: sudo 'CROS_SUDO_KEEP_ALIVE=/dev/pts/10:unknown' -- kill 131212
2018/01/24 16:45:12 - cros_build_lib.py - ERROR : /mnt/host/source/src/platform/crostestutils/au_test_harness/cros_au_test_harness.py --base_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --target_image=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/chromiumos_test_image.bin --board=betty --type=vm --remote=0.0.0.0 --verbose --jobs=1 --ssh_private_key=/mnt/host/source/src/build/images/betty/R66-10334.0.2018_01_24_1531-a1/id_rsa --test_prefix=SimpleTestUpdateAndVerify --no_graphics exited with code 1: None
,
Jan 25 2018
Could set my env to get this test executed. Doing bisec now.
,
Jan 25 2018
Issue 805464 has been merged into this issue.
,
Jan 25 2018
that is bad. Test itself is flaky :( I see passing and failing cases for 10330. Not sure how to find the problem.
,
Jan 25 2018
,
Jan 25 2018
With several retries I can pass test on versions that failed on bot. I also tool a look: https://cros-goldeneye.corp.google.com/chromeos/console/listBuild?boards=betty#/ There 10337.0.0-10339.0.0 were failed due this problem However 10340.0.0 is good.
,
Jan 25 2018
I am also trying to remote bisect on amd64-generic-chromium-pfq builder: https://uberchromegw.corp.google.com/i/chromeos/waterfall?show=amd64-generic-chromium-pfq I don't have experience on identifying failures on vmtest. +Steven for suggestion as well.
,
Jan 25 2018
,
Jan 25 2018
amd64-generic-chromium-pfq https://luci-milo.appspot.com/buildbot/chromeos/amd64-generic-chromium-pfq/11394 failed with the same symptoms.
,
Jan 26 2018
The error message is "can not connect to device"
erforming standard stateful update.
INFO : Rebooting 127.0.0.1...
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
7s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
12s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
17s: can not connect to device
Connection to 127.0.0.1 closed by remote host.
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
22s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
+infra deputy, to check if this is infra problem
,
Jan 26 2018
Add more people for insights of it, this is now a major blocker of chrome/android pfq
,
Jan 26 2018
flags:WARN flag name ([no]ssh_port) already defined
INFO : QEMU binary: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64
INFO : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Starting a KVM instance
INFO : Set kvm_smp to min(8, 40).
INFO : Launching: /b/c/cbuild/repository/chroot/usr/bin/qemu-system-x86_64 -enable-kvm -m 8G -smp 8 -vga virtio -pidfile /tmp/kvm.9227 -chardev pipe,id=control_pipe,path=/tmp/kvm.9227.monitor -serial file:/tmp/kvm.9227.serial -mon chardev=control_pipe -daemonize -net nic,model=virtio,vlan=9227 -display none -net user,hostfwd=tcp:127.0.0.1:9227-:22,vlan=9227 -drive file=/tmp/cbuildbot-tmpjw892l/chromiumos_qemu_disk.bin.YLJGh0,index=0,media=disk,cache=unsafe
WARNING: Image format was not specified for '/tmp/cbuildbot-tmpjw892l/chromiumos_qemu_disk.bin.YLJGh0' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
INFO : KVM started with pid stored in /tmp/kvm.9227
INFO : Serial output, if available, can be found here in /tmp/kvm.9227.serial
!!! You are using a deprecated script !!!
Please use 'cros flash' in the future. See 'cros flash -h' for the details.
More information available in the link below.
https://sites.google.com/a/chromium.org/dev/chromium-os/build/cros-flash
Iteration: 1 of 1
INFO : Omaha URL: --omaha_url=http://192.168.120.66:8080/update/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
INFO : Forcing update independent of the current version
INFO : Waiting to initiate contact with the update_engine.
INFO : Starting update using args --update --omaha_url=http://192.168.120.66:8080/update/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
INFO : Devserver handling ping. Check dev_server.log for more info.
[0123/215127:INFO:update_engine_client.cc(486)] Forcing an update by setting app_version to ForcedUpdate.
[0123/215127:INFO:update_engine_client.cc(488)] Initiating update check and install.
[0123/215127:INFO:update_engine_client.cc(517)] Waiting for update to complete.
INFO : State of updater has changed to: UPDATE_STATUS_UPDATE_AVAILABLE
INFO : State of updater has changed to: UPDATE_STATUS_FINALIZING
......[0123/215218:INFO:update_engine_client.cc(239)] Update succeeded -- reboot needed.
INFO : Autoupdate was successful.
INFO : Starting stateful update using URL http://192.168.120.66:8080/static/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd
Downloading stateful payload from http://192.168.120.66:8080/static/cache/0aa244ba02b3a81827cb68a16d6b5ecd_0aa244ba02b3a81827cb68a16d6b5ecd/stateful.tgz
Downloading command returns code 0.
Successfully downloaded update.
Performing standard stateful update.
INFO : Rebooting 127.0.0.1...
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
7s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
12s: can not connect to device
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
17s: can not connect to device
Connection to 127.0.0.1 closed by remote host.
ssh: connect to host 9227 port 9227: Invalid argument
ssh: connect to host 9227 port 9227: Invalid argument
22s: can not connect to device
-----------------------------
Attaching the full log here. I see this line "Connection to 127.0.0.1 closed by remote host" and followed with many ssh ping tries.
,
Jan 26 2018
can a recent change to remote_access be related https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/872461
,
Jan 26 2018
go/crosland shows it is landed in 10336.0.0. chrome-pfq starts failing on R66-10336.0.0-rc3. I was told android side starts failing on 10335...
,
Jan 26 2018
yes: https://luci-milo.appspot.com/buildbot/chromeos/betty-nyc-android-pfq/1269 10335.0.0-rc2
,
Jan 26 2018
yep, see this https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-nyc-android-pfq/622 R66-10335.0.0-rc2
,
Jan 26 2018
yeah I agree, but the failure pattern looks quite related to me... The failing connection is from this _check_if_rebooted() call https://chromium.googlesource.com/chromiumos/platform/crosutils/+/master/remote_access.sh#198 which invokes $(ssh_connect_settings) with no argument, but the change https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/872461 now looks requiring to pass -p or -P to $(ssh_connect_settings), and lack of which will make `ssh 9227` where it should have been `ssh -p 9227`
,
Jan 26 2018
In https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-nyc-android-pfq/622, the InitSDK step is fetching crosutils-0.0.1-r2084 (Completed dev-util/crosutils-0.0.1-r2084 (in 0m5.5s)) and r2084 is https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/6d011365f42bd4436c82e0510f71dcde5a2a7142/dev-util/crosutils/crosutils-0.0.1-r2084.ebuild CROS_WORKON_COMMIT="8caf9881ef86f9bbc3e5cc21477d17537d83b3b0" which is https://chromium.googlesource.com/chromiumos/platform/crosutils/+/8caf9881ef86f9bbc3e5cc21477d17537d83b3b0 = https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/872461
,
Jan 26 2018
Fixing CL https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/888238 along the line of #16 - #21
,
Jan 26 2018
Rather, the revert by warx@ is in CQ https://chromium-review.googlesource.com/c/chromiumos/platform/crosutils/+/888158
,
Jan 26 2018
Failed builders turned green. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by bmgordon@chromium.org
, Jan 24 2018Owner: khmel@chromium.org