provision_CheetsUpdate failing |
|||||||
Issue descriptionSome of the arc++ presubmit tests are timing out (they are scheduled by ATP) and it's not clear exactly why. There are ready duts in the arc-presubmit pool so I don't think it's a dut resource issue. Example test run: https://atp.googleplex.com/test_runs/19142238 -> http://cautotest/afe/#tab_id=view_job&object_id=197001126 The debug/autoserv.DEBUG indicates that the child jobs were scheduled but then nothing happened for an hour at which the job got killed. 05/03 13:17:40.361 INFO | server_job:0218| INFO ---- Start arc-unit-test timestamp=1525378660 localtime=May 03 13:17:40 05/03 13:17:40.361 DEBUG| suite:1166| Scheduling cheets_ArcInstrumentationTestServer.cheets_ServerAMTest 05/03 13:17:41.048 DEBUG| suite:1450| Adding job keyval for cheets_ArcInstrumentationTestServer.cheets_ServerAMTest=197001254-chromeos-test 05/03 13:17:41.048 DEBUG| suite:1166| Scheduling cheets_ArcNativeTestServer.cheets_NativeTests 05/03 13:17:41.712 DEBUG| suite:1450| Adding job keyval for cheets_ArcNativeTestServer.cheets_NativeTests=197001256-chromeos-test 05/03 13:17:41.712 DEBUG| suite:1166| Scheduling cheets_ArcInstrumentationTestServer.cheets_SystemUIUnitTest 05/03 13:17:42.379 DEBUG| suite:1450| Adding job keyval for cheets_ArcInstrumentationTestServer.cheets_SystemUIUnitTest=197001260-chromeos-test . . . 05/03 14:17:39.350 DEBUG| autoserv:0394| Received SIGTERM Some more examples: https://atp.googleplex.com/test_runs/19145913 -> http://cautotest/afe/#tab_id=view_job&object_id=197015774 https://atp.googleplex.com/test_runs/19063927 -> http://cautotest/afe/#tab_id=view_job&object_id=196730378 Might this be related to crbug.com/839028?
,
May 3 2018
,
May 4 2018
We've had a really bad week from a combination of: https://crbug.com/837714 https://crbug.com/839028 However, both appear to be resolved as of ~ 6 PM on 4/3.
,
May 4 2018
reassign to Ilja. These are still failing but maybe there's a bug in ToT causing in issue during provision. Latest failure: https://atp.googleplex.com/test_runs/19210035 -> http://cautotest/afe/#tab_id=view_job&object_id=197324214 I looked at one of the aborted jobs (link to shard): http://cros-full-0007.mtv.corp.google.com/afe/#tab_id=view_job&object_id=197324219 And I see the following failure in the status.log (http://cros-full-0007.mtv.corp.google.com/results/197324219-chromeos-test/chromeos6-row4-rack16-host14/status.log): FAIL provision_CheetsUpdate provision_CheetsUpdate timestamp=1525466873 localtime=May 04 13:47:53 Pushing Android test build failed due to: DEBUG:root:Extracting artifacts DEBUG:root:Calling: /usr/bin/simg2img /tmp/tmphoy1fe1j/system.img /tmp/tmphoy1fe1j/system.raw.img DEBUG:root:Calling: /usr/bin/sudo /usr/bin/unsquashfs -d /tmp/tmpsnxiolik -no-progress -f /tmp/tmphoy1fe1j/system.raw.img -no-xattrs sepolicy file_contexts plat_file_contexts nonplat_file_contexts system/build.prop Parallel unsquashfs: Using 32 processors 3 inodes (4 blocks) to write created 3 files created 2 directories created 0 symlinks created 0 devices created 0 fifos DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/sepolicy /tmp/tmphoy1fe1j/root/sepolicy DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/root/sepolicy DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/file_contexts /tmp/tmphoy1fe1j/root/file_contexts DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/root/file_contexts DEBUG:root:Skipping /tmp/tmpsnxiolik/plat_file_contexts since it is missing DEBUG:root:Skipping /tmp/tmpsnxiolik/nonplat_file_contexts since it is missing DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/system/build.prop /tmp/tmphoy1fe1j/build.prop DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/build.prop DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpsnxiolik DEBUG:root:Reading build prop file: /tmp/tmphoy1fe1j/build.prop INFO:root:Deleting prop file: /tmp/tmphoy1fe1j/build.prop... DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'which rsync' /usr/local/bin/rsync DEBUG:root:New image's Android SDK version: 25 DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'grep ^CHROMEOS_ARC_ANDROID_SDK_VERSION= /etc/lsb-release' DEBUG:root:Target device's Android SDK version: 25 DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'uname -m' DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'grep ro.build.type= /opt/google/containers/android/rootfs/root/system/build.prop' DEBUG:root:Calling: scp 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml' /tmp/tmpmnl4_c2g/packages.xml scp: /run/containers/android*/root/data/system/packages.xml: No such file or directory ERROR:root:Error while executing scp 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml' /tmp/tmpmnl4_c2g/packages.xml ERROR:root:None DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpmnl4_c2g WARNING:root:There was an error getting the remote device APK key signature Command '('scp', 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml', '/tmp/tmpmnl4_c2g/packages.xml')' returned non-zero exit status 1. Assuming APK key signature is 'unknown' DEBUG:root:device apk key: unknown; build variant: user -> user INFO:root:Converting images to raw images... DEBUG:root:Calling: /usr/bin/simg2img /tmp/tmphoy1fe1j/system.img /tmp/tmphoy1fe1j/system.raw.img DEBUG:root:Is the file system squashfs? True DEBUG:root:Calling: /usr/bin/sudo /usr/bin/unsquashfs -force -d /tmp/tmpcfh5iih4 /tmp/tmphoy1fe1j/system.raw.img Parallel unsquashfs: Using 32 processors 1307 inodes (9514 blocks) to write [=================- ] 2800/9514 29% [========================\ ] 3800/9514 39% [=============================================================-] 9514/9514 100% created 1139 files created 277 directories created 168 symlinks created 0 devices created 0 fifos DEBUG:root:Found file_contexts in image DEBUG:root:Calling: cp /tmp/tmpcfh5iih4/file_contexts /tmp/tmpw7kddch2/file_contexts DEBUG:root:Calling: /usr/bin/sudo /usr/local/autotest/server/site_tests/provision_CheetsUpdate/mksquashfs /tmp/tmpcfh5iih4 /tmp/tmphoy1fe1j/system.raw.img -no-progress -comp gzip -no-exports -noappend -mount-point / -product-out /tmp/tmphoy1fe1j -context-file /tmp/tmpw7kddch2/file_contexts -no-recovery -no-fragments -no-duplicates -b 131072 -t 0 Parallel mksquashfs: Using 32 processors Creating 4.0 filesystem on /tmp/tmphoy1fe1j/system.raw.img, block size 131072. Squashfs 4.0 filesystem, gzip compressed, data block size 131072 compressed data, compressed metadata, no fragments, compressed xattrs duplicates are not removed Filesystem size 475737.21 Kbytes (464.59 Mbytes) 43.72% of uncompressed filesystem size (1088142.22 Kbytes) Inode table size 43133 bytes (42.12 Kbytes) 36.46% of uncompressed inode table size (118315 bytes) Directory table size 17480 bytes (17.07 Kbytes) 45.67% of uncompressed directory table size (38273 bytes) Xattr table size 525 bytes (0.51 Kbytes) 19.31% of uncompressed xattr table size (2719 bytes) No duplicate files removed Number of inodes 1584 Number of files 1139 Number of symbolic links 168 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 277 Number of ids (unique uids + gids) 7 Number of uids 3 unknown (655360) unknown (656360) root (0) Number of gids 7 unknown (655360) unknown (656360) unknown (657361) unknown (657360) unknown (656388) unknown (656363) root (0) DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpcfh5iih4 DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'sha1sum /etc/selinux/arc/policy/policy.30' DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'sha1sum /etc/selinux/arc/contexts/files/android_file_contexts' DEBUG:root:Skipping /tmp/tmphoy1fe1j/root/plat_file_contexts since it is missing DEBUG:root:Skipping /tmp/tmphoy1fe1j/root/nonplat_file_contexts since it is missing DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'echo $(( $(df --output=avail --local --block-size 1 / | tail -n1) + $(du --bytes /opt/google/containers/android/system.raw.img | awk '"'"'{print $1}'"'"') ))' INFO:root:Setting up ChromeOS device to image-writable... DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'stop arc-system-mount || true' arc-system-mount stop/waiting DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'mount -o remount,rw /' DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'if [ -L /opt/google/containers/android/system.raw.img ]; then rm /opt/google/containers/android/system.raw.img;fi' DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(export ANDROID_DEBUGGABLE=\).*/\10/" /etc/init/arc-setup-env' DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(export SHARE_FONTS=\).*/\10/" /etc/init/arc-setup-env' INFO:root:Syncing image files to ChromeOS... DEBUG:root:Calling: rsync --inplace -v --progress /tmp/tmphoy1fe1j/system.raw.img root@chromeos6-row4-rack16-host14:/opt/google/containers/android system.raw.img 44,128 0% 0.00kB/s 0:00:00 84,402,880 17% 80.45MB/s 0:00:04 175,544,368 36% 83.73MB/s 0:00:03 229,603,166 47% 73.02MB/s 0:00:03 332,680,992 68% 79.37MB/s 0:00:01 431,833,712 88% 82.88MB/s 0:00:00 487,157,760 100% 78.67MB/s 0:00:05 (xfr#1, to-chk=0/1) sent 121,940,377 bytes received 154,603 bytes 10,616,954.78 bytes/sec total size is 487,157,760 speedup is 3.99 INFO:root:Updating CHROMEOS_ARC_VERSION... DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(CHROMEOS_ARC_VERSION=\).*/\1cheets_arm-img-P5911288/" /etc/lsb-release' INFO:root:*** SELinux policy is not updated. Restarting ui. *** DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'set -e mount -o remount,ro / start ui start arc-system-mount' start: Job is already running: ui ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 'set -e mount -o remount,ro / start ui start arc-system-mount' ERROR:root:None ERROR:root:Failed to cleanly restart ui, fall back to reboot Traceback (most recent call last): File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 643, in __exit__ 'start arc-system-mount', File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 95, in check_call lib.util.check_call(dryrun=self._dryrun, *command) File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/lib/util.py", line 46, in check_call return subprocess.check_call(subprocess_args, **kwargs) File "/usr/lib/python3.4/subprocess.py", line 561, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '('ssh', 'root@chromeos6-row4-rack16-host14', 'set -e\nmount -o remount,ro /\nstart ui\nstart arc-system-mount')' returned non-zero exit status 1 INFO:root:*** Reboot required. *** DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 reboot ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 reboot ERROR:root:None Traceback (most recent call last): File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 1461, in <module> main() File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 1457, in main _update_selinux_policy(remote_proxy, out) File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 654, in __exit__ self._remote_proxy.check_call('reboot') File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 95, in check_call lib.util.check_call(dryrun=self._dryrun, *command) File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/lib/util.py", line 46, in check_call return subprocess.check_call(subprocess_args, **kwargs) File "/usr/lib/python3.4/subprocess.py", line 561, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '('ssh', 'root@chromeos6-row4-rack16-host14', 'reboot')' returned non-zero exit status 255 END FAIL provision_CheetsUpdate provision_CheetsUpdate timestamp=1525466873 localtime=May 04 13:47:53
,
May 4 2018
Found this bug: https://b.corp.google.com/issues/77494303
,
May 4 2018
hmm... I'm a bit lost, the jobs get scheduled but they don't always run on the shard (cros-full-0007). Hoping Ilja/Don can figure out this mystery.
,
May 4 2018
The server has not been affected by overload https://viceroy.corp.google.com/chromeos/machines?hostname=cros-full-0007&duration=8d&mdb_role=chrome-infra&refresh=-1 The failure above is during installation of an image (provision) and then (maybe) merging Android on it. --- start arc-system-mount' start: Job is already running: ui ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 'set -e mount -o remount,ro / start ui start arc-system-mount' ERROR:root:None ERROR:root:Failed to cleanly restart ui, fall back to reboot --- So, basically find the image, install it locally, see why things are failing. It might be that the Chrome OS base image is bad. It might be there is an incompatibility between it and the new Android. It might be that the new Android blows things up. I am not really the most familiar with this particular test.
,
May 4 2018
Luis might be able to tell if something is related to bad images from the logs.
,
May 4 2018
I think phshah@ has been running into this locally. I asked him to provide more logs, let's see if that can help us pinpoint the root cause.
,
May 4 2018
Recently, I've been able to reproduce this consistently. I've added logs to b/77494303.
,
Jan 7
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by kevcheng@chromium.org
, May 3 2018