New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 839621 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

provision_CheetsUpdate failing

Project Member Reported by kevcheng@chromium.org, May 3 2018

Issue description

Some of the arc++ presubmit tests are timing out (they are scheduled by ATP) and it's not clear exactly why. There are ready duts in the arc-presubmit pool so I don't think it's a dut resource issue.

Example test run:
https://atp.googleplex.com/test_runs/19142238 -> http://cautotest/afe/#tab_id=view_job&object_id=197001126

The debug/autoserv.DEBUG indicates that the child jobs were scheduled but then nothing happened for an hour at which the job got killed.


05/03 13:17:40.361 INFO |        server_job:0218| INFO	----	Start arc-unit-test	timestamp=1525378660	localtime=May 03 13:17:40	
05/03 13:17:40.361 DEBUG|             suite:1166| Scheduling cheets_ArcInstrumentationTestServer.cheets_ServerAMTest
05/03 13:17:41.048 DEBUG|             suite:1450| Adding job keyval for cheets_ArcInstrumentationTestServer.cheets_ServerAMTest=197001254-chromeos-test
05/03 13:17:41.048 DEBUG|             suite:1166| Scheduling cheets_ArcNativeTestServer.cheets_NativeTests
05/03 13:17:41.712 DEBUG|             suite:1450| Adding job keyval for cheets_ArcNativeTestServer.cheets_NativeTests=197001256-chromeos-test
05/03 13:17:41.712 DEBUG|             suite:1166| Scheduling cheets_ArcInstrumentationTestServer.cheets_SystemUIUnitTest
05/03 13:17:42.379 DEBUG|             suite:1450| Adding job keyval for cheets_ArcInstrumentationTestServer.cheets_SystemUIUnitTest=197001260-chromeos-test
.
.
.
05/03 14:17:39.350 DEBUG|          autoserv:0394| Received SIGTERM

Some more examples:
https://atp.googleplex.com/test_runs/19145913 -> http://cautotest/afe/#tab_id=view_job&object_id=197015774
https://atp.googleplex.com/test_runs/19063927 -> http://cautotest/afe/#tab_id=view_job&object_id=196730378

Might this be related to crbug.com/839028?
 
Components: Infra>Client>ChromeOS>Test
Labels: -Pri-3 Pri-1
We've had a really bad week from a combination of:
  https://crbug.com/837714
  https://crbug.com/839028

However, both appear to be resolved as of ~ 6 PM on 4/3.
Cc: dgarr...@chromium.org
Owner: ihf@chromium.org
reassign to Ilja.

These are still failing but maybe there's a bug in ToT causing in issue during provision. 

Latest failure:
https://atp.googleplex.com/test_runs/19210035 -> http://cautotest/afe/#tab_id=view_job&object_id=197324214

I looked at one of the aborted jobs (link to shard): http://cros-full-0007.mtv.corp.google.com/afe/#tab_id=view_job&object_id=197324219

And I see the following failure in the status.log (http://cros-full-0007.mtv.corp.google.com/results/197324219-chromeos-test/chromeos6-row4-rack16-host14/status.log):
		FAIL	provision_CheetsUpdate	provision_CheetsUpdate	timestamp=1525466873	localtime=May 04 13:47:53	Pushing Android test build failed due to: DEBUG:root:Extracting artifacts
  DEBUG:root:Calling: /usr/bin/simg2img /tmp/tmphoy1fe1j/system.img /tmp/tmphoy1fe1j/system.raw.img
  DEBUG:root:Calling: /usr/bin/sudo /usr/bin/unsquashfs -d /tmp/tmpsnxiolik -no-progress -f /tmp/tmphoy1fe1j/system.raw.img -no-xattrs sepolicy file_contexts plat_file_contexts nonplat_file_contexts system/build.prop
  Parallel unsquashfs: Using 32 processors
  3 inodes (4 blocks) to write
  
  
  created 3 files
  created 2 directories
  created 0 symlinks
  created 0 devices
  created 0 fifos
  DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/sepolicy /tmp/tmphoy1fe1j/root/sepolicy
  DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/root/sepolicy
  DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/file_contexts /tmp/tmphoy1fe1j/root/file_contexts
  DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/root/file_contexts
  DEBUG:root:Skipping /tmp/tmpsnxiolik/plat_file_contexts since it is missing
  DEBUG:root:Skipping /tmp/tmpsnxiolik/nonplat_file_contexts since it is missing
  DEBUG:root:Calling: /usr/bin/sudo cp /tmp/tmpsnxiolik/system/build.prop /tmp/tmphoy1fe1j/build.prop
  DEBUG:root:Calling: /usr/bin/sudo chown chromeos-test /tmp/tmphoy1fe1j/build.prop
  DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpsnxiolik
  DEBUG:root:Reading build prop file: /tmp/tmphoy1fe1j/build.prop
  INFO:root:Deleting prop file: /tmp/tmphoy1fe1j/build.prop...
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'which rsync'
  /usr/local/bin/rsync
  DEBUG:root:New image's Android SDK version: 25
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'grep ^CHROMEOS_ARC_ANDROID_SDK_VERSION= /etc/lsb-release'
  DEBUG:root:Target device's Android SDK version: 25
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'uname -m'
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'grep ro.build.type= /opt/google/containers/android/rootfs/root/system/build.prop'
  DEBUG:root:Calling: scp 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml' /tmp/tmpmnl4_c2g/packages.xml
  scp: /run/containers/android*/root/data/system/packages.xml: No such file or directory
  ERROR:root:Error while executing scp 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml' /tmp/tmpmnl4_c2g/packages.xml
  ERROR:root:None
  DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpmnl4_c2g
  WARNING:root:There was an error getting the remote device APK key signature Command '('scp', 'root@chromeos6-row4-rack16-host14:/run/containers/android*/root/data/system/packages.xml', '/tmp/tmpmnl4_c2g/packages.xml')' returned non-zero exit status 1. Assuming APK key signature is 'unknown'
  DEBUG:root:device apk key: unknown; build variant: user -> user
  INFO:root:Converting images to raw images...
  DEBUG:root:Calling: /usr/bin/simg2img /tmp/tmphoy1fe1j/system.img /tmp/tmphoy1fe1j/system.raw.img
  DEBUG:root:Is the file system squashfs? True
  DEBUG:root:Calling: /usr/bin/sudo /usr/bin/unsquashfs -force -d /tmp/tmpcfh5iih4 /tmp/tmphoy1fe1j/system.raw.img
  Parallel unsquashfs: Using 32 processors
  1307 inodes (9514 blocks) to write
  
  
  [=================-                                            ] 2800/9514  29%
  [========================\                                     ] 3800/9514  39%
  [=============================================================-] 9514/9514 100%
  created 1139 files
  created 277 directories
  created 168 symlinks
  created 0 devices
  created 0 fifos
  DEBUG:root:Found file_contexts in image
  DEBUG:root:Calling: cp /tmp/tmpcfh5iih4/file_contexts /tmp/tmpw7kddch2/file_contexts
  DEBUG:root:Calling: /usr/bin/sudo /usr/local/autotest/server/site_tests/provision_CheetsUpdate/mksquashfs /tmp/tmpcfh5iih4 /tmp/tmphoy1fe1j/system.raw.img -no-progress -comp gzip -no-exports -noappend -mount-point / -product-out /tmp/tmphoy1fe1j -context-file /tmp/tmpw7kddch2/file_contexts -no-recovery -no-fragments -no-duplicates -b 131072 -t 0
  Parallel mksquashfs: Using 32 processors
  Creating 4.0 filesystem on /tmp/tmphoy1fe1j/system.raw.img, block size 131072.
  
  Squashfs 4.0 filesystem, gzip compressed, data block size 131072
  	compressed data, compressed metadata, no fragments, compressed xattrs
  	duplicates are not removed
  Filesystem size 475737.21 Kbytes (464.59 Mbytes)
  	43.72% of uncompressed filesystem size (1088142.22 Kbytes)
  Inode table size 43133 bytes (42.12 Kbytes)
  	36.46% of uncompressed inode table size (118315 bytes)
  Directory table size 17480 bytes (17.07 Kbytes)
  	45.67% of uncompressed directory table size (38273 bytes)
  Xattr table size 525 bytes (0.51 Kbytes)
  	19.31% of uncompressed xattr table size (2719 bytes)
  No duplicate files removed
  Number of inodes 1584
  Number of files 1139
  Number of symbolic links  168
  Number of device nodes 0
  Number of fifo nodes 0
  Number of socket nodes 0
  Number of directories 277
  Number of ids (unique uids + gids) 7
  Number of uids 3
  	unknown (655360)
  	unknown (656360)
  	root (0)
  Number of gids 7
  	unknown (655360)
  	unknown (656360)
  	unknown (657361)
  	unknown (657360)
  	unknown (656388)
  	unknown (656363)
  	root (0)
  DEBUG:root:Calling: /usr/bin/sudo rm -rf /tmp/tmpcfh5iih4
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'sha1sum /etc/selinux/arc/policy/policy.30'
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'sha1sum /etc/selinux/arc/contexts/files/android_file_contexts'
  DEBUG:root:Skipping /tmp/tmphoy1fe1j/root/plat_file_contexts since it is missing
  DEBUG:root:Skipping /tmp/tmphoy1fe1j/root/nonplat_file_contexts since it is missing
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'echo $((     $(df --output=avail --local --block-size 1 / | tail -n1) +     $(du --bytes /opt/google/containers/android/system.raw.img |       awk '"'"'{print $1}'"'"') ))'
  INFO:root:Setting up ChromeOS device to image-writable...
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'stop arc-system-mount || true'
  arc-system-mount stop/waiting
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'mount -o remount,rw /'
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'if [ -L /opt/google/containers/android/system.raw.img ]; then  rm /opt/google/containers/android/system.raw.img;fi'
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(export ANDROID_DEBUGGABLE=\).*/\10/" /etc/init/arc-setup-env'
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(export SHARE_FONTS=\).*/\10/" /etc/init/arc-setup-env'
  INFO:root:Syncing image files to ChromeOS...
  DEBUG:root:Calling: rsync --inplace -v --progress /tmp/tmphoy1fe1j/system.raw.img root@chromeos6-row4-rack16-host14:/opt/google/containers/android
  system.raw.img
  
           44,128   0%    0.00kB/s    0:00:00  
       84,402,880  17%   80.45MB/s    0:00:04  
      175,544,368  36%   83.73MB/s    0:00:03  
      229,603,166  47%   73.02MB/s    0:00:03  
      332,680,992  68%   79.37MB/s    0:00:01  
      431,833,712  88%   82.88MB/s    0:00:00  
      487,157,760 100%   78.67MB/s    0:00:05 (xfr#1, to-chk=0/1)
  
  sent 121,940,377 bytes  received 154,603 bytes  10,616,954.78 bytes/sec
  total size is 487,157,760  speedup is 3.99
  INFO:root:Updating CHROMEOS_ARC_VERSION...
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 '/bin/sed -i "s/^\(CHROMEOS_ARC_VERSION=\).*/\1cheets_arm-img-P5911288/" /etc/lsb-release'
  INFO:root:*** SELinux policy is not updated. Restarting ui. ***
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 'set -e
  mount -o remount,ro /
  start ui
  start arc-system-mount'
  start: Job is already running: ui
  ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 'set -e
  mount -o remount,ro /
  start ui
  start arc-system-mount'
  ERROR:root:None
  ERROR:root:Failed to cleanly restart ui, fall back to reboot
  Traceback (most recent call last):
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 643, in __exit__
      'start arc-system-mount',
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 95, in check_call
      lib.util.check_call(dryrun=self._dryrun, *command)
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/lib/util.py", line 46, in check_call
      return subprocess.check_call(subprocess_args, **kwargs)
    File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '('ssh', 'root@chromeos6-row4-rack16-host14', 'set -e\nmount -o remount,ro /\nstart ui\nstart arc-system-mount')' returned non-zero exit status 1
  INFO:root:*** Reboot required. ***
  DEBUG:root:Calling: ssh root@chromeos6-row4-rack16-host14 reboot
  ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 reboot
  ERROR:root:None
  Traceback (most recent call last):
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 1461, in <module>
      main()
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 1457, in main
      _update_selinux_policy(remote_proxy, out)
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 654, in __exit__
      self._remote_proxy.check_call('reboot')
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/push_to_device.py", line 95, in check_call
      lib.util.check_call(dryrun=self._dryrun, *command)
    File "/usr/local/autotest/server/site_tests/provision_CheetsUpdate/lib/util.py", line 46, in check_call
      return subprocess.check_call(subprocess_args, **kwargs)
    File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '('ssh', 'root@chromeos6-row4-rack16-host14', 'reboot')' returned non-zero exit status 255
	END FAIL	provision_CheetsUpdate	provision_CheetsUpdate	timestamp=1525466873	localtime=May 04 13:47:53	

hmm... I'm a bit lost, the jobs get scheduled but they don't always run on the shard (cros-full-0007). Hoping Ilja/Don can figure out this mystery.

Comment 7 by ihf@chromium.org, May 4 2018

Cc: rohi...@chromium.org
Components: Platform>ARC
Labels: OS-Chrome
Owner: ----
Status: Available (was: Assigned)
Summary: provision_CheetsUpdate failing (was: autotest arc++ suites timing out)
The server has not been affected by overload
https://viceroy.corp.google.com/chromeos/machines?hostname=cros-full-0007&duration=8d&mdb_role=chrome-infra&refresh=-1

The failure above is during installation of an image (provision) and then (maybe) merging Android on it.

---
  start arc-system-mount'
  start: Job is already running: ui
  ERROR:root:Error while executing ssh root@chromeos6-row4-rack16-host14 'set -e
  mount -o remount,ro /
  start ui
  start arc-system-mount'
  ERROR:root:None
  ERROR:root:Failed to cleanly restart ui, fall back to reboot
---

So, basically find the image, install it locally, see why things are failing. It might be that the Chrome OS base image is bad. It might be there is an incompatibility between it and the new Android. It might be that the new Android blows things up. I am not really the most familiar with this particular test.
Cc: lhchavez@chromium.org
Luis might be able to tell if something is related to bad images from the logs.
Cc: phshah@chromium.org
I think phshah@ has been running into this locally. I asked him to provide more logs, let's see if that can help us pinpoint the root cause.
Recently, I've been able to reproduce this consistently. I've added logs to b/77494303.
Cc: -lhchavez@chromium.org

Sign in to add a comment