New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 840975 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
OOO
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

High provision failure rate in the lab

Project Member Reported by nxia@chromium.org, May 8 2018

Issue description

Provision failure rates spiked since  ~ 15 pm 05/07.

http://shortn/_EJf6M2lYLj

quawks boards have high failure rates.

e.g. chromeos4-row10-rack10-host19 is in failed repair-verify loop. 

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row10-rack10-host19/551953-repair/20180805123334/



	FAIL	----	verify.python	timestamp=1525808067	localtime=May 08 12:34:27	Python is missing; may be caused by powerwash
	GOOD	----	verify.cros	timestamp=1525808074	localtime=May 08 12:34:34	
	START	----	repair.au	timestamp=1525808074	localtime=May 08 12:34:34	
		FAIL	----	repair.au	timestamp=1525808310	localtime=May 08 12:38:30	CrOS auto-update failed for host chromeos4-row10-rack10-host19: 0) Could not copy /tmp/cros-update_chromeos4-row10-rack10-host19_9741/src to device., 1) Could not copy /tmp/cros-update_100.115.201.127_18954/src to device.
	END FAIL	----	repair.au	timestamp=1525808310	localtime=May 08 12:38:30	
	START	----	repair.powerwash	timestamp=1525808310	localtime=May 08 12:38:30	
		START	----	reboot	timestamp=1525808311	localtime=May 08 12:38:31	
			GOOD	----	reboot.start	timestamp=1525808311	localtime=May 08 12:38:31	
			GOOD	----	reboot.verify	timestamp=1525808373	localtime=May 08 12:39:33	
		END GOOD	----	reboot	kernel=4.4.130-14075-gca210cedb9ec	localtime=May 08 12:39:37	timestamp=1525808377	
		FAIL	----	repair.powerwash	timestamp=1525808613	localtime=May 08 12:43:33	CrOS auto-update failed for host chromeos4-row10-rack10-host19: 0) Could not copy /tmp/cros-update_chromeos4-row10-rack10-host19_31080/src to device., 1) Could not copy /tmp/cros-update_100.115.201.127_8631/src to device.
	END FAIL	----	repair.powerwash	timestamp=1525808613	localtime=May 08 12:43:33	
	START	----	repair.usb	timestamp=1525808613	localtime=May 08 12:43:33	
		FAIL	----	repair.usb	timestamp=1525808712	localtime=May 08 12:45:12	command execution error
  * Command: 
      /usr/bin/ssh -a -x  -o ControlPath=/tmp/_autotmp_HMh93pssh-master/socket
      -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
      -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o
      ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22
      chromeos4-row10-rack10-host19 "export LIBC_FATAL_STDERR_=1; if type
      \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
      \"server[stack::repair|servo_install|run] -> ssh_run(chromeos-install
      --yes)\";fi; chromeos-install --yes"
  Exit status: 1
  Duration: 0.602130174637
  
  stdout:
  cros-disks stop/waiting
  stderr:
  Error: Source can not be the destination device: /dev/mmcblk0
	END FAIL	----	repair.usb	timestamp=1525808712	localtime=May 08 12:45:12	
	GOOD	----	verify.hwid	timestamp=1525808715	localtime=May 08 12:45:15	
END FAIL	----	repair	timestamp=1525808715	localtime=May 08 12:45:15	







 

Comment 1 by nxia@chromium.org, May 8 2018

Description: Show this description

Comment 2 by nxia@chromium.org, May 8 2018

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row10-rack10-host19/551953-repair/20180805123334/autoupdate_logs/

provision failed with errors


2018/05/08 12:38:19.172 DEBUG|    cros_build_lib:0597| RunCommand: scp -P 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=60' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' -i /tmp/ssh-tmpkb8D6E/testing_rsa -r /tmp/cros-update_100.115.201.127_18954/src root@100.115.201.127:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.KkbfAJBuZ9/
Warning: Permanently added '100.115.201.127' (ED25519) to the list of known hosts.
Warning: Permanently added '100.115.201.127' (ED25519) to the list of known hosts.
/tmp/cros-update_100.115.201.127_18954/src/gs_cache/chromite: No such file or directory
2018/05/08 12:38:21.830 DEBUG|        retry_util:0204| <class 'chromite.lib.cros_build_lib.RunCommandError'>(return code: 1; command: scp -P 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=60' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' -i /tmp/ssh-tmpkb8D6E/testing_rsa -r /tmp/cros-update_100.115.201.127_18954/src root@100.115.201.127:/mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.KkbfAJBuZ9/
Could not copy /tmp/cros-update_100.115.201.127_18954/src to device.)
2018/05/08 12:38:21.831 DEBUG|    cros_build_lib:0597| RunCommand: ssh -p 22 '-oConnectionAttempts=4' '-oUserKnownHostsFile=/dev/null' '-oProtocol=2' '-oConnectTimeout=30' '-oServerAliveCountMax=3' '-oStrictHostKeyChecking=no' '-oServerAliveInterval=10' '-oNumberOfPasswordPrompts=0' '-oIdentitiesOnly=yes' -i /tmp/ssh-tmpkb8D6E/testing_rsa root@100.115.201.127 -- rm -rf /mnt/stateful_partition/unencrypted/preserve/cros-update/tmp.KkbfAJBuZ9
2018/05/08 12:38:22.072 DEBUG|       cros_update:0373| Error happens in CrOS auto-update: RunCommandError('Could not copy /tmp/cros-update_100.115.201.127_18954/src to device.', <chromite.lib.cros_build_lib.CommandResult object at 0x7fd71f9ace90>, None)




Comment 3 by nxia@chromium.org, May 8 2018

Summary: High provision failure rate in the lab (was: Bad duts in the lab.)

Comment 4 by nxia@chromium.org, May 8 2018

Description: Show this description

Comment 5 by nxia@chromium.org, May 8 2018

Owner: gu...@chromium.org
Yes, I see Python is missing:

localhost ~ # python
-bash: python: command not found

But the test image seems sane:

localhost ~ # cat /etc/lsb-release 
CHROMEOS_RELEASE_APPID={A772AA64-E906-A01E-1DFD-1856870D39EC}
CHROMEOS_BOARD_APPID={A772AA64-E906-A01E-1DFD-1856870D39EC}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
DEVICETYPE=CHROMEBOOK
CHROMEOS_ARC_VERSION=4764679
CHROMEOS_ARC_ANDROID_SDK_VERSION=25
GOOGLE_RELEASE=10656.0.0
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_BUILDER_PATH=quawks-release/R68-10656.0.0
CHROMEOS_RELEASE_BUILD_NUMBER=10656
CHROMEOS_RELEASE_BRANCH_NUMBER=0
CHROMEOS_RELEASE_CHROME_MILESTONE=68
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=testimage-channel
CHROMEOS_RELEASE_DESCRIPTION=10656.0.0 (Official Build) dev-channel quawks test
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_BOARD=quawks
CHROMEOS_RELEASE_VERSION=10656.0.0
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
Project Member

Comment 7 by bugdroid1@chromium.org, May 8 2018

Comment 8 by nxia@chromium.org, May 8 2018

Cc: ayatane@chromium.org
Request an urgent push
Status: Fixed (was: Untriaged)
Project Member

Comment 10 by bugdroid1@chromium.org, May 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/f3bbed4c0e9c0433fdbbfb78b37bc9ca118c3757

commit f3bbed4c0e9c0433fdbbfb78b37bc9ca118c3757
Author: Congbin Guo <guocb@google.com>
Date: Thu May 10 22:09:51 2018

auto_updater: filter more directories when copy devserver to DUT

In some cases of provision, we copy devserver package to DUT. We don't
want to copy some directories, e.g. venv and gs_cache, because both of
them has a symlink to a checking out of chromite.

PS. Also fixed some minor format/style issues.

BUG= chromium:840975 ,chromium:824580
TEST=Ran unit tests.

Change-Id: I9832c7240d990e744065648bf77e27dd6b50c09e
Reviewed-on: https://chromium-review.googlesource.com/1050876
Commit-Ready: Congbin Guo <guocb@chromium.org>
Tested-by: Congbin Guo <guocb@chromium.org>
Reviewed-by: Congbin Guo <guocb@chromium.org>

[modify] https://crrev.com/f3bbed4c0e9c0433fdbbfb78b37bc9ca118c3757/lib/auto_updater_unittest.py
[modify] https://crrev.com/f3bbed4c0e9c0433fdbbfb78b37bc9ca118c3757/lib/auto_updater.py

Project Member

Comment 11 by bugdroid1@chromium.org, May 11 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b6297b2be6ab0dde891052b935537b2b29ab499a

commit b6297b2be6ab0dde891052b935537b2b29ab499a
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Fri May 11 00:15:20 2018

Roll src/third_party/chromite/ 0aae1c1fc..3c311c157 (4 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/0aae1c1fc30a..3c311c15709b

$ git log 0aae1c1fc..3c311c157 --date=short --no-merges --format='%ad %ae %s'
2018-05-09 manojgupta cbuildbot: Disable unit tests in fuzzer builds.
2018-04-20 bmgordon build_stages: Use snapshot to clean up chroot.
2018-05-04 yunlian generic_stages: temporarily ignore all target prebuilts.
2018-05-08 guocb auto_updater: filter more directories when copy devserver to DUT

Created with:
  roll-dep src/third_party/chromite
BUG= chromium:841512 , chromium:829665 ,chromium:730144, chromium:826418 , chromium:840975 ,chromium:824580


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: I0abebd960beb44b1560aca995781fed1b91f453f
Reviewed-on: https://chromium-review.googlesource.com/1054619
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#557735}
[modify] https://crrev.com/b6297b2be6ab0dde891052b935537b2b29ab499a/DEPS

Sign in to add a comment