New issue
Advanced search Search tips

Issue 731274 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Provision failures aren't reported in autotest

Project Member Reported by dgarr...@chromium.org, Jun 8 2017

Issue description

In  crbug.com/731253 , we had bad CLs cause DUTs to rollback during provision, but surfaced this as an infrastructure error to the sheriffs with a non-helpful error message.

https://luci-milo.appspot.com/buildbot/chromeos/cave-paladin/529

Shows:

[Test-Logs]: Suite job: ABORT
[Test-Logs]: provision: FAIL: The host has wrong cros-version label., completed successfully
[Test-Logs]: provision: FAIL: Unhandled AutoservError: No answer to ping from chromeos2-row8-rack7-host11
[Test-Logs]: provision: FAIL: Unhandled AutoservError: No answer to ping from chromeos2-row8-rack7-host18

Richard had opinions about how the errors should be handled, and I'm filing this to capture those opinions.
 
The issue is that provision reported _passing_:
	START	provision_AutoUpdate	provision_AutoUpdate	timestamp=1496948858	localtime=Jun 08 12:07:38	
		START	----	----	timestamp=1496948881	localtime=Jun 08 12:08:01	
			GOOD	----	sysinfo.before	timestamp=1496948881	localtime=Jun 08 12:08:01	
		END GOOD	----	----	timestamp=1496948881	localtime=Jun 08 12:08:01	
		GOOD	provision_AutoUpdate	provision_AutoUpdate	timestamp=1496951020	localtime=Jun 08 12:43:40	completed successfully
	END GOOD	provision_AutoUpdate	provision_AutoUpdate	timestamp=1496951020	localtime=Jun 08 12:43:40	

If you look in the logs at "autoupdate_logs/CrOS_update_*", you find this
message in every one of them:
    2017/06/08 12:43:04.487 DEBUG|       cros_update:0260| Error happens in CrOS auto-update: RootfsUpdateError('Build cave-paladin/R61-9631.0.0-rc1 failed to boot on 100.115.130.248; system rolled back to previous build',)

That's an unambiguous failure; it should have caused the final provision
status to be a failure, and the rollback error should have featured front
and cent.  But, the status.log shows provision was GOOD.  We were only
protected from unmitigated evil by verify(), which (correctly) identified
that the DUT wasn't running the right version.


Owner: ----
Status: Available (was: Untriaged)
Summary: Provision failures aren't reported in autotest (was: Rollback during provision is surfaced badly.)
Let's describe this in starker terms:  The devserver's provision
code plainly failed, but the failure wasn't propagated up the
call chain.

Also, although I care, I'm not unambiguously the owner for this.

Owner: xixuan@chromium.org
Status: Assigned (was: Available)
Actually, this looks a lot like the error complained of
in  bug 717336 .
Status: Started (was: Assigned)
Project Member

Comment 5 by bugdroid1@chromium.org, Jun 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f19118f03d242e0bd7dc04bee0bd93c90273e197

commit f19118f03d242e0bd7dc04bee0bd93c90273e197
Author: xixuan <xixuan@chromium.org>
Date: Mon Jun 12 19:30:36 2017

autotest: Limit the scope of retryable error keyword for auto-update.

BUG= chromium:731274 
TEST=None

Change-Id: I34d2164d8dd5c9a04fc74fa56b779a22858f81ea
Reviewed-on: https://chromium-review.googlesource.com/528391
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Dan Shi <dshi@google.com>

[modify] https://crrev.com/f19118f03d242e0bd7dc04bee0bd93c90273e197/client/common_lib/cros/dev_server.py

Project Member

Comment 6 by bugdroid1@chromium.org, Jun 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/92f6f535eac719bd98bc5442fdd8d3b2883516bd

commit 92f6f535eac719bd98bc5442fdd8d3b2883516bd
Author: xixuan <xixuan@chromium.org>
Date: Tue Jun 13 22:12:39 2017

autotest: Decrease provision times from 3 to 2.

We detect whether to force provision with original version in CL:497026,
so no need to try provision one more time for this case, which was the
reason to increase provision times from 2 to 3 months ago.

BUG= chromium:731274 
TEST=Ran unittest.

Change-Id: Id7779a6d29af4efd7d56ab43527608294dc0822d
Reviewed-on: https://chromium-review.googlesource.com/528328
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Allen Li <ayatane@chromium.org>

[modify] https://crrev.com/92f6f535eac719bd98bc5442fdd8d3b2883516bd/client/common_lib/cros/dev_server.py

Project Member

Comment 7 by bugdroid1@chromium.org, Jun 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/9f45534bd6403e362f5889cc9583b52b741b7b4d

commit 9f45534bd6403e362f5889cc9583b52b741b7b4d
Author: xixuan <xixuan@chromium.org>
Date: Fri Jun 23 17:19:30 2017

autotest: don't allow retry in retrying provision to raise error correctly.

This CL raise a retryableException in auto_update() for callers to
decide whether they want to retry provision.

BUG= chromium:731274 
TEST=Ran unittest.

Change-Id: I0a97aee70c97718708e7b8a103ac3e4e364d31a3
Reviewed-on: https://chromium-review.googlesource.com/528430
Commit-Ready: Xixuan Wu <xixuan@chromium.org>
Tested-by: Xixuan Wu <xixuan@chromium.org>
Reviewed-by: Don Garrett <dgarrett@chromium.org>

[modify] https://crrev.com/9f45534bd6403e362f5889cc9583b52b741b7b4d/server/hosts/cros_host.py
[modify] https://crrev.com/9f45534bd6403e362f5889cc9583b52b741b7b4d/client/common_lib/cros/dev_server.py

Comment 8 by xixuan@chromium.org, Jun 26 2017

Status: Fixed (was: Started)

Sign in to add a comment