Provision failures aren't reported in autotest |
|||||
Issue descriptionIn crbug.com/731253 , we had bad CLs cause DUTs to rollback during provision, but surfaced this as an infrastructure error to the sheriffs with a non-helpful error message. https://luci-milo.appspot.com/buildbot/chromeos/cave-paladin/529 Shows: [Test-Logs]: Suite job: ABORT [Test-Logs]: provision: FAIL: The host has wrong cros-version label., completed successfully [Test-Logs]: provision: FAIL: Unhandled AutoservError: No answer to ping from chromeos2-row8-rack7-host11 [Test-Logs]: provision: FAIL: Unhandled AutoservError: No answer to ping from chromeos2-row8-rack7-host18 Richard had opinions about how the errors should be handled, and I'm filing this to capture those opinions.
,
Jun 8 2017
Let's describe this in starker terms: The devserver's provision code plainly failed, but the failure wasn't propagated up the call chain. Also, although I care, I'm not unambiguously the owner for this.
,
Jun 8 2017
Actually, this looks a lot like the error complained of in bug 717336 .
,
Jun 8 2017
,
Jun 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f19118f03d242e0bd7dc04bee0bd93c90273e197 commit f19118f03d242e0bd7dc04bee0bd93c90273e197 Author: xixuan <xixuan@chromium.org> Date: Mon Jun 12 19:30:36 2017 autotest: Limit the scope of retryable error keyword for auto-update. BUG= chromium:731274 TEST=None Change-Id: I34d2164d8dd5c9a04fc74fa56b779a22858f81ea Reviewed-on: https://chromium-review.googlesource.com/528391 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Dan Shi <dshi@google.com> [modify] https://crrev.com/f19118f03d242e0bd7dc04bee0bd93c90273e197/client/common_lib/cros/dev_server.py
,
Jun 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/92f6f535eac719bd98bc5442fdd8d3b2883516bd commit 92f6f535eac719bd98bc5442fdd8d3b2883516bd Author: xixuan <xixuan@chromium.org> Date: Tue Jun 13 22:12:39 2017 autotest: Decrease provision times from 3 to 2. We detect whether to force provision with original version in CL:497026, so no need to try provision one more time for this case, which was the reason to increase provision times from 2 to 3 months ago. BUG= chromium:731274 TEST=Ran unittest. Change-Id: Id7779a6d29af4efd7d56ab43527608294dc0822d Reviewed-on: https://chromium-review.googlesource.com/528328 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Allen Li <ayatane@chromium.org> [modify] https://crrev.com/92f6f535eac719bd98bc5442fdd8d3b2883516bd/client/common_lib/cros/dev_server.py
,
Jun 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/9f45534bd6403e362f5889cc9583b52b741b7b4d commit 9f45534bd6403e362f5889cc9583b52b741b7b4d Author: xixuan <xixuan@chromium.org> Date: Fri Jun 23 17:19:30 2017 autotest: don't allow retry in retrying provision to raise error correctly. This CL raise a retryableException in auto_update() for callers to decide whether they want to retry provision. BUG= chromium:731274 TEST=Ran unittest. Change-Id: I0a97aee70c97718708e7b8a103ac3e4e364d31a3 Reviewed-on: https://chromium-review.googlesource.com/528430 Commit-Ready: Xixuan Wu <xixuan@chromium.org> Tested-by: Xixuan Wu <xixuan@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> [modify] https://crrev.com/9f45534bd6403e362f5889cc9583b52b741b7b4d/server/hosts/cros_host.py [modify] https://crrev.com/9f45534bd6403e362f5889cc9583b52b741b7b4d/client/common_lib/cros/dev_server.py
,
Jun 26 2017
|
|||||
►
Sign in to add a comment |
|||||
Comment 1 by jrbarnette@chromium.org
, Jun 8 2017The issue is that provision reported _passing_: START provision_AutoUpdate provision_AutoUpdate timestamp=1496948858 localtime=Jun 08 12:07:38 START ---- ---- timestamp=1496948881 localtime=Jun 08 12:08:01 GOOD ---- sysinfo.before timestamp=1496948881 localtime=Jun 08 12:08:01 END GOOD ---- ---- timestamp=1496948881 localtime=Jun 08 12:08:01 GOOD provision_AutoUpdate provision_AutoUpdate timestamp=1496951020 localtime=Jun 08 12:43:40 completed successfully END GOOD provision_AutoUpdate provision_AutoUpdate timestamp=1496951020 localtime=Jun 08 12:43:40 If you look in the logs at "autoupdate_logs/CrOS_update_*", you find this message in every one of them: 2017/06/08 12:43:04.487 DEBUG| cros_update:0260| Error happens in CrOS auto-update: RootfsUpdateError('Build cave-paladin/R61-9631.0.0-rc1 failed to boot on 100.115.130.248; system rolled back to previous build',) That's an unambiguous failure; it should have caused the final provision status to be a failure, and the rollback error should have featured front and cent. But, the status.log shows provision was GOOD. We were only protected from unmitigated evil by verify(), which (correctly) identified that the DUT wasn't running the right version.