autoupdate_Rollback: command execution error: update engine in UPDATE_STATUS_IDLE |
||||||||||||||||||||||
Issue descriptionThis HWTest failure hit 2 release builders here: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929800112675151472 See elm and caroline-arcnext. rikku also hit an issue with autoupdate_Rollback, with a slightly different error message: autoupdate_Rollback: FAIL: Failed to install device image using payload at http://100.115.219.139:8082/update/rikku-release/R72-11264.0.0 on chromeos4-row3-rack5-host14. : command execution error
,
Nov 15
,
Nov 15
,
Nov 30
The problem is spreading... "command execution error" and autoupdate_Rollback failure happened to many of the release builders in the last run: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928441158195883984
,
Nov 30
,
Nov 30
Looks like a devserver issue, could be related to issue 901939.
,
Nov 30
Assigning to the current deputy who may have more context.
,
Nov 30
Failures involve more than 1 devserver, e.g. chromeos4-devserver7, chromeos4-devserver6, chromeos4-devserver11.
,
Nov 30
,
Dec 1
#1 unrelated to devservers
START autoupdate_Rollback autoupdate_Rollback timestamp=1542293204 localtime=Nov 15 06:46:44
START ---- ---- timestamp=1542293224 localtime=Nov 15 06:47:04
GOOD ---- sysinfo.before timestamp=1542293224 localtime=Nov 15 06:47:04
END GOOD ---- ---- timestamp=1542293224 localtime=Nov 15 06:47:04
START ---- ---- timestamp=1542293233 localtime=Nov 15 06:47:13
GOOD ---- sysinfo.iteration.before timestamp=1542293234 localtime=Nov 15 06:47:14
END GOOD ---- ---- timestamp=1542293234 localtime=Nov 15 06:47:14
START ---- ---- timestamp=1542293424 localtime=Nov 15 06:50:24
GOOD ---- sysinfo.iteration.after timestamp=1542293424 localtime=Nov 15 06:50:24
END GOOD ---- ---- timestamp=1542293424 localtime=Nov 15 06:50:24
START ---- ---- timestamp=1542293433 localtime=Nov 15 06:50:33
GOOD ---- sysinfo.after timestamp=1542293434 localtime=Nov 15 06:50:34
END GOOD ---- ---- timestamp=1542293434 localtime=Nov 15 06:50:34
FAIL autoupdate_Rollback autoupdate_Rollback timestamp=1542293440 localtime=Nov 15 06:50:40 Failed to install device image using payload at http://100.115.219.134:8082/update/wolf-release/R72-11264.0.0 on chromeos4-row1-rack3-host9. : command execution error
* Command:
/usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o
UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
-l root -p 22 chromeos4-row1-rack3-host9 "export LIBC_FATAL_STDERR_=1; if
type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
\"server[stack::_run|_base_update_handler_no_retry|run] ->
ssh_run(/usr/bin/update_engine_client --update
--omaha_url=http://100.115.219.134:8082/update/wolf-
release/R72-11264.0.0)\";fi; /usr/bin/update_engine_client --update
--omaha_url=http://100.115.219.134:8082/update/wolf-release/R72-11264.0.0"
Exit status: 1
Duration: 179.279510975
stderr:
[1115/064719:INFO:update_engine_client.cc(511)] Forcing an update by setting app_version to ForcedUpdate.
[1115/064719:INFO:update_engine_client.cc(513)] Initiating update check and install.
[1115/064719:INFO:update_engine_client.cc(542)] Waiting for update to complete.
[1115/065018:ERROR:update_engine_client.cc(233)] Update failed, current operation is UPDATE_STATUS_IDLE, last error code is ErrorCode::kError(1)
END FAIL autoupdate_Rollback autoupdate_Rollback timestamp=1542293440 localtime=Nov 15 06:50:40
,
Dec 1
Checking a few from #4 appears to be the same issue. I believe the problem is strictly within update engine.
,
Dec 3
The OP error is different from all the others, so making this bug about the rest instead. Looks like a lot of (more than half?) canary builders are red right now due to this. Adding more support to #10 above: Initial update succeeds (expected), then the test initiates a rollback (expected) but update_engine is found to be in an unexpected state. Sheriffs need to find an owner for update_engine to take a look at this, and revert any recent CLs that may be the root cause. https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392 12/03 06:02:47.504 INFO |autoupdate_Rollbac:0065| Update verified, initiating rollback. ... 12/03 06:02:47.845 INFO | autoupdater:0688| Checking for rollback. 12/03 06:02:47.854 DEBUG| ssh_host:0310| Running (ssh) '/usr/bin/update_engine_client --can_rollback' from '_call_run_once|run_once|rollback_rootfs|_run|run|run_very_slowly' 12/03 06:02:48.189 ERROR| utils:0287| [stderr] [1203/060248.001784:INFO:update_engine_client.cc(414)] Rollback partition: AVAILABLE: /dev/mmcblk0p4 12/03 06:02:48.192 INFO | autoupdater:0699| Performing rollback. 12/03 06:02:48.201 DEBUG| ssh_host:0310| Running (ssh) '/usr/bin/update_engine_client --rollback --follow --nopowerwash' from '_call_run_once|run_once|rollback_rootfs|_run|run|run_very_slowly' 12/03 06:02:48.564 ERROR| utils:0287| [stderr] [1203/060248.376622:INFO:update_engine_client.cc(476)] Requesting rollback. 12/03 06:02:48.575 ERROR| utils:0287| [stderr] [1203/060248.388149:INFO:update_engine_client.cc(544)] Waiting for update to complete. 12/03 06:02:48.591 ERROR| utils:0287| [stderr] [1203/060248.403930:ERROR:update_engine_client.cc(233)] Update failed, current operation is UPDATE_STATUS_IDLE, last error code is ErrorCode::kSuccess(0) 12/03 06:02:48.595 DEBUG| test:0381| Test failed due to Rollback failed on chromeos4-row2-rack12-host19: command execution error * Command: /usr/bin/ssh -a -x -o Protocol=2 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -l root -p 22 chromeos4-row2-rack12-host19 "export LIBC_FATAL_STDERR_=1; if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\" \"server[stack::rollback_rootfs|_run|run] -> ssh_run(/usr/bin/update_engine_client --rollback --follow --nopowerwash)\";fi; /usr/bin/update_engine_client --rollback --follow --nopowerwash" Exit status: 1 Same errors here: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167854644322416 https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167867389380416 I think this is P0 due to the widespread nature of the issue and also because breaking rollback in the field would be bad.
,
Dec 3
,
Dec 3
,
Dec 3
In https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392 I can see [1203/060248.402211:INFO:update_attempter.cc(1053)] DLC successfully installed, no reboot needed. So either is_install_ is somehow being set or there is a problem with this CL: I think this is defnitely related to https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1344916 Adding xiaochu@ and chowes@ to investigate this.
,
Dec 3
I think the problem is is_install_ has never been initialized in the update_attempter ctor or when the rollback is initiated. we should do this. Xiaochu@ can you send a fix for this?
,
Dec 3
,
Dec 3
I'm temporarily reverting https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1344916 if no objection.
,
Dec 3
,
Dec 3
,
Dec 3
,
Dec 3
chowes@ will attempt a permanent fix while we're landing the revert. Both the revert (https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1359108) and the permanent fix will be merged to M72.
,
Dec 4
Checked on eve canary and it looks like test is GREEN now: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=eve-release&buildBranch=master I'm downgrading the priority but still need merge approval to cherry-pick to M72.
,
Dec 4
Your change meets the bar and is auto-approved for M72. Please go ahead and merge the CL to branch 3626 manually. Please contact milestone owner if you have questions. Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 4
Revert went into M72 as well:https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1362060
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/update_engine/+/978c1081373dba279c2c9d5b308e093c3e13707a commit 978c1081373dba279c2c9d5b308e093c3e13707a Author: Colin Howes <chowes@google.com> Date: Tue Dec 04 22:00:05 2018 update_attempter: Set is_install_ to false when doing a rollback. We weren't setting is_install_ when performing a rollback, meaning it was potentially left uninitialized or had a stale value from a previous install/update attempt. Explicitly set is_install_ in Rollback() and in the UpdateAttempter constructor. BUG= chromium:905704 TEST=Add a unittest. Change-Id: I19b646d689cab869a2aca0cd82def252c370184b Reviewed-on: https://chromium-review.googlesource.com/1359109 Commit-Ready: Colin Howes <chowes@google.com> Tested-by: Colin Howes <chowes@google.com> Reviewed-by: Amin Hassani <ahassani@chromium.org> [modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter_unittest.cc [modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter.h [modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter.cc
,
Dec 4
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/update_engine/+/5687e39855627092f5e9033e16bcb0b7a91309e9 commit 5687e39855627092f5e9033e16bcb0b7a91309e9 Author: Colin Howes <chowes@google.com> Date: Tue Dec 04 22:04:20 2018 update_attempter: Set is_install_ to false when doing a rollback. We weren't setting is_install_ when performing a rollback, meaning it was potentially left uninitialized or had a stale value from a previous install/update attempt. Explicitly set is_install_ in Rollback() and in the UpdateAttempter constructor. BUG= chromium:905704 TEST=Add a unittest. Change-Id: I19b646d689cab869a2aca0cd82def252c370184b Reviewed-on: https://chromium-review.googlesource.com/1359109 Commit-Ready: Colin Howes <chowes@google.com> Tested-by: Colin Howes <chowes@google.com> Reviewed-by: Amin Hassani <ahassani@chromium.org> (cherry picked from commit 978c1081373dba279c2c9d5b308e093c3e13707a) [modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter_unittest.cc [modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter.h [modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter.cc
,
Dec 4
fix is also cherry-picked to M72. https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1362061 thanks all for triaging and fixing this!
,
Dec 5
Issue 910952 has been merged into this issue.
,
Dec 5
Issue 911957 has been merged into this issue.
,
Dec 5
Issue 911279 has been merged into this issue.
,
Dec 5
Issue 911816 has been merged into this issue.
,
Dec 10
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 10
Issue 911222 has been merged into this issue.
,
Dec 10
,
Jan 7
Issue 911803 has been merged into this issue. |
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by zwisler@google.com
, Nov 15