New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 905704 link

Starred by 11 users

autoupdate_Rollback: command execution error: update engine in UPDATE_STATUS_IDLE

Project Member Reported by zwisler@google.com, Nov 15

Issue description

This HWTest failure hit 2 release builders here:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929800112675151472

See elm and caroline-arcnext.

rikku also hit an issue with autoupdate_Rollback, with a slightly different error message:

autoupdate_Rollback: FAIL: Failed to install device image using payload at http://100.115.219.139:8082/update/rikku-release/R72-11264.0.0 on chromeos4-row3-rack5-host14. : command execution error
 
wolf-release also hit a similar issue:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8929798893425198384

RootFSUpdateError: Failed to install device image using payload at http://100.115.219.134:8082/update/wolf-release/R72-11264.0.0 on chromeos4-row1-rack3-host9. : command execution error


Looks the same as rikku.
Owner: zamorzaev@chromium.org
Components: Infra>Client>ChromeOS>Test
Status: Assigned (was: Untriaged)
The problem is spreading... "command execution error" and autoupdate_Rollback failure happened to many of the release builders in the last run:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928441158195883984

Labels: Hotlist-Deputy
Looks like a devserver issue, could be related to issue 901939.
Cc: zamorzaev@chromium.org
Owner: ayatane@chromium.org
Status: Untriaged (was: Assigned)
Assigning to the current deputy who may have more context.
Failures involve more than 1 devserver, e.g. chromeos4-devserver7, chromeos4-devserver6, chromeos4-devserver11.
Status: Assigned (was: Untriaged)
#1 unrelated to devservers

START	autoupdate_Rollback	autoupdate_Rollback	timestamp=1542293204	localtime=Nov 15 06:46:44	
	START	----	----	timestamp=1542293224	localtime=Nov 15 06:47:04	
		GOOD	----	sysinfo.before	timestamp=1542293224	localtime=Nov 15 06:47:04	
	END GOOD	----	----	timestamp=1542293224	localtime=Nov 15 06:47:04	
	START	----	----	timestamp=1542293233	localtime=Nov 15 06:47:13	
		GOOD	----	sysinfo.iteration.before	timestamp=1542293234	localtime=Nov 15 06:47:14	
	END GOOD	----	----	timestamp=1542293234	localtime=Nov 15 06:47:14	
	START	----	----	timestamp=1542293424	localtime=Nov 15 06:50:24	
		GOOD	----	sysinfo.iteration.after	timestamp=1542293424	localtime=Nov 15 06:50:24	
	END GOOD	----	----	timestamp=1542293424	localtime=Nov 15 06:50:24	
	START	----	----	timestamp=1542293433	localtime=Nov 15 06:50:33	
		GOOD	----	sysinfo.after	timestamp=1542293434	localtime=Nov 15 06:50:34	
	END GOOD	----	----	timestamp=1542293434	localtime=Nov 15 06:50:34	
	FAIL	autoupdate_Rollback	autoupdate_Rollback	timestamp=1542293440	localtime=Nov 15 06:50:40	Failed to install device image using payload at http://100.115.219.134:8082/update/wolf-release/R72-11264.0.0 on chromeos4-row1-rack3-host9. : command execution error
  * Command: 
      /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o
      UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
      ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
      -l root -p 22 chromeos4-row1-rack3-host9 "export LIBC_FATAL_STDERR_=1; if
      type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
      \"server[stack::_run|_base_update_handler_no_retry|run] ->
      ssh_run(/usr/bin/update_engine_client --update
      --omaha_url=http://100.115.219.134:8082/update/wolf-
      release/R72-11264.0.0)\";fi; /usr/bin/update_engine_client --update
      --omaha_url=http://100.115.219.134:8082/update/wolf-release/R72-11264.0.0"
  Exit status: 1
  Duration: 179.279510975
  
  stderr:
  [1115/064719:INFO:update_engine_client.cc(511)] Forcing an update by setting app_version to ForcedUpdate.
  [1115/064719:INFO:update_engine_client.cc(513)] Initiating update check and install.
  [1115/064719:INFO:update_engine_client.cc(542)] Waiting for update to complete.
  [1115/065018:ERROR:update_engine_client.cc(233)] Update failed, current operation is UPDATE_STATUS_IDLE, last error code is ErrorCode::kError(1)
END FAIL	autoupdate_Rollback	autoupdate_Rollback	timestamp=1542293440	localtime=Nov 15 06:50:40	
Checking a few from #4 appears to be the same issue.

I believe the problem is strictly within update engine.
Cc: -zamorzaev@chromium.org
Components: -Infra>Client>ChromeOS>Test Internals>Installer
Labels: -Pri-1 -Hotlist-Deputy Pri-0
Owner: nsanders@chromium.org
The OP error is different from all the others, so making this bug about the rest instead.

Looks like a lot of (more than half?) canary builders are red right now due to this. Adding more support to #10 above:

Initial update succeeds (expected), then the test initiates a rollback (expected) but update_engine is found to be in an unexpected state.

Sheriffs need to find an owner for update_engine to take a look at this, and revert any recent CLs that may be the root cause.

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392

12/03 06:02:47.504 INFO |autoupdate_Rollbac:0065| Update verified, initiating rollback.
...

12/03 06:02:47.845 INFO |       autoupdater:0688| Checking for rollback.
12/03 06:02:47.854 DEBUG|          ssh_host:0310| Running (ssh) '/usr/bin/update_engine_client --can_rollback' from '_call_run_once|run_once|rollback_rootfs|_run|run|run_very_slowly'
12/03 06:02:48.189 ERROR|             utils:0287| [stderr] [1203/060248.001784:INFO:update_engine_client.cc(414)] Rollback partition: AVAILABLE: /dev/mmcblk0p4
12/03 06:02:48.192 INFO |       autoupdater:0699| Performing rollback.
12/03 06:02:48.201 DEBUG|          ssh_host:0310| Running (ssh) '/usr/bin/update_engine_client --rollback --follow --nopowerwash' from '_call_run_once|run_once|rollback_rootfs|_run|run|run_very_slowly'
12/03 06:02:48.564 ERROR|             utils:0287| [stderr] [1203/060248.376622:INFO:update_engine_client.cc(476)] Requesting rollback.
12/03 06:02:48.575 ERROR|             utils:0287| [stderr] [1203/060248.388149:INFO:update_engine_client.cc(544)] Waiting for update to complete.
12/03 06:02:48.591 ERROR|             utils:0287| [stderr] [1203/060248.403930:ERROR:update_engine_client.cc(233)] Update failed, current operation is UPDATE_STATUS_IDLE, last error code is ErrorCode::kSuccess(0)
12/03 06:02:48.595 DEBUG|              test:0381| Test failed due to Rollback failed on chromeos4-row2-rack12-host19: command execution error
* Command: 
    /usr/bin/ssh -a -x   -o Protocol=2 -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -l root -p 22 chromeos4-row2-rack12-host19 "export LIBC_FATAL_STDERR_=1;
    if type \"logger\" > /dev/null 2>&1; then logger -tag \"autotest\"
    \"server[stack::rollback_rootfs|_run|run] ->
    ssh_run(/usr/bin/update_engine_client --rollback --follow
    --nopowerwash)\";fi; /usr/bin/update_engine_client --rollback --follow
    --nopowerwash"
Exit status: 1

Same errors here:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167854644322416
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167867389380416

I think this is P0 due to the widespread nature of the issue and also because breaking rollback in the field would be bad.
Cc: ahass...@chromium.org zentaro@chromium.org
Summary: autoupdate_Rollback: command execution error: update engine in UPDATE_STATUS_IDLE (was: autoupdate_Rollback: ABORT: Job aborted unexpectedly)
In
https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8928167903522923392

I can see
[1203/060248.402211:INFO:update_attempter.cc(1053)] DLC successfully installed, no reboot needed.

So either is_install_ is somehow being set or there is a problem with this CL:

I think this is defnitely related to https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1344916

Adding xiaochu@ and chowes@ to investigate this.
I think the problem is is_install_ has never been initialized in the update_attempter ctor or when the rollback is initiated. we should do this. Xiaochu@ can you send a fix for this?
Cc: nsanders@chromium.org
Owner: xiaochu@chromium.org
Cc: chowes@google.com
Labels: OS-Chrome
Labels: Merge-Request-72
Owner: chowes@google.com
chowes@ will attempt a permanent fix while we're landing the revert. Both the revert (https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1359108) and the permanent fix will be merged to M72.
Labels: -Pri-0 Pri-1
Checked on eve canary and it looks like test is GREEN now: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=eve-release&buildBranch=master

I'm downgrading the priority but still need merge approval to cherry-pick to M72.
Project Member

Comment 24 by sheriffbot@chromium.org, Dec 4

Labels: -Merge-Request-72 Hotlist-Merge-Approved Merge-Approved-72
Your change meets the bar and is auto-approved for M72. Please go ahead and merge the CL to branch 3626 manually. Please contact milestone owner if you have questions.
Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 26 by bugdroid1@chromium.org, Dec 4

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/978c1081373dba279c2c9d5b308e093c3e13707a

commit 978c1081373dba279c2c9d5b308e093c3e13707a
Author: Colin Howes <chowes@google.com>
Date: Tue Dec 04 22:00:05 2018

update_attempter: Set is_install_ to false when doing a rollback.

We weren't setting is_install_ when performing a rollback, meaning it
was potentially left uninitialized or had a stale value from a previous
install/update attempt. Explicitly set is_install_ in Rollback() and in
the UpdateAttempter constructor.

BUG= chromium:905704 
TEST=Add a unittest.

Change-Id: I19b646d689cab869a2aca0cd82def252c370184b
Reviewed-on: https://chromium-review.googlesource.com/1359109
Commit-Ready: Colin Howes <chowes@google.com>
Tested-by: Colin Howes <chowes@google.com>
Reviewed-by: Amin Hassani <ahassani@chromium.org>

[modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter_unittest.cc
[modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter.h
[modify] https://crrev.com/978c1081373dba279c2c9d5b308e093c3e13707a/update_attempter.cc

Project Member

Comment 27 by bugdroid1@chromium.org, Dec 4

Labels: merge-merged-release-R72-11316.B
The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/5687e39855627092f5e9033e16bcb0b7a91309e9

commit 5687e39855627092f5e9033e16bcb0b7a91309e9
Author: Colin Howes <chowes@google.com>
Date: Tue Dec 04 22:04:20 2018

update_attempter: Set is_install_ to false when doing a rollback.

We weren't setting is_install_ when performing a rollback, meaning it
was potentially left uninitialized or had a stale value from a previous
install/update attempt. Explicitly set is_install_ in Rollback() and in
the UpdateAttempter constructor.

BUG= chromium:905704 
TEST=Add a unittest.

Change-Id: I19b646d689cab869a2aca0cd82def252c370184b
Reviewed-on: https://chromium-review.googlesource.com/1359109
Commit-Ready: Colin Howes <chowes@google.com>
Tested-by: Colin Howes <chowes@google.com>
Reviewed-by: Amin Hassani <ahassani@chromium.org>
(cherry picked from commit 978c1081373dba279c2c9d5b308e093c3e13707a)

[modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter_unittest.cc
[modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter.h
[modify] https://crrev.com/5687e39855627092f5e9033e16bcb0b7a91309e9/update_attempter.cc

Status: Fixed (was: Assigned)
fix is also cherry-picked to M72.

https://chromium-review.googlesource.com/c/aosp/platform/system/update_engine/+/1362061

thanks all for triaging and fixing this!
Cc: ihf@chromium.org pprabhu@chromium.org jclinton@google.com dgarr...@chromium.org shapiroc@chromium.org jkardatzke@chromium.org nya@chromium.org ayatane@chromium.org derat@chromium.org akes...@chromium.org hidehiko@chromium.org
 Issue 910952  has been merged into this issue.
Issue 911957 has been merged into this issue.
Issue 911279 has been merged into this issue.
Cc: zamorzaev@chromium.org vsu...@google.com posciak@chromium.org
 Issue 911816  has been merged into this issue.
Project Member

Comment 33 by sheriffbot@chromium.org, Dec 10

Cc: xiaochu@chromium.org
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Issue 911222 has been merged into this issue.
Labels: -Hotlist-Merge-Approved -Merge-Approved-72
Cc: xiy...@chromium.org mkarkada@chromium.org dhadd...@chromium.org sdantul...@chromium.org mitsuji@chromium.org yoshiki@chromium.org
 Issue 911803  has been merged into this issue.

Sign in to add a comment