New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 614444 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 547548
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

update is dying after issuing ssh /usr/bin/update_engine_client command

Project Member Reported by kevcheng@chromium.org, May 24 2016

Issue description

from https://bugs.chromium.org/p/chromium/issues/detail?id=596262

There have been multiple failures during autoupdate_Rollback where the ssh command to initiate the update dies with 255.  The failing signature has been pretty consistent (around 4 failures I looked at).


https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/64413468-chromeos-test/chromeos4-row11-rack11-host9/

05/24 02:30:30.943 INFO |       autoupdater:0240| Updating image via: /usr/bin/update_engine_client --update --omaha_url=http://100.107.160.4:8082/update/cyan-release/R51-8172.44.0 2>&1
05/24 02:30:30.943 DEBUG|          ssh_host:0153| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.107.160.4:8082/update/cyan-release/R51-8172.44.0 2>&1'
05/24 02:30:32.171 DEBUG|        base_utils:0268| [stdout] [0524/023031:INFO:update_engine_client.cc(447)] Forcing an update by setting app_version to ForcedUpdate.
05/24 02:30:32.172 DEBUG|        base_utils:0268| [stdout] [0524/023031:INFO:update_engine_client.cc(449)] Initiating update check and install.
05/24 02:30:32.173 DEBUG|        base_utils:0268| [stdout] [0524/023031:INFO:update_engine_client.cc(478)] Waiting for update to complete.
05/24 03:07:54.405 ERROR|        base_utils:0268| [stderr] Write failed: Broken pipe
05/24 03:07:54.409 ERROR|       autoupdater:0181| command execution error
* Command: 
    /usr/bin/ssh -a -x     -o StrictHostKeyChecking=no -o
    UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o
    ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4
    -o Protocol=2 -l root -p 22 chromeos4-row11-rack11-host9 "export
    LIBC_FATAL_STDERR_=1; /usr/bin/update_engine_client --update
    --omaha_url=http://100.107.160.4:8082/update/cyan-release/R51-8172.44.0
    2>&1"
Exit status: 255
Duration: 2243.44563603

The time between the broken pipe and when the command is executed is about 30-45 mins.  Just wondering if we should have timed out instead or if something else is broken?
 

Comment 1 by de...@chromium.org, May 24 2016

Owner: ----
It looks like the ssh connection to the DUT dropped, and then you get a 255 error code from the ssh command.

If you want to make the test more reliable in an unreliable network you can change the command to return immediately and and issue other update_engine_client commands waiting for the update to finish. This code has been quite stable and the flakiness of moving 100s of MB for updating devices is not new. 
Cc: xixuan@chromium.org
Xixuan's devserver revamp will help with this, but I'm not sure if it will help with ssh commands timing out while the update is being applied.
The dut itself seems to be down for good once the ssh connection cuts out though so could the update have crippled the dut?

From https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/64413468-chromeos-test/chromeos4-row11-rack11-host9/debug/

in autoserv.DEBUG:
2:30:30 - The update is initiated.
3:07:54 - The ssh command dies.
And the dut is unreachable until the test ends 
3:20:01 - The test ends
Cc: de...@chromium.org
And only after a reset via servo does the dut come back.

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/hosts/chromeos4-row11-rack11-host9/55724641-repair/debug/

Upon the reset, it does boot up to the version it was updated to.

05/24 05:17:58.577 DEBUG|          ssh_host:0180| Running (ssh) 'cat "/etc/lsb-release"'
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_APPID={11130F0B-738A-C024-7A78-CF72D93B77AF}
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] CHROMEOS_BOARD_APPID={11130F0B-738A-C024-7A78-CF72D93B77AF}
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] DEVICETYPE=CHROMEBOOK
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_BOARD=cyan
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] CHROMEOS_DEVSERVER=
05/24 05:17:58.821 DEBUG|        base_utils:0269| [stdout] GOOGLE_RELEASE=8172.44.0
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_BUILD_NUMBER=8172
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_BRANCH_NUMBER=44
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_CHROME_MILESTONE=51
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_PATCH_NUMBER=0
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_TRACK=testimage-channel
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_DESCRIPTION=8172.44.0 (Official Build) dev-channel cyan test
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_BUILD_TYPE=Official Build
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_NAME=Chrome OS
05/24 05:17:58.822 DEBUG|        base_utils:0269| [stdout] CHROMEOS_RELEASE_VERSION=8172.44.0
Cc: -xixuan@chromium.org
Owner: xixuan@chromium.org
looks like similar with  crbug.com/547548 , DUT lost ssh connection during update, and need servo to do reboot.
Should they be duped together?
Mergedinto: 547548
Status: Duplicate (was: Untriaged)

Sign in to add a comment