New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 738027 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

parrot-release has been broken for 3 weeks

Project Member Reported by sjg@google.com, Jun 29 2017

Issue description

The last success was on the 7th.

https://luci-milo.appspot.com/buildbot/chromeos/parrot-release/?limit=100

Seems to die in paygen test with:

Could not attach to process.  If your uid matches the uid of the target


parrot-release:3898 failed

Builders failed on: 
- parrot-release: 
  https://luci-milo.appspot.com/buildbot/chromeos/parrot-release/3898



 

Comment 1 by sjg@google.com, Jun 29 2017

Summary: parrot-release and lump-release have been broken for 3 weeks (was: parrot-release has been broken for 3 weeks)
Adding lumpy since it seems to be in the same boat.
Components: -Infra>Client>ChromeOS OS>Installer
Owner: sjg@chromium.org
Status: Assigned (was: Available)
Parrot AU and Paygen tests are failing with this error message:
    Failed to receive a download finished notification (download_finished)
    within 1200 seconds. This could be a problem with the updater or a
    connectivity issue. For more details, check the update_engine log
    (in sysinfo or on the DUT, also included in the test log).

Passing back to the sheriff to find someone who can debug an AU
test failure.

Summary: parrot-release has been broken for 3 weeks (was: parrot-release and lump-release have been broken for 3 weeks)
Looking at the lumpy failures, they're different, and will need
a different bug.

Comment 4 by sjg@google.com, Jul 6 2017

Blocking: 578270

Comment 5 by sjg@google.com, Jul 6 2017

Owner: ----
Status: Available (was: Assigned)
Blocking: -578270
Cc: sjg@chromium.org jrbarnette@chromium.org
Owner: grundler@chromium.org
Status: Assigned (was: Available)
Current failures appear to be related to update_engine test timing out (1200 seconds now) for parrot-release builder, not parrot-paladin builder.

I will investigate if we check for "/sys/block/sd?/queue/rotational" and increase the time to 1800 seconds in that case.

https://uberchromegw.corp.google.com/i/chromeos/builders/parrot-release/builds/3921

[Auto-Bug]: autoupdate_EndToEndTest.paygen_au_dev_delta: retry_count: 1, FAIL: Failed to receive a download finished notification (download_finished) within 1200 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log)., 221 reports
[Test-Logs]: autoupdate_EndToEndTest.paygen_au_dev_delta: retry_count: 1, FAIL: Failed to receive a download finished notification (download_finished) within 1200 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log).



The "provisioning" (full or "forced") update for parrt-release build 3921 took nearly 20 minutes:


07/06 06:46:18.339 ERROR|             utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(471)] Forcing an update by setting app_version to ForcedUpdate.
07/06 06:46:18.340 ERROR|             utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(473)] Initiating update check and install.
07/06 06:46:18.341 ERROR|             utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(502)] Waiting for update to complete.
07/06 07:04:50.881 ERROR|             utils:0280| [stderr] [0706/070450:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed.


The actual update test failed with timeout:
07/06 07:08:40.844 INFO |       autoupdater:0254| Triggering update via: /usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.185.227:42475/update
...
07/06 07:08:41.074 INFO |autoupdate_EndToEn:0271| Expecting event_result=any version=9693.1.0 event_type=any previous_version=any, within 720 seconds
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0371| Consumed new event: {u'event_result': '1', u'event_type': '54', u'previous_version': '0.0.0.0', u'track': 'stable-channel', u'timestamp': '2017-07-06 07:08:41', u'version': '9693.1.0', u'board': 'parrot'}
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0298| Event received after 0.0 seconds
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0271| Expecting event_result=1:success version=9693.1.0 event_type=13:download_started previous_version=any, within 240 seconds
07/06 07:08:42.444 INFO |autoupdate_EndToEn:0371| Consumed new event: {u'event_result': '1', u'event_type': '13', u'track': 'stable-channel', u'timestamp': '2017-07-06 07:08:42', u'version': '9693.1.0', u'board': 'parrot'}
07/06 07:08:42.445 INFO |autoupdate_EndToEn:0298| Event received after 1.1 seconds
07/06 07:08:42.445 INFO |autoupdate_EndToEn:0271| Expecting event_result=1:success version=9693.1.0 event_type=14:download_finished previous_version=any, within 1200 seconds
07/06 07:28:42.985 ERROR|autoupdate_EndToEn:0307| Timeout expired
Cc: ahass...@chromium.org
Status: Started (was: Assigned)
I've uploaded an UNTESTED code that I will need help testing on parrot-release:
    https://chromium-review.googlesource.com/562552

Can someone help test that when it's convenient for them?
 Issue 740420  has been merged into this issue.

Comment 11 by oka@chromium.org, Jul 11 2017

Cc: oka@chromium.org
Cc: -cernekee@chromium.org
Project Member

Comment 13 by sheriffbot@chromium.org, Jul 11 2017

Labels: Hotlist-Google
Owner: ahass...@chromium.org
Update: reassigning to ahassani since he is working on a heuristic to NOT use O_DSYNC when an update is forced - either by user or by "cros flash".

The main problem is parrot has a traditional HDD (aka "spinning rust") and the update time was already pretty slow. My change to use O_DSYNC nearly doubled the time and it's right around 20 minutes (1200 seconds) now.

I've uploaded an autotest change to detect "rotational" media and then double the timeout:
   https://chromium-review.googlesource.com/c/562552/

but since I am not able to test this, it's not going in. And it's going to slightly slow down the "normal" (SSD) case by sending two more SSH commands to the DUT. There is very likely a more efficient way of implementing this but I just wanted to post the code so autotest/infra team knows how to detect "rotational" media.
I'm testing a solution that I implemented, I will update once it works.
Added this CL (https://chromium-review.googlesource.com/c/567360/) as a solution.
Project Member

Comment 17 by bugdroid1@chromium.org, Jul 18 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/7ecda265a87236e83cf820364947a1618872b6be

commit 7ecda265a87236e83cf820364947a1618872b6be
Author: Amin Hassani <ahassani@google.com>
Date: Tue Jul 18 07:32:49 2017

Open partitions with O_DSYNC flag only if the update is periodic.

Currently when updating we always open the target partition with flag O_DSYNC
(CL:562552), but this makes all infrastructure operations like 'cros flash',
provisioning, force update, paygen, etc much slower. This changes the update
engine to only add O_DSYNC flag if an update is triggered by periodic checks
(not interactively forced). This means if the user clicks on 'check for update'
it will be an interactive update and O_DSYNC will not be used. This change keeps
the AOSP partitions open without O_DSYNC flag. This CL uses non-interactive mode
for all unit tests but currently there are no integration test like provisioning
for triggering periodic updates.

Currently 'parrot' board canaries (only board with rotating HDD) is failing due
to timeouts related to slow updates. This CL potentially will clear that problem.

TEST=cros_workon_make --test, installed an image with/out the O_DSYCN flag and
measured the 'cros flash' time.
BUG= chromium:738027 

Change-Id: If45fcf5e798b9c9353e09021ad812c859d983a65
Reviewed-on: https://chromium-review.googlesource.com/567360
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>

[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer.h
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer_integration_test.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action.h
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter_unittest.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action_unittest.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter_android.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_generator/generate_delta_main.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer_unittest.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer.cc
[modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter.cc

Labels: Merge-Request-60
Project Member

Comment 19 by sheriffbot@chromium.org, Jul 25 2017

Labels: -Merge-Request-60 Hotlist-Merge-Review Merge-Review-60
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Please add appropriate OSs.
Labels: OS-Chrome
Labels: -Merge-Review-60 M-60 Merge-Approved-60
Project Member

Comment 23 by bugdroid1@chromium.org, Jul 27 2017

Labels: merge-merged-release-R60-9592.B
The following revision refers to this bug:
  https://chromium.googlesource.com/aosp/platform/system/update_engine/+/760fd21451842b226aad11f67c28f16b17c4e0d8

commit 760fd21451842b226aad11f67c28f16b17c4e0d8
Author: Amin Hassani <ahassani@google.com>
Date: Tue Jul 25 17:58:27 2017

Open partitions with O_DSYNC flag only if the update is periodic.

Currently when updating we always open the target partition with flag O_DSYNC
(CL:562552), but this makes all infrastructure operations like 'cros flash',
provisioning, force update, paygen, etc much slower. This changes the update
engine to only add O_DSYNC flag if an update is triggered by periodic checks
(not interactively forced). This means if the user clicks on 'check for update'
it will be an interactive update and O_DSYNC will not be used. This change keeps
the AOSP partitions open without O_DSYNC flag. This CL uses non-interactive mode
for all unit tests but currently there are no integration test like provisioning
for triggering periodic updates.

Currently 'parrot' board canaries (only board with rotating HDD) is failing due
to timeouts related to slow updates. This CL potentially will clear that problem.

TEST=cros_workon_make --test, installed an image with/out the O_DSYCN flag and
measured the 'cros flash' time.
BUG= chromium:738027 

Reviewed-on: https://chromium-review.googlesource.com/567360
Commit-Ready: Amin Hassani <ahassani@chromium.org>
Tested-by: Amin Hassani <ahassani@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
(cherry picked from commit 7ecda265a87236e83cf820364947a1618872b6be)

Change-Id: If36a9d9f3100e5bb85ab0e0281458ab921078260

[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer.h
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer_integration_test.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action.h
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter_unittest.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action_unittest.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter_android.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_generator/generate_delta_main.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer_unittest.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer.cc
[modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter.cc

Status: Fixed (was: Started)
Project Member

Comment 25 by sheriffbot@chromium.org, Jul 31 2017

Cc: josa...@chromium.org
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Merge-Approved-60
Labels: FixedByAURewrite
Status: Verified (was: Fixed)

Sign in to add a comment