Issue metadata
Sign in to add a comment
|
parrot-release has been broken for 3 weeks |
||||||||||||||||||||||
Issue descriptionThe last success was on the 7th. https://luci-milo.appspot.com/buildbot/chromeos/parrot-release/?limit=100 Seems to die in paygen test with: Could not attach to process. If your uid matches the uid of the target parrot-release:3898 failed Builders failed on: - parrot-release: https://luci-milo.appspot.com/buildbot/chromeos/parrot-release/3898
,
Jun 30 2017
Parrot AU and Paygen tests are failing with this error message:
Failed to receive a download finished notification (download_finished)
within 1200 seconds. This could be a problem with the updater or a
connectivity issue. For more details, check the update_engine log
(in sysinfo or on the DUT, also included in the test log).
Passing back to the sheriff to find someone who can debug an AU
test failure.
,
Jun 30 2017
Looking at the lumpy failures, they're different, and will need a different bug.
,
Jul 6 2017
,
Jul 6 2017
,
Jul 6 2017
,
Jul 6 2017
Current failures appear to be related to update_engine test timing out (1200 seconds now) for parrot-release builder, not parrot-paladin builder. I will investigate if we check for "/sys/block/sd?/queue/rotational" and increase the time to 1800 seconds in that case. https://uberchromegw.corp.google.com/i/chromeos/builders/parrot-release/builds/3921 [Auto-Bug]: autoupdate_EndToEndTest.paygen_au_dev_delta: retry_count: 1, FAIL: Failed to receive a download finished notification (download_finished) within 1200 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log)., 221 reports [Test-Logs]: autoupdate_EndToEndTest.paygen_au_dev_delta: retry_count: 1, FAIL: Failed to receive a download finished notification (download_finished) within 1200 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log).
,
Jul 6 2017
The "provisioning" (full or "forced") update for parrt-release build 3921 took nearly 20 minutes:
07/06 06:46:18.339 ERROR| utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(471)] Forcing an update by setting app_version to ForcedUpdate.
07/06 06:46:18.340 ERROR| utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(473)] Initiating update check and install.
07/06 06:46:18.341 ERROR| utils:0280| [stderr] [0706/064618:INFO:update_engine_client.cc(502)] Waiting for update to complete.
07/06 07:04:50.881 ERROR| utils:0280| [stderr] [0706/070450:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed.
The actual update test failed with timeout:
07/06 07:08:40.844 INFO | autoupdater:0254| Triggering update via: /usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.185.227:42475/update
...
07/06 07:08:41.074 INFO |autoupdate_EndToEn:0271| Expecting event_result=any version=9693.1.0 event_type=any previous_version=any, within 720 seconds
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0371| Consumed new event: {u'event_result': '1', u'event_type': '54', u'previous_version': '0.0.0.0', u'track': 'stable-channel', u'timestamp': '2017-07-06 07:08:41', u'version': '9693.1.0', u'board': 'parrot'}
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0298| Event received after 0.0 seconds
07/06 07:08:41.196 INFO |autoupdate_EndToEn:0271| Expecting event_result=1:success version=9693.1.0 event_type=13:download_started previous_version=any, within 240 seconds
07/06 07:08:42.444 INFO |autoupdate_EndToEn:0371| Consumed new event: {u'event_result': '1', u'event_type': '13', u'track': 'stable-channel', u'timestamp': '2017-07-06 07:08:42', u'version': '9693.1.0', u'board': 'parrot'}
07/06 07:08:42.445 INFO |autoupdate_EndToEn:0298| Event received after 1.1 seconds
07/06 07:08:42.445 INFO |autoupdate_EndToEn:0271| Expecting event_result=1:success version=9693.1.0 event_type=14:download_finished previous_version=any, within 1200 seconds
07/06 07:28:42.985 ERROR|autoupdate_EndToEn:0307| Timeout expired
,
Jul 7 2017
I've uploaded an UNTESTED code that I will need help testing on parrot-release:
https://chromium-review.googlesource.com/562552
Can someone help test that when it's convenient for them?
,
Jul 10 2017
Issue 740420 has been merged into this issue.
,
Jul 11 2017
,
Jul 11 2017
,
Jul 11 2017
,
Jul 11 2017
Update: reassigning to ahassani since he is working on a heuristic to NOT use O_DSYNC when an update is forced - either by user or by "cros flash". The main problem is parrot has a traditional HDD (aka "spinning rust") and the update time was already pretty slow. My change to use O_DSYNC nearly doubled the time and it's right around 20 minutes (1200 seconds) now. I've uploaded an autotest change to detect "rotational" media and then double the timeout: https://chromium-review.googlesource.com/c/562552/ but since I am not able to test this, it's not going in. And it's going to slightly slow down the "normal" (SSD) case by sending two more SSH commands to the DUT. There is very likely a more efficient way of implementing this but I just wanted to post the code so autotest/infra team knows how to detect "rotational" media.
,
Jul 11 2017
I'm testing a solution that I implemented, I will update once it works.
,
Jul 11 2017
Added this CL (https://chromium-review.googlesource.com/c/567360/) as a solution.
,
Jul 18 2017
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/update_engine/+/7ecda265a87236e83cf820364947a1618872b6be commit 7ecda265a87236e83cf820364947a1618872b6be Author: Amin Hassani <ahassani@google.com> Date: Tue Jul 18 07:32:49 2017 Open partitions with O_DSYNC flag only if the update is periodic. Currently when updating we always open the target partition with flag O_DSYNC (CL:562552), but this makes all infrastructure operations like 'cros flash', provisioning, force update, paygen, etc much slower. This changes the update engine to only add O_DSYNC flag if an update is triggered by periodic checks (not interactively forced). This means if the user clicks on 'check for update' it will be an interactive update and O_DSYNC will not be used. This change keeps the AOSP partitions open without O_DSYNC flag. This CL uses non-interactive mode for all unit tests but currently there are no integration test like provisioning for triggering periodic updates. Currently 'parrot' board canaries (only board with rotating HDD) is failing due to timeouts related to slow updates. This CL potentially will clear that problem. TEST=cros_workon_make --test, installed an image with/out the O_DSYCN flag and measured the 'cros flash' time. BUG= chromium:738027 Change-Id: If45fcf5e798b9c9353e09021ad812c859d983a65 Reviewed-on: https://chromium-review.googlesource.com/567360 Commit-Ready: Amin Hassani <ahassani@chromium.org> Tested-by: Amin Hassani <ahassani@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer.h [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer_integration_test.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action.h [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter_unittest.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/download_action_unittest.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter_android.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_generator/generate_delta_main.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer_unittest.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/payload_consumer/delta_performer.cc [modify] https://crrev.com/7ecda265a87236e83cf820364947a1618872b6be/update_attempter.cc
,
Jul 25 2017
,
Jul 25 2017
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: amineer@(Android), cmasso@(iOS), josafat@(ChromeOS), bustamante@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 25 2017
Please add appropriate OSs.
,
Jul 25 2017
,
Jul 26 2017
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/aosp/platform/system/update_engine/+/760fd21451842b226aad11f67c28f16b17c4e0d8 commit 760fd21451842b226aad11f67c28f16b17c4e0d8 Author: Amin Hassani <ahassani@google.com> Date: Tue Jul 25 17:58:27 2017 Open partitions with O_DSYNC flag only if the update is periodic. Currently when updating we always open the target partition with flag O_DSYNC (CL:562552), but this makes all infrastructure operations like 'cros flash', provisioning, force update, paygen, etc much slower. This changes the update engine to only add O_DSYNC flag if an update is triggered by periodic checks (not interactively forced). This means if the user clicks on 'check for update' it will be an interactive update and O_DSYNC will not be used. This change keeps the AOSP partitions open without O_DSYNC flag. This CL uses non-interactive mode for all unit tests but currently there are no integration test like provisioning for triggering periodic updates. Currently 'parrot' board canaries (only board with rotating HDD) is failing due to timeouts related to slow updates. This CL potentially will clear that problem. TEST=cros_workon_make --test, installed an image with/out the O_DSYCN flag and measured the 'cros flash' time. BUG= chromium:738027 Reviewed-on: https://chromium-review.googlesource.com/567360 Commit-Ready: Amin Hassani <ahassani@chromium.org> Tested-by: Amin Hassani <ahassani@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> (cherry picked from commit 7ecda265a87236e83cf820364947a1618872b6be) Change-Id: If36a9d9f3100e5bb85ab0e0281458ab921078260 [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer.h [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer_integration_test.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action.h [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter_unittest.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/download_action_unittest.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter_android.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_generator/generate_delta_main.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer_unittest.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/payload_consumer/delta_performer.cc [modify] https://crrev.com/760fd21451842b226aad11f67c28f16b17c4e0d8/update_attempter.cc
,
Jul 27 2017
,
Jul 31 2017
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 31 2017
,
Sep 19 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by sjg@google.com
, Jun 29 2017