[Canary] autoupdate_EndToEndTest failures of timeout on receiving finished notification |
|||||||
Issue descriptionThe following builder failed at autoupdate_EndToEndTest with timeout of finished notification which is different from crbug.com/730134 and crbug.com/730141 . Builder: reef-release Build: https://chromegw.corp.google.com/i/chromeos/builders/reef-release/builds/1187 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122207283-chromeos-test/chromeos2-row4-rack6-host16/: 06/08 07:55:05.182 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 08:05:05.631 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 08:05:05.632 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: chell-release Build: https://chromegw.corp.google.com/i/chromeos/builders/chell-release/builds/1177 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122207822-chromeos-test/chromeos4-row12-rack9-host11/: 06/08 08:04:38.699 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 08:14:39.678 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 08:14:39.680 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122208115-chromeos-test/chromeos4-row12-rack9-host3/: 06/08 08:52:40.413 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 09:02:40.454 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 09:02:40.455 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: pyro-release Build: https://chromegw.corp.google.com/i/chromeos/builders/pyro-release/builds/714 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122208673-chromeos-test/chromeos2-row4-rack4-host5/: 06/08 08:22:14.412 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 08:32:14.511 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 08:32:14.513 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: stumpy-release Build: https://chromegw.corp.google.com/i/chromeos/builders/stumpy-release/builds/5312 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122193972-chromeos-test/chromeos4-row2-rack8-host20/: 06/08 06:47:23.270 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 06:57:23.835 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 06:57:23.836 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: parrot-release Build: https://chromegw.corp.google.com/i/chromeos/builders/parrot-release/builds/3836 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122185628-chromeos-test/chromeos6-row2-rack4-host2/: 06/08 06:25:37.974 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 06:35:38.922 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 06:35:38.923 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122185653-chromeos-test/chromeos6-row2-rack3-host2/: 06/08 06:30:40.740 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 06:40:41.601 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 06:40:41.601 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: lumpy-release Build: https://chromegw.corp.google.com/i/chromeos/builders/lumpy-release/builds/5046 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122197192-chromeos-test/chromeos6-row2-rack7-host22/: 06/08 06:55:53.710 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 07:05:54.649 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 07:05:54.649 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122197194-chromeos-test/chromeos6-row2-rack8-host11/: 06/08 06:56:33.906 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 07:06:34.781 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 07:06:34.782 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). Builder: reks-release Build: https://chromegw.corp.google.com/i/chromeos/builders/reks-release/builds/1194 Logs from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122203330-chromeos-test/chromeos4-row11-rack7-host2/: 06/08 07:31:50.169 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=9630.0.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 07:41:51.135 ERROR|autoupdate_EndToEn:0310| Timeout expired 06/08 07:41:51.136 ERROR|autoupdate_EndToEn:0278| Failed expected event: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log).
,
Jun 8 2017
,
Jun 8 2017
Builder: auron_paine-release Build: https://chromegw.corp.google.com/i/chromeos/builders/auron_paine-release/builds/1194 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: auron_yuna-release Build: https://chromegw.corp.google.com/i/chromeos/builders/auron_yuna-release/builds/1194 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: banjo-release Build: https://chromegw.corp.google.com/i/chromeos/builders/banjo-release/builds/1179 Logs: autoupdate_EndToEndTest.paygen_au_dev_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_dev_delta retry_count: 1 Builder: candy-release Build: https://chromegw.corp.google.com/i/chromeos/builders/candy-release/builds/1194 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: gandof-release Build: https://chromegw.corp.google.com/i/chromeos/builders/gandof-release/builds/1186 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: gnawty-release Build: https://chromegw.corp.google.com/i/chromeos/builders/gnawty-release/builds/1194 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: heli-release Bulid: https://chromegw.corp.google.com/i/chromeos/builders/heli-release/builds/1197 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: lulu-release Build: https://chromegw.corp.google.com/i/chromeos/builders/lulu-release/builds/1196 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: tidus-release Build: https://chromegw.corp.google.com/i/chromeos/builders/tidus-release/builds/1191 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1 Builder: veyron_minnie Build: https://chromegw.corp.google.com/i/chromeos/builders/veyron_minnie-release/builds/1198 Logs: autoupdate_EndToEndTest.paygen_au_dev_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_dev_delta retry_count: 1 Builder: veyron_speedy-release Build: https://chromegw.corp.google.com/i/chromeos/builders/veyron_speedy-release/builds/1194 Logs: autoupdate_EndToEndTest.paygen_au_canary_delta [ FAILED ] autoupdate_EndToEndTest.paygen_au_canary_delta FAIL: Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log). autoupdate_EndToEndTest.paygen_au_canary_delta retry_count: 1
,
Jun 8 2017
There is a chance this is related to my change: https://chromium-review.googlesource.com/c/502369/ update_engine_client: add O_DSYNC flags However, my first thought is 600 seconds (10 minutes) might not be enough time depending on how busy the device is and what the latency is to download parts. Despite my change being a one-liner, it fundamentally changes the client behavior. We really needs this change for at least two customer facing issues. I'll talk with gwendal about this.
,
Jun 8 2017
I looked at the stumpy failure above and it looks like the test passed: https://chromegw.corp.google.com/i/chromeos/builders/stumpy-release/builds/5312/steps/PaygenTestDev/logs/stdio points at http://cautotest/tko/retrieve_logs.cgi?job=/results/122185183-chromeos-test/ which redirects me to: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122185183-chromeos-test/chromeos4-row2-rack8-host18/ status.log contents: START autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full timestamp=1496926855 localtime=Jun 08 06:00:55 START ---- ---- timestamp=1496926872 localtime=Jun 08 06:01:12 GOOD ---- sysinfo.before timestamp=1496926872 localtime=Jun 08 06:01:12 END GOOD ---- ---- timestamp=1496926872 localtime=Jun 08 06:01:12 START ---- ---- timestamp=1496926882 localtime=Jun 08 06:01:22 GOOD ---- sysinfo.iteration.before timestamp=1496926882 localtime=Jun 08 06:01:22 END GOOD ---- ---- timestamp=1496926882 localtime=Jun 08 06:01:22 START ---- reboot timestamp=1496926928 localtime=Jun 08 06:02:08 GOOD ---- reboot.start timestamp=1496926928 localtime=Jun 08 06:02:08 GOOD ---- reboot.verify timestamp=1496926949 localtime=Jun 08 06:02:29 END GOOD ---- reboot kernel=3.8.11 localtime=Jun 08 06:02:29 timestamp=1496926949 START ---- reboot timestamp=1496927668 localtime=Jun 08 06:14:28 GOOD ---- reboot.start timestamp=1496927668 localtime=Jun 08 06:14:28 GOOD ---- reboot.verify timestamp=1496927802 localtime=Jun 08 06:16:42 END GOOD ---- reboot kernel=3.8.11 localtime=Jun 08 06:16:44 timestamp=1496927804 START ---- reboot timestamp=1496928041 localtime=Jun 08 06:20:41 GOOD ---- reboot.start timestamp=1496928041 localtime=Jun 08 06:20:41 GOOD ---- reboot.verify timestamp=1496928110 localtime=Jun 08 06:21:50 END GOOD ---- reboot kernel=3.8.11 localtime=Jun 08 06:21:51 timestamp=1496928111 START ---- ---- timestamp=1496928164 localtime=Jun 08 06:22:44 GOOD ---- sysinfo.iteration.after timestamp=1496928164 localtime=Jun 08 06:22:44 END GOOD ---- ---- timestamp=1496928164 localtime=Jun 08 06:22:44 START ---- ---- timestamp=1496928175 localtime=Jun 08 06:22:55 GOOD ---- sysinfo.after timestamp=1496928175 localtime=Jun 08 06:22:55 END GOOD ---- ---- timestamp=1496928175 localtime=Jun 08 06:22:55 GOOD autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full timestamp=1496928214 localtime=Jun 08 06:23:34 completed successfully END GOOD autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full timestamp=1496928214 localtime=Jun 08 06:23:34 I don't understand what was happening for ~12 minutes between the first and second reboots (from this log). That is the only time delta that exceeds 10 minutes (600 seconds). I am certain the O_DSYNC will slow down total throughput - but I have found no update engine (AU) perf requirement since it's running background. Happen to know where the 600 seconds timeout is specified? I'd like to extend that to 900 or 1200 seconds.
,
Jun 8 2017
I have evidence the timeout is too short: looks like stumpy needs at least 12 minutes. This is not a slow machine. I'm going to propose a CL to extend the timeout to 1200 seconds.
Some key bits from the autoserv.DEBUG log file:
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122185183-chromeos-test/chromeos4-row2-rack8-host18/debug/
06/08 06:00:55.811 INFO | server_job:0199| START autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full timestamp=1496926855 localtime=Jun 08 06:00:55
...
06/08 06:01:08.925 DEBUG| autotest:0380| Autotest job starts on remote host: chromeos4-row2-rack8-host18
...
06/08 06:01:30.567 DEBUG|autoupdate_EndToEn:1800| Choose the least loaded devserver: http://100.115.219.135:8082
06/08 06:01:30.567 INFO |autoupdate_EndToEn:0997| Staging images onto autotest devserver (http://100.115.219.135:8082)
06/08 06:01:30.569 INFO | dev_server:1108| Staging artifacts on devserver http://100.115.219.135:8082: build=stable-channel/stumpy/8530.96.0/payloads, artifacts=None, files=['chromeos_8530.96.0_stumpy_stable-channel_full_test.bin-857dd3431a264c5554e54b810a58df7c'], archive_url=gs://chromeos-releases/stable-channel/stumpy/8530.96.0/payloads
...
06/08 06:01:35.994 INFO | dev_server:1126| Finished staging artifacts: build=stable-channel/stumpy/8530.96.0, artifacts=None, files=['stateful.tgz'], archive_url=gs://chromeos-releases/stable-channel/stumpy/8530.96.0
06/08 06:01:35.998 INFO |autoupdate_EndToEn:1028| Source full payload from gs://chromeos-releases/stable-channel/stumpy/8530.96.0/payloads/chromeos_8530.96.0_stumpy_stable-channel_full_test.bin-857dd3431a264c5554e54b810a58df7c staged at http://100.115.219.135:8082/static/stable-channel/stumpy/8530.96.0/payloads/chromeos_8530.96.0_stumpy_stable-channel_full_test.bin-857dd3431a264c5554e54b810a58df7c
...
06/08 06:02:07.119 INFO |autoupdate_EndToEn:1147| Installing a source image on the DUT
...
06/08 06:02:08.919 DEBUG| ssh_host:0286| Running (ssh) '( sleep 1; reboot & sleep 10; reboot -f ) </dev/null >/dev/null 2>&1 & echo -n $!'
...
06/08 06:02:29.320 DEBUG| abstract_ssh:0587| Host chromeos4-row2-rack8-host18 is now up
...
06/08 06:02:32.271 WARNI|autoupdate_EndToEn:0486| Waiting for devserver to start up.
...
06/08 06:02:33.965 INFO |autoupdate_EndToEn:0494| Devserver pid is 29320, serving on port 45779
06/08 06:02:33.966 WARNI|autoupdate_EndToEn:0510| Waiting for devserver to accept network requests.
...
06/08 06:02:35.487 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.115.219.135:45779/update'
06/08 06:02:35.737 ERROR| utils:0298| [stderr] [0608/060234:INFO:update_engine_client.cc(471)] Forcing an update by setting app_version to ForcedUpdate.
06/08 06:02:35.739 ERROR| utils:0298| [stderr] [0608/060234:INFO:update_engine_client.cc(473)] Initiating update check and install.
06/08 06:02:35.740 ERROR| utils:0298| [stderr] [0608/060234:INFO:update_engine_client.cc(502)] Waiting for update to complete.
**** TEST TIMEOUT HERE? ****
06/08 06:14:12.348 ERROR| utils:0298| [stderr] [0608/061411:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed.
...
06/08 06:14:13.540 DEBUG| utils:0298| [stdout] Downloading stateful payload from http://100.115.219.135:8082/static/stable-channel/stumpy/8530.96.0/stateful.tgz
...
06/08 06:14:25.947 DEBUG| utils:0298| [stdout] Successfully downloaded update.
06/08 06:14:25.947 DEBUG| utils:0298| [stdout] Restoring state to factory_install with dev_image.
...
,
Jun 9 2017
Oops... I think I handed this bug to gwendal too quickly. "Failed to finish download" is not the same as the "python missing" bug he's been working on. I really don't understand what this failure means yet.
,
Jun 9 2017
I think the failure is related to my change but it's a problem with the test that extending the timeout will fix. I'll submit a CL shortly to extend the timeout...just need to find out where it is. :)
,
Jun 9 2017
,
Jun 9 2017
I need to find the source to this test it seems: https://wmatrix.googleplex.com/unfiltered?hide_missing=True&tests=autoupdate_EndToEndTest.paygen_au_dev_full cs.corp isn't finding it. :(
,
Jun 9 2017
dgarett stopped by and pointed me at the right file which specifies the 600 second timeout: chromite/lib/paygen/signer_payloads_client.py: _CleanSignerFilesByKeyset() I've submitted: https://chromium-review.googlesource.com/528629 paygen: increase EndToEnd timeout to 1200 seconds Data below supports the change to 1200 seconds (20 minutes). So far, 7 boards have failed autoupdate_EndToEndTest.paygen_au_dev_full since my change landed: https://wmatrix.googleplex.com/failures/unfiltered?tests=autoupdate_EndToEndTest.paygen_au_dev_full [Tip: search for "bin/update_engine_client" in the autoserv.DEBUG log] BOARD Storage UpdateTime[logs below] chell eMMC 10:28 [1] lumpy SATA 13:23 [2] parrot HDD 18:37 [3] pyro --- 12:01 [5] reef --- 11:27 [6] sand --- 11:29 [7] stumpy SATA 11:59 (provisioning and timeout after 600 sec on test update) On a successive test that passed: stumpy SATA 11:37 [4] (provisioning) IIRC, stumpy and lumpy use the same M.2 SanDisk part that caused grief before we even released those products. I was expecting more eMMC boards to fail with timeouts. But it's possible NEWER eMMC parts have lower write speeds due to MLC (older/faster) vs TLC (Newer/Lower $/bit cost). Gwendal might know for sure - just my first guess. There are two failure modes it seems and I'm only going address the "Failed to receive a download finished notification (download_finished) within 600 seconds" symptom. The other failure seems to be "Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test." and is very likely unrelated to my change. [1] CHELL UPDATE LOG SNIPPETS 06/08 08:33:11.441 DEBUG| utils:0203| Running 'ssh 100.115.219.136 'curl "http://100.115.219.136:33632/check_health?"'' 06/08 08:33:12.648 INFO | autoupdater:0302| Updating image via: /usr/bin/update_engine_client --update --omaha_url=http://100.115.219.136:33632/update 06/08 08:33:12.650 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.115.219.136:33632/update' 06/08 08:33:13.105 ERROR| utils:0298| [stderr] [0608/083312:INFO:update_engine_client.cc(471)] Forcing an update by setting app_version to ForcedUpdate. 06/08 08:33:13.106 ERROR| utils:0298| [stderr] [0608/083312:INFO:update_engine_client.cc(473)] Initiating update check and install. 06/08 08:33:13.143 ERROR| utils:0298| [stderr] [0608/083312:INFO:update_engine_client.cc(502)] Waiting for update to complete. 06/08 08:44:41.644 ERROR| utils:0298| [stderr] [0608/084440:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed. 06/08 08:44:41.688 WARNI| autoupdater:0315| Unable to parse build name /update for metrics. Continuing anyway. 06/08 08:44:41.690 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client -status | grep CURRENT_OP' 06/08 08:44:42.102 ERROR| utils:0298| [stderr] [0608/084441:INFO:update_engine_client.cc(493)] Querying Update Engine status... 06/08 08:44:42.139 DEBUG| utils:0298| [stdout] CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT 06/08 08:44:42.143 INFO | autoupdater:0521| Updating stateful partition... ... 06/08 08:46:16.548 DEBUG| utils:0298| [stdout] Downloading command returns code 0. 06/08 08:46:16.548 DEBUG| utils:0298| [stdout] Successfully downloaded update. 06/08 08:46:16.548 DEBUG| utils:0298| [stdout] Restoring state to factory_install with dev_image. ... 06/08 08:48:03.506 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.219.136:43002/update' ... 06/08 08:58:14.060 ERROR|autoupdate_EndToEn:0310| Timeout expired [2] LUMPY UPDATE LOG SNIPPETS 06/08 14:29:33.630 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.115.185.227:40618/update' ... 06/08 14:42:56.454 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client -status | grep CURRENT_OP' 06/08 14:42:56.817 ERROR| utils:0298| [stderr] [0608/144255:INFO:update_engine_client.cc(493)] Querying Update Engine status... 06/08 14:42:56.819 DEBUG| utils:0298| [stdout] CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT 06/08 14:42:56.821 INFO | autoupdater:0521| Updating stateful partition... ... 06/08 14:42:57.950 DEBUG| utils:0298| [stdout] Downloading stateful payload from http://100.115.185.227:8082/static/dev-channel/lumpy/9631.0.0/stateful.tgz ... 06/08 14:45:21.275 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.185.227:44601/update' ... 06/08 14:55:28.377 ERROR|autoupdate_EndToEn:0310| Timeout expired [3] PARROT UPDATE LOG SNIPPETS 06/08 06:09:01.141 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.115.185.227:46773/update' ... 06/08 06:27:38.762 ERROR| utils:0298| [stderr] [0608/062737:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed. ... 06/08 06:31:43.930 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.185.227:46303/update' ... 06/08 06:41:49.873 ERROR|autoupdate_EndToEn:0310| Timeout expired [4] STUMPY UPDATE LOG SNIPPETS 06/08 06:02:35.487 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --update --omaha_url=http://100.115.219.135:45779/update' ... 06/08 06:14:12.348 ERROR| utils:0298| [stderr] [0608/061411:INFO:update_engine_client.cc(224)] Update succeeded -- reboot needed. ... 06/08 06:17:22.141 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.219.135:43124/update' ... 06/08 06:17:29.046 INFO |autoupdate_EndToEn:0274| Expecting event_result=1:success version=8530.96.0 event_type=14:download_finished previous_version=any, within 600 seconds 06/08 06:19:29.238 INFO |autoupdate_EndToEn:0374| Consumed new event: {u'event_result': '1', u'event_type': '14', u'track': 'stable-channel', u'timestamp': '2017-06-08 06:19:29', u'version': '8530.96.0', u'board': 'stumpy'} 06/08 06:19:29.239 INFO |autoupdate_EndToEn:0301| Event received after 120.1 seconds ... 06/08 06:21:51.283 DEBUG| ssh_host:0286| Running (ssh) '/usr/bin/update_engine_client --check_for_update --omaha_url=http://100.115.219.135:43124/update' ... 06/08 06:21:51.520 INFO |autoupdate_EndToEn:0274| Expecting event_result=2:success_reboot version=9630.0.0 event_type=3:update_complete previous_version=8530.96.0 OR event_result=1:success version=9630.0.0 event_type=54:rebooted_after_update previous_version=8530.96.0, within 900 seconds ... 06/08 06:22:00.262 INFO |autoupdate_EndToEn:0374| Consumed new event: {u'event_result': '1', u'event_type': '54', u'previous_version': '8530.96.0', u'track': 'stable-channel', u'timestamp': '2017-06-08 06:21:59', u'version': '9630.0.0', u'board': 'stumpy'} 06/08 06:22:00.263 INFO |autoupdate_EndToEn:0301| Event received after 8.7 seconds [5] pyro: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122255948-chromeos-test/chromeos2-row4-rack4-host1/debug/ [6] reef: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122207426-chromeos-test/chromeos2-row4-rack5-host3/debug/ [7] sand: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/122206647-chromeos-test/chromeos4-row7-rack11-host5/debug/
,
Jun 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/178afc7c3cf5b5d74c944328b62209b679a61677 commit 178afc7c3cf5b5d74c944328b62209b679a61677 Author: Grant Grundler <grundler@google.com> Date: Fri Jun 09 03:16:02 2017 paygen: increase EndToEnd timeout to 1200 seconds https://chromium-review.googlesource.com/c/502369/ adds O_DSYNC to update_engine and this is required to fix OOM issues with "small mem" (ie gale; 512MB RAM) devices. One side effect is this slows down write throughput. However, we want slower throughput on consumer devices since we don't want to affect other activities (like video playback or hangouts) during an update. While this is "suboptimal" for lab provisioning, we can add a "lab only" parameter to NOT use O_DSYNC in Chromium test lab. BUG= chromium:731214 TEST=./signer_payloads_client_unittest OK (skipped=2) Note: 2 network test(s) skipped; use --network to run them. Change-Id: Iec15fbd647b6483f6f303a7f7d748814edf9f532 Reviewed-on: https://chromium-review.googlesource.com/528629 Reviewed-by: Grant Grundler <grundler@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Tested-by: Grant Grundler <grundler@chromium.org> Trybot-Ready: Grant Grundler <grundler@chromium.org> Commit-Queue: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/178afc7c3cf5b5d74c944328b62209b679a61677/lib/paygen/signer_payloads_client.py
,
Jun 9 2017
,
Jun 9 2017
Other tests are still failing with this symptom: https://uberchromegw.corp.google.com/i/chromeos/builders/chell-release https://uberchromegw.corp.google.com/i/chromeos/builders/candy-release I changed the wrong place. jrbarnette pointed out: _WAIT_FOR_DOWNLOAD_COMPLETED_SECONDS = 10 * 60 in server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py since that is associated with the failure message: 1675 chain.add_event( 1676 ExpectedUpdateEvent( 1677 event_type=EVENT_TYPE_DOWNLOAD_FINISHED, 1678 event_result=EVENT_RESULT_SUCCESS, 1679 version=source_release, 1680 on_error=self._error_download_finished), 1681 self._WAIT_FOR_DOWNLOAD_COMPLETED_SECONDS, 1682 on_timeout=self._timeout_err( 1683 'a download finished notification', 1684 self._WAIT_FOR_DOWNLOAD_COMPLETED_SECONDS, 1685 event_type=EVENT_TYPE_DOWNLOAD_FINISHED)) [Personally, I would have NOT approved this code just for it's coding style.] I've uploaded another CL to address this and would appreciate any help to land this since quite a few release builders (canaries) are red now due to this. https://chromium-review.googlesource.com/530054
,
Jun 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/f3c5dcd63105ae2f98a9dbb4da1cdbeacd49e832 commit f3c5dcd63105ae2f98a9dbb4da1cdbeacd49e832 Author: Grant Grundler <grundler@google.com> Date: Sat Jun 10 04:44:04 2017 autotest: increase "download" timeout to 1200 sec Timeouts started showing up once this CL landed: https://chromium-review.googlesource.com/c/502369/ CL502369 adds O_DSYNC to update_engine and this is required to fix OOM issues with "small mem" (ie gale; 512MB RAM) devices. OOM reports doubled while any new release was pushed. CL502369 has two side effects because writes are no longer asynchronous: 1) slightly slower write throughput 2) timeout now covers ALL data reaching media. (1) is primarily due to not "pipelining" preparing (uncompressing) data to be written and having to wait for the current write() to complete. (2) isn't as certain but I expect "download completed" event to be issued when the last write() completes. Thus if write() is asynchronous, the storage device is "slow" (perhaps even busy with other activity), quite a bit of data could still be in the linux kernel buffer cache when the "download finished" event is sent back to autotest. However, we _want_ slower throughput on consumer devices in order to avoid competing with other activities (like video playback or hangouts) during an update. While this is "suboptimal" for lab provisioning. We will have to follow up with additional changes improve update performance again in the Chrome OS Test lab. BUG= chromium:731214 TEST=none (will try to run cbuildbot - no promises) Change-Id: Ia0f7446d6639f06423de00b0650f6b2ce30383b7 Reviewed-on: https://chromium-review.googlesource.com/530054 Commit-Ready: Grant Grundler <grundler@chromium.org> Tested-by: Grant Grundler <grundler@chromium.org> Reviewed-by: Don Garrett <dgarrett@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> [modify] https://crrev.com/f3c5dcd63105ae2f98a9dbb4da1cdbeacd49e832/server/site_tests/autoupdate_EndToEndTest/autoupdate_EndToEndTest.py
,
Jun 27 2017
I'm going to claim victory for now. My changes plus recent changes from ahassani (use BLKDISCARD instead of BLKSECDISCARD) should help resolve most of this issue. If there are specific platforms still intermittently failing this test, I think it would be good to open a new bug specific to those platforms.
,
Sep 19 2017
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by mcchou@chromium.org
, Jun 8 2017