Eve failed canary build due to AU test failures |
|||||||||||||||
Issue descriptionEve canary build has failed because of Paygen AU tests failing on the latest canary. Looks like the DUT did not come back up after applying the AU. Chrome OS:9675.0.0 Chrome: 61.0.3136.5 https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/661/steps/PaygenTestDev/logs/stdio autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595354 autoupdate_EndToEndTest_paygen_au_dev_delta_9675.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595356 autoupdate_EndToEndTest_paygen_au_dev_delta_9608.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595358 autoupdate_EndToEndTest_paygen_au_dev_full_9608.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595359 The suite job has another 1:29:30.198128 till timeout. The suite job has another 0:59:23.631959 till timeout. The suite job has another 0:29:20.136746 till timeout. The suite job has another -1 day, 23:59:17.263361 till timeout. 06-22-2017 [08:56:47] Suite job is finished. Suite timed out. Started on 06-22-2017 [05:48:49], timed out on 06-22-2017 [08:56:47] 06-22-2017 [08:56:47] Start collecting test results and dump them to json. Suite job [ FAILED ] Suite job ABORT: autoupdate_EndToEndTest.paygen_au_dev_full [ FAILED ] autoupdate_EndToEndTest.paygen_au_dev_full ABORT: Host did not return from reboot autoupdate_EndToEndTest.paygen_au_dev_delta [ PASSED ] ----------- host: chromeos2-row4-rack9-host10, status: Running, locked: False diagnosis: Working labels: ['board:eve', 'bluetooth', 'lightsensor', 'accel:cros-ec', 'arc', 'hw_video_acc_enc_h264', 'os:cros', 'hw_jpeg_acc_dec', 'power:battery', 'ec:cros', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'servo', 'hw_video_acc_vp9', 'cts_abi_x86', 'cts_abi_arm', 'storage:mmc', 'webcam', 'eve', 'internal_display', 'audio_loopback_dongle', 'pool:bvt', 'cros-version:eve-release/R61-9675.0.0'] Last 10 jobs within 3:18:00: 60860638 Reset started on: 2017-06-22 08:19:59 status PASS 124930899 eve-release/R61-9675.0.0/paygen_au_dev/autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0 started on: 2017-06-22 07:46:33 status Completed 60860511 Reset started on: 2017-06-22 07:43:04 status PASS 60860481 Cleanup started on: 2017-06-22 07:39:03 status PASS 60860477 Reset started on: 2017-06-22 07:34:32 status FAIL 60860376 Reset started on: 2017-06-22 07:04:20 status PASS 60860201 Provision started on: 2017-06-22 06:16:27 status PASS Reason: Some test(s) failed. 06-22-2017 [08:57:09] Output below this line is for buildbot consumption: @@@STEP_LINK@[Test-Logs]: Suite job: ABORT@http://cautotest/tko/retrieve_logs.cgi?job=/results/124595121-chromeos-test/@@@ @@@STEP_LINK@[Flake-Dashboard]: Suite job@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=Suite job@@@ @@@STEP_LINK@[Test-Logs]: autoupdate_EndToEndTest.paygen_au_dev_full: ABORT: Host did not return from reboot@http://cautotest/tko/retrieve_logs.cgi?job=/results/124595354-chromeos-test/@@@ @@@STEP_LINK@[Flake-Dashboard]: autoupdate_EndToEndTest.paygen_au_dev_full@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=autoupdate_EndToEndTest.paygen_au_dev_full@@@ Will return from run_suite with status: ERROR
,
Jun 22 2017
It looks like eve has been failing one of the paygen suites for the last 5 builds (657 - 661) On the latest build 661, it failed both PaygenTestCanary and PaygenTestDev The PaygenTestCanary failure is weird though: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124591938 It shows that one of the suites was aborted but if you go into the logs of that test run the test completes successfully. The same for 660, the builds page shows PaygenTestCanary failed but when you open the suite they all passed: https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/660 http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124508901 For 661 PaygenTestDev it shows that a bunch of the runs were aborted: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595121 But the logs show the test completed successfully and then showed this afterwards: 06/22 08:55:39.834 ERROR| logging_manager:0626| tko parser: {'aborted_by': 'autotest_system', 'job_started': 1498145874, 'parent_job_id': 124595121, 'user': 'chromeos-test', 'aborted_on': 1498146864, 'builds': "{'cros-version': 'eve-release/R61-9675.0.0'}", 'job_finished': 1498146872, 'hostname': 'chromeos2-row4-rack9-host3', 'status_version': 1, 'label': 'eve-release/R61-9675.0.0/paygen_au_dev/autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0', 'drone': 'chromeos-server3.hot.corp.google.com', 'build': 'eve-release/R61-9675.0.0', 'suite': 'paygen_au_dev', 'retry_original_job_id': 124595354, 'experimental': 'False', 'job_queued': 1498144842} 06/22 08:55:39.837 ERROR| logging_manager:0626| tko parser: MACHINE NAME: chromeos2-row4-rack9-host3 06/22 08:55:39.837 ERROR| logging_manager:0626| tko parser: MACHINE GROUP: eve 06/22 08:55:39.838 ERROR| logging_manager:0626| tko parser: parsing partial test ---- SERVER_JOB 06/22 08:55:39.838 ERROR| logging_manager:0626| tko parser: parsing partial test autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.839 ERROR| logging_manager:0626| tko parser: RUNNING: RUNNING 06/22 08:55:39.839 ERROR| logging_manager:0626| Subdir: autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.840 ERROR| logging_manager:0626| Testname: autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.840 ERROR| logging_manager:0626| 06/22 08:55:39.840 ERROR| logging_manager:0626| tko parser: Unexpected indent: aborting log parse 06/22 08:55:39.841 ERROR| logging_manager:0626| tko parser: parsing test autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.841 ERROR| logging_manager:0626| tko parser: ADD: ABORT 06/22 08:55:39.842 ERROR| logging_manager:0626| Subdir: autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.842 ERROR| logging_manager:0626| Testname: autoupdate_EndToEndTest.paygen_au_dev_full 06/22 08:55:39.842 ERROR| logging_manager:0626| None 06/22 08:55:39.843 ERROR| logging_manager:0626| tko parser: parsing test ---- SERVER_JOB One of the suites seemed to legitimately fail (Host did not return) http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595358 The rootfs was updated, then it applied stateful and it didn't return from reboot. This could be a once off though as none of the other builds have this failure
,
Jun 22 2017
So to me the test seems to be doing its job but something is marking them as aborted after they are done
,
Jun 23 2017
,
Jun 23 2017
,
Jun 23 2017
I think this might cause many missing bvt tests ? https://screenshot.googleplex.com/zrG7cdMyTFq.png here is the link to the corresponded build status https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/662 eve-release: The PaygenTestCanary stage failed: (15, 'Received signal 15; shutting down') The PaygenBuildCanary stage failed: (15, 'Received signal 15; shutting down') The Paygen stage failed: : No output from <_BackgroundTask(_BackgroundTask-7:6:7:3, started)> for 8610 seconds
,
Jun 23 2017
+todd, let me know if you want a separate bug for the build failures and missing tests.
,
Jun 23 2017
Yes: separate bugs for anything not related to PaygenTest* From quick read #2 this sounds infra related. I see a pass at build 650 and believe theres a bunch of canary/dev dogfooders getting updates so failure is specific to test related AU. Finally any chance this is related to crbug.com/689105? Gwendal can you have a look at this?
,
Jun 23 2017
This isn't related. 689105 is a probelm with AU. This is some infra problem On the latest eve build: https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/664 PaygenTestCanary => All suites were aborted without running http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124811281 PaygenTestDev http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124813526 The first two were aborted without running. The third was aborted during staging payloads in the test The fourth was aborted when it was starting the update
,
Jun 23 2017
Re#9: There was a swarming proxy glitch this morning: https://viceroy.corp.google.com/chromeos/swarming_proxy?hostname=chromeos-server11&duration=1d&heatmap=False&host_name=chromeos-server2&refresh=90&topstreams=5#_VG_r8L_5ReW Looks like the suite job was aborted mid-flight, so I'm going to go find a slightly older job to see what's up.
,
Jun 24 2017
FYI, ketakid mentioned that this CL https://chromium-review.googlesource.com/#/c/546882/ is likely to blame and has been reverted.
,
Jun 24 2017
In that case ... (We were chasing this bug the whole day after all :) )
,
Jun 25 2017
#11, that CL and issue are not even closely related to the failures I see reported here. Reopening this.
,
Jun 25 2017
Thanks tfiga@. gwendal@ can you please take a look at this issue? This is a priority since the Eve canary is failing AU tests for the last few days and is causing all dogfooders a lot of pain.
,
Jun 26 2017
I believe this issue is affecting many other boards like caroline.
,
Jun 26 2017
For caroline, R61-9678.0.0 is good, but R61-9679.0.0 does not boot. Though I guess that's similar issue, but I realized this issue was reported on R61-9675.0.0.
,
Jun 26 2017
9685.0.0 Eve build has passed canary & dev paygen tests. https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/671 Can't reach crosland to see if there's a specific CL to thank but will monitor next build which should finish in ~2hrs https://crosland.corp.google.com/log/9684.0.0..9685.0.0
,
Jun 26 2017
9686.0.0 passed as well. Still can't reach crosland but closing as it does look like it got addressed. 3:33:56: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpddER9m/tmpb08CJP/temp_summary.json --raw-cmd --task-name eve-release/R61-9686.0.0-paygen_au_canary --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:paygen_au_canary' '--tags=build:eve-release/R61-9686.0.0' '--tags=task_name:eve-release/R61-9686.0.0-paygen_au_canary' '--tags=board:eve' -- /usr/local/autotest/site_utils/run_suite.py --build eve-release/R61-9686.0.0 --board eve --suite_name paygen_au_canary --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --suite_min_duts 2 -m 125178255 00:18:50: INFO: Refreshing due to a 401 (attempt 1/2) 00:18:50: INFO: Refreshing access_token Autotest instance: cautotest 06-25-2017 [23:33:54] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=125178255 The suite job has another 2:29:43.520500 till timeout. The suite job has another 1:59:37.113590 till timeout. 06-26-2017 [00:50:13] Suite job is finished. 06-26-2017 [00:50:13] Start collecting test results and dump them to json. Suite job [ PASSED ] autoupdate_EndToEndTest.paygen_au_canary_delta [ PASSED ] autoupdate_EndToEndTest.paygen_au_canary_full [ PASSED ] autoupdate_EndToEndTest.paygen_au_canary_delta [ PASSED ] autoupdate_EndToEndTest.paygen_au_canary_full [ PASSED ] Suite timings: Downloads started at 2017-06-25 23:33:50 Payload downloads ended at 2017-06-25 23:33:52 Suite started at 2017-06-25 23:34:16 Artifact downloads ended (at latest) at 2017-06-25 23:34:39 Testing started at 2017-06-25 23:53:24 Testing ended at 2017-06-26 00:44:34 Links to test logs: Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/125178255-chromeos-test/ autoupdate_EndToEndTest.paygen_au_canary_delta http://cautotest/tko/retrieve_logs.cgi?job=/results/125178278-chromeos-test/ autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125178279-chromeos-test/ autoupdate_EndToEndTest.paygen_au_canary_delta http://cautotest/tko/retrieve_logs.cgi?job=/results/125178280-chromeos-test/ autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125178281-chromeos-test/
,
Jun 26 2017
What was the fix for this bug, please?
,
Jun 26 2017
re-cap of last several eve builds: passed: 671, 672, 674 failed: 673 No diffs in crosland: https://crosland.corp.google.com/log/9684.0.0..9685.0.0 (build 670 -> 671) Also mentioned by others that similar failures on cave,chell,caroline https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/1251 https://uberchromegw.corp.google.com/i/chromeos/builders/chell-release/builds/1218 https://uberchromegw.corp.google.com/i/chromeos/builders/caroline-release/builds/791 re-opening since it appears we're still chasing this.
,
Jun 26 2017
Counter-example crbug.com/736847 - successful eve-release build. So the eve failure could have been something else. But many other boards still fail (they are marked as 'running' but really have timed out, I think).
,
Jun 26 2017
,
Jun 27 2017
,
Jun 27 2017
,
Jun 27 2017
673 fails in PaygenTestCanary 674 passes 676 fails in HWtest Since we had a passing build and the two problems are different, I believe this bug is fixed. link: https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/673
,
Jul 25 2017
looks like fixed. please reopen if still fails. Thanks! |
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by dchan@chromium.org
, Jun 22 2017