New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 736000 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Closed: Jun 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 736534
issue 736807
issue 736991



Sign in to add a comment

Eve failed canary build due to AU test failures

Project Member Reported by keta...@chromium.org, Jun 22 2017

Issue description

Eve canary build has failed because of Paygen AU tests failing on the latest canary. Looks like the DUT did not come back up after applying the AU.

Chrome OS:9675.0.0	
Chrome: 61.0.3136.5


https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/661/steps/PaygenTestDev/logs/stdio

  autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595354
  autoupdate_EndToEndTest_paygen_au_dev_delta_9675.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595356
  autoupdate_EndToEndTest_paygen_au_dev_delta_9608.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595358
  autoupdate_EndToEndTest_paygen_au_dev_full_9608.0.0: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595359
  The suite job has another 1:29:30.198128 till timeout.
  The suite job has another 0:59:23.631959 till timeout.
  The suite job has another 0:29:20.136746 till timeout.
  The suite job has another -1 day, 23:59:17.263361 till timeout.
  06-22-2017 [08:56:47] Suite job is finished.
  Suite timed out. Started on 06-22-2017 [05:48:49], timed out on 06-22-2017 [08:56:47]
  06-22-2017 [08:56:47] Start collecting test results and dump them to json.
  Suite job                                     [ FAILED ]
  Suite job                                       ABORT: 
  autoupdate_EndToEndTest.paygen_au_dev_full    [ FAILED ]
  autoupdate_EndToEndTest.paygen_au_dev_full      ABORT: Host did not return from reboot
  autoupdate_EndToEndTest.paygen_au_dev_delta   [ PASSED ]


-----------

  host: chromeos2-row4-rack9-host10, status: Running, locked: False diagnosis: Working
  labels: ['board:eve', 'bluetooth', 'lightsensor', 'accel:cros-ec', 'arc', 'hw_video_acc_enc_h264', 'os:cros', 'hw_jpeg_acc_dec', 'power:battery', 'ec:cros', 'hw_video_acc_vp8', 'hw_video_acc_h264', 'servo', 'hw_video_acc_vp9', 'cts_abi_x86', 'cts_abi_arm', 'storage:mmc', 'webcam', 'eve', 'internal_display', 'audio_loopback_dongle', 'pool:bvt', 'cros-version:eve-release/R61-9675.0.0']
  Last 10 jobs within 3:18:00:
  60860638 Reset started on: 2017-06-22 08:19:59 status PASS
  124930899 eve-release/R61-9675.0.0/paygen_au_dev/autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0 started on: 2017-06-22 07:46:33 status Completed
  60860511 Reset started on: 2017-06-22 07:43:04 status PASS
  60860481 Cleanup started on: 2017-06-22 07:39:03 status PASS
  60860477 Reset started on: 2017-06-22 07:34:32 status FAIL
  60860376 Reset started on: 2017-06-22 07:04:20 status PASS
  60860201 Provision started on: 2017-06-22 06:16:27 status PASS
  
  Reason: Some test(s) failed.
  
   06-22-2017 [08:57:09] Output below this line is for buildbot consumption:
  @@@STEP_LINK@[Test-Logs]: Suite job: ABORT@http://cautotest/tko/retrieve_logs.cgi?job=/results/124595121-chromeos-test/@@@
  @@@STEP_LINK@[Flake-Dashboard]: Suite job@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=Suite job@@@
  @@@STEP_LINK@[Test-Logs]: autoupdate_EndToEndTest.paygen_au_dev_full: ABORT: Host did not return from reboot@http://cautotest/tko/retrieve_logs.cgi?job=/results/124595354-chromeos-test/@@@
  @@@STEP_LINK@[Flake-Dashboard]: autoupdate_EndToEndTest.paygen_au_dev_full@https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=autoupdate_EndToEndTest.paygen_au_dev_full@@@
  Will return from run_suite with status: ERROR
 

Comment 1 by dchan@chromium.org, Jun 22 2017

Cc: dhadd...@chromium.org
It looks like eve has been failing one of the paygen suites for the last 5 builds (657 - 661)

On the latest build 661, it failed both PaygenTestCanary and PaygenTestDev

The PaygenTestCanary failure is weird though:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124591938

It shows that one of the suites was aborted but if you go into the logs of that test run the test completes successfully. 

The same for 660, the builds page shows PaygenTestCanary failed but when you open the suite they all passed:
https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/660
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124508901

For 661 PaygenTestDev it shows that a bunch of the runs were aborted:
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595121

But the logs show the test completed successfully and then showed this afterwards:

06/22 08:55:39.834 ERROR|   logging_manager:0626| tko parser: {'aborted_by': 'autotest_system', 'job_started': 1498145874, 'parent_job_id': 124595121, 'user': 'chromeos-test', 'aborted_on': 1498146864, 'builds': "{'cros-version': 'eve-release/R61-9675.0.0'}", 'job_finished': 1498146872, 'hostname': 'chromeos2-row4-rack9-host3', 'status_version': 1, 'label': 'eve-release/R61-9675.0.0/paygen_au_dev/autoupdate_EndToEndTest_paygen_au_dev_full_9675.0.0', 'drone': 'chromeos-server3.hot.corp.google.com', 'build': 'eve-release/R61-9675.0.0', 'suite': 'paygen_au_dev', 'retry_original_job_id': 124595354, 'experimental': 'False', 'job_queued': 1498144842}
06/22 08:55:39.837 ERROR|   logging_manager:0626| tko parser: MACHINE NAME: chromeos2-row4-rack9-host3
06/22 08:55:39.837 ERROR|   logging_manager:0626| tko parser: MACHINE GROUP: eve
06/22 08:55:39.838 ERROR|   logging_manager:0626| tko parser: parsing partial test ---- SERVER_JOB
06/22 08:55:39.838 ERROR|   logging_manager:0626| tko parser: parsing partial test autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.839 ERROR|   logging_manager:0626| tko parser: RUNNING: RUNNING
06/22 08:55:39.839 ERROR|   logging_manager:0626| Subdir: autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.840 ERROR|   logging_manager:0626| Testname: autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.840 ERROR|   logging_manager:0626| 
06/22 08:55:39.840 ERROR|   logging_manager:0626| tko parser: Unexpected indent: aborting log parse
06/22 08:55:39.841 ERROR|   logging_manager:0626| tko parser: parsing test autoupdate_EndToEndTest.paygen_au_dev_full autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.841 ERROR|   logging_manager:0626| tko parser: ADD: ABORT
06/22 08:55:39.842 ERROR|   logging_manager:0626| Subdir: autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.842 ERROR|   logging_manager:0626| Testname: autoupdate_EndToEndTest.paygen_au_dev_full
06/22 08:55:39.842 ERROR|   logging_manager:0626| None
06/22 08:55:39.843 ERROR|   logging_manager:0626| tko parser: parsing test ---- SERVER_JOB


One of the suites seemed to legitimately fail (Host did not return)
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124595358

The rootfs was updated, then it applied stateful and it didn't return from reboot. 
This could be a once off though as none of the other builds have this failure 
So to me the test seems to be doing its job but something is marking them as aborted after they are done 

Comment 4 by dchan@google.com, Jun 23 2017

Labels: M-61

Comment 5 by dchan@google.com, Jun 23 2017

Labels: Proj-eve

Comment 6 by dchan@google.com, Jun 23 2017


I think this might cause many missing bvt tests ?
https://screenshot.googleplex.com/zrG7cdMyTFq.png

here is the link to the corresponded build status https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/662

eve-release: The PaygenTestCanary stage failed: (15, 'Received signal 15; shutting down') The PaygenBuildCanary stage failed: (15, 'Received signal 15; shutting down') The Paygen stage failed: : No output from <_BackgroundTask(_BackgroundTask-7:6:7:3, started)> for 8610 seconds 

Comment 7 by dchan@google.com, Jun 23 2017

Cc: keta...@chromium.org kmshelton@chromium.org josa...@chromium.org
+todd, let me know if you want a separate bug for the build failures and missing tests.

Comment 8 by tbroch@chromium.org, Jun 23 2017

Cc: tbroch@chromium.org
Owner: gwendal@chromium.org
Status: Assigned (was: Untriaged)
Yes: separate bugs for anything not related to PaygenTest*

From quick read #2 this sounds infra related.  I see a pass at build 650 and believe theres a bunch of canary/dev dogfooders getting updates so failure is specific to test related AU.

Finally any chance this is related to crbug.com/689105?

Gwendal can you have a look at this?
This isn't related. 689105 is a probelm with AU. This is some infra problem

On the latest eve build:
https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/664

PaygenTestCanary => All suites were aborted without running 
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124811281

PaygenTestDev
http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=124813526

The first two were aborted without running.
The third was aborted during staging payloads in the test
The fourth was aborted when it was starting the update 
Re#9: There was a swarming proxy glitch this morning:
https://viceroy.corp.google.com/chromeos/swarming_proxy?hostname=chromeos-server11&duration=1d&heatmap=False&host_name=chromeos-server2&refresh=90&topstreams=5#_VG_r8L_5ReW

Looks like the suite job was aborted mid-flight, so I'm going to go find a slightly older job to see what's up.
FYI, ketakid mentioned that this CL

https://chromium-review.googlesource.com/#/c/546882/

is likely to blame and has been reverted.
Mergedinto: 736462
Status: Duplicate (was: Assigned)
In that case ...

(We were chasing this bug the whole day after all :) )

Comment 13 by tfiga@chromium.org, Jun 25 2017

Status: Assigned (was: Duplicate)
#11, that CL and issue are not even closely related to the failures I see reported here. Reopening this.
Thanks tfiga@. gwendal@ can you please take a look at this issue? This is a priority since the Eve canary is failing AU tests for the last few days and is causing all dogfooders a lot of pain.

Comment 15 by nya@chromium.org, Jun 26 2017

Cc: levarum@chromium.org nya@chromium.org
I believe this issue is affecting many other boards like caroline.

Comment 16 by nya@chromium.org, Jun 26 2017

For caroline, R61-9678.0.0 is good, but R61-9679.0.0 does not boot. Though I guess that's similar issue, but I realized this issue was reported on R61-9675.0.0.

9685.0.0 Eve build has passed canary & dev paygen tests.

https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/671

Can't reach crosland to see if there's a specific CL to thank but will monitor next build which should finish in ~2hrs


https://crosland.corp.google.com/log/9684.0.0..9685.0.0


Status: Fixed (was: Assigned)
9686.0.0 passed as well.  Still can't reach crosland but closing as it does look like it got addressed.

3:33:56: INFO: RunCommand: /b/c/cbuild/repository/chromite/third_party/swarming.client/swarming.py run --swarming chromeos-proxy.appspot.com --task-summary-json /tmp/cbuildbot-tmpddER9m/tmpb08CJP/temp_summary.json --raw-cmd --task-name eve-release/R61-9686.0.0-paygen_au_canary --dimension os Ubuntu-14.04 --dimension pool default --print-status-updates --timeout 14400 --io-timeout 14400 --hard-timeout 14400 --expiration 1200 '--tags=priority:Build' '--tags=suite:paygen_au_canary' '--tags=build:eve-release/R61-9686.0.0' '--tags=task_name:eve-release/R61-9686.0.0-paygen_au_canary' '--tags=board:eve' -- /usr/local/autotest/site_utils/run_suite.py --build eve-release/R61-9686.0.0 --board eve --suite_name paygen_au_canary --pool bvt --file_bugs True --priority Build --timeout_mins 180 --retry True --suite_min_duts 2 -m 125178255
00:18:50: INFO: Refreshing due to a 401 (attempt 1/2)
00:18:50: INFO: Refreshing access_token
Autotest instance: cautotest
06-25-2017 [23:33:54] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=125178255
The suite job has another 2:29:43.520500 till timeout.
The suite job has another 1:59:37.113590 till timeout.
06-26-2017 [00:50:13] Suite job is finished.
06-26-2017 [00:50:13] Start collecting test results and dump them to json.
Suite job                                        [ PASSED ]
autoupdate_EndToEndTest.paygen_au_canary_delta   [ PASSED ]
autoupdate_EndToEndTest.paygen_au_canary_full    [ PASSED ]
autoupdate_EndToEndTest.paygen_au_canary_delta   [ PASSED ]
autoupdate_EndToEndTest.paygen_au_canary_full    [ PASSED ]
Suite timings:
Downloads started at 2017-06-25 23:33:50
Payload downloads ended at 2017-06-25 23:33:52
Suite started at 2017-06-25 23:34:16
Artifact downloads ended (at latest) at 2017-06-25 23:34:39
Testing started at 2017-06-25 23:53:24
Testing ended at 2017-06-26 00:44:34
Links to test logs:
Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/125178255-chromeos-test/
autoupdate_EndToEndTest.paygen_au_canary_delta http://cautotest/tko/retrieve_logs.cgi?job=/results/125178278-chromeos-test/
autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125178279-chromeos-test/
autoupdate_EndToEndTest.paygen_au_canary_delta http://cautotest/tko/retrieve_logs.cgi?job=/results/125178280-chromeos-test/
autoupdate_EndToEndTest.paygen_au_canary_full http://cautotest/tko/retrieve_logs.cgi?job=/results/125178281-chromeos-test/

Comment 19 by sjg@google.com, Jun 26 2017

What was the fix for this bug, please?
Status: Assigned (was: Fixed)
re-cap of last several eve builds:

passed: 671, 672, 674
failed: 673

No diffs in crosland: https://crosland.corp.google.com/log/9684.0.0..9685.0.0 (build 670 -> 671)

Also mentioned by others that similar failures on cave,chell,caroline

https://uberchromegw.corp.google.com/i/chromeos/builders/cave-release/builds/1251
https://uberchromegw.corp.google.com/i/chromeos/builders/chell-release/builds/1218
https://uberchromegw.corp.google.com/i/chromeos/builders/caroline-release/builds/791

re-opening since it appears we're still chasing this.

Comment 21 by sjg@google.com, Jun 26 2017

Counter-example  crbug.com/736847  - successful eve-release build. So the eve failure could have been something else.

But many other boards still fail (they are marked as 'running' but really have timed out, I think).

Comment 22 by sjg@google.com, Jun 26 2017

Blockedon: 736807

Comment 23 by sjg@google.com, Jun 27 2017

Blockedon: 736991

Comment 24 by sjg@google.com, Jun 27 2017

Blockedon: 736534

Comment 25 by sjg@google.com, Jun 27 2017

Status: Fixed (was: Assigned)
673 fails in PaygenTestCanary
674 passes
676 fails in HWtest

Since we had a passing build and the two problems are different, I believe this bug is fixed.

link:
https://uberchromegw.corp.google.com/i/chromeos/builders/eve-release/builds/673

Status: Verified (was: Fixed)
looks like fixed. please reopen if still fails. Thanks!

Sign in to add a comment