New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 762400 link

Starred by 2 users

Issue metadata

Status: Duplicate
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: ----

Blocking:
issue 762641
issue 763981



Sign in to add a comment

autoupdate_EndToEndTest.paygen_au_dev_full flaky on multiple canary builds

Project Member Reported by yamaguchi@chromium.org, Sep 6 2017

Issue description

Builders failed on: 
  https://luci-milo.appspot.com/buildbot/chromeos/veyron_mickey-release/1466
autoupdate_EndToEndTest.paygen_au_dev_full      FAIL: The update appears to have completed successfully but we found a problem while verifying the hostlog of events returned from the update. Some attributes reported for the initial update check event are not what we expected: ['version']. The expected version is (9887.0.0) but reported version was (9914.0.0). The source payload we installed was probably incorrect or corrupt. Check the full hostlog for this update in the devserver_hostlog_rootfs file in the autoupdate_logs directory.

  https://luci-milo.appspot.com/buildbot/chromeos/monroe-release/2086
autoupdate_EndToEndTest.paygen_au_dev_delta     FAIL: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row13-rack10-host4: RootfsUpdateError: Failed to perform rootfs update: RootfsUpdateError('Update failed with unexpected update status: UPDATE_STATUS_IDLE',)

  https://uberchromegw.corp.google.com/i/chromeos/builders/celes-release/builds/1465/steps/PaygenTestCanary/logs/stdio
15:28:46: ERROR: pre-kill notification (SIGXCPU); traceback:
15:29:17: INFO: Translating result (15, 'Received signal 15; shutting down') to fail.


This might be different from  Issue 730141  as the devices are not in the list on the description of  Issue 730141 .
 
Cc: yueherngl@chromium.org shuqianz@chromium.org josephsih@chromium.org
Labels: Pri-1
This is ongoing, in fact eve-release hasn't had a successful build in a week.
Cc: seobrien@chromium.org

Comment 3 by osh...@chromium.org, Sep 12 2017

Blocking: 762641
Owner: xixuan@chromium.org
xixuan@, can you take a look? It seems the provision failed to deploy a correct version?

Comment 5 by xixuan@chromium.org, Sep 12 2017

Cc: xixuan@chromium.org
Owner: dhadd...@chromium.org
@david, could you check this paygen_au_dev_full failure? seems sth wrong with verifying the hostlog.
Blocking: 763981
Labels: -Pri-1 Pri-0
This is causing Simple Chrome compile failures since there is not a recent LATEST file for anything > 9914.0.0 and there was a breaking GN configuration change post 9914.0.0.

Given the importance placed on eve, this is blocking a lot of devs.

Escalating priority on this.

There are a few different issues in this bug. Regarding this failure:

FAIL: The update appears to have completed successfully but we found a problem while verifying the hostlog of events returned from the update. Some attributes reported for the initial update check event are not what we expected: ['version']. The expected version is (9887.0.0) but reported version was (9914.0.0). The source payload we installed was probably incorrect or corrupt. Check the full hostlog for this update in the devserver_hostlog_rootfs file in the autoupdate_logs directory.

This can happen when we update rootfs from X -> Y. Then applying stateful fails. 
We retry to update to Y but since the rootfs was successfully updated we update from Y -> Y. We return the hostlog from this update which is not what the test expects. I have updated the error message for when this happens in the refactor CL:

https://chromium-review.googlesource.com/#/c/chromiumos/third_party/autotest/+/654064/
For eve though, the most common failure reason for paygen_au_canary and paygen_au_dev lately is this one:

Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row4-rack1-host10: RootfsUpdateError: After update and reboot, update-engine failed to call chromeos-setgoodkernel within 120 seconds

Which can happen if an update failed or it failed to apply the source image in the test. Checking the logs for eve...
Cc: dhadd...@chromium.org
Owner: ahass...@chromium.org
and this failure reason too:
Unhandled DevServerException: CrOS auto-update failed for host <DUT HOSTNAME>: SSHConnectionError: ssh: connect to host <DUT HOSTNAME> port 22: Connection timed out

Both of these failures are because we try to apply the source image in the autoupdate test and the device doesn't come back to life after we apply stateful. So it times out with ssh or applying chromeos-setgoodkernel, whatever comes first. 

So it seems like the test is behaving correctly and there is a problem with the eve images and/or stateful lately. 

Not sure who to assign this to. ahassani@ do you know if there was any changes related to eve here?

Cc: jkwang@chromium.org
Eve has been flaky since Aug 11. Looks like there was a change to the eve touch firmware in the first failed build: https://crosland.corp.google.com/log/9831.0.0..9832.0.0

Could this be a culprit, jkwang@?

Comment 11 by jkwang@google.com, Sep 13 2017

Could not think of any reason that the fw update will trigger this.
look at: crbug.com/761259
Mergedinto: 761259
Status: Duplicate (was: Available)
One failure is fixed by #7 another is fixed by #12 so closing this
Labels: FixedByAURewrite

Sign in to add a comment