AU tests timeout when they need to powerwash after updating backwards |
||||||||||||||||||||||
Issue descriptionhttps://wmatrix.googleplex.com/platform/paygen_au_canary?platforms=quawks This is only a problem going back to the stepping stone build.
,
Jul 21 2017
Yup thats what happened
,
Jul 21 2017
It took over 10minutes for it to complete this process. We only give the reboot 8 minutes to complete: https://cs.corp.google.com/chromeos_public/chromite/lib/auto_updater.py?q=auto_updater.py&sq=package:%5Echromeos_(internal%7Cpublic)$&l=1317 This only happens on quawks device. Does anybody know why this 8 minute number was chosen?
,
Jul 21 2017
+milestone owner
,
Jul 21 2017
Found one run on squawks too that failed for the same reason: https://wmatrix.googleplex.com/failures/paygen_au_stable?platforms=squawks&builds=R59-9460.74.0&releases=59
,
Jul 21 2017
,
Jul 21 2017
xixuan, Can you comment on #3?
,
Jul 24 2017
,
Jul 31 2017
,
Aug 2 2017
Ping xixuan :)
,
Aug 2 2017
Sorry forgot to reply this. The timeout comes from current reboot_timeout in autotest: https://cs.corp.google.com/chromeos_public/src/third_party/autotest/files/server/hosts/cros_host.py?q=cros_host.py&sq=package:%5Echromeos_(internal%7Cpublic)$&l=117 Based on the comments, 8 minutes seems long, and previous is 5 minutes... Now we want to set it as 10 minutes? Could we know why it's too long, since once it's extended, normal offline DUTs will take much more time to wait for timeout in provision and increase provision time cost.
,
Aug 3 2017
Yeh 8 minutes is too long for a reboot IMO. The problem is when the device downgrades from a recent build to the stepping stone build (M53). The device needs to powerwash and this takes a long time on certain boards. I can add an is_au_endtotest clause but wanted to check if there is a better way since cros flash users would be hit by this too and would fail
,
Aug 3 2017
Gwendal do we care that this takes so long?
,
Sep 19 2017
Ping :)
,
Nov 30 2017
This has happened again on Ninja: https://cros-goldeneye.corp.google.com/chromeos/console/viewRelease?releaseName=M62-STABLE-CHROMEOS-4-NinjaOnly&disco=AAAABhw4cC8&ts=5a1f3f90&usp=comment_email_document Going back to the old 53 build, we powerwash the device and it takes way too long to complete so the test fails.
,
Nov 30 2017
,
May 10 2018
,
Dec 11
Since the this is the reason for most of the AU failures on stable and causes the need for multiple reruns and morning planner confusion AND since the lab now uses quick-provision I think it is OK to increase the reboot timeout for the case where we update backwards I'll do it
,
Dec 12
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/d991b182666ed9a57bb42fa21b721847c1179daf commit d991b182666ed9a57bb42fa21b721847c1179daf Author: David Haddock <dhaddock@chromium.org> Date: Wed Dec 12 21:55:53 2018 auto_updater: Increase reboot timeout after rootfs update for AU tests. When we put old builds on devices as the source image in the AU tests it sometimes requires the device to powerwash. This powerwashing step is taking a long time on older devices and the test is timing out. This is the reason for 90% of AU test failures on stable channel. The timeout increase is only when we are doing an AU test and only for the post rootfs reboot. BUG= chromium:746997 TEST=autoupdate_EndToEndTest on ninja from R53 -> R73 Change-Id: Ie1f472b680b2fbdb1fa5baa640556866d4f07d9e Reviewed-on: https://chromium-review.googlesource.com/1372178 Commit-Ready: David Haddock <dhaddock@chromium.org> Tested-by: David Haddock <dhaddock@chromium.org> Reviewed-by: Amin Hassani <ahassani@chromium.org> [modify] https://crrev.com/d991b182666ed9a57bb42fa21b721847c1179daf/lib/auto_updater.py
,
Dec 13
Issue 915028 has been merged into this issue.
,
Dec 13
,
Dec 13
This bug requires manual review: M72 has already been promoted to the beta branch, so this requires manual review Please contact the milestone owner if you have questions. Owners: govind@(Android), kariahda@(iOS), djmm@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 14
Merge approved for Chrome OS M71 and M72.
,
Dec 14
Actually this may just require a devserver push to take effect. I'll ask infra-discuss
,
Dec 14
Ping since this is blocking boards from getting tested and included in the M71 stable; thanks!
,
Dec 15
+aviv to update dev server.
,
Dec 17
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 18
Raising to a P0 since this is blocking boards from being pushed to M71 Chrome OS stable. Please advise on status and next steps. Thanks.
,
Dec 18
I believe the devservers were updated by xixuan@. If so, this should be owned by somebody on AU. -> dhaddock to confirm fix
,
Dec 18
Actually, push is apparently blocked by our staging lab outage (due to ganeti .hot outage). If it's a severe emergency we could consider an untested chromite push, but that carries risk of breaking the lab.
,
Dec 21
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 21
I did a full push on Dec 19, the fix should already be pushed.
,
Jan 9
|
||||||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||||||
Comment 1 by dhadd...@chromium.org
, Jul 20 2017