Mac FYI Experimental Retina Release (NVIDIA) on chromium.gpu.fyi infra failure. |
|||||||
Issue descriptionThe failure started at build #797 with tests failing to run with "not enough capacity" errors. Starting with build #802, it looks like some shards are getting allocated but it still eventually fails with the same error and the run is not getting marked clearly as an infra failure. Here is an example of a failing swarming task: https://chromium-swarm.appspot.com/task?id=3e41bf0198940310&refresh=10&show_raw=1.
,
Jun 25 2018
Build #818 above has also failed, for the same shard #19. This is consistent across all failures so far.
,
Jun 26 2018
Hmm. Build 818 failed: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Retina%20Release%20%28NVIDIA%29/818 but build 819 passed with no significant (I assume) code changes: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Retina%20Release%20%28NVIDIA%29/819 The max pending time per shard is suspiciously close to 3 hours in both cases. Here's build 818's: Max pending time: 2:52:56.604581 (shard #18) and build 819's: Max pending time: 2:59:17.252003 (shard #19) Nodir, M-A, John, does the "execution_timeout_secs" timeout in src/infra/config/global/cr-buildbucket.cfg cover the maximum pending time as well as the potential per-shard execution time? Could it be the case that we've been adjusting the wrong timeout all along?
,
Jun 26 2018
no. the timeout you're hitting is here: https://codesearch.chromium.org/chromium/src/testing/buildbot/chromium.gpu.fyi.json?rcl=2dcc61258b623969b900fd38bca45c61d709a782&l=6949
,
Jun 26 2018
as for the max pending time; we may not be correctly reporting it in cases where a shard expired. note that shard 19 on build 818 had a pending time just over 3hr: https://chromium-swarm.appspot.com/task?id=3e50831274358810
,
Jun 26 2018
#4: aha, thanks. Fix incoming.
,
Jun 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a8c3e9c80c478d32a43f0bbe4423ffd43359152f commit a8c3e9c80c478d32a43f0bbe4423ffd43359152f Author: Kenneth Russell <kbr@chromium.org> Date: Tue Jun 26 03:42:26 2018 Increase expiration time on Mac FYI Exp Release (NVIDIA). webgl2_conformance_tests' shards are taking at least 3 hours to schedule, so increase the timeout to 6 hours. Bug: 856268 Tbr: jbudorick@chromium.org Change-Id: Idf610388955f4b6cb73b81f5075a810c13c2fda0 Reviewed-on: https://chromium-review.googlesource.com/1114356 Reviewed-by: Kenneth Russell <kbr@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> Cr-Commit-Position: refs/heads/master@{#570322} [modify] https://crrev.com/a8c3e9c80c478d32a43f0bbe4423ffd43359152f/testing/buildbot/chromium.gpu.fyi.json [modify] https://crrev.com/a8c3e9c80c478d32a43f0bbe4423ffd43359152f/testing/buildbot/waterfalls.pyl
,
Jun 26 2018
I think this will be reliably fixed by the above change. Please reopen if not.
,
Jun 28 2018
Failures have started again on this bot since the last 3 builds. Here is an example: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20FYI%20Experimental%20Retina%20Release%20%28NVIDIA%29/833 All tests fail with the following error: Missing results from the following shard(s): 0 This can happen in following cases: * Test failed to start (missing *.dll/*.so dependency for example) * Test crashed or hung * Task expired because there are not enough bots available and are all used * Swarming service experienced problems Please examine logs to figure out what happened.
,
Jun 28 2018
Looks like the machine got upgrade to 10.13.5 & the test configs need to be retargeted.
,
Jun 28 2018
Sorry about breaking these bots while upgrading to 10.13.5 but that was fixed in Issue 857527 by ynovikov.
,
Jun 28 2018
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by kbr@chromium.org
, Jun 25 2018Components: -Infra Internals>GPU>Testing Infra>Client>Chrome
Labels: GPU-NVidia OS-Mac
Owner: khushals...@chromium.org
Status: Assigned (was: Available)