Issue metadata
Sign in to add a comment
|
mac_chromium_rel_ng long pending times |
||||||||||||||||||||||||
Issue descriptionNot sure what's going on, but there are big pending times
,
Nov 26
Looked at a random log, saw this at the bottom. Looks suspicious:
Total duration: 36000.9s
Results from some shards are missing: 9
WARNING:root:collect_cmd had non-zero return code: 1
WARNING:root:Expected output.json file missing: set(['/b/s/w/ir/tmp/t/tmpdWo5xz/1/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/5/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/3/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/2/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/9/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/7/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/6/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/11/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/8/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/10/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/4/output.json'])
Found: []
Expected: ['/b/s/w/ir/tmp/t/tmpdWo5xz/1/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/10/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/11/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/2/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/3/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/4/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/5/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/6/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/7/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/8/output.json', '/b/s/w/ir/tmp/t/tmpdWo5xz/9/output.json']
WARNING:root:No shard json files found in task_output_dir: '/b/s/w/ir/tmp/t/tmpdWo5xz'
Found ['/b/s/w/ir/tmp/t/tmpdWo5xz/1', '/b/s/w/ir/tmp/t/tmpdWo5xz/10', '/b/s/w/ir/tmp/t/tmpdWo5xz/11', '/b/s/w/ir/tmp/t/tmpdWo5xz/2', '/b/s/w/ir/tmp/t/tmpdWo5xz/3', '/b/s/w/ir/tmp/t/tmpdWo5xz/4', '/b/s/w/ir/tmp/t/tmpdWo5xz/5', '/b/s/w/ir/tmp/t/tmpdWo5xz/6', '/b/s/w/ir/tmp/t/tmpdWo5xz/7', '/b/s/w/ir/tmp/t/tmpdWo5xz/8', '/b/s/w/ir/tmp/t/tmpdWo5xz/9', '/b/s/w/ir/tmp/t/tmpdWo5xz/summary.json']
Running ['/b/s/w/ir/cache/vpython/5b0713/bin/python', '/b/s/w/ir/cache/builder/src/third_party/blink/tools/merge_web_test_results.py', '--build-properties', '{"attempt_start_ts": 1543254523000000, "blamelist": ["xidachen@chromium.org"], "bot_id": "vm49-m1", "buildbucket": {"build": {"bucket": "luci.chromium.try", "created_by": "user:5071639625-1lppvbtck1morgivc6sq4dul7klu27sd@developer.gserviceaccount.com", "created_ts": 1543254539868255, "id": "8928774043562122880", "project": "chromium", "tags": ["builder:mac_chromium_rel_ng", "buildset:patch/gerrit/chromium-review.googlesource.com/1350331/1", "cq_experimental:false", "user_agent:cq"]}, "hostname": "cr-buildbucket.appspot.com"}, "buildername": "mac_chromium_rel_ng", "buildnumber": 193057, "category": "cq", "got_angle_revision": "15992bef28d84b59c1a815483519347896f185c8", "got_buildtools_revision": "04161ec8d7c781e4498c699254c69ba0dd959fde", "got_dawn_revision": "63997221d7d880d8d1783abe326b90cd95cd92d2", "got_nacl_revision": "f701a90597fc85979319447c0cd44c3b52201c78", "got_revision": "4671e20b1c076d3674a6d8bdc3a510b1c30578ba", "got_revision_cp": "refs/heads/master@{#610876}", "got_swarming_client_revision": "b6e9e23e4e79249bd4f95735205ffb7c3f9f0912", "got_v8_revision": "89124cf99ef9a852bdf0681599cfcc29193a4c79", "got_v8_revision_cp": "refs/heads/7.2.470@{#1}", "got_webrtc_revision": "f1c194decd51a63ba923349da96fcd9cb6dae35a", "got_webrtc_revision_cp": "refs/heads/master@{#25777}", "mastername": "tryserver.chromium.mac", "patch_gerrit_url": "https://chromium-review.googlesource.com", "patch_issue": 1350331, "patch_project": "chromium/src", "patch_ref": "refs/changes/31/1350331/1", "patch_repository_url": "https://chromium.googlesource.com/chromium/src", "patch_set": 1, "patch_storage": "gerrit", "path_config": "generic", "reason": "CQ", "recipe": "chromium_trybot", "repository": "https://chromium.googlesource.com/chromium/src", "revision": "HEAD"}', '--summary-json', '/b/s/w/ir/tmp/t/tmpdWo5xz/summary.json', '--task-output-dir', '/b/s/w/ir/tmp/t/tmpdWo5xz', u'--verbose', '-o', '/b/s/w/ir/tmp/t/tmpo7bHYu.json'] in None (env: None)
2018-11-26 11:51:09,006 - blinkpy.common.system.log_utils: [DEBUG] Debug logging enabled.
2018-11-26 11:51:09,006 - root: [INFO] Running with isolated arguments
Traceback (most recent call last):
File "/b/s/w/ir/cache/builder/src/third_party/blink/tools/merge_web_test_results.py", line 12, in <module>
main(sys.argv[1:])
File "/b/s/w/ir/cache/builder/src/third_party/blink/tools/blinkpy/web_tests/merge_results.py", line 775, in main
assert args.positional
AssertionError
Command ['/b/s/w/ir/cache/vpython/5b0713/bin/python', '/b/s/w/ir/cache/builder/src/third_party/blink/tools/merge_web_test_results.py', '--build-properties', '{"attempt_start_ts": 1543254523000000, "blamelist": ["xidachen@chromium.org"], "bot_id": "vm49-m1", "buildbucket": {"build": {"bucket": "luci.chromium.try", "created_by": "user:5071639625-1lppvbtck1morgivc6sq4dul7klu27sd@developer.gserviceaccount.com", "created_ts": 1543254539868255, "id": "8928774043562122880", "project": "chromium", "tags": ["builder:mac_chromium_rel_ng", "buildset:patch/gerrit/chromium-review.googlesource.com/1350331/1", "cq_experimental:false", "user_agent:cq"]}, "hostname": "cr-buildbucket.appspot.com"}, "buildername": "mac_chromium_rel_ng", "buildnumber": 193057, "category": "cq", "got_angle_revision": "15992bef28d84b59c1a815483519347896f185c8", "got_buildtools_revision": "04161ec8d7c781e4498c699254c69ba0dd959fde", "got_dawn_revision": "63997221d7d880d8d1783abe326b90cd95cd92d2", "got_nacl_revision": "f701a90597fc85979319447c0cd44c3b52201c78", "got_revision": "4671e20b1c076d3674a6d8bdc3a510b1c30578ba", "got_revision_cp": "refs/heads/master@{#610876}", "got_swarming_client_revision": "b6e9e23e4e79249bd4f95735205ffb7c3f9f0912", "got_v8_revision": "89124cf99ef9a852bdf0681599cfcc29193a4c79", "got_v8_revision_cp": "refs/heads/7.2.470@{#1}", "got_webrtc_revision": "f1c194decd51a63ba923349da96fcd9cb6dae35a", "got_webrtc_revision_cp": "refs/heads/master@{#25777}", "mastername": "tryserver.chromium.mac", "patch_gerrit_url": "https://chromium-review.googlesource.com", "patch_issue": 1350331, "patch_project": "chromium/src", "patch_ref": "refs/changes/31/1350331/1", "patch_repository_url": "https://chromium.googlesource.com/chromium/src", "patch_set": 1, "patch_storage": "gerrit", "path_config": "generic", "reason": "CQ", "recipe": "chromium_trybot", "repository": "https://chromium.googlesource.com/chromium/src", "revision": "HEAD"}', '--summary-json', '/b/s/w/ir/tmp/t/tmpdWo5xz/summary.json', '--task-output-dir', '/b/s/w/ir/tmp/t/tmpdWo5xz', u'--verbose', '-o', '/b/s/w/ir/tmp/t/tmpo7bHYu.json'] returned exit code 1
WARNING:root:merge_cmd had non-zero return code: 1
step returned non-zero exit code: 1
,
Nov 26
Ok, that seems wrong but like a false failure. Looks like layout test tasks are taking forever. I think it's because some shards are trying to run every test. For example look at build https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/193077. In that build, a few shards time out, and some are successful. The successful ones (random example is https://chromium-swarm.appspot.com/user/task/416a1117f7eba410) run about 6,000 tests (sample runs 6552). Another task in that build, https://chromium-swarm.appspot.com/task?id=416a113187be3310&refresh=10&show_raw=1, tries to run basically every test. This is in the log: 10:53:28.671 27362 Found 89747 tests; running 78981, skipping 10766. Not sure why, but that seems very likely to cause these issues.
,
Nov 26
cc-ing some blink people. I glanced through the git log of https://cs.chromium.org/chromium/src/third_party/blink/tools/blinkpy/?q=run_webkit_tests.py&sq=package:chromium&dr, but didn't see anything suspicious.
,
Nov 26
Found the bug. https://cs.chromium.org/chromium/infra/luci/client/swarming.py?type=cs&q=GTEST_TOTAL_SHARDS+file:%5Einfra/luci/+package:%5Echromium$&g=0&l=245 Looks like gtest sharding environment variables aren't being set for all task slices.
,
Nov 26
https://crrev.com/c/1351503 is a revert which should have fixed this. I'll monitor the bot to make sure future test runs are good.
,
Nov 26
Why would that CL affect the env vars?
,
Nov 26
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/d07efe857282aaa2ccd332eb974308be66c8393a commit d07efe857282aaa2ccd332eb974308be66c8393a Author: Brad Hall <bradhall@google.com> Date: Mon Nov 26 22:15:38 2018 Make sure to setup_googletest on all slices If we don't do this then GTEST_SHARD_* env variables won't be set Bug: 871453, 908551 Change-Id: Id11b140294906cadd2c9ca7c39e0aab2fda0c0af Reviewed-on: https://chromium-review.googlesource.com/c/1351358 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Brad Hall <bradhall@google.com> [modify] https://crrev.com/d07efe857282aaa2ccd332eb974308be66c8393a/client/swarming.py
,
Nov 26
> Why would that CL affect the env vars? That CL enables the task slice code in swarming.py which mistakenly only sets the env vars for task slice 0.
,
Nov 26
Ah, got it. Thanks!
,
Nov 27
Just in case it's helpful, seeing the same issue here: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/193583
,
Nov 27
And now the tests are somewhat running, but the bot still looks like it's extra sad: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/193583
,
Nov 27
Hmm, not sure why, but webkit_layout_test become sometimes very slow (took more than 50 mins) even after the revert for shard config fix. https://chromium-swarm.appspot.com/tasklist?c=name&c=state&c=created_ts&c=duration&c=pending_time&c=pool&c=bot&et=1543302240000&f=name-tag%3Awebkit_layout_tests&f=pool%3AChrome&f=master-tag%3Atryserver.chromium.mac&f=buildername-tag%3Amac_chromium_rel_ng&l=1000&n=true&s=created_ts%3Adesc&st=1543215840000
,
Nov 27
,
Nov 27
Ah, if some test of webkit_layout_tests step fails, failure tests are run again in without patch step. But in without patch step, failure tests runs many times to detect tests' flakiness by --gtest_repeat=10. That cause long webkit_layout_test running time in without patch step. If swarming capacity is not sufficient, timeout happens easily and all of tests are run in without patch step. And that consumes capacity resource. erikchen@, can we stop to run test multiple times when failure is apparently come from infra failure?
,
Nov 27
Issue 908729 has been merged into this issue.
,
Nov 27
> erikchen@, can we stop to run test multiple times when failure is apparently come from infra failure? Agreed. I thought we already did that. https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/steps.py?type=cs&q=_test_options_for_running&sq=package:chromium&g=0&l=121 Can you link to a build where you're seeing this behavior? When I look at long-running builds: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/194045 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/194080 https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/194083 It looks like the initial set of webkit layout tests are timing out due to insufficient capacity. The shards that do run are completing in ~15 minutes. This looks like an insufficient capacity issue due to the backlog from the problems earlier, but maybe I'm missing something?
,
Nov 27
agree w/ #17, this still appears to be digging out of the backlog.
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5831aa2859f46c29109f6ec3591270e25c6d6527 commit 5831aa2859f46c29109f6ec3591270e25c6d6527 Author: John Budorick <jbudorick@chromium.org> Date: Tue Nov 27 14:53:41 2018 Temporarily remove webkit_layout_tests from mac_chromium_rel_ng. Tbr: sergeyberezin@chromium.org,bradhall@chromium.org No-Try: true Bug: 908551 Change-Id: I5f003793bc94930fa685fff74ae1217e208b0dc6 Reviewed-on: https://chromium-review.googlesource.com/c/1351936 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: John Budorick <jbudorick@chromium.org> Cr-Commit-Position: refs/heads/master@{#611103} [modify] https://crrev.com/5831aa2859f46c29109f6ec3591270e25c6d6527/testing/buildbot/chromium.mac.json [modify] https://crrev.com/5831aa2859f46c29109f6ec3591270e25c6d6527/testing/buildbot/test_suite_exceptions.pyl
,
Nov 27
Going to help it dig out of the backlog a bit: - removing layout tests temporarily - cancelling pending layout test tasks from current jobs
,
Nov 27
,
Nov 27
#17, I think we don't want to repeat 10 times in without patch step when there are some shards that insufficient capacity happens and no shard has test failure. e.g. https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/194018
,
Nov 27
#22: Can you clarify why you think that we're trying to repeat 10 times in the link you sent? I don't see any indication that we're trying to do so. As linked in c#17, we shouldn't be trying to do the 10X repeat. Although, if we are timing out due to insufficient capacity, there's really no point in running 'without patch' steps altogether. This seems like a small optimization that will make us fail more gracefully when there's insufficient capacity. jbudorick, wdyt?
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/503000c4b8e0745a6c4b2ce19857e125fab77efa commit 503000c4b8e0745a6c4b2ce19857e125fab77efa Author: John Budorick <jbudorick@chromium.org> Date: Tue Nov 27 16:51:02 2018 Temporarily remove webkit_layout_tests from mac_chromium_rel_ng, part 2. crrev.com/c/1351936 removed the suite from Mac10.12 Tests. mac_chromium_rel_ng mirrors Mac10.13 Tests despite running the layout tests on 10.12.6. X( Tbr: sergeyberezin@chromium.org,bradhall@chromium.org No-Try: true Bug: 908551 Change-Id: Iad0c9bea9be0302d38cdf642951a6b9bcb731469 Reviewed-on: https://chromium-review.googlesource.com/c/1351745 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: John Budorick <jbudorick@chromium.org> Cr-Commit-Position: refs/heads/master@{#611148} [modify] https://crrev.com/503000c4b8e0745a6c4b2ce19857e125fab77efa/testing/buildbot/chromium.mac.json [modify] https://crrev.com/503000c4b8e0745a6c4b2ce19857e125fab77efa/testing/buildbot/test_suite_exceptions.pyl
,
Nov 27
Issue 908847 has been merged into this issue.
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/af56597b2e83e94d3f1cfcc34f8a4b8d5f648e8a commit af56597b2e83e94d3f1cfcc34f8a4b8d5f648e8a Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Nov 27 18:13:42 2018 Roll src/tools/swarming_client/ b6e9e23e4..157bec8a2 (4 commits) https://chromium.googlesource.com/infra/luci/client-py.git/+log/b6e9e23e4e79..157bec8a25cc $ git log b6e9e23e4..157bec8a2 --date=short --no-merges --format='%ad %ae %s' 2018-11-26 bradhall Make sure to setup_googletest on all slices 2018-11-26 maruel [client] Add warning about variable flags 2018-11-21 maruel [client] Stop leaking dir 'cache' when running isolateserver_test.py 2018-11-20 maruel [client] internal refactoring adding ServerRef Created with: roll-dep src/tools/swarming_client R=bradhall@google.com Bug: 871453, 908551 Change-Id: Ia0d51cc0584d6df90b71722fbbc9a17bbbceb563 Reviewed-on: https://chromium-review.googlesource.com/c/1351743 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Cr-Commit-Position: refs/heads/master@{#611194} [modify] https://crrev.com/af56597b2e83e94d3f1cfcc34f8a4b8d5f648e8a/DEPS
,
Nov 27
jbudorick@ is trooper today.
,
Nov 27
nope, sergeyberezin is primary and bradhall is secondary.
,
Nov 28
Seems the issue resolved. But we want to back webkit_layout_tests again.
,
Nov 28
#29: yes, we do. We were discussing keeping it off of mac_chromium_rel_ng until after tomorrow's branch, though.
,
Nov 28
I missed this bug since it wasn't in the trooper queue... Merging to another public bug I filed for this outage - will track the remaining progress there. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by martiniss@chromium.org
, Nov 26