Issue metadata
Sign in to add a comment
|
Mac webkit layout tests appeared stuck while running tests for a CL |
||||||||||||||||||||||||
Issue descriptionMy CL at https://crrev.com/c/1316830 was CQ+2 at 11:59a. The Mac Rel build started at 12:00p and is still running. (at 1:23p): https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_chromium_rel_ng/184483 shows webkit_layout_tests running, 25 mins elapsed stdout: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8929943545456525328/+/steps/webkit_layout_tests_on_Intel_GPU_on_Mac__with_patch__on_Mac-10.12.6/0/stdout ====== ... 13:00:13.979 59056 editing/undo/undo-smart-delete-word.html 13:00:13.979 59056 external/wpt/css/css-writing-modes/sizing-orthog-vlr-in-htb-020.xht 13:00:13.979 59056 external/wpt/css/css-writing-modes/sizing-orthog-vrl-in-htb-020.xht 13:00:13.979 59056 fast/events/frame-detached-in-mousedown.html 13:00:13.979 59056 fast/forms/select/menulist-appearance-rtl.html 13:00:13.979 59056 fast/text/drawBidiText.html 13:00:13.979 59056 virtual/layout_ng/fast/block/float/overhanging-tall-block.html 13:00:13.979 59056 virtual/layout_ng/fast/inline/inline-offsetLeft-continuation.html 13:00:13.979 59056 virtual/new-remote-playback-pipeline/media/controls/buttons-after-reset.html 13:00:13.979 59056 virtual/outofblink-cors-ns/http/tests/security/contentSecurityPolicy/object-src-does-not-affect-child.html 13:00:13.983 59056 13:00:13.983 59056 Testing completed. Exit status: 0 +------------------------------------------------------------------------+ | End of shard 8 | | Pending: 532.9s Duration: 896.3s Bot: build560-m4 Exit: 0 | +------------------------------------------------------------------------+ Waiting for results from the following shards: 2, 5, 7, 9 ====== From what I can see, 4 shards are stuck and the bot hasn't done anything for 23+ mins. What caused this bot to be stuck?
,
Nov 16
,
Nov 16
All of the long shards (9, 7, 5, 2) have ~30 minute overheads. This is a semi KI, tracked in bug 899991 . I'm trying out a fix which could help. I don't know how feasible it'd be to have it print out a periodic message every 5 minutes or so when collecting tasks. Swarming people would know more here.
,
Nov 16
+Jon as this is the kind of thing impacting the CI's runtime (thus reducing overall fleet throughput) that we'll have to monitor more closely and fix. Airborne today so can't take a look now.
,
Nov 19
Re. adjusting output so the job doesn't look stuck, it looks like tasks updates are supposed to be emitted every 15min. [1] Hypothesis: the first "Waiting for results" came in at 1:15. The second would have come in at 1:30, but was interrupted by Shard 9's finished output coming in at 1:28. I've put together a CL to increase the frequency and also output the update time, which should help us see better if/when this happens in the future. [2] [1] https://cs.chromium.org/chromium/infra/luci/client/swarming.py?l=654&rcl=473a850bc451ce86db312d3c209a30ccffee832b [2] https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/1340660
,
Nov 19
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/fa7445abcb3a53e2bddb60a2656582c5b34fcd8e commit fa7445abcb3a53e2bddb60a2656582c5b34fcd8e Author: Jao-ke Chin-Lee <jchinlee@chromium.org> Date: Mon Nov 19 17:04:41 2018 [client] Print time with tasks update message. Also increase frequency. BUG= 905012 Change-Id: Id8349367695baaf204ce7c2489be44f32dd8fffb Reviewed-on: https://chromium-review.googlesource.com/c/1340660 Commit-Queue: Jao-ke Chin-Lee <jchinlee@chromium.org> Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/fa7445abcb3a53e2bddb60a2656582c5b34fcd8e/client/swarming.py
,
Nov 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6472f2d72226f5f0776ad5301840b98012c5bb53 commit 6472f2d72226f5f0776ad5301840b98012c5bb53 Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Tue Nov 20 23:16:48 2018 Roll src/tools/swarming_client/ 7f463e66e..b6e9e23e4 (4 commits) https://chromium.googlesource.com/infra/luci/client-py.git/+log/7f463e66e1c4..b6e9e23e4e79 $ git log 7f463e66e..b6e9e23e4 --date=short --no-merges --format='%ad %ae %s' 2018-11-20 maruel [client]: fix undefined class reference 2018-11-19 vadimsh [proto] Fix google/rpc/*_pb2.py, it has wrong proto paths in it. 2018-11-19 maruel protobuf: upgrade to 3.6.1 from 3.5.1 2018-11-19 jchinlee [client] Print time with tasks update message. Also increase frequency. Created with: roll-dep src/tools/swarming_client R=jchinlee@chromium.org Bug: 905012 Change-Id: I5e397d787e27223d94dd85b35d3c2826dc93febe Reviewed-on: https://chromium-review.googlesource.com/c/1343666 Reviewed-by: Jao-ke Chin-Lee <jchinlee@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> Cr-Commit-Position: refs/heads/master@{#609845} [modify] https://crrev.com/6472f2d72226f5f0776ad5301840b98012c5bb53/DEPS
,
Nov 23
This is essentially the same issue as issue 899991 . This is about *download* overhead, not about upload. Download overhead is influenced by how smart the archiver is, which is issue 854610. That said, as I noted in issue 899991 , I suspect these VMs will gain a lot of performance by being redeployed. |
|||||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||||
Comment 1 by c...@chromium.org
, Nov 13Summary: Mac webkit layout tests appeared stuck while running tests for a CL (was: Mac webkit layout tests are stuck while CQ+2 on my CL)