linux_layout_tests_layout_ng time out on webkit_layout_tests |
||||
Issue descriptionAll tries since #446 fail saying: shard #0 isolated out shard #0 timed out, took too much time to complete shard #1 isolated out shard #1 timed out, took too much time to complete shard #2 isolated out shard #2 timed out, took too much time to complete shard #3 isolated out shard #3 timed out, took too much time to complete shard #4 isolated out shard #4 timed out, took too much time to complete shard #5 isolated out shard #5 timed out, took too much time to complete Example: https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_layout_tests_layout_ng/446 The same slave can run linux_layout_tests_slimming_paint_v2 successfully, so there should be something special to linux_layout_tests_layout_ng but I can't figure this out. Tried to revert recent change that may affect all tests, but it didn't change. https://codereview.chromium.org/2943933002 qyearsley@, could you advice what we can investigate?
,
Jun 19 2017
The problem appears to be that the layout tests take longer than an hour to run. The two solutions to the problem are; * Increase the number of shards. * Increase the timeout given to swarming.
,
Jun 19 2017
Ah, that's very helpful -- do you have any thoughts about which way is preferable? (Both seem OK to me...) Both of these parameters would be controlled in src/testing/buildbot/chromium.fyi.json, right?
,
Jun 19 2017
I'll do both and see what happens. https://codereview.chromium.org/2951633002
,
Jun 19 2017
As one of the LayoutNG developers, I would prefer more shards so that this is faster :)
,
Jun 19 2017
But did you say that each shard takes more than an hour? That's surprising, because the entire test run did not use to take 5 hours...
,
Jun 19 2017
It looks like the timeout was actually set to 15 minutes.
,
Jun 20 2017
Thank you for the CL, it looks like timeout is gone, but it still fails. https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_layout_tests_layout_ng/484 I can't read why it is failing, does 1 unexpected timeout/crash cause the total failure in swarming, or is it failing for other reasons? I can't find merged full_results.json, so I'll run local tests to get one and update expectations to see if it makes the bot happy.
,
Jun 20 2017
So, it seems that the actual error is the following;
--------------------------
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-nested-list-pretty-diff.html from ['/tmp/tmpNeJpQi/9/layout-test-results/fast/lists/ol-reversed-nested-list-pretty-diff.html']
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-actual.txt from ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-actual.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-actual.txt']
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-diff.txt from ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-diff.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt']
Traceback (most recent call last):
File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/merge-layout-test-results", line 209, in <module>
main(sys.argv[1:])
File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/merge-layout-test-results", line 191, in main
merger.merge(args.output_directory, args.input_directories)
File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/merge_results.py", line 498, in merge
merge_func(out_path, to_merge)
File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/merge_results.py", line 291, in __call__
to_merge)
webkitpy.layout_tests.merge_results.MergeFailure: Failure merging /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-diff.txt: File contents don't match:
/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt
Trying to merge ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-diff.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt'].
WARNING:root:merge_cmd had non-zero return code: 1
step returned non-zero exit code: 2
--------------------------
Basically fast/lists/ol-reversed-simple-diff.html ran on both shard 12 and shard 8 which broke the results. This means that this test is somehow listed to run twice.
,
Jun 20 2017
Thank you tansell@ for the analysis. "fast/lists/ol-reversed-simple" looks very interesting, there are html and xhtml versions. fast/lists/ol-reversed-simple-expected.txt fast/lists/ol-reversed-simple.html fast/lists/ol-reversed-simple.xhtml and that enable-blink-features=LayoutNG lists both: crbug.com/591099 fast/lists/ol-reversed-simple.html [ Crash Failure ] crbug.com/591099 fast/lists/ol-reversed-simple.xhtml [ Crash Failure ] Both files are there since 2012, so probably having both expectations confuses swarming, because test result file name conflicts?
,
Jun 20 2017
The way the layout tests runner works, I think it is currently randomly overwriting one of the outputs. This is going to make the test super flaky/hard to debug! In many cases the merge script doesn't let this behaviour happen. I think we want to rename these tests to something like; fast/lists/ol-reversed-simple-html.html fast/lists/ol-reversed-simple-xhtml.xhtml Then they will end up with two separate output files.
,
Jun 20 2017
The layout_ng bots are baaack! Thank you Quinten and Tim!!
,
Jun 21 2017
jeffcarp / qyearsley - Do you have a bug for the layout tests allowing two test files with the same name? I know Jeff was running into this problem with the WPT import process.
,
Jun 21 2017
BTW Something else weird is going on here. Each shard only took 10 minutes to run, yet the webkit_layout_tests step took ( 32 mins 22 secs to complete. None of the shards were delayed in the pending state either.
,
Jun 21 2017
I've created to bugs to get the logging I need to see where the time is going in this step; * https://bugs.chromium.org/p/chromium/issues/detail?id=735297 - Turn on timestamp printing for the log output of the merge-layout-test-results script * https://bugs.chromium.org/p/chromium/issues/detail?id=735300 - Turn on timestamp printing for the log output of the swarming collection and merging script
,
Jun 22 2017
So with 15 shards, the longest running shard took ~12 minutes. I'll set the timeout to 30 minutes.
,
Jun 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/270e8881098cc84fbacc014e4bc0387796beae47 commit 270e8881098cc84fbacc014e4bc0387796beae47 Author: Tim 'mithro' Ansell <tansell@chromium.org> Date: Fri Jun 23 15:21:10 2017 swarming: Adding time to log messages in collect_task.py This allows you to see how long collect verse merging is taking. It is needed to figure out what is going on with https://crbug.com/734467 BUG= 524758 , 735300 , 734467 R=qyearsley@chromium.org,jeffcarp@chromium.org,mcgreevy@chromium.org,dpranke@chromium.org,jbudorick@chromium.org Change-Id: Ib8a440e65ea14e5eb79ff05f212846e6211642f8 Reviewed-on: https://chromium-review.googlesource.com/544752 Reviewed-by: John Budorick <jbudorick@chromium.org> Commit-Queue: Tim 'mithro' Ansell <tansell@chromium.org> [modify] https://crrev.com/270e8881098cc84fbacc014e4bc0387796beae47/scripts/slave/recipe_modules/swarming/resources/collect_task.py
,
Jul 26 2017
Update: Tim increased the timeouts and shard count in https://codereview.chromium.org/2951633002, and now we're not seeing timeouts. I will now adjust the timeouts back down again (from 100 hours) to something like 15 minutes, since the shards tend to typically take less than 10 minutes.
,
Jul 26 2017
This was fixed by Tim by increasing the shard count and timeout; I've got a follow-up CL to decrease the timeout again: https://chromium-review.googlesource.com/c/586921/
,
Jul 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/df9f0bb2057a939400aa752e95f54849b31f260b commit df9f0bb2057a939400aa752e95f54849b31f260b Author: Quinten Yearsley <qyearsley@google.com> Date: Fri Jul 28 00:25:20 2017 Lower the hard timeout of the layout test flag try bots This is a follow-up to https://codereview.chromium.org/2951633002, which raised the timeouts for https://crbug.com/734467 . Bug: 734467 Change-Id: Ibdc85d2c6e660c152aa19770dded5a55ef481a03 Reviewed-on: https://chromium-review.googlesource.com/586921 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Tim 'mithro' Ansell <tansell@chromium.org> Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> Cr-Commit-Position: refs/heads/master@{#490157} [modify] https://crrev.com/df9f0bb2057a939400aa752e95f54849b31f260b/testing/buildbot/chromium.fyi.json |
||||
►
Sign in to add a comment |
||||
Comment 1 by qyears...@chromium.org
, Jun 19 2017Status: Assigned (was: Untriaged)