New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 734467 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jul 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

linux_layout_tests_layout_ng time out on webkit_layout_tests

Project Member Reported by kojii@chromium.org, Jun 19 2017

Issue description

All tries since #446 fail saying:

shard #0 isolated out
shard #0 timed out, took too much time to complete
shard #1 isolated out
shard #1 timed out, took too much time to complete
shard #2 isolated out
shard #2 timed out, took too much time to complete
shard #3 isolated out
shard #3 timed out, took too much time to complete
shard #4 isolated out
shard #4 timed out, took too much time to complete
shard #5 isolated out
shard #5 timed out, took too much time to complete

Example:
https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_layout_tests_layout_ng/446

The same slave can run linux_layout_tests_slimming_paint_v2 successfully, so there should be something special to linux_layout_tests_layout_ng but I can't figure this out.

Tried to revert recent change that may affect all tests, but it didn't change.
https://codereview.chromium.org/2943933002

qyearsley@, could you advice what we can investigate?
 
Cc: tansell@chromium.org
Status: Assigned (was: Untriaged)
The main recent change the switch to swarming:

  https://codereview.chromium.org/2927703002 and
  https://chromium-review.googlesource.com/c/532499

The config that affects linux_layout_tests_slimming_paint_v2 was changed at the same time as the config that affects linux_layout_tests_layout_ng, so it seems like swarming *should* work here.

I think the next step is to try and look through the logs to see what's relevant.
The problem appears to be that the layout tests take longer than an hour to run.

The two solutions to the problem are;

 * Increase the number of shards.
 * Increase the timeout given to swarming.


Ah, that's very helpful -- do you have any thoughts about which way is preferable? (Both seem OK to me...)

Both of these parameters would be controlled in src/testing/buildbot/chromium.fyi.json, right?
I'll do both and see what happens.

https://codereview.chromium.org/2951633002
As one of the LayoutNG developers, I would prefer more shards so that this is faster :)
But did you say that each shard takes more than an hour? That's surprising, because the entire test run did not use to take 5 hours...
It looks like the timeout was actually set to 15 minutes.

Comment 8 by kojii@chromium.org, Jun 20 2017

Thank you for the CL, it looks like timeout is gone, but it still fails.
https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_layout_tests_layout_ng/484

I can't read why it is failing, does 1 unexpected timeout/crash cause the total failure in swarming, or is it failing for other reasons?

I can't find merged full_results.json, so I'll run local tests to get one and update expectations to see if it makes the bot happy.
So, it seems that the actual error is the following;
--------------------------
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-nested-list-pretty-diff.html from ['/tmp/tmpNeJpQi/9/layout-test-results/fast/lists/ol-reversed-nested-list-pretty-diff.html']
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-actual.txt from ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-actual.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-actual.txt']
webkitpy.layout_tests.merge_results: [DEBUG] Creating merged /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-diff.txt from ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-diff.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt']
Traceback (most recent call last):
  File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/merge-layout-test-results", line 209, in <module>
    main(sys.argv[1:])
  File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/merge-layout-test-results", line 191, in main
    merger.merge(args.output_directory, args.input_directories)
  File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/merge_results.py", line 498, in merge
    merge_func(out_path, to_merge)
  File "/b/c/b/linux_layout_tests_layout_ng/src/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/merge_results.py", line 291, in __call__
    to_merge)
webkitpy.layout_tests.merge_results.MergeFailure: Failure merging /b/rr/tmpI2hBbU/w/layout-test-results/fast/lists/ol-reversed-simple-diff.txt:  File contents don't match:
/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt
Trying to merge ['/tmp/tmpNeJpQi/12/layout-test-results/fast/lists/ol-reversed-simple-diff.txt', '/tmp/tmpNeJpQi/8/layout-test-results/fast/lists/ol-reversed-simple-diff.txt'].
WARNING:root:merge_cmd had non-zero return code: 1
step returned non-zero exit code: 2
--------------------------

Basically fast/lists/ol-reversed-simple-diff.html ran on both shard 12 and shard 8 which broke the results. This means that this test is somehow listed  to run twice.

Comment 10 by kojii@chromium.org, Jun 20 2017

Thank you tansell@ for the analysis.

"fast/lists/ol-reversed-simple" looks very interesting, there are html and xhtml versions.

fast/lists/ol-reversed-simple-expected.txt
fast/lists/ol-reversed-simple.html
fast/lists/ol-reversed-simple.xhtml

and that enable-blink-features=LayoutNG lists both:

crbug.com/591099 fast/lists/ol-reversed-simple.html [ Crash Failure ]
crbug.com/591099 fast/lists/ol-reversed-simple.xhtml [ Crash Failure ]

Both files are there since 2012, so probably having both expectations confuses swarming, because test result file name conflicts?
The way the layout tests runner works, I think it is currently randomly overwriting one of the outputs. This is going to make the test super flaky/hard to debug!

In many cases the merge script doesn't let this behaviour happen.

I think we want to rename these tests to something like;
 fast/lists/ol-reversed-simple-html.html
 fast/lists/ol-reversed-simple-xhtml.xhtml

Then they will end up with two separate output files. 

Comment 12 by kojii@chromium.org, Jun 20 2017

The layout_ng bots are baaack! Thank you Quinten and Tim!!
Cc: jeffcarp@chromium.org
jeffcarp / qyearsley - Do you have a bug for the layout tests allowing two test files with the same name? I know Jeff was running into this problem with the WPT import process.
Cc: dpranke@chromium.org
BTW Something else weird is going on here.

Each shard only took 10 minutes to run, yet the webkit_layout_tests step took ( 32 mins 22 secs to complete. None of the shards were delayed in the pending state either.
I've created to bugs to get the logging I need to see where the time is going in this step;

 * https://bugs.chromium.org/p/chromium/issues/detail?id=735297 - Turn on timestamp printing for the log output of the merge-layout-test-results script

 * https://bugs.chromium.org/p/chromium/issues/detail?id=735300 - Turn on timestamp printing for the log output of the swarming collection and merging script
So with 15 shards, the longest running shard took ~12 minutes.

I'll set the timeout to 30 minutes.
Project Member

Comment 17 by bugdroid1@chromium.org, Jun 23 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/270e8881098cc84fbacc014e4bc0387796beae47

commit 270e8881098cc84fbacc014e4bc0387796beae47
Author: Tim 'mithro' Ansell <tansell@chromium.org>
Date: Fri Jun 23 15:21:10 2017

swarming: Adding time to log messages in collect_task.py

This allows you to see how long collect verse merging is taking. It is
needed to figure out what is going on with  https://crbug.com/734467 

BUG= 524758 , 735300 , 734467 
R=qyearsley@chromium.org,jeffcarp@chromium.org,mcgreevy@chromium.org,dpranke@chromium.org,jbudorick@chromium.org

Change-Id: Ib8a440e65ea14e5eb79ff05f212846e6211642f8
Reviewed-on: https://chromium-review.googlesource.com/544752
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: Tim 'mithro' Ansell <tansell@chromium.org>

[modify] https://crrev.com/270e8881098cc84fbacc014e4bc0387796beae47/scripts/slave/recipe_modules/swarming/resources/collect_task.py

Update: Tim increased the timeouts and shard count in https://codereview.chromium.org/2951633002, and now we're not seeing timeouts.

I will now adjust the timeouts back down again (from 100 hours) to something like 15 minutes, since the shards tend to typically take less than 10 minutes.
Cc: -tansell@chromium.org qyears...@chromium.org
Owner: tansell@chromium.org
Status: Fixed (was: Assigned)
This was fixed by Tim by increasing the shard count and timeout; I've got a follow-up CL to decrease the timeout again: https://chromium-review.googlesource.com/c/586921/
Project Member

Comment 20 by bugdroid1@chromium.org, Jul 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/df9f0bb2057a939400aa752e95f54849b31f260b

commit df9f0bb2057a939400aa752e95f54849b31f260b
Author: Quinten Yearsley <qyearsley@google.com>
Date: Fri Jul 28 00:25:20 2017

Lower the hard timeout of the layout test flag try bots

This is a follow-up to https://codereview.chromium.org/2951633002,
which raised the timeouts for  https://crbug.com/734467 .

Bug:  734467 
Change-Id: Ibdc85d2c6e660c152aa19770dded5a55ef481a03
Reviewed-on: https://chromium-review.googlesource.com/586921
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Reviewed-by: Tim 'mithro' Ansell <tansell@chromium.org>
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Cr-Commit-Position: refs/heads/master@{#490157}
[modify] https://crrev.com/df9f0bb2057a939400aa752e95f54849b31f260b/testing/buildbot/chromium.fyi.json

Sign in to add a comment