New issue
Advanced search Search tips

Issue 777526 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Single-test timeouts in wpt requires whole file's expectation to be timeout.

Project Member Reported by foolip@chromium.org, Oct 23 2017

Issue description

https://chromium-review.googlesource.com/c/chromium/src/+/729149 is working around a problem with how testharness.js timeouts are handled in Chromium.

The problem is here:
https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/resources/testharnessreport.js?l=50

We could presumably arrange for the testharness.js timeout to always be half the timeout of the whole file, so that only harness errors or a test doing explicit_timeout could cause the whole file to timeout.
 

Comment 1 by foolip@chromium.org, Oct 23 2017

Cc: qyears...@chromium.org
robertma@, WDYT, does it seem doable to let test timeouts show up as individual test timeouts in -expected.txt? I think the main challenge is with the two kinds of timeouts racing and causing flakiness, but it might still be the right tradeoff.

Comment 2 by hbos@chromium.org, Oct 23 2017

Cc: foolip@chromium.org
 Issue 776040  has been merged into this issue.

Comment 3 by hbos@chromium.org, Oct 23 2017

The merged bug's description:

"I want to add WPT tests that make sure events fire or promises resolve, e.g.  https://crbug.com/774303 .

Tests with assert failure get -expected.txt with FAIL, but tests that timeout, such as if an event that should fire never fires, cause content_shell to timeout. It gets killed and no output is produced for any test in the same test file.

This is bad, if anyone adds tests that we don't pass that cause a timeout we lose test coverage for the entire file.

Should events and promises be wrapped in a t.step_timeout() or something similar or do we need to run asynchronous tests differently?"

Comment 4 by hbos@chromium.org, Oct 23 2017

> We could presumably arrange for the testharness.js timeout to always be half the timeout of the whole
> file, so that only harness errors or a test doing explicit_timeout could cause the whole file to
> timeout.

This would make but it would take a long time to execute tests. And unless the tests are running in parallel, it would take just two test cases for the whole test file to timeout. Typically multiple tests would use the same event or promise that is timing out.

In my CL, the "expectCalled" had a timeout of 1s which might be flaky, but if the timeout is larger than, say, 5 or 10 seconds, running multiple tests would be a pain.

Comment 5 by hbos@chromium.org, Oct 23 2017

The majority of cases: On an average run, an event or promise would fire/resolve in a matter of milliseconds or not at all.

In other cases, like requesting media perhaps, this would be a larger operation.

Marking something as "expecting the be resolved quickly" and "expecting to resolve in a bit" would be good for the overall performance of LayoutTests. Bots running LayoutTests are already the CQ's bottleneck.
Status: Available (was: Untriaged)
This perhaps is blink-specific. As I said in the CL:

> Upstream WPT does support timeout of subtests without failing the whole file (http://web-platform-tests.org/writing-tests/testharness-api.html says TIMEOUT is a legitimate result for a test), which is implemented in testharness.js. However, in Blink we override the behavior to only use a single timeout for each file at the test runner level.
>
> The `expectCalled` workaround might hence only be beneficial in Blink. As for the upstream, the workaround likely only adds flakes with no benefits.

The aforementioned override can be found here: https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/resources/testharnessreport.js?l=56&rcl=7c532114fb03f053be230d0abbc9d61d207e4a31

Quinten, you recently refactored the file. Do you have any insights on this timeout matter?
My main point to add is that in the past, before we added that override ('explicit_timeout': true in the testharnessreport setup call), we sometimes had race conditions between wpt timeout and run-webkit-tests timeout, which led to flakiness.
Oops, just noticed that foolip@ had found the setup override in the original report. Somehow missed it when jumping from the merged bug.

Quinten, do you know if we've tried playing with the timeout values before? Like setting the harness timeout to be half of the runner timeout as foolip suggested? If not, it sounds like a good idea to me. Forcing explicit_timeout might be an overkill and cause the loss of some useful test coverage.
I believe we haven't. That's a possible option.

Although, if you have a couple of tests that take a long time in one file, I think you could still run into flakiness (sometimes hitting the run-webkit-tests timeout, and sometimes finishing but with one or two TIMEOUT results).

Possibly related discussion:
 - Is timeout ever really a result that we want to get? I would assume that anywhere where we hit a timeout means something is broken somehow.
 - If we have a single test in a file that times out and everything else passes, it most cases maybe the file can be broken up? 

Comment 11 by hbos@chromium.org, Oct 25 2017

I have a strong opinion that we should support single tests TIMEOUTs.
Splitting up test into multiple files due to implementation status is bad for tests running in multiple browsers, and failing one test should not mean losing test coverage for all the other tests in that file. These type of problems would encourage people to write non-Web Platform Tests and that would hurt the web platform. And WPT tests should be written as part of spec work to encourage coverage and conformance.

More specifically...

> Is timeout ever really a result that we want to get? I would assume that anywhere where we hit a timeout means something is broken somehow.

Yes, because WPT are layout tests shared by multiple browser vendors, meaning:

1. Anyone can add test coverage for a feature. Adding test coverage should not be blocked on all other implementations passing the test or else people will write their tests as non-WPT, this hurts the web platform.

2. For the same reason, we can't design the test to expect to fail with a TODO saying fix the test when the feature is supported (like we could if the test was not to run on other browsers). Whether a test should pass or fail needs to be in an -expected.txt file because that is the only piece of the test that is per-browser.

3. There are benefits to writing WPT independently of any browser's implementation status because the specs are living documents, updated before (and after) implementations are. If we can encourage a culture that has people write WPT test coverage for any change they make to the spec we would have much less differences between implementations and spec and between different browsers. Imagine that.

> If we have a single test in a file that times out and everything else passes, it most cases maybe the file can be broken up?

Usually multiple tests in the same file test something similar, and if something times out it is likely that multiple tests will time out, but that does not mean all of them will.

Tests should be broken up into files based on what areas of the specs they cover, and based on testing similar/related behavior. Splitting a file up based on what any given browser's current implementation status is is an ugly hack.

I don't want to split up tests relating to addTrack() APIs based on what Chrome, Firefox or other browsers do or do not pass. That's a detail a WPT designer shouldn't have to be concerned with.

Comment 12 by hbos@chromium.org, Oct 25 2017

Labels: -Pri-3 Pri-2
:)
Summary: Single-test timeouts in wpt requires whole file's expectation to be timeout. (was: Single-test timeouts in wpt requires who file's expectation to be timeout.)

Comment 14 by hbos@chromium.org, Oct 26 2017

According to http://web-platform-tests.org/writing-tests/testharness-api.html, expectCalled() type of behavior is discouraged:

"In general the use of timeouts in tests is discouraged because this is an observed source of instability in real tests when run on CI infrastructure. In particular if a test should fail when something doesn’t happen, it is good practice to simply let the test run to the full timeout rather than trying to guess an appropriate shorter timeout to use."

As mentioned in https://crbug.com/777526#c5 the time it takes to run LayoutTests is significant and a bottleneck for the CQ bots. While flaky tests must be avoided, I think it is debatable whether to ban timeouts such as "expectCalled()". With a few seconds margin of error, and reliance on t.step_timeout which can scale up the timeout for slow configurations, I think flakiness can be avoided. A several second timeout for something expected to be resolved in < 0.25 seconds would be better than a large full test timeout.
We could use a large default timeout if this is a concern, like 5-10 seconds.

Besides performance, this has the benefit of indicating what promise or event in the test timed out except for a generic test timed out failure.

expectCalled() should be easy to modify into a Test.prototype method akin to t.step_func/t.step_timeout, and you're already expected to wrap these callbacks with t.step_func, might as well wrap it in something like "t.step_expect_func".
> expectCalled() should be easy to modify into a Test.prototype method akin to t.step_func/t.step_timeout, and you're already expected to wrap these callbacks with t.step_func, might as well wrap it in something like "t.step_expect_func".

Correct me if I'm wrong. If the test runner supports individual test timeout (like the upstream wptrunner), would expectCalled still be needed? I thought this was merely a workaround to prevent whole-file timeout.

Comment 16 by hbos@chromium.org, Oct 26 2017

If individual test timeout is supported we would not need expectCalled(), no.
I think it would still have benefits over letting the test timeout (reduce CQ time, helpful failure messages), but this would not be a necessity.

Comment 17 by hbos@chromium.org, Nov 10 2017

Bump!

So we have two issues:
1. This issue: Support single test timeout in -expected.txt file.
2. Spin-off proposal: Support "t.step_expect_func" á la #14.

Can we have 1) assigned to an owner?
I support doing 2) but that is debatable, we can file a separate issue or skip it.
Project Member

Comment 18 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e765c18bf7127ef44119f1aa593f9b0a0da188cb

commit e765c18bf7127ef44119f1aa593f9b0a0da188cb
Author: Henrik Boström <hbos@chromium.org>
Date: Fri Nov 10 11:36:15 2017

More WPT tests for setRemoteDescription for removing tracks.

Added to RTCPeerConnection-setRemoteDescription-tracks.https.html:
- stream.onremovetrack should fire and the track removed from stream.

Added RTCPeerConnection-setRemoteDescription-remove-tracks.https.html:
- track.onmute should fire and the track be muted.

This is placed in a separate time for now due to being unable to
"-expected.txt"-expect a TIMEOUT in Chromium, see discussion in
https://crbug.com/777526. Left a TODO to merge the files when Chromium
supports this.

Bug:  774303 , 777526
Change-Id: I1ee56e99d7abbea25d11dccfef54d8fb89b78039
Reviewed-on: https://chromium-review.googlesource.com/729149
Commit-Queue: Henrik Boström <hbos@chromium.org>
Reviewed-by: Robert Ma <robertma@chromium.org>
Reviewed-by: Taylor Brandstetter <deadbeef@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515527}
[modify] https://crrev.com/e765c18bf7127ef44119f1aa593f9b0a0da188cb/third_party/WebKit/LayoutTests/TestExpectations
[add] https://crrev.com/e765c18bf7127ef44119f1aa593f9b0a0da188cb/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-remove-tracks.https.html
[modify] https://crrev.com/e765c18bf7127ef44119f1aa593f9b0a0da188cb/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-expected.txt
[modify] https://crrev.com/e765c18bf7127ef44119f1aa593f9b0a0da188cb/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html

Project Member

Comment 19 by bugdroid1@chromium.org, Nov 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b

commit 5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b
Author: Henrik Boström <hbos@chromium.org>
Date: Fri Nov 10 17:44:17 2017

When removing a remote track, mute it and fire onmute.

Spec: https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#process-remote-track-removal
This fixes  https://crbug.com/773523 .
Updated the -expected.txt file of the test covering this.

Bug:  773523 , 777526
Change-Id: Icc6e6d45201bec8cfa91ad68f4114d7b3ca66028
Reviewed-on: https://chromium-review.googlesource.com/729523
Commit-Queue: Henrik Boström <hbos@chromium.org>
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Reviewed-by: Taylor Brandstetter <deadbeef@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515589}
[modify] https://crrev.com/5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b/third_party/WebKit/LayoutTests/TestExpectations
[delete] https://crrev.com/a87fc3285677f9c3903542925e1fc7f2a1dbacaf/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-remove-tracks.https.html
[modify] https://crrev.com/5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-expected.txt
[modify] https://crrev.com/5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html
[modify] https://crrev.com/5a0ef3f1c3ce29ce9d1ae2d755e3c04fff08ea1b/third_party/WebKit/Source/modules/peerconnection/RTCPeerConnection.cpp

Comment 20 by hbos@chromium.org, Nov 13 2017

FYI the above CLs were affected by this issue but are not addressing the issue.
robertma@, is this something you think we should try to fix in Q1, and if so would you be the right owner?
I would say it's at most a P2, as it's almost always been like this. (In
comparison, I'd say the Android WPT issue should be of higher priority.) I
did plan to take a look some time this quarter (probably near the end, and
the fix may land next year). If the time frame is reasonable, I can still
be the owner. Otherwise, I would not have enough recently.
Owner: robertma@chromium.org
Status: Assigned (was: Available)
This quarter is more than I would have guessed possible, so I'll assign it then. It's fine if it turns out that there are 11 more important P2 issues to go through first, as long as this is still in our plans to fix eventually, not a P3 that we might never do.

Comment 24 by hbos@chromium.org, Nov 22 2017

Thanks.
robertma@, do you still think we should do this, and in Q1? (It came up in triage.)
Not likely, given the amount of surprise tasks that came op this Q.
hbos@, has this continued to be a problem for you, and how have you worked around it? How sad would you be if we don't figure out a fix for this?

Comment 28 by hbos@chromium.org, Jan 25 2018

I've worked around this by splitting tests up into multiple files, which I've only had to do a couple of times. So not a big problem it turns out, more of a would be nice to have.
Labels: -Pri-2 Pri-3
OK, I'm going to downgrade this to a P3 to reflect the reality that it's not imminent. Will bump priority again if we see this happening for other features too.

Sign in to add a comment