New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 829401 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Layout Test external/wpt/webrtc/RTCPeerConnection-track-stats.https.html is flaky

Project Member Reported by smcgruer@chromium.org, Apr 5 2018

Issue description

The following layout test is flaky on Win7 Tests (dbg)(1).

external/wpt/webrtc/RTCPeerConnection-track-stats.https.html

https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=webkit_layout_tests&tests=external%2Fwpt%2Fwebrtc%2FRTCPeerConnection-track-stats.https.html

It appears to have been flaky on and off for some time. Assigning hbos@ based on recent changes to the file.

If flake continues, I will disable the test.
 

Comment 1 by hbos@chromium.org, Apr 6 2018

Status: Started (was: Assigned)
Here's the flaky failure:

--- e:\b\swarm_slave\w\ioia3wmb\layout-test-results\external/wpt/webrtc/RTCPeerConnection-track-stats.https-expected.txt
+++ e:\b\swarm_slave\w\ioia3wmb\layout-test-results\external/wpt/webrtc/RTCPeerConnection-track-stats.https-actual.txt
@@ -13,8 +13,8 @@
 FAIL replaceTrack(): original track attachment stats present after replacing assert_true: Has stats for original track expected true got false
 PASS RTCRtpSender.getStats() contains only outbound-rtp and related stats
 PASS RTCRtpReceiver.getStats() contains only inbound-rtp and related stats
-PASS RTCPeerConnection.getStats(sendingTrack) is the same as RTCRtpSender.getStats()
-PASS RTCPeerConnection.getStats(receivingTrack) is the same as RTCRtpReceiver.getStats()
+FAIL RTCPeerConnection.getStats(sendingTrack) is the same as RTCRtpSender.getStats() assert_object_equals: property "timestamp" expected object "[object Object]" got object "[object Object]"
+FAIL RTCPeerConnection.getStats(receivingTrack) is the same as RTCRtpReceiver.getStats() assert_object_equals: property "timestamp" expected object "[object Object]" got object "[object Object]"
 PASS RTCPeerConnection.getStats(track) throws InvalidAccessError when there are zero senders or receivers for the track
 PASS RTCPeerConnection.getStats(track) throws InvalidAccessError when there are multiple senders for the track
 Harness: the test ran to completion.

I see what's going on :)
In rare cases the stats cache is invalidated (slow run) and assert_object_equals fails.

I will update the tests to only compare stats IDs, not assuming the contents exactly match since timestamp could have changed and counters could have gone up.

Comment 3 by hbos@chromium.org, Apr 6 2018

The flake is very rare btw so no need to disable the test in the meantime.
Project Member

Comment 4 by bugdroid1@chromium.org, Apr 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/15d774ed7528de0846c95efb2273b5e5b8744ab9

commit 15d774ed7528de0846c95efb2273b5e5b8744ab9
Author: Henrik Boström <hbos@chromium.org>
Date: Fri Apr 06 09:14:10 2018

Fix RTCPeerConnection-track-stats.https.html flake.

The assert_object_equals would fail due to timestamps being different in
rare runs where the getStats() cache was invalidated between the two
calls.

external/wpt/webrtc should make no assumptions about any cache. Tests
are updated only to compare stats IDs, not individual metrics.

Bug:  829401 
Change-Id: I6fcc781a1827af69f47a5c4416dd5e6bae0c98b3
Reviewed-on: https://chromium-review.googlesource.com/999477
Reviewed-by: Harald Alvestrand <hta@chromium.org>
Commit-Queue: Henrik Boström <hbos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#548721}
[modify] https://crrev.com/15d774ed7528de0846c95efb2273b5e5b8744ab9/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-track-stats.https.html

Comment 5 by hbos@chromium.org, Apr 6 2018

Labels: -Sheriff-Chromium
Status: Fixed (was: Started)
Labels: Sheriff-Chromium
Status: Assigned (was: Fixed)
Based on https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20(dbg)(1)/68313 and https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Win7%20Tests%20%28dbg%29%281%29/68314 , which both happened after 15d774ed, this is still timing out on Win7.

Please take a look hbos :)
Labels: -Sheriff-Chromium
Cc: -smcgruer@chromium.org
Project Member

Comment 9 by bugdroid1@chromium.org, Apr 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/492e71000ed76bf3423957e15475198fcd0cfab9

commit 492e71000ed76bf3423957e15475198fcd0cfab9
Author: Vasilii Sukhanov <vasilii@chromium.org>
Date: Mon Apr 09 14:20:16 2018

Mark RTCPeerConnection-track-stats.https.html as timing out on Win.

TBR=hbos@chromium.org

Bug:  829401 
Change-Id: I112346b87535c6f78303da8e7b7ece905f4d6b0f
Reviewed-on: https://chromium-review.googlesource.com/1001900
Reviewed-by: Vasilii Sukhanov <vasilii@chromium.org>
Commit-Queue: Vasilii Sukhanov <vasilii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549169}
[modify] https://crrev.com/492e71000ed76bf3423957e15475198fcd0cfab9/third_party/WebKit/LayoutTests/TestExpectations

Comment 10 by hbos@chromium.org, Apr 9 2018

It timing out is a different, unknown, issue than the flake I fixed (where it would fail due to finishing the test with a different result than -expected.txt).

The logs give no hint about what is timing out.

Comment 12 by hbos@chromium.org, Apr 10 2018

Cc: hta@chromium.org deadbeef@chromium.org
deadbeef@, hta@: I don't know why this would time out, I really don't think the getStats() with selector API is timing out since it is just filtering existing results in a well-tested manor, but I still suspect the getStats() with selector tests are the ones that cause it to time out since these were recently added.

https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-track-stats.https.html?l=425

Other than using the new APIs, the difference between these tests and other tests is the use of:
    await doSignalingHandshake(caller, callee);
    await onIceConnectionStateCompleted(caller);

Could you think of any reason why this might hang? Is it possible for pc.iceConnectionState to never be completed? Maybe it goes from 'connected' back to 'checking' or something?

  // Returns a promise that is resolved when pc.iceConnectionState changes to
  // 'completed'. This is when transport stats can be expected to have its
  // selectedCandidatePairId defined.
  async function onIceConnectionStateCompleted(pc) {
    let resolver = new Resolver();
    pc.oniceconnectionstatechange = e => {
      if (pc.iceConnectionState == 'completed')
        resolver.resolve();
    };
    return resolver.promise;
  }

Comment 13 by hbos@chromium.org, Apr 10 2018

I could speculatively change this to
  if (pc.iceConnectionState == 'connected' ||
      pc.iceConnectionState == 'completed')
and see if the flake goes away.

Comment 14 by hbos@chromium.org, Apr 10 2018

Status: Started (was: Assigned)
Project Member

Comment 15 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/55f4beb9157e45fa1ffe3ebfa7176b6cafbc33b6

commit 55f4beb9157e45fa1ffe3ebfa7176b6cafbc33b6
Author: Vasilii Sukhanov <vasilii@chromium.org>
Date: Tue Apr 10 11:27:21 2018

Mark RTCPeerConnection-track-stats.https.html as timing out on all the platforms

TBR=hbos@chromium.org

Bug:  829401 
Change-Id: I98b7d3149599e876d673a84718215521ddf6cfac
Reviewed-on: https://chromium-review.googlesource.com/1004995
Reviewed-by: Vasilii Sukhanov <vasilii@chromium.org>
Commit-Queue: Vasilii Sukhanov <vasilii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#549485}
[modify] https://crrev.com/55f4beb9157e45fa1ffe3ebfa7176b6cafbc33b6/third_party/WebKit/LayoutTests/TestExpectations

Comment 16 by hbos@chromium.org, Apr 10 2018

That's not it. It's very difficult to reproduce, I haven't been able to in content_shell but I have sometimes been able to with run-webkit-tests --iterations=50. It either finishes in ~1 second or it times out after ~18s.

Comment 17 by hta@chromium.org, Apr 10 2018

there used to be a different bug where the PC would never get from connected to completed state. Perhaps it still happens some part of the time?

Idea: console.log the state transitions - see if there's a pattern before going into timeout.

Comment 18 by hbos@chromium.org, Apr 10 2018

I don't know how to console.log because as far as I know that only shows up in content_shell, and I can only repro this with the run-webkit-tests script that eats up all the output. I've tried to trial-and-error only run a subset of the tests in the file by commenting out the other tests but I haven't pinpointed what is timing out. Now the script is misbehaving, I'll try rebooting

Comment 19 by hbos@chromium.org, Apr 10 2018

  if (pc.iceConnectionState == 'connected' ||
      pc.iceConnectionState == 'completed')

Did not help.

Comment 20 by hbos@chromium.org, Apr 10 2018

The fact that it only reproduces on multiple iterations (run-webkit-tests) but not when repeating just as many iterations a content_shell run, I wonder if this is tied to resources consumed by peer connections that are not garbage collected between runs.
I'd advise against relying on the "completed" state, generally. It's sometimes gated by connectivity checks timing out, and we have pretty generous timeouts. I wouldn't think this would happen with loopback connections but you never know.

Something I did notice in the log:

05:56:56.400 7852   [8852:9416:0406/055558.194:WARNING:peerconnection.cc(5360)] audio is not ready to use the remote candidate because the local or remote description is not set.

I'm not sure if this is actually an issue, since the candidate should still be used after the local description is set... But you could comment out this condition and see if it's reproducible: https://cs.chromium.org/chromium/src/third_party/webrtc/pc/jseptransport2.cc?l=256&rcl=4da18e89bdee78df4478b66cdd0e6f6a38d61b4d

Maybe there's a race condition, and "Promise.all" allows a candidate to sneak in between "localPc.setLocalDescription(offer)" and "remotePc.setRemoteDescription(offer)"? Doesn't seem like that should be possible though.

Comment 22 by hta@chromium.org, Apr 11 2018

Promise.all deliberately doesn't promise anything about sequence of execution, so it's entirely reasonable that sometimes localPc.setLocalDescription(offer) will be executed (and possibly cause candidates to be emitted) before remotePc.setRemoteDescription(offer) is completed.

As deadbeef says, this should be OK in principle, early candidates should be saved, but I don't know that we have tests that verify that connections get set up correctly if all the candidates are early.


Comment 23 by hbos@chromium.org, Apr 12 2018

In this case it's not that it is rejected but that it is timing out. I wish I knew how to get logs while running with the script, or that I could repro when running it in content_shell directly...

While Promise.all does not guarantee order, the functions that are producing the promises are called in a specific order, which makes this non-racy in Chrome which queues operations on the webrtc signaling thread.

WPT should not make any assumptions about what is executed in what order though. But whether localPc or remotePc is executed first does not matter - it does wait for promises to resolve before creating the answer.

Comment 24 by hbos@chromium.org, Apr 12 2018

onIceConnectionStateCompleted is called last which wires up and listens for oniceconnectionstatechange...
https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-track-stats.https.html?sq=package:chromium&dr&l=575

Maybe the race is that if we're "lucky" the state has ALREADY reached 'completed' and as such there are no more events that will fire? I'll try...
Project Member

Comment 25 by bugdroid1@chromium.org, Apr 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/84f1aaa8e300f56b37d84827443954696f12f4b7

commit 84f1aaa8e300f56b37d84827443954696f12f4b7
Author: Henrik Boström <hbos@chromium.org>
Date: Fri Apr 13 10:48:23 2018

Fix race in track-stats.https.html test.

This resolves the onIceConnectionStateComplete probmise when
iceConnectionState reaches either 'connected' or 'completed' and fixes
the race if it had already reached these states before the function was
called.

This fix is speculative. I am no longer able to repro the TIMEOUT
locally with or without the fix.

Bug:  829401 
Change-Id: I1dec90250d640d93498c59a932ab5e84a3b96f15
Reviewed-on: https://chromium-review.googlesource.com/1012029
Reviewed-by: Harald Alvestrand <hta@chromium.org>
Commit-Queue: Henrik Boström <hbos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#550580}
[modify] https://crrev.com/84f1aaa8e300f56b37d84827443954696f12f4b7/third_party/WebKit/LayoutTests/TestExpectations
[modify] https://crrev.com/84f1aaa8e300f56b37d84827443954696f12f4b7/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-track-stats.https.html

Comment 26 by hbos@chromium.org, Apr 13 2018

Status: Fixed (was: Started)
This is not verified, but marking as Fixed. If it keeps flaking I'm sure this or another bug will be reopened.
hbos@ - thank you so much for your attempt to get this flake squashed forever! It is much appreciated :).
Project Member

Comment 28 by bugdroid1@chromium.org, Apr 17 2018

Labels: merge-merged-testbranch
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/84f1aaa8e300f56b37d84827443954696f12f4b7

commit 84f1aaa8e300f56b37d84827443954696f12f4b7
Author: Henrik Boström <hbos@chromium.org>
Date: Fri Apr 13 10:48:23 2018

Fix race in track-stats.https.html test.

This resolves the onIceConnectionStateComplete probmise when
iceConnectionState reaches either 'connected' or 'completed' and fixes
the race if it had already reached these states before the function was
called.

This fix is speculative. I am no longer able to repro the TIMEOUT
locally with or without the fix.

Bug:  829401 
Change-Id: I1dec90250d640d93498c59a932ab5e84a3b96f15
Reviewed-on: https://chromium-review.googlesource.com/1012029
Reviewed-by: Harald Alvestrand <hta@chromium.org>
Commit-Queue: Henrik Boström <hbos@chromium.org>
Cr-Commit-Position: refs/heads/master@{#550580}
[modify] https://crrev.com/84f1aaa8e300f56b37d84827443954696f12f4b7/third_party/WebKit/LayoutTests/TestExpectations
[modify] https://crrev.com/84f1aaa8e300f56b37d84827443954696f12f4b7/third_party/WebKit/LayoutTests/external/wpt/webrtc/RTCPeerConnection-track-stats.https.html

[bulk-edit: disregard if N/A] Can the owner please set milestone to this bug if applicable?

Comment 30 by hbos@chromium.org, Jun 7 2018

Labels: M-67

Sign in to add a comment