New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 777777 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: ----

Blocked on:
issue 787461

Blocking:
issue 760107
issue 777999



Sign in to add a comment

external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is flaky on webkit_layout_tests failing on chromium.mac/Mac10.11 Tests

Project Member Reported by battre@chromium.org, Oct 24 2017

Issue description

webkit_layout_tests failing on chromium.mac/Mac10.11 Tests

Builders failed on: 
- Mac10.11 Tests: 
  https://build.chromium.org/p/chromium.mac/builders/Mac10.11%20Tests


external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is flaky

This is observed only on Mac.

Example diff:

--- /b/s/w/ioe8a9ll/layout-test-results/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-expected.txt
+++ /b/s/w/ioe8a9ll/layout-test-results/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-actual.txt
@@ -3,7 +3,7 @@
 PASS addTrack with a track and a stream makes ontrack fire with a track and a stream.
 PASS addTrack with two tracks and one stream makes ontrack fire twice with the tracks and shared stream.
 FAIL addTrack with a track and two streams makes ontrack fire with a track and two streams. Failed to execute 'addTrack' on 'RTCPeerConnection': Adding a track to multiple streams is not supported.
-FAIL ontrack fires before setRemoteDescription resolves. assert_equals: expected "ontrack;setRemoteDescription;" but got "setRemoteDescription;ontrack;"
+PASS ontrack fires before setRemoteDescription resolves.
 PASS ontrack's receiver matches getReceivers().
 FAIL removeTrack does not remove the receiver. assert_array_equals: Expected the set of receivers to remain the same. lengths differ, expected 1 got 0
 Harness: the test ran to completion.
 

Comment 1 by guidou@chromium.org, Oct 24 2017

Cc: guidou@chromium.org
Owner: hta@chromium.org

Comment 2 by hta@chromium.org, Oct 24 2017

Cc: hbos@chromium.org
Henrik, is your reading that the spec is authoritative on this sequencing?

Likely to be racy; racy is bad.

Comment 3 by battre@chromium.org, Oct 24 2017

Labels: -Sheriff-Chromium
Taking this out of the sheriff queue.
Project Member

Comment 4 by bugdroid1@chromium.org, Oct 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0aaff0a1f03ba2e5250e16010e6092acc3fd9525

commit 0aaff0a1f03ba2e5250e16010e6092acc3fd9525
Author: Dominic Battre <battre@chromium.org>
Date: Tue Oct 24 10:39:40 2017

Disable flaky RTCPeerConnection-setRemoteDescription-tracks.https.html on Mac

Test is flaky on Mac, see bug.

TBR=guidou@chromium.org, yutak@chromium.org

Bug:  777777 
Change-Id: I3b7ebeca52fc48407913002276c5006d65f2f3fe
Reviewed-on: https://chromium-review.googlesource.com/735344
Reviewed-by: Dominic Battré <battre@chromium.org>
Reviewed-by: Yuta Kitamura <yutak@chromium.org>
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Commit-Queue: Dominic Battré <battre@chromium.org>
Cr-Commit-Position: refs/heads/master@{#511077}
[modify] https://crrev.com/0aaff0a1f03ba2e5250e16010e6092acc3fd9525/third_party/WebKit/LayoutTests/TestExpectations

Comment 5 by hbos@chromium.org, Oct 24 2017

Cc: foolip@chromium.org hta@chromium.org
Owner: hbos@chromium.org
Status: Started (was: Assigned)
setRemoteDescription is called on the main thread, posts to the webrtc signaling thread which does SRD in the lower layers and fires callbacks (tracks added, promise resolving). The callbacks jump back to the main thread and completes the operation.

Despite the thread jumps and multiple callbacks one would expect the order to be determinate. On the webrtc thread each callback is invoked in order. The "jumping back to the main thread" from these callbacks happens in order with Posts, so the tasks are placed in order on the main thread.

There are risks of races here if the main thread is doing things in parallel to the webrtc thread, but in this test we call SRD and await promises resolving and events firing. There should be no interference.

If "fire ontrack" and "resolve promise" are synchronous operations on the main thread I don't see how this could be flaky. It might be more complicated than that though, for example when a promise is resolved the then() will not be invoked until later, and I don't know if the order of resolved promises is determinate? When we do "invoke ontrack firing", is this placed in a queue or similar or does this happen immediately?

Need to debug/look closer at code...

Comment 6 by hbos@chromium.org, Oct 24 2017

The spec is authoritative on the ordering:
https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#set-the-rtcsessiondescription

4.4.1.6 @ starting from step 8, you're supposed to...
 - Let trackEvents be an empty set.
 - Process the tracks being added and removed, including adding events to trackEvents.
 - For each event in trackEvents, fire ontrack.
 - Resolve the setRemoteDescription promise.

Comment 7 by hbos@chromium.org, Oct 24 2017

Here's a breakdown of what's going on during SRD:

Starting on main thread, blink setRemoteDescription -> content SetRemoteDescription,
which does a a Post of webrtc's SetRemoteDescription to webrtc signaling thread
https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?q=rtc_peer_connection_handler.cc&sq=package:chromium&dr&l=1482
I had previously thought that this was a synchronous invoke, but it's a post, so the main and webrtc threads are running in parallel here.
And we do...

- webrtc::PeerConnection::SetRemoteDescription (on webrtc signaling thread)

  - "UpdateRemoteStreamsList -> OnRemoteTrackSeen -> CreateAudio/VideoReceiver" calls OnAddTrack
    https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?sq=package:chromium&dr&l=1520

    - Post to main thread to add track "OnAddTrack -> OnAddTrackImpl -> OnAddRemoteTrack -> DidAddRemoteTrack"
      https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=1036

      - DidAddRemoteTrack call to blink
        https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=2064

        - ScheduleDispatchEvent "ontrack"
          https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/peerconnection/RTCPeerConnection.cpp?sq=package:chromium&dr&l=1536

  - Post to signaling thread to resolve promise
    https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?gsn=SetSessionDescriptionObserver&l=1152

    - Post to the main thread to resolve promise
      https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=472

      - Promise.resolve()
      https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/peerconnection/RTCVoidRequestPromiseImpl.cpp?sq=package:chromium&dr&l=32

ScheduleDispatchEvent of "ontrack" is an asynchronous action, "ontrack" will execute on the main thread later.
Resolving the promise is also an asynchronous action, "then()" will execute on the main thread later.

Comment 8 by hbos@chromium.org, Oct 24 2017

Cc: deadbeef@chromium.org
If I understand this correctly, these order of events are GUARANTEED:
1. ScheduleDispatchEvent "ontrack"
2. srdPromise.resolve()

So normally both are fired and execute in that order. But MAYBE if 1. and 2. happens quickly, by the time we reach 2. "ontrack" is still in a task queue, with no event loop exectution in-between 1. and 2. When we have both a task and a promise pending AT THE SAME TIME, when we begin to execute the next cycle in the event loop - will it execute promises' then() BEFORE it executes queued tasks?

foolip@ do you know the guaranteed ordering of tasks and promises? Or who to ask?

Comment 9 by hbos@chromium.org, Oct 24 2017

I've confirmed that resolved promises execute before events scheduled for dispatching:
https://chromium-review.googlesource.com/c/chromium/src/+/736025

Despite both being asynchronous, they are not put on the same queue so to speak, and the promises' then() happens before the execution of the event's callback even though we queued the event before we resolved the promise.

This must be what the problem is.

Comment 10 by hbos@chromium.org, Oct 24 2017

A proper fix is to add trackEvents to a trackEvents queue like the spec says and do the firing before resolving the promise.

The more obvious fix to fire the event immediately instead of scheduling it to fire would cause problems where all effects of SRD have not taken effect before the track fires. I.e., a stream with two tracks would have the first track event fire before the second track was added to the stream, yielding wrong results if the stream were to be inspected in the ontrack event.

We must have a similar race with regards to the order of MediaStream.onaddtrack and promise resolving due to ScheduleDispatchEvent here:
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/mediastream/MediaStream.cpp?sq=package:chromium&dr=CSs&l=420
This is not explicitly tested anywhere though.

Comment 11 by hbos@chromium.org, Oct 24 2017

(Or rather if two tracks were added to an existing stream, details details, point being: there would be observable effects of doing it that way, we need a trackEvents queue.)

Comment 12 by hbos@chromium.org, Oct 24 2017

Any promise resolving in an independent promise chain is potentially racey with "the effects of SRD" because of how the multiple callbacks of webrtc are handed over to the main thread in multiple steps, meaning things can happen in-between callbacks, and the effects of different callbacks can occur with user-defined functions running in-between (with the ability to expect RTCPeerConnections and friends).

What we probably should do, regardless of this problem, is to handle all effects of something in a single callback, a single jump to the main thread, so that it becomes an "all or nothing" operation.

This could be achieved by queueing all the callbacks until the promise is ready to resolve, and then fire all of them in a single "promise resolved" step on the main thread.

Comment 13 by hbos@chromium.org, Oct 24 2017

Good ol' spaghetti land.

Comment 14 by hbos@chromium.org, Oct 24 2017

guidou@ to the rescue: Check if microtasks execute before promises resolves and queue a microtask instead of ScheduleDispatchEvent's reliance on RaskRunnerTimer.

I'll look into that.

Comment 15 by hbos@chromium.org, Oct 24 2017

Confirmed: Microtasks execute before resolved promises' then().

Comment 16 by hbos@chromium.org, Oct 24 2017

I filed  https://crbug.com/777999  to keep track of firing events in WebRTC (RTCPeerConnection, MediaStream, MediaStreamTrack) in general, separately from this particular flake.

Comment 17 by hbos@chromium.org, Oct 24 2017

I tried to fix this problem using Microtasks in https://chromium-review.googlesource.com/c/chromium/src/+/736038 but ended up breaking other tests. I think Microtasks fire "too soon".

Comment 18 by hbos@chromium.org, Oct 24 2017

With that setback, I'm leaning towards "throttling" all webrtc callbacks until the promise is ready to resolve and then in a single Post to the main thread go ahead and handle all callbacks synchronously.

Then there is no risk of tasks executing in-between callbacks, making this non-racey and we can either always pass or always fail the disabled test.

We can then decide upon the proper queue type (microtask, task queue, custom event queue, etc) to ensure the tasks are fired before the promise's then() executes ( https://crbug.com/777999 ).

Comment 19 by hbos@chromium.org, Oct 24 2017

Blocking: 777999
> With that setback, I'm leaning towards "throttling" all webrtc callbacks until the promise is ready to resolve and then in a single Post to the main thread go ahead and handle all callbacks synchronously.

That seems like a sensible approach to me. If everything is handled in one place at the Blink level, things won't break if ordering is changed at the lower levels.

Comment 21 by hbos@chromium.org, Oct 27 2017

@deadbeef: PeerConnection::SetRemoteDescription pretends to be asynchronous by posting to the current thread (signaling thread -> signaling thread) for the "OnSuccess" callback,
https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?q=peerconnection.cc&sq=package:chromium&dr&l=1126

This makes it impossible for the webrtc::PeerConnectionObserver (RTCPeerConnectionHandler::Observer) to tell which callbacks belong to which SetRemoteDescription call if SRD is called multiple times without waiting for the callback to resolve, e.g.

On signaling thread
pc.SetRemoteDescription()  // causing callback1() and Post(promiseResolve1)
pc.SetRemoteDescription()  // causing callback2() and Post(promiseResolve2)

When promiseResolve1 does "jump to main thread to invoke all SRD callbacks" its list of callbacks will involve both callback1 and callback2.

The PeerConnectionObserver is per-PC, not per-call. Possible solutions:
1. Have a per-call observer. Breaking change.
2. Add a function to observer that is involved without doing Post to inform the PC that all callbacks have been invoked.
3. Remove the Post, stop pretending SRD is async. Unclear how this would affect upstream projects.
4. Introduce some SRD ID passed to all the callbacks. Breaking change.
5. When Chrome jumps to webrtc signaling thread to invoke SRD immediately afterwards inform the observer that SRD has been called. This takes advantage of knowing that SRD is not asynchronous. That may be bad.

Wrt 2: It's a bit ugly to have two OnSuccess methods, but it makes sense that PeerConnectionObserver has an OnSuccess (or "OnSetRemoteDescriptionSuccess") since it is listening to everything else that has to do with SRD, why not when it completes. The only OnSuccess we have is tied to a different type of observer, SetSessionDescriptionObserver.

Thoughts?

Comment 22 by hbos@chromium.org, Oct 27 2017

(Multiple SRD calls in a row is triggered if JavaScript is doing that, e.g. https://cs.chromium.org/chromium/src/chrome/test/data/webrtc/peerconnection_rtp.js?q=peerconnection_rtp.js&sq=package:chromium&dr&l=284)

Comment 23 by hbos@chromium.org, Oct 27 2017

Alternatively we could try to block the main thread from SRD being called on main thread to SRD being resolved on the signaling thread (after the Post). That might be fine, but I'm afraid of deadlocks, I've seen synchronous invoke from webrtc land to main thread land e.g. external encoder/decoder factories.

Comment 24 by hbos@chromium.org, Oct 27 2017

Doing 3) fixed the crash I was facing in my WIP CL, 2) might be safer for other projects and we could specify that the new observer function of 2) must be invoked between callbacks of one SRD and the callbacks of another SRD, ensuring any PC implementation upholds this contract, regardless of if its sync or async.

Comment 25 by hbos@chromium.org, Oct 27 2017

Blockedon: 779250

Comment 26 by hbos@chromium.org, Oct 30 2017

Blockedon: webrtc:8473

Comment 27 by hbos@chromium.org, Nov 5 2017

Blockedon: -779250

Comment 29 by hbos@chromium.org, Nov 7 2017

I expect this flake to affect all platforms.

Comment 30 by hbos@chromium.org, Nov 13 2017

FYI this comment was referenced from  https://crbug.com/774303#c14  with regards to test coverage.

Comment 31 by hbos@chromium.org, Nov 15 2017

Labels: -Pri-1 Pri-2

Comment 32 by hbos@chromium.org, Nov 21 2017

Blockedon: 787461
Project Member

Comment 33 by bugdroid1@chromium.org, Nov 27 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/27d21eb0af100f86a2dfca5bccecf60e2d71f70e

commit 27d21eb0af100f86a2dfca5bccecf60e2d71f70e
Author: Henrik Boström <hbos@chromium.org>
Date: Mon Nov 27 11:27:56 2017

Unify SetRemoteDescription track and SRD resolve events in one callback.

Previously, RTCPeerConnectionHandler::Observer listened on tracks added
and removed events and SetSessionDescriptionRequest listened on the
SetRemoteDescription being resolved/rejected on different callbacks.
Each callback was handled separately, jumping from the webrtc signaling
thread to the main thread once per callback, causing race conditions
and making it possible for events to occur in-between callbacks, before
the results of SRD were fully processed.

Now, WebRtcSetRemoteDescriptionObserverInterface is implemented,
unifying the handling of these events, removing the race conditions.
Also, the new observer is invoked immediately when SRD is done on the
webrtc signaling thread, so there is no longer a race between ontrack
firing and the SRD promise resolving.

For more information, including design docs, see
 https://crbug.com/webrtc/8473 .

Bug:  webrtc:8473 ,  chromium:777777 
Change-Id: Ide13b38179b0eb59da83c92502b273219a100fcc
Reviewed-on: https://chromium-review.googlesource.com/753598
Commit-Queue: Henrik Boström <hbos@chromium.org>
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Cr-Commit-Position: refs/heads/master@{#519276}
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/mock_peer_connection_impl.cc
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/mock_peer_connection_impl.h
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler.cc
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler.h
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler_unittest.cc
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/webrtc/rtc_rtp_receiver.cc
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/webrtc/rtc_rtp_receiver.h
[modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/third_party/WebKit/LayoutTests/TestExpectations

Comment 34 by hbos@chromium.org, Nov 27 2017

Blockedon: -webrtc:8473
Status: Verified (was: Started)
The last remaining bits of  https://crbug.com/webrtc/8473  does not block this.
With the above CL, the test is no longer flaky.

Future work include unifying stream/track events ( https://crbug.com/webrtc/8473 ) and synchronously firing/resolving (https://crbug.com/788558).

The flaky test file should no longer be flaky.

Comment 35 by hbos@chromium.org, Nov 27 2017

Blocking: 760107

Sign in to add a comment