external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is flaky on webkit_layout_tests failing on chromium.mac/Mac10.11 Tests |
|||||||||||||
Issue descriptionwebkit_layout_tests failing on chromium.mac/Mac10.11 Tests Builders failed on: - Mac10.11 Tests: https://build.chromium.org/p/chromium.mac/builders/Mac10.11%20Tests external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https.html is flaky This is observed only on Mac. Example diff: --- /b/s/w/ioe8a9ll/layout-test-results/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-expected.txt +++ /b/s/w/ioe8a9ll/layout-test-results/external/wpt/webrtc/RTCPeerConnection-setRemoteDescription-tracks.https-actual.txt @@ -3,7 +3,7 @@ PASS addTrack with a track and a stream makes ontrack fire with a track and a stream. PASS addTrack with two tracks and one stream makes ontrack fire twice with the tracks and shared stream. FAIL addTrack with a track and two streams makes ontrack fire with a track and two streams. Failed to execute 'addTrack' on 'RTCPeerConnection': Adding a track to multiple streams is not supported. -FAIL ontrack fires before setRemoteDescription resolves. assert_equals: expected "ontrack;setRemoteDescription;" but got "setRemoteDescription;ontrack;" +PASS ontrack fires before setRemoteDescription resolves. PASS ontrack's receiver matches getReceivers(). FAIL removeTrack does not remove the receiver. assert_array_equals: Expected the set of receivers to remain the same. lengths differ, expected 1 got 0 Harness: the test ran to completion.
,
Oct 24 2017
Henrik, is your reading that the spec is authoritative on this sequencing? Likely to be racy; racy is bad.
,
Oct 24 2017
Taking this out of the sheriff queue.
,
Oct 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0aaff0a1f03ba2e5250e16010e6092acc3fd9525 commit 0aaff0a1f03ba2e5250e16010e6092acc3fd9525 Author: Dominic Battre <battre@chromium.org> Date: Tue Oct 24 10:39:40 2017 Disable flaky RTCPeerConnection-setRemoteDescription-tracks.https.html on Mac Test is flaky on Mac, see bug. TBR=guidou@chromium.org, yutak@chromium.org Bug: 777777 Change-Id: I3b7ebeca52fc48407913002276c5006d65f2f3fe Reviewed-on: https://chromium-review.googlesource.com/735344 Reviewed-by: Dominic Battré <battre@chromium.org> Reviewed-by: Yuta Kitamura <yutak@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Commit-Queue: Dominic Battré <battre@chromium.org> Cr-Commit-Position: refs/heads/master@{#511077} [modify] https://crrev.com/0aaff0a1f03ba2e5250e16010e6092acc3fd9525/third_party/WebKit/LayoutTests/TestExpectations
,
Oct 24 2017
setRemoteDescription is called on the main thread, posts to the webrtc signaling thread which does SRD in the lower layers and fires callbacks (tracks added, promise resolving). The callbacks jump back to the main thread and completes the operation. Despite the thread jumps and multiple callbacks one would expect the order to be determinate. On the webrtc thread each callback is invoked in order. The "jumping back to the main thread" from these callbacks happens in order with Posts, so the tasks are placed in order on the main thread. There are risks of races here if the main thread is doing things in parallel to the webrtc thread, but in this test we call SRD and await promises resolving and events firing. There should be no interference. If "fire ontrack" and "resolve promise" are synchronous operations on the main thread I don't see how this could be flaky. It might be more complicated than that though, for example when a promise is resolved the then() will not be invoked until later, and I don't know if the order of resolved promises is determinate? When we do "invoke ontrack firing", is this placed in a queue or similar or does this happen immediately? Need to debug/look closer at code...
,
Oct 24 2017
The spec is authoritative on the ordering: https://rawgit.com/w3c/webrtc-pc/master/webrtc.html#set-the-rtcsessiondescription 4.4.1.6 @ starting from step 8, you're supposed to... - Let trackEvents be an empty set. - Process the tracks being added and removed, including adding events to trackEvents. - For each event in trackEvents, fire ontrack. - Resolve the setRemoteDescription promise.
,
Oct 24 2017
Here's a breakdown of what's going on during SRD: Starting on main thread, blink setRemoteDescription -> content SetRemoteDescription, which does a a Post of webrtc's SetRemoteDescription to webrtc signaling thread https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?q=rtc_peer_connection_handler.cc&sq=package:chromium&dr&l=1482 I had previously thought that this was a synchronous invoke, but it's a post, so the main and webrtc threads are running in parallel here. And we do... - webrtc::PeerConnection::SetRemoteDescription (on webrtc signaling thread) - "UpdateRemoteStreamsList -> OnRemoteTrackSeen -> CreateAudio/VideoReceiver" calls OnAddTrack https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?sq=package:chromium&dr&l=1520 - Post to main thread to add track "OnAddTrack -> OnAddTrackImpl -> OnAddRemoteTrack -> DidAddRemoteTrack" https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=1036 - DidAddRemoteTrack call to blink https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=2064 - ScheduleDispatchEvent "ontrack" https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/peerconnection/RTCPeerConnection.cpp?sq=package:chromium&dr&l=1536 - Post to signaling thread to resolve promise https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?gsn=SetSessionDescriptionObserver&l=1152 - Post to the main thread to resolve promise https://cs.chromium.org/chromium/src/content/renderer/media/rtc_peer_connection_handler.cc?sq=package:chromium&dr&l=472 - Promise.resolve() https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/peerconnection/RTCVoidRequestPromiseImpl.cpp?sq=package:chromium&dr&l=32 ScheduleDispatchEvent of "ontrack" is an asynchronous action, "ontrack" will execute on the main thread later. Resolving the promise is also an asynchronous action, "then()" will execute on the main thread later.
,
Oct 24 2017
If I understand this correctly, these order of events are GUARANTEED: 1. ScheduleDispatchEvent "ontrack" 2. srdPromise.resolve() So normally both are fired and execute in that order. But MAYBE if 1. and 2. happens quickly, by the time we reach 2. "ontrack" is still in a task queue, with no event loop exectution in-between 1. and 2. When we have both a task and a promise pending AT THE SAME TIME, when we begin to execute the next cycle in the event loop - will it execute promises' then() BEFORE it executes queued tasks? foolip@ do you know the guaranteed ordering of tasks and promises? Or who to ask?
,
Oct 24 2017
I've confirmed that resolved promises execute before events scheduled for dispatching: https://chromium-review.googlesource.com/c/chromium/src/+/736025 Despite both being asynchronous, they are not put on the same queue so to speak, and the promises' then() happens before the execution of the event's callback even though we queued the event before we resolved the promise. This must be what the problem is.
,
Oct 24 2017
A proper fix is to add trackEvents to a trackEvents queue like the spec says and do the firing before resolving the promise. The more obvious fix to fire the event immediately instead of scheduling it to fire would cause problems where all effects of SRD have not taken effect before the track fires. I.e., a stream with two tracks would have the first track event fire before the second track was added to the stream, yielding wrong results if the stream were to be inspected in the ontrack event. We must have a similar race with regards to the order of MediaStream.onaddtrack and promise resolving due to ScheduleDispatchEvent here: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/modules/mediastream/MediaStream.cpp?sq=package:chromium&dr=CSs&l=420 This is not explicitly tested anywhere though.
,
Oct 24 2017
(Or rather if two tracks were added to an existing stream, details details, point being: there would be observable effects of doing it that way, we need a trackEvents queue.)
,
Oct 24 2017
Any promise resolving in an independent promise chain is potentially racey with "the effects of SRD" because of how the multiple callbacks of webrtc are handed over to the main thread in multiple steps, meaning things can happen in-between callbacks, and the effects of different callbacks can occur with user-defined functions running in-between (with the ability to expect RTCPeerConnections and friends). What we probably should do, regardless of this problem, is to handle all effects of something in a single callback, a single jump to the main thread, so that it becomes an "all or nothing" operation. This could be achieved by queueing all the callbacks until the promise is ready to resolve, and then fire all of them in a single "promise resolved" step on the main thread.
,
Oct 24 2017
Good ol' spaghetti land.
,
Oct 24 2017
guidou@ to the rescue: Check if microtasks execute before promises resolves and queue a microtask instead of ScheduleDispatchEvent's reliance on RaskRunnerTimer. I'll look into that.
,
Oct 24 2017
Confirmed: Microtasks execute before resolved promises' then().
,
Oct 24 2017
I filed https://crbug.com/777999 to keep track of firing events in WebRTC (RTCPeerConnection, MediaStream, MediaStreamTrack) in general, separately from this particular flake.
,
Oct 24 2017
I tried to fix this problem using Microtasks in https://chromium-review.googlesource.com/c/chromium/src/+/736038 but ended up breaking other tests. I think Microtasks fire "too soon".
,
Oct 24 2017
With that setback, I'm leaning towards "throttling" all webrtc callbacks until the promise is ready to resolve and then in a single Post to the main thread go ahead and handle all callbacks synchronously. Then there is no risk of tasks executing in-between callbacks, making this non-racey and we can either always pass or always fail the disabled test. We can then decide upon the proper queue type (microtask, task queue, custom event queue, etc) to ensure the tasks are fired before the promise's then() executes ( https://crbug.com/777999 ).
,
Oct 24 2017
,
Oct 24 2017
> With that setback, I'm leaning towards "throttling" all webrtc callbacks until the promise is ready to resolve and then in a single Post to the main thread go ahead and handle all callbacks synchronously. That seems like a sensible approach to me. If everything is handled in one place at the Blink level, things won't break if ordering is changed at the lower levels.
,
Oct 27 2017
@deadbeef: PeerConnection::SetRemoteDescription pretends to be asynchronous by posting to the current thread (signaling thread -> signaling thread) for the "OnSuccess" callback, https://cs.chromium.org/chromium/src/third_party/webrtc/pc/peerconnection.cc?q=peerconnection.cc&sq=package:chromium&dr&l=1126 This makes it impossible for the webrtc::PeerConnectionObserver (RTCPeerConnectionHandler::Observer) to tell which callbacks belong to which SetRemoteDescription call if SRD is called multiple times without waiting for the callback to resolve, e.g. On signaling thread pc.SetRemoteDescription() // causing callback1() and Post(promiseResolve1) pc.SetRemoteDescription() // causing callback2() and Post(promiseResolve2) When promiseResolve1 does "jump to main thread to invoke all SRD callbacks" its list of callbacks will involve both callback1 and callback2. The PeerConnectionObserver is per-PC, not per-call. Possible solutions: 1. Have a per-call observer. Breaking change. 2. Add a function to observer that is involved without doing Post to inform the PC that all callbacks have been invoked. 3. Remove the Post, stop pretending SRD is async. Unclear how this would affect upstream projects. 4. Introduce some SRD ID passed to all the callbacks. Breaking change. 5. When Chrome jumps to webrtc signaling thread to invoke SRD immediately afterwards inform the observer that SRD has been called. This takes advantage of knowing that SRD is not asynchronous. That may be bad. Wrt 2: It's a bit ugly to have two OnSuccess methods, but it makes sense that PeerConnectionObserver has an OnSuccess (or "OnSetRemoteDescriptionSuccess") since it is listening to everything else that has to do with SRD, why not when it completes. The only OnSuccess we have is tied to a different type of observer, SetSessionDescriptionObserver. Thoughts?
,
Oct 27 2017
(Multiple SRD calls in a row is triggered if JavaScript is doing that, e.g. https://cs.chromium.org/chromium/src/chrome/test/data/webrtc/peerconnection_rtp.js?q=peerconnection_rtp.js&sq=package:chromium&dr&l=284)
,
Oct 27 2017
Alternatively we could try to block the main thread from SRD being called on main thread to SRD being resolved on the signaling thread (after the Post). That might be fine, but I'm afraid of deadlocks, I've seen synchronous invoke from webrtc land to main thread land e.g. external encoder/decoder factories.
,
Oct 27 2017
Doing 3) fixed the crash I was facing in my WIP CL, 2) might be safer for other projects and we could specify that the new observer function of 2) must be invoked between callbacks of one SRD and the callbacks of another SRD, ensuring any PC implementation upholds this contract, regardless of if its sync or async.
,
Oct 27 2017
,
Oct 30 2017
,
Nov 5 2017
,
Nov 6 2017
Bug also observed on Linux. https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=bb48e65f9206a7aca4460ffeb9db4419573c618d
,
Nov 7 2017
I expect this flake to affect all platforms.
,
Nov 13 2017
FYI this comment was referenced from https://crbug.com/774303#c14 with regards to test coverage.
,
Nov 15 2017
,
Nov 21 2017
,
Nov 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/27d21eb0af100f86a2dfca5bccecf60e2d71f70e commit 27d21eb0af100f86a2dfca5bccecf60e2d71f70e Author: Henrik Boström <hbos@chromium.org> Date: Mon Nov 27 11:27:56 2017 Unify SetRemoteDescription track and SRD resolve events in one callback. Previously, RTCPeerConnectionHandler::Observer listened on tracks added and removed events and SetSessionDescriptionRequest listened on the SetRemoteDescription being resolved/rejected on different callbacks. Each callback was handled separately, jumping from the webrtc signaling thread to the main thread once per callback, causing race conditions and making it possible for events to occur in-between callbacks, before the results of SRD were fully processed. Now, WebRtcSetRemoteDescriptionObserverInterface is implemented, unifying the handling of these events, removing the race conditions. Also, the new observer is invoked immediately when SRD is done on the webrtc signaling thread, so there is no longer a race between ontrack firing and the SRD promise resolving. For more information, including design docs, see https://crbug.com/webrtc/8473 . Bug: webrtc:8473 , chromium:777777 Change-Id: Ide13b38179b0eb59da83c92502b273219a100fcc Reviewed-on: https://chromium-review.googlesource.com/753598 Commit-Queue: Henrik Boström <hbos@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Cr-Commit-Position: refs/heads/master@{#519276} [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/mock_peer_connection_impl.cc [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/mock_peer_connection_impl.h [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler.cc [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler.h [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/rtc_peer_connection_handler_unittest.cc [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/webrtc/rtc_rtp_receiver.cc [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/content/renderer/media/webrtc/rtc_rtp_receiver.h [modify] https://crrev.com/27d21eb0af100f86a2dfca5bccecf60e2d71f70e/third_party/WebKit/LayoutTests/TestExpectations
,
Nov 27 2017
The last remaining bits of https://crbug.com/webrtc/8473 does not block this. With the above CL, the test is no longer flaky. Future work include unifying stream/track events ( https://crbug.com/webrtc/8473 ) and synchronously firing/resolving (https://crbug.com/788558). The flaky test file should no longer be flaky.
,
Nov 27 2017
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by guidou@chromium.org
, Oct 24 2017Owner: hta@chromium.org