RTCRtpSender/Receiver, WebRtcMediaStreamAdapterMap and AdapterRef have broken threading semantics. |
|||||||||||||||||||||
Issue descriptionThis test crashed on the Fuchsia/x64 FYI bot in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/15446: [00369.438] 01206.01268> <== fatal exception: process content_unittests__exec[229603] thread MockPCFactory WebRtc Signaling [230493] [00369.438] 01206.01268> <== fatal page fault, PC at 0x8322c14c04 [00369.438] 01206.01268> CS: 0 RIP: 0x8322c14c04 EFL: 0x10202 CR2: 0x10 [00369.438] 01206.01268> RAX: 0 RBX: 0x7214f8604230 RCX: 0x2feda5114e0739af RDX: 0x2523e01d3cb0 [00369.439] 01206.01268> RSI: 0x5bfe2f37d000 RDI: 0x2f RBP: 0x2e5399286bf8 RSP: 0x2e5399286a90 [00369.439] 01206.01268> R8: 0xffffffffffffffff R9: 0x5c0af294dd80 R10: 0x1000 R11: 0x800000000 [00369.440] 01206.01268> R12: 0x5c877f45c370 R13: 0x2e5399286d40 R14: 0x5bfe2f3770f0 R15: 0x2e5399286ac8 [00369.440] 01206.01268> errc: 0x4 [00369.440] 01206.01268> bottom of user stack: [00369.441] 01206.01268> 0x00002e5399286a90: 00000000 00000000 99286ac8 00002e53 |.........j(.S...| [00369.455] 01206.01268> 0x00002e5399286aa0: 2f3770f0 00005bfe 22c14e73 00000083 |.p7/.[..sN."....| [00369.455] 01206.01268> 0x00002e5399286ab0: 2f3770c0 00005bfe 23caf306 00000083 |.p7/.[.....#....| [00369.455] 01206.01268> 0x00002e5399286ac0: 7f532ad0 00005c87 f8604230 00007214 |.*S..\..0B`..r..| [00369.455] 01206.01268> 0x00002e5399286ad0: 4e0739af 2feda511 2f3770c0 00005bfe |.9.N.../.p7/.[..| [00369.460] 01206.01268> 0x00002e5399286ae0: 7f536f00 00005c87 7f45c360 00005c87 |.oS..\..`.E..\..| [00369.460] 01206.01268> 0x00002e5399286af0: 99286bf8 00002e53 23cb13b1 00000083 |.k(.S......#....| [00369.460] 01206.01268> 0x00002e5399286b00: 7f532b60 00005c87 00000005 2feda511 |`+S..\........./| [00369.461] 01206.01268> 0x00002e5399286b10: 4e0739af 2feda511 7f536f00 00005c87 |.9.N.../.oS..\..| [00369.524] 01206.01268> 0x00002e5399286b20: 23a7418c 00005c34 2f377000 00005bfe |.A.#4\...p7/.[..| [00369.524] 01206.01268> 0x00002e5399286b30: 00000000 00000000 23c97be5 00000083 |.........{.#....| [00369.525] 01206.01268> 0x00002e5399286b40: 243a3b10 00000083 2f377000 00005bfe |.;:$.....p7/.[..| [00369.525] 01206.01268> 0x00002e5399286b50: 23a7418c 00005c34 1dbf49e8 00000083 |.A.#4\...I......| [00369.525] 01206.01268> 0x00002e5399286b60: 23ff1d50 00000083 23c97c89 00000083 |P..#.....|.#....| [00369.526] 01206.01268> 0x00002e5399286b70: 7f4f34c0 00005c87 208891ef 00000083 |.4O..\..... ....| [00369.526] 01206.01268> 0x00002e5399286b80: 99286d40 00002e53 21f11002 00000083 |@m(.S......!....| [00369.530] 01206.01268> arch: x86_64 [00369.772] 01206.01268> dso: id=615ca57b6e6f2fbe540e425d5343f9761999b360 base=0x7cc933f6d000 name=libasync-default.so [00369.773] 01206.01268> dso: id=a544e7483e97161e58387de2a4570a936e907aae base=0x77747f066000 name=libfdio.so [00369.773] 01206.01268> dso: id=5efedc7801edd06c base=0x76807dddc000 name=libunwind.so.1 [00369.784] 01206.01268> dso: id=8eb4f1a404049d1d7aea82f90090697f327c8bf8 base=0x6befa5558000 name=liblaunchpad.so [00369.796] 01206.01268> dso: id=c401e71426b591bdbf3504aa2b5839aeb48326a1 base=0x640a1288b000 name=<vDSO> [00369.796] 01206.01268> dso: id=f15b94abfcbdd5d69fe3e872968ed2cee5162d07 base=0x62c0f1908000 name=libc.so [00369.796] 01206.01268> dso: id=a2eac7bf15ca918d base=0x5ceb45352000 name=libfxl.so [00369.796] 01206.01268> dso: id=401a597024c762aa base=0x5a40f64f5000 name=libosmesa.so [00369.796] 01206.01268> dso: id=b61656ed713ac11d04067c3dc1c4fb0e52eb9b86 base=0x4e84456e9000 name=libsyslog.so [00369.796] 01206.01268> dso: id=595637609bf55a50 base=0x31121eb84000 name=libc++.so.2 [00369.797] 01206.01268> dso: id=6841a18eeeebb2b7 base=0x2ba94946c000 name=libc++abi.so.1 [00369.797] 01206.01268> dso: id=d97edb5ce3c0772b3d0c746d419339f4188838f2 base=0x1b814f79a000 name=libtrace-engine.so [00369.811] 01206.01268> dso: id=347d69f7bac1291b base=0x1b6cecc9a000 name=libfxl_logging.so [00369.811] 01206.01268> dso: id=a332afe8df5fffff base=0x6612ce58000 name=libmedia_client.so [00369.824] 01206.01268> dso: id=97a0dc303188df85 base=0xc5f8608000 name=libfsl.so [00369.825] 01206.01268> dso: id=7f9e3e76ab9cfab6 base=0x831db24000 name=app:content_unittests__exec #01: blink::PersistentBase<blink::MediaStreamSource, (blink::WeaknessPersistentConfiguration)0, (blink::CrossThreadnessPersistentConfiguration)0>::Assign(blink::MediaStreamSource*) at ??:? #02: blink::WebMediaStreamTrack::Source() const at ??:? #03: content::WebRtcMediaStreamTrackAdapter::Dispose() at ??:? #04: content::WebRtcMediaStreamTrackAdapterMap::AdapterRef::~AdapterRef() at ??:? #05: content::RTCRtpSender::RTCRtpSenderInternal::~RTCRtpSenderInternal() at ??:? #06: content::RTCRtpSender::RTCRtpSenderInternal::~RTCRtpSenderInternal() at ??:? #07: base::internal::BindState<void (media::AudioOutputController::*)(std::__1::unique_ptr<media::AudioBus, std::__1::default_delete<media::AudioBus> >, base::TimeTicks), scoped_refptr<media::AudioOutputController>, std::__1::unique_ptr<media::AudioBus, std::__1::default_delete<media::AudioBus> >, base::TimeTicks>::Destroy(base::internal::BindStateBase const*) at ??:? #08: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #09: base::MessageLoop::RunTask(base::PendingTask*) at ??:? #10: base::MessageLoop::DoWork() at ??:? #11: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) at ??:? #12: base::RunLoop::Run() at ??:? #13: base::Thread::ThreadMain() at ??:? #14: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #15: pc 0x62c0f1918e36 sp 0x2e5399286fe0 (libc.so,0x10e36) #16: pc 0x62c0f198e599 sp 0x2e5399286ff0 (libc.so,0x86599) #17: pc 0 sp 0x2e5399287000 For graphics-related bugs, please copy/paste the contents of the about:gpu page at the end of this report. Note that the Fuchsia OS and emulation environment have unusual thread-scheduling, so this may be a latent cross-platform race-condition.
,
Mar 30 2018
,
Mar 30 2018
,
Apr 3 2018
Probably the same cause as https://crbug.com/812296 . All of these RTCRtpSenderTest/RTCRtpReceiverTest are flaky, I think it has to do with races where a thread is being shut down before it some other thread is finished, so that thread does PostTask and it doesn't actually post the task and the arguments are not moved to the other thread but destroyed on the current, wrong thread. I added a WebRtcMediaStreamTrackAdapterMap stress test too that was flaking, probably same cause: https://chromium-review.googlesource.com/c/chromium/src/+/983553 There's this assumption about if we just run all the posted tasks before we quit we should be good but maybe one of the posted tasks posts another task. The annoying thing about these flakes is I've never been able to reproduce them locally and I don't understand how they can occur given my runlooping. All of the code is covered by integrationtests though so it's not high priority to fix this.
,
Apr 3 2018
We should consider adding a callback for all of these things. But I would prefer to have a repro case to test against before I change this stuff around, the WebRtcMediaStream[Track]AdapterMaps have caused many problems before.
,
Apr 10 2018
RTCRtpSenderTest.GetStats flaked in https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia/15840 Looking at the crash dump, I notice that there is an "AdapterRef" on the stack, which appears to be some sort of ref-counting hack. Looking at the entries further up the stack, I see that that is owned by the RTCRtpSenderInternal (which is RefCountedThreadSafe) and that that is being released by the media::AudioOutputManager (also RefCountedThreadSafe). It looks like we have a thread-affine class (blink::WebMediaStreamTrack::Source()?) being invoked from the wrong thread as the result of multiple layers of multi-threaded ref-counting. This seems extremely unlikely to be a test-specific issue.
,
Apr 10 2018
Looking at the try flake for RTCRtpReceiverTest.GetStats in https://ci.chromium.org/buildbot/tryserver.chromium.android/linux_android_rel_ng/517991 I see this crash: 261.765s run_tests_on_device(03ab182d003b9ebf) [ RUN ] RTCRtpReceiverTest.GetStats I 261.765s run_tests_on_device(03ab182d003b9ebf) [FATAL:webrtc_media_stream_track_adapter_map.cc(23)] Check failed: map_->main_thread_->BelongsToCurrentThread(). ... which appears to confirm that we're crashing due to broken threading.
,
Apr 10 2018
The adapters are a bit horrible, they own stuff that is created on one thread but have to be destroyed on another thread. So when ref counted was added instead of using scoped_refptr which is very implicit we made a separate class, AdapterRef, which makes the ownership of references more explicit. For instance std::move() is used in PostTasks which explicitly passes ownership to avoid accidentally retain a reference on the thread that is calling PostTask to transfer ownership. Aaaannyway, AdapterRef has to be destroyed on the main thread, that ensures that any last reference to this horrible adapter triggers the destruction on the correct thread. Generally speaking AdapterRefs should thus only be owned by objects who live on the main thread. If there is something on the signaling thread keeping a sender (and its AdapterRefs) alive on the signaling thread that is a bug. Need to investigate if this is a threading error on the part of the tests or if it is real. One of the reasons is I've brushed this off as "testing-only problem" is I've seen these problems before and they have only occurred in these unittests while we have integrationttests and production code that never hits this. And it's a race that I've never been able to repro locally, even after thousands of test runs. Agreed this should be investigated further if time allows.
,
Apr 10 2018
Re #8: Even if it is a test-only issue, it makes these tests useless across a swathe of platforms. The good news is that the flake repros fairly often under Fuschsia for the GetStats tests, so there's a good chance we can repro it with just a large enough --gtest_repeats. The timing differences that make things flake more often under Fuchsia are also reasonably well understood (essentially unexpectedly long delays at unusual points during execution; these flakes can usually be made 100% failures by addition of a Sleep() somewhere specific :).
,
Apr 10 2018
Re the thread-affine teardown: RefCountedThreadSafe<> supports deletion traits which you can provide to have it teardown on a particular thread, rather than on whichever thread happens to hold the final reference. So long as you aren't tearing down the target thread before the last reference to the object, that should work as you intend. Is the idea simply that AdapterRef is effectively a non-copyable scoped_refptr<>, in effect, so that you have to explicitly be dispensed a new reference, and otherwise they can only be passed via std::move()?
,
Apr 10 2018
Re #8: (/me hits Send too soon..) I think the problem here is that you have this special AdapterRef thing that is intended to ensure correct threading by being move-only, but that doesn't work because the object holding the AdapterRef is potentially released on the wrong thread, since it is RefCountedThreadSafe.
,
Apr 10 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5 commit 9ea3e3cf0148a0685ebf8f7af4979e31735d94e5 Author: Wez <wez@chromium.org> Date: Tue Apr 10 22:29:58 2018 Disable or filter some more tests which often flake under Fuchsia. - Disable ProcessUtilTest.LaunchWithHandleTransfer under Debug builds, since launching the child process is currently too time consuming. - Modifies that test to distinguish the child process failing to send data from the timeout case. - Disable SecurityTest.NewOverflow, which fails under Fuchsia/Debug builds due to the SDK exporting the wrong new[]. - Disable RTCRtpSenderTest.GetStats, which is part of a suite of tests which appear to be inherently flaky, but fail most often under Fuchsia. - Filter DataPipeTest.SendConsumerAndCloseProducer Bug: 793412 , 828229 , 827450 , 831024 Change-Id: Icb655ca91881c3ff7059f333f0e22d9bf2d92468 Reviewed-on: https://chromium-review.googlesource.com/1003996 Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Yuri Wiitala <miu@chromium.org> Reviewed-by: Scott Graham <scottmg@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#549663} [modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/base/process/process_util_unittest.cc [modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/base/security_unittest.cc [modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/content/renderer/media/webrtc/rtc_rtp_sender_unittest.cc [modify] https://crrev.com/9ea3e3cf0148a0685ebf8f7af4979e31735d94e5/testing/buildbot/filters/fuchsia.mojo_unittests.filter
,
Apr 11 2018
It is possible to repro the GetStats test failures trivially by adding:
--- a/content/renderer/media/webrtc/rtc_rtp_sender.cc
+++ b/content/renderer/media/webrtc/rtc_rtp_sender.cc
@@ -187,6 +187,7 @@ class RTCRtpSender::RTCRtpSenderInternal
native_peer_connection_->GetStats(webrtc_sender_,
RTCStatsCollectorCallbackImpl::Create(
main_thread_, std::move(callback)));
+ base::PlatformThread::Sleep(base::TimeDelta::FromSeconds(2));
}
This causes the GetStatsOnSignalingThread call not to return until after the main thread has dropped its last reference to the RTCRtpSenderInternal, causing the scoped_refptr<RTCRtpSenderInternal> that was Bind()ed to |this| to be the final reference. The class is then released on the signaling thread, implicitly resulting in the AdapterRef being released there as well.
Presumably similar problems apply to the other test flakes.
Reviewing the various classes involved in the call stack in more detail, it seems that:
1. Most of them are used on multiple threads, rather than being split e.g. into separate objects for the main and signaling thread parts.
2. Most of them are ref-counted objects; it's not immediately clear whether that is necessary, or just a work-around for the objects existing across multiple threads.
3. Most of these objects have the expectation of being torn-down on the main thread, but do nothing to guarantee that.
4. Many of these threading expectations are not DCHECKed.
,
Apr 11 2018
Adding some content/renderer/media OWNERS for visibility.
,
Apr 11 2018
,
Apr 11 2018
,
Apr 11 2018
,
Apr 11 2018
,
Apr 11 2018
,
Apr 12 2018
#13: Thank you for that snippet. This is a real problem. The async operations are a case where a reference is given to the webrtc signaling thread and thus there are no guarantees that the sender is destroyed on the main thread. Any operation that jumps to the signaling thread does jump back to the main thread per design, but GetStats is an example where, mistakenly, nothing is holding on to a sender reference during the operation to protect it until we are back on the main thread, meaning the signaling thread's reference may very well be the last. This can easily be addressed by adding a reference that is posted back and forth until the operation is complete on the main thread, and an easy fix I can implement. But this mix of reference counting of objects used on multiple threads but expected to be destroyed on one thread is a headache. We may want to redesign these classes or have the destructor of the ref counted object post to the main thread to do its destruction logic, that way you don't have to care about which thread held the last reference.
,
Apr 12 2018
Even if BlahOnSignalingThread does PostTask to main thread with a reference to |this| it is racy, since that task could be executed before the end of BlahOnSignalingThread. This code needs to be updated such that either threading assumptions are checked or cleanup is safe regardless of which thread held the last reference.
,
Apr 12 2018
Re #20: If an object runs on a particular Sequence, and async work can/should safely be "dropped" if that object is destroyed, then ref-counting isn't required; you can instead PostTaskAndReply(...) and Bind() a WeakPtr to the object into the reply Callback. If the object is gone by the time the reply arrives, it will be silently skipped. Re #21: RefCountedThreadSafe<> is parameterized with Traits which include allowing a customized Destruct(const T*) behaviour. We have several ref-counted classes which implement that to DeleteSoon() the object on its "home" TaskRunner when the last reference is dropped, regardless of where that happens. You listed a set of constraints that should be properly implemented, and checked, in https://bugs.chromium.org/p/chromium/issues/detail?id=812296#c14. We should probably implement those to address the immediate issues, but we should also clean up this code to remove any unnecessary ref-counting, split up multi-threaded classes into single-threaded sub-components, etc. e.g. Using RefCountedThreadSafe::Traits and WeakPtr I expect we can get rid of the need for AdapterRef. I'm happy to help clean this up, but I'm unfamiliar with this code, so I'm unsure of which ref-counting is required and which was added just to to cope with multi-threading.
,
Apr 13 2018
Let me get back to this on Monday or Tuesday and I can do immediate fixes and get back here to talk about design improvements. I think PostTaskAndReply and DeleteSoon are good options. One of the reasons for having AdapterRef though is we have blink-layer objects (blink::MediaStream, blink::MediaStreamTrack) and webrtc-layer objects (webrtc::MediaStreamInterface, webrtc::MediaStreamTrackInterface) that are supposed to map 1:1, but by design does not have the same life-time. You can create a track and never use it in combination with WebRTC (e.g. capture camera and show it in a <video> tag). If you do want to attach a track to an RTCPeerConnection though we need to create content and webrtc glue to identify this track in objects that third_party/webrtc understands. These classes are reference counted and whenever an API is called (blink->webrtc) or a callback is invoke (webrtc->blink) when we see one of these objects we need to "convert" it to the other layer's representation of that object. This is why we have WebRtcMediaStream[Track]AdapterMap. This is where "GetOrCreate" comes to play, we only create an adapter if we haven't already created an adapter for that object. And reference counting comes to play with the idea that we want to free these resources if they're not used, but there may be multiple usages. For example, you might have two receivers referencing the same stream. If one of the receivers are removed from the peer connection, you only clean up the stream adapter if there are no other receivers that reference that stream. The same goes for senders referencing and tracks and streams. Ideally we'd get rid of the content layer altogether, but in the meantime, a content layer sender is needed to as the glue between blink::RTCRtpSender and webrtc::RtpSenderInterface. Because both the blink and the content layer needs to reference senders, this is reference counted. The split between content::RTCRtpSender not being ref counted and RTCRtpSenderInternal being ref counted was a work-around for not being able to use ref counted interfaces across blink and content using a pattern I've seen elsewhere (WebFoo and Foo, e.g. multiple WebMediaStreamTracks can be copied by value all representing the same MediaStreamTrack). If a blink interface implemented in content can be reference counted the internal class could be merged into content::RTCRtpSender but it does not change the overall design.
,
Apr 13 2018
With DeleteSoon type of logic adapters could use "normal" ref counting and then it doesn't matter where they are destroyed, the cleanup will occur on the right thread. This would be a good thing. Testing needs to allow threads to finish up any DeleteSoon logic is executed before the thread shuts down.
,
Apr 13 2018
Also note that when the last AdapterRef is destroyed, the entry in the adapter map of that adapter is removed, which involves a mutex.
,
Apr 16 2018
Re #23: IIUC the reason for the ref-counting is that the objects on the Blink side may be multiply-referenced, and therefore the corresponding webrtc-side adapters must support multiple references to a single adapter. Are we guaranteed that GetOrCreate() will always be called from the main thread? I don't understand your point re content::RTCRtpSender. Are you saying that the blink interface is single-owner, but multiple of those objects may refer to the same underlying RTCRtpSender, hence the need for ref-counting? The issue I see with the ref-counting is that you're both using it to allow multiple RTCRtpSenders to co-own a single RTCRtpSenderInternal, _and_ when posting tasks to be executed on the signaling thread. You could split the *Internal would be split into two parts; a ref-counted main-thread part, and a single-owner signaling thread part. However, just fixing the destruct traits to teardown on the main thread, and being explicit about which members are safe to access where is probably sufficient. Re #25: If we fix things such that the adaptors are always inserted/fetched/removed from the AdapterMap from the main thread then we can have the adapter remove itself on destruction, without the need for a mutex, surely?
,
Apr 17 2018
The NextAction date has arrived: 2018-04-17
,
Apr 17 2018
Adapters fall into two categories, "local" adapters which are ones created locally e.g. by opening the camera, and "remote" adapters which are created by webrtc by the remote endpoint (i.e. another peer's local track and stream being sent over the network, received as a remote track and stream). For local adapters, GetOrCreate() is always invoked on the main thread (this is guaranteed by calling places and is DCHECKed). They are meant to always be destroyed on the main thread due to destroying resources held by the adapter (this is DCHECKed but in race conditions this DCHECK may actually be hit as per this bug). For remote adapters, GetOrCreate() is always invoked on the webrtc signaling thread (this is guaranteed by calling places and is DCHECKed). They too are always meant to be destroyed on the main thread due to resources held by the adapter (this is DCHECKed but in race conditions this DCHECK may actually be hit as per this bug). These threading restrictions were imposed by the underlying code that the track adapters are essentially just wrapping around, refactoring and abstracting away. The functionality provided by the maps is reference counting and "getting if already exist, otherwise create a new one", but apparently DCHECK is not sufficient to ensure references are handled correctly. In both cases, it is possible to Get() an adapter ref for an existing adapter on any thread, but then it will not be created if it does not exist. It is debatable whether this API is needed. Both local and remote adapters share the same map and lock, though it would be entirely possible to split local and remote adapters into separate maps with separate locks (or possibly getting rid of the lock if we don't need to access the map on multiple threads after the split). Currently mutex is needed because... - GetOrCreate() happens on multiple threads causing insertions into a shared map (local: main thread, remote: webrtc signaling thread). - Get() happens on multiple threads and needs to sync up with any previously created adapters on the other thread.
,
Apr 17 2018
Fixing destruction traits sounds good to me. It would also be good to be able to have some sort of callback for testing to ensure the tests don't tear down the threads before all destruction logic has fully executed.
,
Apr 17 2018
,
Apr 17 2018
The use of locks in the map have been a concern for deadlocks in the past, but those deadlock issues have been fixed. The adapter map unittests are not flaky, and use of std::unique_ptr for AdapterRef make it relatively safe to jump between threads. E.g. if you're on the signaling thread and you do PostTask, you have to std::move() the adapter ref so there is no concern if the posted task is executed before the current function finishes. Passing around adapter refs is rather deterministic. As for immediate concerns, enabling the failing tests and merging a fix for M67... The concern is when an adapter ref is held by an object whose destruction logic is not deterministic. This is a problem with RTCRtpSenderInternal and RTCRtpReceiverInternal since these use normal ref counting that can be destroyed on any thread, and when it performs operations with PostTask you don't know if that task or the current task will be the last to drop its reference. This class has nothing that guarantees the threading assumption that it would be destroyed on the main thread. For the immediate fix I will add destructor traits to RTCRtpSenderInternal and RTCRtpReceiverInternal.
,
Apr 17 2018
RTCRtpSenderInternal destructor traits CL: https://chromium-review.googlesource.com/c/chromium/src/+/1015002 RTCRtpReceiverInternal destructor traits CL: https://chromium-review.googlesource.com/c/chromium/src/+/1013492
,
Apr 17 2018
,
Apr 17 2018
Re#31: Actually there is a similar PostTask problem with ~WebRtcMediaStreamTrackAdapter DCHECKing that we are on the main thread even though the remote audio track case has Dispose logic for PostTasking to signaling thread and back, but this DCHECK should perhaps be removed just like the ~WebRtcMediaStreamTrackAdapterMap DCHECK.
,
Apr 17 2018
I'll remove the invalid thread checks for the map and adapter while adding a stress test in this CL: https://chromium-review.googlesource.com/c/chromium/src/+/983553/ Rest assure the adapter's Dispose() function still has a thread check and the map is thread safe and already DCHECK it is empty upon destruction so this should be safe.
,
Apr 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6bfff67c0102ca4d037a2eec2b75034d099ef08a commit 6bfff67c0102ca4d037a2eec2b75034d099ef08a Author: Henrik Boström <hbos@chromium.org> Date: Tue Apr 17 19:00:30 2018 Track adapter map stress test added and invalid thread checks removed. WebRtcMediaStreamTrackAdapterMapStressTest added due to https://crbug.com/813574. This continuously creates and destroys local and remote track adapters concurrently in an attempt to cause a deadlock if possible (hopefully not). This also removes some invalid thread checks at the destructor of ~WebRtcMediaStreamTrackAdapter and ~WebRtcMediaStreamAdapterMap. The first of which made the stress test flaky ( https://crbug.com/826365 ). The adapter's DCHECK was invalid because the remote audio track's Dispose (which properly DCHECKs that it is invoked on the main thread) can jump to the signaling thread and back with a reference to the adapter. It is possible for the signaling thread's task to be the last reference to the adapter. Similarly, WebRtcMediaStreamAdapterMap is reference counted and tasks referencing on different threads could make which thread references it last racy. We don't care which thread was the last reference as long as it is empty (which it is if and only if all adapters have been destroyed on the appropriate threads). Bug: 827450 Change-Id: I5f819948fc2148e810f343e027fa8011b03d2335 Reviewed-on: https://chromium-review.googlesource.com/983553 Reviewed-by: Guido Urdaneta <guidou@chromium.org> Commit-Queue: Henrik Boström <hbos@chromium.org> Cr-Commit-Position: refs/heads/master@{#551413} [modify] https://crrev.com/6bfff67c0102ca4d037a2eec2b75034d099ef08a/content/renderer/media/webrtc/webrtc_media_stream_adapter_map.cc [modify] https://crrev.com/6bfff67c0102ca4d037a2eec2b75034d099ef08a/content/renderer/media/webrtc/webrtc_media_stream_track_adapter.cc [modify] https://crrev.com/6bfff67c0102ca4d037a2eec2b75034d099ef08a/content/renderer/media/webrtc/webrtc_media_stream_track_adapter_map_unittest.cc
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a43fb027d85525a5f1038df421ca5643f5778fca commit a43fb027d85525a5f1038df421ca5643f5778fca Author: Henrik Boström <hbos@chromium.org> Date: Wed Apr 25 14:53:24 2018 RTCRtpSenderInternal destructor traits ensuring delete on main thread. The internal class owns AdapterRefs which have to be destructed on the main thread. As such, the last reference to the internal class had to be freed on the main thread. In flaky cases, when tasks were posted to the signaling thread, an operation performed and a task posted back to the main thread - the main thread would finish before the signaling thread task and the last reference to the internal class would be released on the signaling thread, causing DCHECK crashes at ~AdapterRef for being on the wrong thread. With destructor traits, if we are not already on the main thread we post to the main thread and delete RTCRtpSenderInternal there. The TearDown of RTCRtpSenderTest is updated to ensure that any pending tasks get a chance to execute, in case the signaling thread was not finished yet or else the destructor posted to the main thread does not get a chance to execute and the test would flakily leak. Before this CL: Flake's symptoms could be reproduced by adding a thread sleep at RTCRtpSenderInternal::GetStatsOnSignalingThread. After this CL: Unable to repro flake. The same applies for the other flaking tests based on replaceTrack(). Bug: 827450 Change-Id: Ib594a53042e441e591ccc2c87ae8012bcb4ec75e Reviewed-on: https://chromium-review.googlesource.com/1015002 Commit-Queue: Henrik Boström <hbos@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Cr-Commit-Position: refs/heads/master@{#553550} [modify] https://crrev.com/a43fb027d85525a5f1038df421ca5643f5778fca/content/renderer/media/webrtc/rtc_rtp_sender.cc [modify] https://crrev.com/a43fb027d85525a5f1038df421ca5643f5778fca/content/renderer/media/webrtc/rtc_rtp_sender_unittest.cc
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ec0e885a2fecd965eba2e084f4e55fe002dae81 commit 0ec0e885a2fecd965eba2e084f4e55fe002dae81 Author: Henrik Boström <hbos@chromium.org> Date: Wed Apr 25 18:41:39 2018 RTCRtpReceiverInternal destructor traits ensuring delete on main thread. The internal class owns AdapterRefs which have to be destructed on the main thread. As such, the last reference to the internal class had to be freed on the main thread. In flaky cases, when tasks were posted to the signaling thread, an operation performed and a task posted back to the main thread - the main thread would finish before the signaling thread task and the last reference to the internal class would be released on the signaling thread, causing DCHECK crashes at ~AdapterRef for being on the wrong thread. With destructor traits, if we are not already on the main thread we post to the main thread and delete RTCRtpReceiverInternal there. The TearDown of RTCRtpReceiverTest is updated to ensure that any pending tasks get a chance to execute, in case the signaling thread was not finished yet or else the destructor posted to the main thread does not get a chance to execute and the test would flakily leak. Before this CL: Flake's symptoms could be reproduced by adding a thread sleep at RTCRtpReceiverInternal::GetStatsOnSignalingThread. After this CL: Unable to repro flake. Bug: 827450 Change-Id: Icdfdd3e22c3e86308ee7911e67eff43590fde581 Reviewed-on: https://chromium-review.googlesource.com/1013492 Commit-Queue: Henrik Boström <hbos@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Cr-Commit-Position: refs/heads/master@{#553675} [modify] https://crrev.com/0ec0e885a2fecd965eba2e084f4e55fe002dae81/content/renderer/media/webrtc/rtc_rtp_receiver.cc [modify] https://crrev.com/0ec0e885a2fecd965eba2e084f4e55fe002dae81/content/renderer/media/webrtc/rtc_rtp_receiver.h [modify] https://crrev.com/0ec0e885a2fecd965eba2e084f4e55fe002dae81/content/renderer/media/webrtc/rtc_rtp_receiver_unittest.cc [modify] https://crrev.com/0ec0e885a2fecd965eba2e084f4e55fe002dae81/content/renderer/media/webrtc/rtc_rtp_sender.cc [modify] https://crrev.com/0ec0e885a2fecd965eba2e084f4e55fe002dae81/content/renderer/media/webrtc/rtc_rtp_sender.h
,
Apr 30 2018
a43fb027d85525a5f1038df421ca5643f5778fca and 0ec0e885a2fecd965eba2e084f4e55fe002dae81 have baked for 4-5 days now, requesting them be merged into M67 to ensure destructor is not executed on wrong thread if unlucky with GC (not very likely, but can happen). E.g. let sender = pc.addTrack(track); sender.getStats().then(doSomething); ... pc.close() or pc.removeTrack(sender); pc = sender = null; // Race: last reference of content layer sender could be the task executing // getStats() on the signaling thread if GC happens of blink layer sender // while operation in flight.
,
Apr 30 2018
TL;DR: There are some reference counted objects (RTCRtpSenderInternal, RTCRtpReceiverInternal) which own object that have to be destroyed on the main thread, but there were nothing insuring this happened in racy situations. The CLs requested to be merged are low risk, adding destructor traits to these objects that simply jump to the main thread if not already there.
,
Apr 30 2018
This bug requires manual review: Less than 25 days to go before AppStore submit on M67 Please contact the milestone owner if you have questions. Owners: cmasso@(Android), cmasso@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Apr 30 2018
Approving merge to M67 branch 3396 for CLs listed at #39 (a43fb027d85525a5f1038df421ca5643f5778fca and 0ec0e885a2fecd965eba2e084f4e55fe002dae81 ) based on comments #39 and #40. Please merge ASAP so we can pick them up for this week M67 beta release.
,
Apr 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e1efd9340f94ef022bee476b1b2e5cad18578240 commit e1efd9340f94ef022bee476b1b2e5cad18578240 Author: Henrik Boström <hbos@chromium.org> Date: Mon Apr 30 18:47:56 2018 RTCRtpSenderInternal destructor traits ensuring delete on main thread. The internal class owns AdapterRefs which have to be destructed on the main thread. As such, the last reference to the internal class had to be freed on the main thread. In flaky cases, when tasks were posted to the signaling thread, an operation performed and a task posted back to the main thread - the main thread would finish before the signaling thread task and the last reference to the internal class would be released on the signaling thread, causing DCHECK crashes at ~AdapterRef for being on the wrong thread. With destructor traits, if we are not already on the main thread we post to the main thread and delete RTCRtpSenderInternal there. The TearDown of RTCRtpSenderTest is updated to ensure that any pending tasks get a chance to execute, in case the signaling thread was not finished yet or else the destructor posted to the main thread does not get a chance to execute and the test would flakily leak. Before this CL: Flake's symptoms could be reproduced by adding a thread sleep at RTCRtpSenderInternal::GetStatsOnSignalingThread. After this CL: Unable to repro flake. The same applies for the other flaking tests based on replaceTrack(). TBR=hbos@chromium.org (cherry picked from commit a43fb027d85525a5f1038df421ca5643f5778fca) Bug: 827450 Change-Id: Ib594a53042e441e591ccc2c87ae8012bcb4ec75e Reviewed-on: https://chromium-review.googlesource.com/1015002 Commit-Queue: Henrik Boström <hbos@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#553550} Reviewed-on: https://chromium-review.googlesource.com/1035483 Reviewed-by: Henrik Boström <hbos@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#390} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/e1efd9340f94ef022bee476b1b2e5cad18578240/content/renderer/media/webrtc/rtc_rtp_sender.cc [modify] https://crrev.com/e1efd9340f94ef022bee476b1b2e5cad18578240/content/renderer/media/webrtc/rtc_rtp_sender_unittest.cc
,
Apr 30 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8bc0a55c623e97d8c2e4722836d22101938a1117 commit 8bc0a55c623e97d8c2e4722836d22101938a1117 Author: Henrik Boström <hbos@chromium.org> Date: Mon Apr 30 19:48:13 2018 RTCRtpReceiverInternal destructor traits ensuring delete on main thread. The internal class owns AdapterRefs which have to be destructed on the main thread. As such, the last reference to the internal class had to be freed on the main thread. In flaky cases, when tasks were posted to the signaling thread, an operation performed and a task posted back to the main thread - the main thread would finish before the signaling thread task and the last reference to the internal class would be released on the signaling thread, causing DCHECK crashes at ~AdapterRef for being on the wrong thread. With destructor traits, if we are not already on the main thread we post to the main thread and delete RTCRtpReceiverInternal there. The TearDown of RTCRtpReceiverTest is updated to ensure that any pending tasks get a chance to execute, in case the signaling thread was not finished yet or else the destructor posted to the main thread does not get a chance to execute and the test would flakily leak. Before this CL: Flake's symptoms could be reproduced by adding a thread sleep at RTCRtpReceiverInternal::GetStatsOnSignalingThread. After this CL: Unable to repro flake. TBR=hbos@chromium.org (cherry picked from commit 0ec0e885a2fecd965eba2e084f4e55fe002dae81) Bug: 827450 Change-Id: Icdfdd3e22c3e86308ee7911e67eff43590fde581 Reviewed-on: https://chromium-review.googlesource.com/1013492 Commit-Queue: Henrik Boström <hbos@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Guido Urdaneta <guidou@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#553675} Reviewed-on: https://chromium-review.googlesource.com/1035882 Reviewed-by: Henrik Boström <hbos@chromium.org> Cr-Commit-Position: refs/branch-heads/3396@{#395} Cr-Branched-From: 9ef2aa869bc7bc0c089e255d698cca6e47d6b038-refs/heads/master@{#550428} [modify] https://crrev.com/8bc0a55c623e97d8c2e4722836d22101938a1117/content/renderer/media/webrtc/rtc_rtp_receiver.cc [modify] https://crrev.com/8bc0a55c623e97d8c2e4722836d22101938a1117/content/renderer/media/webrtc/rtc_rtp_receiver.h [modify] https://crrev.com/8bc0a55c623e97d8c2e4722836d22101938a1117/content/renderer/media/webrtc/rtc_rtp_receiver_unittest.cc [modify] https://crrev.com/8bc0a55c623e97d8c2e4722836d22101938a1117/content/renderer/media/webrtc/rtc_rtp_sender.cc [modify] https://crrev.com/8bc0a55c623e97d8c2e4722836d22101938a1117/content/renderer/media/webrtc/rtc_rtp_sender.h
,
Apr 30 2018
The adapter map design is worth revisiting, and likely will have to when we're removing the content/renderer/media/webrtc/ layer (https://docs.google.com/document/d/14fsyfCRV_w2VcHDSnAXnhkB4DE83P-u_ewp6bg-HblU/edit?usp=sharing). Another time, another bug. This one is fixed and verified.
,
May 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e3e32d3dc594a93c2445033e7740cef1ceb9aa96 commit e3e32d3dc594a93c2445033e7740cef1ceb9aa96 Author: Wez <wez@chromium.org> Date: Wed May 02 11:47:02 2018 Unfilter RTCRtpSenderTest.ReplaceTrack* tests under Fuchsia. These tests were filtered due to threading issues in the production code leading to crash flakes. Bug: 813795 , 827450 Change-Id: I1c5f69a3d26e23eec5e25c391bfe20422252c360 Reviewed-on: https://chromium-review.googlesource.com/1039068 Reviewed-by: Henrik Boström <hbos@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#555353} [modify] https://crrev.com/e3e32d3dc594a93c2445033e7740cef1ceb9aa96/testing/buildbot/filters/fuchsia.content_unittests.filter |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Mar 30 2018