AudioWorklet Glitches: GC
Reported by
steve...@gmail.com,
Apr 24 2018
|
|||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3405.0 Safari/537.36 Steps to reproduce the problem: 1. Run the example code (available online to test at https://kunstmusik.github.io/audioworklet-gc , source available attached here or available at https://github.com/kunstmusik/audioworklet-gc) 2. Web page starts with simple sine-generator AudioWorklet runnning. 3. Select checkbox to enable GC pressure code that allocates randomly sized Float32Arrays in main thread using setInterval(). These are used from another setInterval() function to ensure that the code does not get JIT'ed out. What is the expected behavior? Audio produced by AudioWorklet should not glitch What went wrong? I wrote this test not knowing Chrome's exact GC implementation and if GC pauses from the main thread will cause stop-the-world for Worklet threads (i.e., for other threads per-tab to go to a safe point). Enabling the checkbox in the test HTML will consistently create glitchy audio. What I do not know is if it is due to high CPU or if GC is indeed causing pauses. Some other notes: * Changing the size of memory allocated for the Float32Array seems to correlate with the rate of glitches heard. * On a separate website, https://kunstmusik.github.io/csound-p5js, I can load up the program, open devtools, go to memory, and trigger GC with the trash can icon. If I click multiple times, I can regularly get audio to get pauses. The CPU is not high in this case, so this is what made me initially inquire if the pauses might be GC-related. Did this work before? No Does this work in other browsers? Yes Chrome version: 68.0.3405.0 Channel: canary OS Version: 10.0 Flash Version: I do not know if the results heard are a result of GC or thread priority or what, but hopefully this test file gives some good data to help figure out ways to iron out any perf issues with AudioWorklet.
,
Apr 24 2018
,
Apr 24 2018
Few observations on tracing: 1. Around 70s, the main thread starts the struggle. Somehow "TimerBase::run" stalls the the main renderer thread. 2. As a result, the main thread GC kicks in frequently. 3. There is a clear correlation between the main thread GC and the wall duration of AudioDestination function. (See the attached image) 4. The actual audio processing task in the AudioWorklet is very simple and well under the performance capacity. 5. MacOS has the same problem, so this is not platform-dependent. Perhaps the main thread GC holds a mutex and the AudioDestination waits until the lock is released? I thought the main thread GC does not affect the performance of the WorkeltGlobalScope. This repro case says otherwise. haraken@ nhiroki@ WDYT?
,
Apr 24 2018
,
Apr 24 2018
if the problem has to do with AudioDestination, wouldn’t it show up in other web audio cases (ie. with native nodes)?
,
Apr 25 2018
> Perhaps the main thread GC holds a mutex and the AudioDestination waits until the lock is released? I thought the main thread GC does not affect the performance of the WorkeltGlobalScope. This is a tricky issue. The main thread and the worker thread can run mostly concurrently but not completely. They need to grab a lock in the following two cases: a) When the thread runs a GC. b) When the thread accesses CrossThreadPersistent. a) means that two threads cannot run GCs at the same time. b) means that one thread cannot access CrossThreadPersistent while another thread is running a GC. Do you know which of the issues you're hitting here?
,
Apr 25 2018
I think it is b) because there should be no GC in the AudioWorklet thread.
,
Apr 25 2018
However, the AudioWorklet thread is attached to Oilpan. It is possible to trigger a GC, right? If the cause is b), do you know what CrossThreadPersistent is causing the problem?
,
Apr 25 2018
All I can say is that the user code in the worklet should not trigger a GC, in that example. The GC pressure is coming from the main thread. I don’t know if there are any other possible triggers of GC in the worklet thread (besides user code). As for b), I actually don’t understand how that works so I wouldn’t be able to tell.
,
Apr 25 2018
I think that a right fix would be to release ProcessHeap::CrossThreadPersistentMutex() after a GC finishes iterating the cross-thread-persistent region. Currently we are grabbing ProcessHeap::CrossThreadPersistentMutex() until the final atomic phase finishes. I think it's too much. keishi, mlippautz: Do you have any concern about releasing ProcessHeap::CrossThreadPersistentMutex() that way?
,
Apr 25 2018
On a high level we should only need to hold the lock while iterating them. For strong persistents that's already true on ToT. Unfortunately, we can have weak cross-thread persistents for which we register a slot that is checked after marking. The race I see is: 1. Iterate roots and register cross-thread persistent for weak processing 2. Give up lock 3. Other thread destroys the cross-thread persistent and returns the underlying node. 4. Weak processing later on accesses already-released memory. For incremental and concurrent marking steps 2.-4. can take arbitrary long and are even interrupted with regular application behavior. Even today this is already a race though. A proper solution would probably involve hooking into destruction of weak cross-thread persistents in a way that the GC gets notified about the node destruction.
,
Apr 25 2018
Thanks for the details. I agree. I think we need to hold the lock in the following periods: a) While iterating strong cross-thread persistents. b) From when we start iterating weak cross-thread persistents to when we finish running weak processing. b) is already done in MarkEpilogue atomically. So today I don't think we have a race. Can we probably adjust the lock scope as described above? Then I think we can significantly decrease the time period when the lock is acquired, which will help fix the glitch in this bug.
,
Apr 25 2018
Yes agree. Didn't see that the weak and strong phases were already split. Will upload a CL.
,
Apr 25 2018
tl;dr: avoid micro benchmarks where possible Longer version: While Oilpans handling of cross-thread persistents here is far from ideal and the proposed CL in [1] makes this slightly better, the fundamental cause here is different and is not solved by [1]. The repro allocates large JS typed array which accumulates up to 100M of V8-external memory. This memory triggers an incremental V8 GC which immediately is able to finalize as there's no other live memory to find. V8 follows up with an Oilpan GC. Since this is a micro benchmark, the scenario just repeats. The "problem" here is that we only allocate one large array that is not on V8's heap and thus not make use of any heap growing strategies. There's a longer term plan on including external memory in V8's internal growing strategy which could immediately solve this by avoiding garbage collection. Others ways of creating garbage (e.g. many small typed arrays, regular JS objects or JS arrays) would result in on-heap storage and not kick off GC in a tight loop. Ultimately, it will always depend on how aggressive the GC is on such large allocations exercised in tight loops. [1] https://chromium-review.googlesource.com/c/chromium/src/+/1026675
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/41715deed2f740033a9796c34e46209afeab23e0 commit 41715deed2f740033a9796c34e46209afeab23e0 Author: Michael Lippautz <mlippautz@chromium.org> Date: Wed Apr 25 18:29:38 2018 [oilpan] Narrow lock scopes for cross-thread persistents We only need to lock those while iterating persistents and while processing weak persistents. Bug: chromium:836306 Change-Id: I788d349672c9d9550e3889c13ec4b79b5e5c6ee9 Reviewed-on: https://chromium-review.googlesource.com/1026675 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Cr-Commit-Position: refs/heads/master@{#553669} [modify] https://crrev.com/41715deed2f740033a9796c34e46209afeab23e0/third_party/blink/renderer/platform/heap/process_heap.h [modify] https://crrev.com/41715deed2f740033a9796c34e46209afeab23e0/third_party/blink/renderer/platform/heap/thread_state.cc
,
Apr 25 2018
Just wanted to note that the repro tries to demonstrate something that is being observed in an intermitent way in practice. This seriously affects the viability of audio worklets for real applications. The problem is shown in an exagerated way, but it does not mean that the main issue, breakups in the audio stream, do not happen in normal operation. For instance, it has been reported that manually triggering GC has a similar effect.
,
Apr 25 2018
It looks like mlippautz@ already started working on it, so I am marking it so.
,
Apr 26 2018
There's nothing short-term we can do here anymore. The GC scheduling problem is on our agenda but will take some time. > The problem is shown in an exagerated way, but it > does not mean that the main issue, breakups in > the audio stream, do not happen in normal > operation. Not saying that it doesn't happen in real apps! It's really useful for us to have these isolated test cases that highlight shortcomings. > For instance, it has been reported that manually > triggering GC has a similar effect. You should not do that though. The GCs in Chrome are designed to get out of your way (unless in corner cases like the repro). Manually calling GC might screw with its heuristics and ultimately cause more problems than it solves.
,
Apr 26 2018
> You should not do that though. The GCs in Chrome are designed to get out of your way (unless in corner cases like the repro). Manually calling GC might screw with its heuristics and ultimately cause more problems than it solves. Just to clarify: we don't do that. I am commenting on how we first got to suspect the GC, which was by triggering GC in devtools. From that we constructed the programmatic example. More generally, I know that this is an impossible target, but realtime-safe audio code should ideally be run in lock-free conditions. That seems to be the case for native nodes, where we don't see the problem reported here.
,
Apr 26 2018
Tried checking the issue on chrome version 68.0.3409.0 using Windows 10 and Mac 10.13.1 with the below mentioned steps. 1. Launched Chrome 2. Navigated to https://kunstmusik.github.io/audioworklet-gc 3. Enabled the check box "Enable GC Pressure". We were able to hear glitches in audio after enabling the check box. Attaching the screen cast of the same. @Michael Lippautz: Could you please have a look at the screen cast provided and let us know if anything missed from our end, please help us in verifying the fix. Thanks!
,
Apr 26 2018
The glitches are not gone but should be reduced. The change has not rolled into chromium yet though. See https://chromiumdash.appspot.com/commit/41715deed2f740033a9796c34e46209afeab23e0
,
Apr 26 2018
mlippautz@ Thanks so much for working on this! I'll measure the improvement from your CL soon.
,
Apr 26 2018
victor.lazzarini@ Let me clarify few things for you: > if the problem has to do with AudioDestination, wouldn’t it show up in other web audio cases (ie. with native nodes)? The native node's memory allocation always happens on the main thread, and the allocation happens only once in its lifetime. > I think it is b) because there should be no GC in the AudioWorklet thread. This is not true. The WorkletGlobalScope always creates a set of arrays to deliver the samples to developers, and this needs to happen every render quantum. However, GC in the worklet scope is generally negligible unless you're using tons of worklet processors. This issue here is not about GC in the worklet scope, it is rather about the contention on the memory allocator that happens to be shared with multiple threads. > This seriously affects the viability of audio worklets for real applications. Why do real application creates tons of Float32Arrays in a loop? The production-grade application should manage the memory allocation accordingly. By design, the web platform hides GC from developers for the security purpose. > More generally, I know that this is an impossible target, but realtime-safe audio code should ideally be run in lock-free conditions. That seems to be the case for native nodes, where we don't see the problem reported here. Native nodes are not "lock-free". The spec doesn't say the implementation should be "lock-free" (it would be wrong if the spec manifests it). We are carefully managing locks and the memory allocation, so developers don't have to. To expose the guts of WebAudio, the AudioWorklet needs to deliver the JS array to user and that can't be free. Now it becomes developer's responsibility to manage them. Once again, the native node's memory allocation is nothing to do with the AudioWorklet. Some of them live outside of the GC management (Oilpan) and so you don't even see them. That shouldn't be a part of the discussion here.
,
Apr 26 2018
The point I made is that in normal operation we are having interruptions, which this repro is just making clearer. That is why we have a serious problem. I am not sure about what you mean by “quantum”, but if it means that allocations happen at every block (call to the process), then we have a big problem. I would have thought the allocation would be at set up time and then the buffers would just be reused.
,
Apr 26 2018
> I am not sure about what you mean by “quantum”, https://webaudio.github.io/web-audio-api/#render-quantum > but if it means that allocations happen at every block (call to the process), then we have a big problem. > I would have thought the allocation would be at set up time and then the buffers would just be reused. The buffer needs to be allocated every quantum, because user can do all kinds of things by stashing the buffer object away. The buffer can even be passed to the other thread/scope. Thanks to our effort to optimize the performance of the AudioWorkletGlobalScope, the buffer allocation overhead is proven to be negligible during the AudioWorklet experiments. If you find such problem, please file a new issue because it is irrelevant with the this issue.
,
Apr 26 2018
Independent of WebAudio changes, the problem highlighted in the repro is that we schedule GC in a loop. This is because of the way typed arrays are handled in V8. We have ideas how to fix this, which would result in less GC pressure in the provided example. The change is to have some (well known) external memory (such as typed arrays) accounted as part of V8. This will not happen immediately though. It will also not change the way how buffers in WebAudio are allocated in general.
,
Apr 26 2018
Some thoughts: 1. To clarify, it's likely no one will ever, and definitely should not ever, write code to allocate memory in this way. The test doesn't propose anyone should. That's not the point of the test. 2. The test was written for a very specific purpose to trigger GC from the main thread frequently to see if it's possible in any way for GC on the main thread to interfere with the AWP thread. This may be common knowledge internally to Chromium devs, but is not for 3rd parties (at least among people I spoke to). It seems from the issue it has at least shown that the main thread and AWP thread do interact via locks and there are indeed conditions where the former can block the latter. 3. The test reliably reproduces breakups in audio. 4. This may or may not help with the intermittent audio crackles, but at least the test reproduces a problem reliably and gives more data to consider for performance diagnosis. 5. It would be good to know what the max pause time could be with the lock in question (ideally before and after the change in comment 15). We know we have 128 block size / 48000 sr time to compute (~2.7ms) each block. If the pause time is max 1ms, then we know we only have a safe amount of time of 1.7ms to process and deliver the block without fear of pause pushing us over the delivery time. (The calculations above of course change if double or triple buffering. I usually think about these kinds of things when developing JVM-based audio systems and have found it helpful.)
,
Apr 27 2018
We have a plan that will hopefully fixe the issue: tldr; We will account for backing stores (like typed arrays) direct in V8 and add them to our heap growing strategy. This should reduce the number of GCs significantly and reduce the lock contention.
,
Apr 27 2018
V8 will work on the scheduling. Maybe there's other stuff to be done on the audio side where we can avoid having cross-thread persistents
,
Apr 27 2018
Thanks to mlippautz@'s patch, the amount of glitches was greatly reduced. The team is fully aware of the issue and also has a plan for it. Is this going to be a tracking bug for the issue? Or is there any existing bug for this?
,
Apr 27 2018
Re #27, stevenyi@: Thanks to you and victor.lazzarini@ we were able to find, diagnose and mitigate the issue. With that said, this should not be a "make it or break it" problem for the AudioWorklet. I have seen other large-scale music applications are being ported without the issue. Developers have a full control over the memory allocation (for TypedArray) on the main thread, so the workaround is possible and should be explored more. The point 5 in your comment; the web platform's task scheduler does not have a concept of "timing budget" because it is not designed to be a real-time system. So trying to calculate the budget like you described might not work well in the end. This is definitely a new question for the web platform, and we should push the concept further via the spec discussion. (That's the better venue for such matter.) Please file a new spec issue if you're interested: https://github.com/WebAudio/web-audio-api/issues
,
May 23 2018
Assigning this to myself. I will follow up on this when we have a progress.
,
Nov 8
,
Nov 9
I am sorry I have not been following this up, but today's notification made me prompt to ask the question whether there have been any changes in relation to this. We've continued to improve our code and have released further updates. We're keeping faith that AW is the best show in town for WASM-based systems. |
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by steve...@gmail.com
, Apr 24 2018