PartitionAlloc should consider using a non-spin lock |
|||||||||
Issue descriptionChrome Version: M60 OS: Windows (but probably all) crbug.com/754213 was a priority inversion triggered when four threads simultaneously tried to acquire the PartitionAlloc lock. Presumably a fifth thread actually held the lock but, because it was a dual-core system, it never got any CPU time and therefore could not release the lock. This led to indefinite spinning (the ETW trace captured nine seconds). Spin locks handle contention poorly and can easily end up spinning for an entire OS scheduler quantum before the thread that owns the lock gets a chance to release it. And, as demonstrated by the bug mentioned above, spin locks cannot be shared between threads of different priorities because this can lead to a priority inversion that will never be resolved. It would be worth experimenting with base::subtle::ReadWriteLock. It *should* be as fast as a spin lock but without the downsides. If the performance on benchmarks is comparable then it would significantly avoid risk to switch to it from the spin lock.
,
Aug 25 2017
It was pointed out that base::Lock should be tried instead of base::subtle::ReadWriteLock - see this discussion: https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/chromium-dev/cPZKCTpVw4U/jPwdaFdzCQAJ
,
Aug 25 2017
,
Aug 30 2017
,
Jan 12 2018
,
Jan 12 2018
,
Jan 12 2018
The spin lock is causing quite a bit of issues for audio code, see issue 710245, issue 754213 , and issue 770312 so far. If the lock cannot be changed, we must hunt down every direct and indirect use of PartitionAlloc from audio threads (and add a DCHECK in PartitionAlloc to verify). This could be quite a bit of work, so we'd rather have a simpler solution if possible.
,
Jan 12 2018
What about using the solution implemented in crrev.com/c/852441? The solution in that case was to spin in a busy loop for a maximum of 1 ms and then fall back to Sleep(1), thus ensuring that infinite spinning is avoided. There are performance consequences from Sleep(1) (the actual sleep time may be higher due to timer interrupt granularity) but Sleep(1) will only be executed when it is highly likely that the alternative was an infinite busy loop. Therefore when the Sleep(1) happens it will almost certainly be an improvement.
,
Jan 15 2018
I think that sounds fine, if it's acceptable to the PartitionAlloc maintainers. How about falling back to an even longer sleep if sleep(1) fails repeatedly?
,
Jan 15 2018
Sounds like an interesting experiment (do we have a good repro?) although I suspect that Sleep() will not offer the appropriate accuracy. Fortunately, for Win8+ we have a better solution: WaitOnAddress/WakeByAddressSingle (https://msdn.microsoft.com/en-us/library/windows/desktop/hh706898(v=vs.85).aspx) It may also be worth taking a step back to see if we don't have another, deeper root issue here (it seems that we're doing dynamic memory allocation through a shared allocator in real-time threads, right?)
,
Jan 16 2018
The sleep solution seems a bit hacky to me, but if it can be made small and simple in SpinLock, I could see trying it out. But, yeah, allocating memory dynamically is not a real-time thing to do. Nobody wants to hear that bad news, but, there it is. :)
,
Jan 16 2018
We do many not really real-time things on the audio threads, but it mostly works out due to the relatively generous deadlines we have :). Ideally, we'd figure out a way eliminate all the locks and allocations and task postings, but this would be quite an undertaking. If a fallback to sleep(1) or WaitOnAddress (and futex?) doesn't add overhead in the typical case, it seems like the simplest fix for the (quite bad) issue 770312 .
,
Jan 16 2018
The most often case of memory allocation on a real-time audio thread is posting a task to a non-real-time thread. And it's where we mostly hit priority inversion. It's quite tricky to work around memory allocations for these cases.
,
Jan 16 2018
> How about falling back to an even longer sleep if sleep(1) fails repeatedly? I don't know for sure about Linux but on Windows that is not necessary. A Sleep(1) loop means that the thread is relinquishing the CPU about 99.9+% of the time. That is, it means that the thread is running less than 0.1% of the time. If that isn't enough then it is because some other thread is consuming too much CPU time and increasing the length of the sleep won't help.
,
Jan 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/da57669f62bed877ec39dd5847470592109cb405 commit da57669f62bed877ec39dd5847470592109cb405 Author: Max Morin <maxmorin@chromium.org> Date: Mon Jan 22 11:05:16 2018 Add sleep fallback to base::SpinLock. This avoids a situation where a thread of lesser priority is holding the lock while a bunch of higher priority threads hog all CPU spinning, preventing the lesser priority thread from releasing the lock. Bug: 758695 , 770312 Change-Id: Ibf36f16023069e32f3f55716a2b63e47fff0d9f7 Reviewed-on: https://chromium-review.googlesource.com/873647 Commit-Queue: Max Morin <maxmorin@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Chris Palmer <palmer@chromium.org> Cr-Commit-Position: refs/heads/master@{#530835} [modify] https://crrev.com/da57669f62bed877ec39dd5847470592109cb405/base/allocator/partition_allocator/spin_lock.cc
,
Feb 6 2018
I think this can be closed now?
,
Feb 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a commit ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a Author: Max Morin <maxmorin@chromium.org> Date: Wed Feb 21 10:25:11 2018 [M65] Add sleep fallback to base::SpinLock. This avoids a situation where a thread of lesser priority is holding the lock while a bunch of higher priority threads hog all CPU spinning, preventing the lesser priority thread from releasing the lock. Bug: 758695 , 770312 Change-Id: Ibf36f16023069e32f3f55716a2b63e47fff0d9f7 Reviewed-on: https://chromium-review.googlesource.com/873647 Commit-Queue: Max Morin <maxmorin@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Reviewed-by: Chris Palmer <palmer@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#530835}(cherry picked from commit da57669f62bed877ec39dd5847470592109cb405) Reviewed-on: https://chromium-review.googlesource.com/928481 Reviewed-by: Max Morin <maxmorin@chromium.org> Cr-Commit-Position: refs/branch-heads/3325@{#529} Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369} [modify] https://crrev.com/ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a/base/allocator/partition_allocator/spin_lock.cc |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by hongchan@chromium.org
, Aug 25 2017