New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 758695 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 1
Type: Bug

Blocking:
issue 734490
issue 770312



Sign in to add a comment

PartitionAlloc should consider using a non-spin lock

Project Member Reported by brucedaw...@chromium.org, Aug 24 2017

Issue description

Chrome Version: M60
OS: Windows (but probably all)

 crbug.com/754213  was a priority inversion triggered when four threads simultaneously tried to acquire the PartitionAlloc lock. Presumably a fifth thread actually held the lock but, because it was a dual-core system, it never got any CPU time and therefore could not release the lock. This led to indefinite spinning (the ETW trace captured nine seconds).

Spin locks handle contention poorly and can easily end up spinning for an entire OS scheduler quantum before the thread that owns the lock gets a chance to release it. And, as demonstrated by the bug mentioned above, spin locks cannot be shared between threads of different priorities because this can lead to a priority inversion that will never be resolved.

It would be worth experimenting with base::subtle::ReadWriteLock. It *should* be as fast as a spin lock but without the downsides. If the performance on benchmarks is comparable then it would significantly avoid risk to switch to it from the spin lock.

 
Cc: hongchan@chromium.org
It was pointed out that base::Lock should be tried instead of base::subtle::ReadWriteLock - see this discussion:

https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/chromium-dev/cPZKCTpVw4U/jPwdaFdzCQAJ

Cc: robliao@chromium.org
Cc: mosescu@chromium.org
Blocking: 770312
Blocking: 734490
Labels: -Pri-3 OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows Pri-1
The spin lock is causing quite a bit of issues for audio code, see issue 710245,  issue 754213 , and  issue 770312  so far. If the lock cannot be changed, we must hunt down every direct and indirect use of PartitionAlloc from audio threads (and add a DCHECK in PartitionAlloc to verify). This could be quite a bit of work, so we'd rather have a simpler solution if possible.
What about using the solution implemented in crrev.com/c/852441? The solution in that case was to spin in a busy loop for a maximum of 1 ms and then fall back to Sleep(1), thus ensuring that infinite spinning is avoided.

There are performance consequences from Sleep(1) (the actual sleep time may be higher due to timer interrupt granularity) but Sleep(1) will only be executed when it is highly likely that the alternative was an infinite busy loop. Therefore when the Sleep(1) happens it will almost certainly be an improvement.

Cc: maxmorin@chromium.org
I think that sounds fine, if it's acceptable to the PartitionAlloc maintainers. How about falling back to an even longer sleep if sleep(1) fails repeatedly?
Sounds like an interesting experiment (do we have a good repro?) although I suspect that Sleep() will not offer the appropriate accuracy.

Fortunately, for Win8+ we have a better solution: WaitOnAddress/WakeByAddressSingle (https://msdn.microsoft.com/en-us/library/windows/desktop/hh706898(v=vs.85).aspx)

It may also be worth taking a step back to see if we don't have another, deeper root issue here (it seems that we're doing dynamic memory allocation through a shared allocator in real-time threads, right?)
The sleep solution seems a bit hacky to me, but if it can be made small and simple in SpinLock, I could see trying it out.

But, yeah, allocating memory dynamically is not a real-time thing to do. Nobody wants to hear that bad news, but, there it is. :)
We do many not really real-time things on the audio threads, but it mostly works out due to the relatively generous deadlines we have :). Ideally, we'd figure out a way eliminate all the locks and allocations and task postings, but this would be quite an undertaking. If a fallback to sleep(1) or WaitOnAddress (and futex?) doesn't add overhead in the typical case, it seems like the simplest fix for the (quite bad)  issue 770312 .

Comment 13 by olka@chromium.org, Jan 16 2018

The most often case of memory allocation on a real-time audio thread is posting a task to a non-real-time thread. And it's where we mostly hit priority inversion.
It's quite tricky to work around memory allocations for these cases.
> How about falling back to an even longer sleep if sleep(1) fails repeatedly?

I don't know for sure about Linux but on Windows that is not necessary. A Sleep(1) loop means that the thread is relinquishing the CPU about 99.9+% of the time. That is, it means that the thread is running less than 0.1% of the time. If that isn't enough then it is because some other thread is consuming too much CPU time and increasing the length of the sleep won't help.

Project Member

Comment 15 by bugdroid1@chromium.org, Jan 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/da57669f62bed877ec39dd5847470592109cb405

commit da57669f62bed877ec39dd5847470592109cb405
Author: Max Morin <maxmorin@chromium.org>
Date: Mon Jan 22 11:05:16 2018

Add sleep fallback to base::SpinLock.

This avoids a situation where a thread of lesser priority is holding the
lock while a bunch of higher priority threads hog all CPU spinning,
preventing the lesser priority thread from releasing the lock.

Bug:  758695 ,  770312 
Change-Id: Ibf36f16023069e32f3f55716a2b63e47fff0d9f7
Reviewed-on: https://chromium-review.googlesource.com/873647
Commit-Queue: Max Morin <maxmorin@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Chris Palmer <palmer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#530835}
[modify] https://crrev.com/da57669f62bed877ec39dd5847470592109cb405/base/allocator/partition_allocator/spin_lock.cc

Cc: -maxmorin@chromium.org palmer@chromium.org
Owner: maxmorin@chromium.org
Status: Fixed (was: Assigned)
I think this can be closed now?
Project Member

Comment 17 by bugdroid1@chromium.org, Feb 21 2018

Labels: merge-merged-3325
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a

commit ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a
Author: Max Morin <maxmorin@chromium.org>
Date: Wed Feb 21 10:25:11 2018

[M65] Add sleep fallback to base::SpinLock.

This avoids a situation where a thread of lesser priority is holding the
lock while a bunch of higher priority threads hog all CPU spinning,
preventing the lesser priority thread from releasing the lock.

Bug:  758695 ,  770312 
Change-Id: Ibf36f16023069e32f3f55716a2b63e47fff0d9f7
Reviewed-on: https://chromium-review.googlesource.com/873647
Commit-Queue: Max Morin <maxmorin@chromium.org>
Reviewed-by: Kentaro Hara <haraken@chromium.org>
Reviewed-by: Chris Palmer <palmer@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#530835}(cherry picked from commit da57669f62bed877ec39dd5847470592109cb405)
Reviewed-on: https://chromium-review.googlesource.com/928481
Reviewed-by: Max Morin <maxmorin@chromium.org>
Cr-Commit-Position: refs/branch-heads/3325@{#529}
Cr-Branched-From: bc084a8b5afa3744a74927344e304c02ae54189f-refs/heads/master@{#530369}
[modify] https://crrev.com/ccfaba8897ac1020d4ad0fe20b35b46dd0e8216a/base/allocator/partition_allocator/spin_lock.cc

Sign in to add a comment