New issue
Advanced search Search tips

Issue 888973 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac
Pri: 3
Type: Bug

Blocked on:
issue 896542



Sign in to add a comment

Make ObserverList faster

Project Member Reported by tapted@chromium.org, Sep 25

Issue description

Chrome Version       : 70.0.3538.22

ObserverList iteration was identified as a performance concern as part of  https://crbug.com/859155 .

https://chromium-review.googlesource.com/c/chromium/src/+/1242568 makes observer list iteration 10-20 times faster by removing the WeakPtr.

Sample performance, for the given ObserverList size in Release. Values
are nanoseconds to notify a single observer. Old -> New. (Old values
improve with ObserverList size due to the malloc() cost being amortized
over the list size).

ObserverList::Unchecked
0:   22.3 -> 1.9
1:   79.6 -> 4.3
2:   61.6 -> 4.0
4:   50.5 -> 4.0
8:   41.0 -> 4.1
16:  33.6 -> 4.0
32:  30.3 -> 4.2
64:  28.8 -> 3.9
128: 28.0 -> 3.9

ObserverList<CheckedObserver>
0:   17.9 ->  1.9
1:   89.3 -> 14.2
2:   73.1 -> 16.6
4:   62.8 -> 20.1
8:   55.4 -> 20.1
16:  51.8 -> 20.5
32:  49.0 -> 20.4
64:  46.6 -> 20.5
128: 45.6 -> 20.9

doc with some data:

https://docs.google.com/document/d/1CYt73XrGG_dE-dUrgyRR5rQPVDGuJpozkwVT_iaIs3I/edit#heading=h.p6i0h4cucpzb
 
ObserverList performance (1).png
18.5 KB View Download
While I'm in favor of removing a poor use case for WeakPtrs, it'd be nice to use the UMA sampling profiler to show the expected "light speed" potential gain (i.e. how much room there is for improvement in this space in practice).

From a quick look it looks like all of the time is spent inside code handling individual notifications rather than in "self" : https://uma.googleplex.com/p/chrome/callstacks?sid=ef3376f8d4d95dfd22cc5b9e5ef1ca97

Comment 2 Deleted

I think it will be hard for this to surface in the FlameGraph due to inlining.

When I looked at the assembly while profiling the perftest for '::Unchecked' in https://chromium-review.googlesource.com/c/chromium/src/+/1242568 it was *completely* inlined, except for the virtual call to `Observe` (which took up about a third of samples). All the other activity was in a single function (i.e. ObserverListPerfTest_NotifyPerformance_TestBody() ).

If the `Observe()` in the perf test didn't increment a volatile int, a smart compiler (with virtual call inlining) might even just elide everything, for having no side-effects.

CheckedObserver version was a similar story. Everything was inlined - it was ObserverListPerfTest_NotifyPerformance_TestBody that got the blame for calls to WeakReference::Flag::IsValid(), which accounted for about 80% of samples.  (IsValid() clobbers the CPU pipeline and cache since it accesses an atomic int and it gets called 3 times each Notify: ObserverList::Compact calls EraseIf unconditionally, and `Iter::EnsureValidIndex` is invoked both when the Iter is constructed and when it is incremented, each time touching `IsMarkedForRemoval` -- [there are probably ways to optimize this..]).

So I expect it's currently the methods that are iterating over the ObserverList who are getting the blame for calls to malloc() and WeakReference::Flag::IsValid(), and not ObserverList itself. Hence, I don't think they will show up in the flame graph.

Also perhaps worth noting that I think the perftest is being quite generous to the old behaviour :). The test is single-threaded and malloc() has a pretty simple job to do since it is running in a tight loop and can just reuse the last thing `freed` each time. The 'real' improvement will be greater than what the perftest can measure.
(edit: destroyed->incremented :).
Re #3: The observation regarding overhead of IsValid() is worrying. WeakPtr's aren't thread-safe, and the use of an atomic was made to as part of supporting optimization of callers (specifically TaskScheduler) by letting them shortcut work if they spot that the WeakPtr is definitely invalid already.

Given the semantics of MaybeValid() (that it can return true even if the WeakPtr is actually invalid) perhaps we should revisit whether an atomic is suitable there?
Re #5: Although the pipeline clobbering is due to the branches, not the atomic guarantees, so presumably for other use-cases it's less of an issue?
#c6 I think it's a combination. Some pipelining can still occur across branches with speculative execution. But atomic instructions are really expensive, and disable a bunch of optimizations - speculative execution is unlikely to impact performance around atomics.

Changing the AtomicFlag to a bool better than halves the overheads. E.g. 16ns becomes 9ns in the perf test (i.e. a 13ns overhead becomes a 6ns overhead). --> https://chromium-review.googlesource.com/c/chromium/src/+/1256483

Tuning to reduce calls to IsValid() could probably reduce the gap further.
Project Member

Comment 8 by bugdroid1@chromium.org, Oct 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8f733b9ec8e990d3dc40da6c357340e631234256

commit 8f733b9ec8e990d3dc40da6c357340e631234256
Author: Trent Apted <tapted@chromium.org>
Date: Thu Oct 04 00:54:45 2018

Make ActivityLog observe ScriptExecutor via a base::RepeatingCallback

There once was a ScriptExecutionObserver::Delegate, but it disappeared
in some past refactoring. The lifetime around ScriptExecutionObserver
is quite complex, and relies on base::ObserverList being a
SupportsWeakPtr, which we want to stop doing for https://crbug.com/888973.

There is now exactly one ScriptExecutionObserver implementation
(ActivityLog), and the observer list only ever has exactly 1 or zero
observers in it; added and removed in concert with a TabHelper.
WebViewGuest also has an ObserverList<ScriptExecutionObserver>, but
it never added any observers to it.

We can instead make ActivityLog responsible for the lifetime of its
callbacks from ScriptExecutor. It already dispenses WeakPtrs to itself.

Bug: 888973
Change-Id: Ic773bbcbabea70627b48ef8e0e29ff4e16d1e1f5
Reviewed-on: https://chromium-review.googlesource.com/c/1256398
Commit-Queue: Trent Apted <tapted@chromium.org>
Reviewed-by: Devlin <rdevlin.cronin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#596454}
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log.cc
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log.h
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log_unittest.cc
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/tab_helper.cc
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/tab_helper.h
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/BUILD.gn
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/guest_view/web_view/web_view_guest.cc
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/guest_view/web_view/web_view_guest.h
[delete] https://crrev.com/2df4ab37771f945cd766caa9efa4c18e619dc023/extensions/browser/script_execution_observer.h
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/script_executor.cc
[modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/script_executor.h

Project Member

Comment 9 by bugdroid1@chromium.org, Oct 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f59cdf0d304cd451e2507a13dd12da6f8fccaffb

commit f59cdf0d304cd451e2507a13dd12da6f8fccaffb
Author: Trent Apted <tapted@chromium.org>
Date: Thu Oct 04 03:42:22 2018

Make ObserverList iteration 10~20 times faster.

ObserverList iterators take a WeakRef to the ObserverList to detect
the list being destroyed mid-iteration. ObserverList iteration doesn't
need the thread-safety guarantees from base::WeakPtr, and there's rarely
more than one iterator, or a couple on the stack.

ObserverList iteration was identified as a performance concern as part
of  https://crbug.com/859155 .

A base::LinkedList can cater for detecting a destroy-while-iterating.
This avoids the malloc() performed when the WeakRef is first dereferenced
each iteration, which currently dominates iteration cost. We also
reduce reference indirection, and reduce pipelining damage of atomics.

Sample performance, for the given ObserverList size in Release. Values
are nanoseconds to notify a single observer. Old -> New. (Old values
improve with ObserverList size due to the malloc() cost being amortized
over the list size).

ObserverList::Unchecked
0:   22.3 -> 1.0
1:   79.6 -> 3.4
2:   61.6 -> 3.2
4:   50.5 -> 3.1
8:   41.0 -> 3.2
16:  33.6 -> 3.2
32:  30.3 -> 3.3
64:  28.8 -> 3.2
128: 28.0 -> 3.2

ObserverList<CheckedObserver>
0:   17.9 ->  0.8
1:   89.3 -> 12.9
2:   73.1 -> 14.8
4:   62.8 -> 16.0
8:   55.4 -> 18.0
16:  51.8 -> 18.3
32:  49.0 -> 18.3
64:  46.6 -> 18.3
128: 45.6 -> 18.2

Bug: 888973
Change-Id: I5edeb754e8c62d64efbbdecf617d5b33b0933688
Reviewed-on: https://chromium-review.googlesource.com/c/1242568
Commit-Queue: Trent Apted <tapted@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#596492}
[modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/BUILD.gn
[modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list.h
[modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_internal.h
[add] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_perftest.cc
[modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_unittest.cc

Cc: ossu@chromium.org
I seem to hit a DCHECK in the audio device monitor code on Windows after the CL in #9; stacktrace:

FATAL	3288	11980	18:38:20-220		0	Check failed: (list_->iteration_sequence_checker_).CalledOnValidSequence(). 
Backtrace:
	base::debug::StackTrace::StackTrace [0x00007FF9C77B98E4+36] (C:\src\chromium\src\base\debug\stack_trace_win.cc:289)
	logging::LogMessage::~LogMessage [0x00007FF9C77D6F42+98] (C:\src\chromium\src\base\logging.cc:591)
	base::internal::WeakLinkNode<base::ObserverList<net::MDnsListenerImpl,0,1,base::internal::UncheckedObserverAdapter> >::get [0x00007FF9C68A68B9+109] (C:\src\chromium\src\base\observer_list_internal.h:134)
	base::ObserverList<net::MDnsListenerImpl,0,1,base::internal::UncheckedObserverAdapter>::Iter::Iter [0x00007FF9C68A6633+93] (C:\src\chromium\src\base\observer_list.h:134)
	media::AudioManagerBase::NotifyAllOutputDeviceChangeListeners [0x00007FF9C6A344C0+320] (C:\src\chromium\src\media\audio\audio_manager_base.cc:480)
	base::OnceCallback<void ()>::Run [0x00007FF9C68B91F4+52] (C:\src\chromium\src\base\callback.h:100)
	base::internal::FunctorTraits<base::RepeatingCallback<void ()>,void>::Invoke<base::RepeatingCallback<void ()>> [0x00007FF9C6A4D7DB+101] (C:\src\chromium\src\base\bind_internal.h:592)
	base::debug::TaskAnnotator::RunTask [0x00007FF9C805BD3C+364] (C:\src\chromium\src\base\debug\task_annotator.cc:99)
	base::MessageLoop::RunTask [0x00007FF9C77E281F+287] (C:\src\chromium\src\base\message_loop\message_loop.cc:436)
	base::MessageLoop::DoWork [0x00007FF9C77E2BD5+389] (C:\src\chromium\src\base\message_loop\message_loop.cc:517)
	base::MessagePumpDefault::Run [0x00007FF9C805F3D1+177] (C:\src\chromium\src\base\message_loop\message_pump_default.cc:37)
	base::MessageLoop::Run [0x00007FF9C77E248B+139] (C:\src\chromium\src\base\message_loop\message_loop.cc:388)
	base::RunLoop::Run [0x00007FF9C780A429+249] (C:\src\chromium\src\base\run_loop.cc:105)
	base::Thread::Run [0x00007FF9C7828DB8+200] (C:\src\chromium\src\base\threading\thread.cc:262)
	base::Thread::ThreadMain [0x00007FF9C78290D1+753] (C:\src\chromium\src\base\threading\thread.cc:360)
	base::`anonymous namespace'::ThreadFunc [0x00007FF9C7826F3C+204] (C:\src\chromium\src\base\threading\platform_thread_win.cc:103)
	BaseThreadInitThunk [0x00007FFA0ECD1FE4+20]
	RtlUserThreadStart [0x00007FFA104ACB31+33]

As far as I can tell, NotifyAllOutputDeviceChangeListeners() is always running on the same thread. I also built and ran the CL just before and did not get a crash.

To replicate it, I have an audio playback stream open and change default devices in the system settings. This causes the AudioManager to call the AudioOuputController that is the only registered observer. It, in turn, unregisters as an observer, stops the underlying stream, creates a new stream and re-registers as an observer again, all within the same callback.

Is this behavior supported by the ObserverList? AFAICT, it should be (albeit perhaps unnecessary). I'm guessing the unregister / reregister is causing an issue. I don't think it's actually crossing over into another thread - there are plenty of thread checkers in place in AudioOutputController. 

FYI, I've filed issue 895501 for DCHECK crashes in WeakLinkNode<ObserverList>, will copy comment #10 to there as well.
Blockedon: 896542

Sign in to add a comment