Make ObserverList faster |
|||
Issue descriptionChrome Version : 70.0.3538.22 ObserverList iteration was identified as a performance concern as part of https://crbug.com/859155 . https://chromium-review.googlesource.com/c/chromium/src/+/1242568 makes observer list iteration 10-20 times faster by removing the WeakPtr. Sample performance, for the given ObserverList size in Release. Values are nanoseconds to notify a single observer. Old -> New. (Old values improve with ObserverList size due to the malloc() cost being amortized over the list size). ObserverList::Unchecked 0: 22.3 -> 1.9 1: 79.6 -> 4.3 2: 61.6 -> 4.0 4: 50.5 -> 4.0 8: 41.0 -> 4.1 16: 33.6 -> 4.0 32: 30.3 -> 4.2 64: 28.8 -> 3.9 128: 28.0 -> 3.9 ObserverList<CheckedObserver> 0: 17.9 -> 1.9 1: 89.3 -> 14.2 2: 73.1 -> 16.6 4: 62.8 -> 20.1 8: 55.4 -> 20.1 16: 51.8 -> 20.5 32: 49.0 -> 20.4 64: 46.6 -> 20.5 128: 45.6 -> 20.9 doc with some data: https://docs.google.com/document/d/1CYt73XrGG_dE-dUrgyRR5rQPVDGuJpozkwVT_iaIs3I/edit#heading=h.p6i0h4cucpzb
,
Sep 28
I think it will be hard for this to surface in the FlameGraph due to inlining. When I looked at the assembly while profiling the perftest for '::Unchecked' in https://chromium-review.googlesource.com/c/chromium/src/+/1242568 it was *completely* inlined, except for the virtual call to `Observe` (which took up about a third of samples). All the other activity was in a single function (i.e. ObserverListPerfTest_NotifyPerformance_TestBody() ). If the `Observe()` in the perf test didn't increment a volatile int, a smart compiler (with virtual call inlining) might even just elide everything, for having no side-effects. CheckedObserver version was a similar story. Everything was inlined - it was ObserverListPerfTest_NotifyPerformance_TestBody that got the blame for calls to WeakReference::Flag::IsValid(), which accounted for about 80% of samples. (IsValid() clobbers the CPU pipeline and cache since it accesses an atomic int and it gets called 3 times each Notify: ObserverList::Compact calls EraseIf unconditionally, and `Iter::EnsureValidIndex` is invoked both when the Iter is constructed and when it is incremented, each time touching `IsMarkedForRemoval` -- [there are probably ways to optimize this..]). So I expect it's currently the methods that are iterating over the ObserverList who are getting the blame for calls to malloc() and WeakReference::Flag::IsValid(), and not ObserverList itself. Hence, I don't think they will show up in the flame graph. Also perhaps worth noting that I think the perftest is being quite generous to the old behaviour :). The test is single-threaded and malloc() has a pretty simple job to do since it is running in a tight loop and can just reuse the last thing `freed` each time. The 'real' improvement will be greater than what the perftest can measure.
,
Sep 28
(edit: destroyed->incremented :).
,
Sep 28
Re #3: The observation regarding overhead of IsValid() is worrying. WeakPtr's aren't thread-safe, and the use of an atomic was made to as part of supporting optimization of callers (specifically TaskScheduler) by letting them shortcut work if they spot that the WeakPtr is definitely invalid already. Given the semantics of MaybeValid() (that it can return true even if the WeakPtr is actually invalid) perhaps we should revisit whether an atomic is suitable there?
,
Sep 28
Re #5: Although the pipeline clobbering is due to the branches, not the atomic guarantees, so presumably for other use-cases it's less of an issue?
,
Oct 2
#c6 I think it's a combination. Some pipelining can still occur across branches with speculative execution. But atomic instructions are really expensive, and disable a bunch of optimizations - speculative execution is unlikely to impact performance around atomics. Changing the AtomicFlag to a bool better than halves the overheads. E.g. 16ns becomes 9ns in the perf test (i.e. a 13ns overhead becomes a 6ns overhead). --> https://chromium-review.googlesource.com/c/chromium/src/+/1256483 Tuning to reduce calls to IsValid() could probably reduce the gap further.
,
Oct 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8f733b9ec8e990d3dc40da6c357340e631234256 commit 8f733b9ec8e990d3dc40da6c357340e631234256 Author: Trent Apted <tapted@chromium.org> Date: Thu Oct 04 00:54:45 2018 Make ActivityLog observe ScriptExecutor via a base::RepeatingCallback There once was a ScriptExecutionObserver::Delegate, but it disappeared in some past refactoring. The lifetime around ScriptExecutionObserver is quite complex, and relies on base::ObserverList being a SupportsWeakPtr, which we want to stop doing for https://crbug.com/888973. There is now exactly one ScriptExecutionObserver implementation (ActivityLog), and the observer list only ever has exactly 1 or zero observers in it; added and removed in concert with a TabHelper. WebViewGuest also has an ObserverList<ScriptExecutionObserver>, but it never added any observers to it. We can instead make ActivityLog responsible for the lifetime of its callbacks from ScriptExecutor. It already dispenses WeakPtrs to itself. Bug: 888973 Change-Id: Ic773bbcbabea70627b48ef8e0e29ff4e16d1e1f5 Reviewed-on: https://chromium-review.googlesource.com/c/1256398 Commit-Queue: Trent Apted <tapted@chromium.org> Reviewed-by: Devlin <rdevlin.cronin@chromium.org> Cr-Commit-Position: refs/heads/master@{#596454} [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log.cc [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log.h [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/activity_log/activity_log_unittest.cc [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/tab_helper.cc [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/chrome/browser/extensions/tab_helper.h [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/BUILD.gn [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/guest_view/web_view/web_view_guest.cc [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/guest_view/web_view/web_view_guest.h [delete] https://crrev.com/2df4ab37771f945cd766caa9efa4c18e619dc023/extensions/browser/script_execution_observer.h [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/script_executor.cc [modify] https://crrev.com/8f733b9ec8e990d3dc40da6c357340e631234256/extensions/browser/script_executor.h
,
Oct 4
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f59cdf0d304cd451e2507a13dd12da6f8fccaffb commit f59cdf0d304cd451e2507a13dd12da6f8fccaffb Author: Trent Apted <tapted@chromium.org> Date: Thu Oct 04 03:42:22 2018 Make ObserverList iteration 10~20 times faster. ObserverList iterators take a WeakRef to the ObserverList to detect the list being destroyed mid-iteration. ObserverList iteration doesn't need the thread-safety guarantees from base::WeakPtr, and there's rarely more than one iterator, or a couple on the stack. ObserverList iteration was identified as a performance concern as part of https://crbug.com/859155 . A base::LinkedList can cater for detecting a destroy-while-iterating. This avoids the malloc() performed when the WeakRef is first dereferenced each iteration, which currently dominates iteration cost. We also reduce reference indirection, and reduce pipelining damage of atomics. Sample performance, for the given ObserverList size in Release. Values are nanoseconds to notify a single observer. Old -> New. (Old values improve with ObserverList size due to the malloc() cost being amortized over the list size). ObserverList::Unchecked 0: 22.3 -> 1.0 1: 79.6 -> 3.4 2: 61.6 -> 3.2 4: 50.5 -> 3.1 8: 41.0 -> 3.2 16: 33.6 -> 3.2 32: 30.3 -> 3.3 64: 28.8 -> 3.2 128: 28.0 -> 3.2 ObserverList<CheckedObserver> 0: 17.9 -> 0.8 1: 89.3 -> 12.9 2: 73.1 -> 14.8 4: 62.8 -> 16.0 8: 55.4 -> 18.0 16: 51.8 -> 18.3 32: 49.0 -> 18.3 64: 46.6 -> 18.3 128: 45.6 -> 18.2 Bug: 888973 Change-Id: I5edeb754e8c62d64efbbdecf617d5b33b0933688 Reviewed-on: https://chromium-review.googlesource.com/c/1242568 Commit-Queue: Trent Apted <tapted@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/heads/master@{#596492} [modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/BUILD.gn [modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list.h [modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_internal.h [add] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_perftest.cc [modify] https://crrev.com/f59cdf0d304cd451e2507a13dd12da6f8fccaffb/base/observer_list_unittest.cc
,
Oct 15
I seem to hit a DCHECK in the audio device monitor code on Windows after the CL in #9; stacktrace: FATAL 3288 11980 18:38:20-220 0 Check failed: (list_->iteration_sequence_checker_).CalledOnValidSequence(). Backtrace: base::debug::StackTrace::StackTrace [0x00007FF9C77B98E4+36] (C:\src\chromium\src\base\debug\stack_trace_win.cc:289) logging::LogMessage::~LogMessage [0x00007FF9C77D6F42+98] (C:\src\chromium\src\base\logging.cc:591) base::internal::WeakLinkNode<base::ObserverList<net::MDnsListenerImpl,0,1,base::internal::UncheckedObserverAdapter> >::get [0x00007FF9C68A68B9+109] (C:\src\chromium\src\base\observer_list_internal.h:134) base::ObserverList<net::MDnsListenerImpl,0,1,base::internal::UncheckedObserverAdapter>::Iter::Iter [0x00007FF9C68A6633+93] (C:\src\chromium\src\base\observer_list.h:134) media::AudioManagerBase::NotifyAllOutputDeviceChangeListeners [0x00007FF9C6A344C0+320] (C:\src\chromium\src\media\audio\audio_manager_base.cc:480) base::OnceCallback<void ()>::Run [0x00007FF9C68B91F4+52] (C:\src\chromium\src\base\callback.h:100) base::internal::FunctorTraits<base::RepeatingCallback<void ()>,void>::Invoke<base::RepeatingCallback<void ()>> [0x00007FF9C6A4D7DB+101] (C:\src\chromium\src\base\bind_internal.h:592) base::debug::TaskAnnotator::RunTask [0x00007FF9C805BD3C+364] (C:\src\chromium\src\base\debug\task_annotator.cc:99) base::MessageLoop::RunTask [0x00007FF9C77E281F+287] (C:\src\chromium\src\base\message_loop\message_loop.cc:436) base::MessageLoop::DoWork [0x00007FF9C77E2BD5+389] (C:\src\chromium\src\base\message_loop\message_loop.cc:517) base::MessagePumpDefault::Run [0x00007FF9C805F3D1+177] (C:\src\chromium\src\base\message_loop\message_pump_default.cc:37) base::MessageLoop::Run [0x00007FF9C77E248B+139] (C:\src\chromium\src\base\message_loop\message_loop.cc:388) base::RunLoop::Run [0x00007FF9C780A429+249] (C:\src\chromium\src\base\run_loop.cc:105) base::Thread::Run [0x00007FF9C7828DB8+200] (C:\src\chromium\src\base\threading\thread.cc:262) base::Thread::ThreadMain [0x00007FF9C78290D1+753] (C:\src\chromium\src\base\threading\thread.cc:360) base::`anonymous namespace'::ThreadFunc [0x00007FF9C7826F3C+204] (C:\src\chromium\src\base\threading\platform_thread_win.cc:103) BaseThreadInitThunk [0x00007FFA0ECD1FE4+20] RtlUserThreadStart [0x00007FFA104ACB31+33] As far as I can tell, NotifyAllOutputDeviceChangeListeners() is always running on the same thread. I also built and ran the CL just before and did not get a crash. To replicate it, I have an audio playback stream open and change default devices in the system settings. This causes the AudioManager to call the AudioOuputController that is the only registered observer. It, in turn, unregisters as an observer, stops the underlying stream, creates a new stream and re-registers as an observer again, all within the same callback. Is this behavior supported by the ObserverList? AFAICT, it should be (albeit perhaps unnecessary). I'm guessing the unregister / reregister is causing an issue. I don't think it's actually crossing over into another thread - there are plenty of thread checkers in place in AudioOutputController.
,
Oct 15
FYI, I've filed issue 895501 for DCHECK crashes in WeakLinkNode<ObserverList>, will copy comment #10 to there as well.
,
Oct 18
|
|||
►
Sign in to add a comment |
|||
Comment 1 by gab@chromium.org
, Sep 27