New issue
Advanced search Search tips

Issue 621786 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

WTF::ThreadSpecific should add a fast path to look up main thread's storage

Project Member Reported by haraken@chromium.org, Jun 21 2016

Issue description

Oilpan has a clever hack [1] to look up main thread's storage without looking up actual thread-local storage. This was one of key optimizations to make Oilpan faster. We should have the same fast-path for WTF::ThreadSpecific.

[1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/ThreadState.h?q=threadstate.h&sq=package:chromium&dr=CSs&l=206

 

Comment 1 by tkent@chromium.org, Jun 23 2016

Components: -Blink>WTF Blink>Internals>WTF
Renaming Blink>WTF to Blink>Internals>WTF.

Owner: yutak@chromium.org
Status: Assigned (was: Untriaged)
yutak@ You were looking into this?
Owner: slangley@chromium.org
I'll take a look at this if you haven't already started?
Sounds great!

I guess yutak@ has some experimental CL for this...? (but he is not working on it now.)

Cc: csharrison@chromium.org
slangley@, are you looking into this? We are thinking it might be blocking  issue 442246  for perf reasons.
Cc: slangley@chromium.org
Owner: csharrison@chromium.org
slangley: If it is ok, I will take ownership.
Sure - sorry for the delayed response.
Project Member

Comment 8 by bugdroid1@chromium.org, Jan 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/098f72afae1133c762650b2c841981be9b4afe6e

commit 098f72afae1133c762650b2c841981be9b4afe6e
Author: csharrison <csharrison@chromium.org>
Date: Thu Jan 19 01:12:33 2017

Fast path for ThreadSpecific for main thread on TLS-slow platforms

This code is mainly a refactor of utility code out of platform/heap and
into wtf/StackUtil. ThreadSpecific then copies some tricks done by
ThreadState.

BUG= 621786 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_layout_tests_slimming_paint_v2

Review-Url: https://codereview.chromium.org/2623273007
Cr-Commit-Position: refs/heads/master@{#444581}

[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/platform/heap/StackFrameDepth.cpp
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/platform/heap/StackFrameDepth.h
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/platform/heap/ThreadState.cpp
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/platform/heap/ThreadState.h
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/BUILD.gn
[add] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/StackUtil.cpp
[add] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/StackUtil.h
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/ThreadSpecific.h
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/ThreadingPthreads.cpp
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/ThreadingWin.cpp
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/WTFThreadData.cpp
[modify] https://crrev.com/098f72afae1133c762650b2c841981be9b4afe6e/third_party/WebKit/Source/wtf/WTFThreadData.h

I spoke with haraken@ about potentially simplifying ThreadSpecific to avoid the lazy allocation on access, and it seemed like we could just add an initialization path upon thread creation and just allocate all ThreadSpecifics there. This would entail having a static global "registry" of all the ThreadSpecifics, such that when a new thread is spawned, we would iterate through them, allocating each one.

However, it seems impossible to ensure that every thread has a copy of each ThreadSpecific, given that the ThreadSpecifics are initialized at different times. For instance, Thread A could initialize thread specific X *after* Thread B has already initialized. How will Thread B initialize x if it also needs a value later?
I don't think we'd want to do that since it means eagerly allocating everything which bloats the size of threads. I do think it might be nice to eagerly allocate the WTFThreadData whenever we start a thread though.
Yeah I was wondering about the size cost of that idea. I'll look into just special casing WTFThreadData this way.
WIP patch for #10:
https://codereview.chromium.org/2646003003
Yes, I was thinking we eagerly allocate only WTFThreadData and Oilpan's ThreadState (maybe you can move ThreadState into WTFThreadData though).

haraken: would you look at the arguments in the linked patch in #12? We think the approach is not viable due to undefined destructor ordering.
Project Member

Comment 15 by bugdroid1@chromium.org, Feb 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/395b9f9ab8004e4a0a4eb67d80f930a57964be9c

commit 395b9f9ab8004e4a0a4eb67d80f930a57964be9c
Author: csharrison <csharrison@chromium.org>
Date: Fri Feb 03 03:21:13 2017

Avoid checking for WTFThreadData::staticData in wtfThreadData()

This patch initialized the underlying ThreadSpecific<WTFThreadData>*
in a separate code path during initialization, to avoid a branch in
wtfThreadData().

BUG= 621786 

Review-Url: https://codereview.chromium.org/2646003003
Cr-Commit-Position: refs/heads/master@{#447897}

[modify] https://crrev.com/395b9f9ab8004e4a0a4eb67d80f930a57964be9c/third_party/WebKit/Source/wtf/ThreadingPthreads.cpp
[modify] https://crrev.com/395b9f9ab8004e4a0a4eb67d80f930a57964be9c/third_party/WebKit/Source/wtf/ThreadingWin.cpp
[modify] https://crrev.com/395b9f9ab8004e4a0a4eb67d80f930a57964be9c/third_party/WebKit/Source/wtf/WTFThreadData.cpp
[modify] https://crrev.com/395b9f9ab8004e4a0a4eb67d80f930a57964be9c/third_party/WebKit/Source/wtf/WTFThreadData.h

Status: Fixed (was: Assigned)
For TLS slow platforms, and for ThreadSpecifics that are initialized during threading initialization (e.g. ThreadState/WTFThreadData), we can remove a single branch (and call of internal::mayNotBeMainThread ) from the "fast path" of main thread access by initializing. We can't speed up the slow path.

We can do something like templatizing on another bool (willInitializeDuringMainThreadInitialization... or something).

Do people think this is worth the complexity? I'm not convinced mayNotBeMainThread is a bottleneck.
Agreed. It wouldn't be worth the complexity.

Sign in to add a comment