On Mac: SetCurrentThreadPriority(default_priority) causes degradation on threads that should already be default_priorty |
||||||||||||
Issue descriptionThe addition of a single call to them on startup caused a ~10% regression... ( issue 595043 ). This core API needs to be fast and it needs to be investigated why a single call to it causes a drastic performance regression.
,
Apr 11 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a66ec9b588056bc350fd6d91fc6ef2256b5bbe81 commit a66ec9b588056bc350fd6d91fc6ef2256b5bbe81 Author: erikchen <erikchen@chromium.org> Date: Mon Apr 11 21:27:12 2016 Update comment in startup_metric_utils. BUG=601270 Review URL: https://codereview.chromium.org/1878573006 Cr-Commit-Position: refs/heads/master@{#386477} [modify] https://crrev.com/a66ec9b588056bc350fd6d91fc6ef2256b5bbe81/components/startup_metric_utils/browser/startup_metric_utils.cc
,
Apr 11 2016
I wasn't planning on looking into this in the near future, so if you want to raise the priority, you should also find another owner.
,
Apr 12 2016
Erik, you landed this API and know the Mac team better than I do, can you please find another owner? As-is this API is unviable IMO.
,
Apr 12 2016
Do you think using that using a base::ThreadLocalPointer to store the current priority would be more efficient than [[NSThread currentThread] threadDictionary]?
,
Apr 12 2016
mark@, can you take a look at this and try to figure out how difficult it would be?
,
Apr 12 2016
It seems unlikely that calling NSThread causes a 10%+ regression. It's more likely that calling SetPriorityNormal() actually changes the priority of the current thread [even when it shouldn't], which causes a priority inversion. I bet that the metrics recorded from Telemetry are pulled from this thread, so lowering the priority of that thread will artificially induce a regression in metrics.
,
Apr 12 2016
If the theory in #7 is correct, it might be worthwhile to consider if this should set a new baseline for the metric.
,
Apr 12 2016
I can reproduce the regression locally, so I'm running a manual bisect to determine exactly which line is responsible for the regression.
,
Apr 12 2016
Changing the thread priority on the main thread causes a regression. [I've tried SetPriorityNormal() and [[NSThread currentThread] setPriority:1]. I'm guessing it starts with a very high priority level, and we're accidentally lowering it.
,
Apr 12 2016
Interesting! I guess that's because GetCurrentThreadPriority() returns ThreadPriority::NORMAL on Mac until Chrome explicitly sets a priority (and then it returns the latest value set through Chrome), is there no way on Mac to get the actual thread priority at any given time?
,
Apr 12 2016
To the best of my knowledge, there are 3 APIs for getting/setting thread priority. https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming/scheduler/scheduler.html#//apple_ref/doc/uid/TP30000905-CH211-TPXREF109 The NSThread interface claims that the thread priority for the main thread is 0.5. Setting the thread priority to 0.5 causes a performance regression. pthread_getschedparam claims that the main thread has: policy: 1 sched_priority: 31 Passing these values back to pthread_setschedparam causes a performance regression. The Mach interface (http://opensource.apple.com//source/xnu/xnu-1456.1.26/osfmk/mach/thread_policy.h?txt) claims that the main thread has identical scheduling to THREAD_STANDARD_POLICY, and does NOT have a THREAD_PRECEDENCE_POLICY or THREAD_TIME_CONSTRAINT_POLICY. Setting the scheduling to THREAD_STANDARD_POLICY causes a performance regression. This suggests two possibilities: 1) There is something "magical" about scheduling of the main thread that is not captured in the public APIs, and using the public APIs breaks this magic. 2) Changing the scheduling of the main thread (even back to its current setting) somehow tickles the scheduler. To distinguish between these, I tried setting the priority of the main thread as early as possible (chrome_main.cc). Doing so also caused a performance regression. So far as I can tell, we can /never/ change the priority of the main thread without causing a performance regression. Anyone else have insight into this? mark?
,
Apr 13 2016
I don’t know how NSThread priorities map to underlying scheduling details. That’s unfortunate, it seems like something someone should know (and document). Because a single double (heh) doesn’t cleanly map to the wider variety of options exposed by the kernel interface, it’s not surprising that “get” returns what they call the “typical” value of 0.5 because NSThread doesn’t actually know anything about the scheduling details or how to make sense of them at that point, but a “set” of the same value runs through NSThread’s algorithm and imposes some other policy at the kernel level. Have you done a before/after thread_policy_get() to see what’s up? I will point out that the link to xnu source in comment 12 is for xnu-1456.1.26, which is Mac OS X 10.6’s kernel. Not that this file has changed in a significant way since then (it’s only seen the addition of THREAD_BACKGROUND_POLICY), but if you were looking around in that directory to see how the code worked, you were looking at something that we don’t run on any longer.
,
Apr 13 2016
To clarify: I generally look at 10.9.5 source, and then I google for a link that I can throw in this thread. I tested all 3 APIs live in Chrome [NSThread, pthread, Mach], and all showed the thread as having "default" priority, but setting the priority to default degraded performance. Beneath the hood, NSThread priority uses thread_policy_set(0x3, ...)
,
Apr 18 2016
mark: Ping?
,
Apr 19 2016
I don’t know. Am I interpreting correctly that thread_policy_get() came back at you with *get_default = true when you had specified false during the call? What else did it have to say?
,
Apr 19 2016
(and for what policies?)
,
Apr 19 2016
"""
thread_extended_policy_data_t policy;
boolean_t get_default = false;
mach_msg_type_number_t count = THREAD_EXTENDED_POLICY_COUNT;
kern_return_t result = thread_policy_get(
mach_thread_id, THREAD_EXTENDED_POLICY,
reinterpret_cast<thread_policy_t>(&policy), &count, &get_default);
CHECK(result == KERN_SUCCESS);
LOG(ERROR) << "extended get default: " << (int)get_default;
LOG(ERROR) << "extended timeshare: " << (int)policy.timeshare;
}
{
thread_precedence_policy_data_t policy;
boolean_t get_default = false;
mach_msg_type_number_t count = THREAD_PRECEDENCE_POLICY_COUNT;
kern_return_t result = thread_policy_get(
mach_thread_id, THREAD_PRECEDENCE_POLICY,
reinterpret_cast<thread_policy_t>(&policy), &count, &get_default);
CHECK(result == KERN_SUCCESS);
LOG(ERROR) << "precedence get default: " << (int)get_default;
LOG(ERROR) << "precendence priority: " << policy.importance;
}
{
thread_time_constraint_policy_data_t policy;
boolean_t get_default = false;
mach_msg_type_number_t count = THREAD_TIME_CONSTRAINT_POLICY_COUNT;
kern_return_t result = thread_policy_get(
mach_thread_id, THREAD_TIME_CONSTRAINT_POLICY,
reinterpret_cast<thread_policy_t>(&policy), &count, &get_default);
CHECK(result == KERN_SUCCESS);
LOG(ERROR) << "time constrain get default: " << (int)get_default;
}
"""
Returns the same values both before and after the thread policy is set to THREAD_STANDARD_POLICY:
"""
[0419/141346:ERROR:platform_thread_mac.mm(182)] extended get default: 0
[0419/141346:ERROR:platform_thread_mac.mm(183)] extended timeshare: 1
[0419/141346:ERROR:platform_thread_mac.mm(193)] precedence get default: 0
[0419/141346:ERROR:platform_thread_mac.mm(194)] precendence priority: 0
[0419/141346:ERROR:platform_thread_mac.mm(204)] time constrain get default: 1
"""
,
Apr 19 2016
Seems really weird to me, then. But the scheduler internals are pretty convoluted. I’ll try to make another pass at xnu to see why this might be, but it does appear to be unexpected and weird. There are some aspects of the scheduler internals that make it seem like maybe successive calls to thread_policy_set would result in certain things accumulating rather than being set outright. That could potentially explain the behavior we’re seeing, but I wasn’t able to pull on that thread long or hard enough to be certain.
,
Apr 21 2016
-> mark for looking into XNU. Set back to Untriaged (or re-assign) depending on your findings.
,
Apr 21 2016
Just spent another half hour looking, didn’t find anything good. I couldn’t find a way to make successive calls do anything evil. The best thing I have now is that what xnu considers the default isn’t actually applied as what we would think of as the default. Some of these priority bits get inherited from a parent task. I wonder if we’d see different behavior if we looked for a perf regression with and without calling thread_policy_set() from a process descended from an ssh session as opposed to a UI launch or Terminal (which was itself a UI launch).
,
May 6 2016
[mac triage] to erik since you seem to have dug at this the most. (Should we just document "SetCurrentThreadPriority to say it shouldn't be called for the Main thread on Mac?).
,
May 6 2016
I think the browser scheduler team wants a platform-independent thread priority API. I was thinking we could no-op inside of SetThreadPriority, and lie in GetThreadPriority for the main thread.
,
Jun 1 2016
Moving this nonessential bug to the next milestone. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 14 2016
This issue has been moved once and is lower than Pri-1. Removing the milestone. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Aug 4 2016
,
Aug 5 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/23d10a36f2e815edd610098a65e5890f279d2183 commit 23d10a36f2e815edd610098a65e5890f279d2183 Author: erikchen <erikchen@chromium.org> Date: Fri Aug 05 21:16:13 2016 Reland #1: "base: Implement GetCurrentThreadPriority." The Chrome thread priority is saved in the thread dictionary during SetCurrentThreadPriority(). This seemed a better approach than using thread_policy_get(), since there are 4 Chrome priorities which won't cleanly map to thread flavors. BUG=601270 Review-Url: https://codereview.chromium.org/2215513002 Cr-Commit-Position: refs/heads/master@{#410163} [modify] https://crrev.com/23d10a36f2e815edd610098a65e5890f279d2183/base/threading/platform_thread_mac.mm [modify] https://crrev.com/23d10a36f2e815edd610098a65e5890f279d2183/base/threading/platform_thread_unittest.cc [modify] https://crrev.com/23d10a36f2e815edd610098a65e5890f279d2183/components/startup_metric_utils/browser/startup_metric_utils.cc
,
Aug 8 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/23c669b5b4cf228dab6da2cc47293b4650a7b923 commit 23c669b5b4cf228dab6da2cc47293b4650a7b923 Author: erikchen <erikchen@chromium.org> Date: Mon Aug 08 17:52:44 2016 Revert of Reland #1: "base: Implement GetCurrentThreadPriority." (patchset #2 id:20001 of https://codereview.chromium.org/2215513002/ ) Reason for revert: Still causes startup regressions: https://bugs.chromium.org/p/chromium/issues/detail?id=635464#c4 Original issue's description: > Reland #1: "base: Implement GetCurrentThreadPriority." > > The Chrome thread priority is saved in the thread dictionary during > SetCurrentThreadPriority(). This seemed a better approach than using > thread_policy_get(), since there are 4 Chrome priorities which won't cleanly map > to thread flavors. > > BUG=601270 > > Committed: https://crrev.com/23d10a36f2e815edd610098a65e5890f279d2183 > Cr-Commit-Position: refs/heads/master@{#410163} TBR=mark@chromium.org,gab@chromium.org # Not skipping CQ checks because original CL landed more than 1 days ago. BUG=601270 Review-Url: https://codereview.chromium.org/2221083002 Cr-Commit-Position: refs/heads/master@{#410390} [modify] https://crrev.com/23c669b5b4cf228dab6da2cc47293b4650a7b923/base/threading/platform_thread_mac.mm [modify] https://crrev.com/23c669b5b4cf228dab6da2cc47293b4650a7b923/base/threading/platform_thread_unittest.cc [modify] https://crrev.com/23c669b5b4cf228dab6da2cc47293b4650a7b923/components/startup_metric_utils/browser/startup_metric_utils.cc
,
Nov 6
Could it be that Mac OS applies higher priority to the main thread for a graphical application? I'm looking into a performance issue [1] where some code has been moved from running on the main thread in the browser process to the main thread of a utility process. Windows and Linux are just fine, but Mac has issues, and a theory is that the utility process main thread has lower priority somehow. It looks like there was a regression when switching the run loop from an NSApplication loop to an NSRunLoop [2] (not being a mac person, I don't know what this means :) ), so maybe the choice here affects the scheduling somehow. Does this make any kind of sense? [1] https://crbug.com/866442 [2] https://bugs.chromium.org/p/chromium/issues/detail?id=866442#c25
,
Nov 6
> Does this make any kind of sense? Yup, and your theory may be right. No one has dug deeply. The weird thing is that all public APIs available in user-space appear to suggest that the priority is not special, and configurable. However, IIRC, attempting to configure the priority [even if it's just to set to its current level], will cause a permanent performance reduction. This suggests that maybe there's some special, Cocoa-specific handling of the main thread.
,
Nov 7
Oh, I found [1] which looks highly interesting. If you squint a bit, it seems like Apple want to replace the thread priorities with the Quality of Service concept. Maybe setting the thread priority "deactivates" QoS scheduling and forces the system to consider the thread priority instead, which would cause behavior differences? [1] https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html
,
Nov 7
Hmm, so how about bumping ThreadPriority::DISPLAY up a bit from 0.5 in https://cs.chromium.org/chromium/src/base/threading/platform_thread_mac.mm?type=cs&sq=package:chromium&g=0&l=166 ?
,
Nov 7
c#32: It might be worth digging in to see how QoS is implemented. NSOperation is a fairly high-level abstraction which I thought was built on top of lower-level primitives [GCD, NSThread priority, etc.] c#33: Clarify? We don't have any problems setting priority of non-main threads. Attempting to set the priority of the main thread to anything [including 0.5 or 1] appears to degrade performance. See c#14 for the low-level APIs I tested.
,
Nov 7
Ah, I see. Never mind then, if higher values doesn't work I'm out of ideas :). |
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by gab@chromium.org
, Apr 11 2016