[TaskScheduler] Thread creation/detach churn [Being *the* idle thread should count as being active] |
||||||||||
Issue description@ https://uma.googleplex.com/histograms?endDate=20180527&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial We see that TaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool is 0 99.9% of the time (i.e. barely ever use created threads) Furthermore : TaskScheduler.DetachDuration.Browser.BackgroundPool is in under 1 second 89% of the time (i.e. tend to recreate threads after a very short timeout). Other pools exhibit this problem too but BackgroundPool is the worst. fdoray@ and I think this is because TaskScheduler has a policy to always keep one idle thread. But being *the* idle thread doesn't count as being "active". As such if the pool regularly has a single task, the first thread is used and the pool creates a second thread to replace it as *the* idle thread. That thread is never used if the pool doesn't tend to have more than one task. As such it's later recycled and created again for the same reasons later. If a timer happens to line up with the detach duration (30 seconds in the case of BackgroundPool at the moment), then the replacing idle thread will tend to be created right after being detached. The fix is to consider the idle thread on top of the stack as "active".
,
Jun 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4e4da122f4deb30a9b90713f2fc8f819057417d1 commit 4e4da122f4deb30a9b90713f2fc8f819057417d1 Author: Gabriel Charette <gab@chromium.org> Date: Wed Jun 06 19:17:10 2018 [TaskScheduler] Use TaskSchedulerWorkerPoolImplTestBase for common code of StandyPolicy tests In preparation for further such tests for crbug.com/847501 Add support for custom reclaim time to TaskSchedulerWorkerPoolImplTestBase and remove its requirement for workers being idle before JoinForTesting() (it was overkill after task_tracker_.FlushForTesting() and worker_pool_->JoinForTesting() already waits for the workers to return from running their last task). R=fdoray@chromium.org Bug: 847501 Change-Id: I4437163427edf4778647fe1fe9e14f9332c466db Reviewed-on: https://chromium-review.googlesource.com/1085383 Reviewed-by: François Doray <fdoray@chromium.org> Commit-Queue: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/heads/master@{#564995} [modify] https://crrev.com/4e4da122f4deb30a9b90713f2fc8f819057417d1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Jun 7 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/903fd04e262f060fecf5ed75549076681e63049d commit 903fd04e262f060fecf5ed75549076681e63049d Author: Gabriel Charette <gab@chromium.org> Date: Thu Jun 07 16:24:08 2018 [TaskScheduler] Consider the idle thread on top of the idle stack as active. Fixing crbug.com/847501 and adding two regression tests for it. Added SchedulerWorker::GetLastUsedTime() to support this fix and re-used it to suppress the need for the delegate's |is_on_idle_worker_stack_|. Tweaked WaitForWorkersCleanedUpForTesting() to allow being called multiple times. R=fdoray@chromium.org Bug: 847501 Change-Id: I83f814358f679bdbee49778963ff9d05cd240adc Reviewed-on: https://chromium-review.googlesource.com/1086978 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#565288} [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker.cc [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker.h [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack.cc [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack.h [modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack_unittest.cc
,
Jun 15 2018
Hmmm interesting, that doesn't seem to have changed anything in the problematic metrics... 05/27 : https://uma.googleplex.com/histograms?endDate=20180527&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool%2CTaskScheduler.TaskLatencyMicroseconds.Browser.BackgroundTaskPriority&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial vs 06/12 : https://uma.googleplex.com/histograms?endDate=20180612&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool%2CTaskScheduler.TaskLatencyMicroseconds.Browser.BackgroundTaskPriority&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
,
Jun 15 2018
The expected impact of your CL is that the "idle" thread lives longer before being reclaimed. An expected side-effect of this is that the total number of thread creation/destruction goes down. Unfortunately, we don't have histograms that measure the lifetime of threads or the number of thread creations per minute of Chrome usage directly. What we have is: - DetachDuration: Let's say that your CL reduces the number of thread detach. There will be less samples in DetachDuration histograms, but it's unclear how the duration of detach that happen will be affected. - NumTasksBeforeDetach: Even if the idle thread has a longer lifetime, if we rarely have multiple background tasks posted concurrently, the idle thread will not run tasks and this metric won't be affected. Shall we record these histograms every minute for each pool? 1) Number of threads created 2) Maximum number of threads that existed at the same time 3) Maximum number of tasks running at the same time It could be our goal to minimize 1) and have matching distributions for 2) and 3). DetachDuration and NumTaskBeforeDetach don't seem to me like things that we want to optimize directly.
,
Jun 18 2018
I expected the total count of detach related metrics to go down (since we'd detach less often), this seems to be the case but not in sync with my CL so unclear what caused this (could just be usage noise on Canary). I also expected less hits in the milliseconds range of DetachDuration metrics (since noticing very fast recreation of threads is what had us figure out this issue in the first place) but that didn't seem to be impacted by this change. I agree we should record high-level task scheduler metrics on a timer (most likely the existing service thread metrics heartbeat though if want total-threads-created since last heartbeat, once a minute sounds better than once per 59 minutes, might as well just reduce the latency of the existing heartbeat though, more reports won't hurt).
,
Jul 23
Looking at graphs again : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=258d3d35cc48983a3314c646721676ec Looks like it had regressed around May 12 (68.0.3428.0), recuperated for 68.0.3433 and regressed for good in 68.0.3434.0 There are no apparent task scheduler changes in these ranges (r559705 is the only one and I doubt it could have caused this). I suspect some CL merely triggered a recurring 30 seconds delayed background task, resulting in the issue described in the OP of this bug. r565288 should have made this better, but interestingly things started to get better just before it landed (perhaps a specifically poorly timed 30 seconds delay was removed...). Since r565288 though it's improved past pre-68.0.3428.0 levels, so clearly a win : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=4b3928b0d5b9bc0025ad6a99dd65acd5 I think we can do even better though by adding dampening factors (with exponential decay) on both thread creation and detach, daily stats still show that we often recreate a thread right after recycling (detaching) it : https://uma.googleplex.com/histograms?endDate=latest&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundBlockingPool%2CTaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.DetachDuration.Browser.ForegroundBlockingPool%2CTaskScheduler.DetachDuration.Browser.ForegroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial Dynamic detach durations would also allow us to get rid of the static param (|SchedulerWorkerPoolParams::suggested_reclaim_time_|) and avoid risking lining it up with a regularly used timer (e.g. 30 seconds) which we saw creates ill-conditions (currently poorly mitigated by having SchedulerWorkerPoolImpl::SchedulerWorkerDelegateImpl::GetSleepTimeout() return |suggested_reclaim_time_ * 1.1|...). I think this will be a good one to get etiennep@ going on task scheduler internals and metrics =D (target M69 for initial mitigation (r565288 -- sadly don't think we can merge this to M68); target M70 for dynamic dampening solution)
,
Jul 24
Actually, looking at it some more, I think we might want to merge r565288 to M68. I think it's a safe merge and something in M68 made the issue worse (could be any code posting a task with a delay matching the task scheduler's configured "detach duration") before that mitigated it. The dip in these metrics in M68 is *very* substantial : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=39adaefc94823246079bfbecce3f22ef That dip means that we're causing a lot of thread churn (creating/destroying threads every ~30 seconds) and I'm a lot more worried knowing that this is going to Stable without r565288 than with. Proving its worth some more, r565288 resulted in a 10% power improvement on some perf benchmarks : https://chromeperf.appspot.com/report?sid=d99ad2294f639cf4e0d0bbfa0e9aee1cac5a0352ef9f35d266f375f52466a608&start_rev=1527740490&end_rev=1528862688 Sorry for the very late merge request, I just realized the magnitude of the problem yesterday when digging into the results of this CL (had remained on my stack of things to do) and some more today when I noticed a 10% power improvement had been attributed to this CL a few days ago (also on my stack of things to look at). (will take ownership back while we resolve the M68-M69 story)
,
Jul 24
This bug requires manual review: Request affecting a post-stable build Please contact the milestone owner if you have questions. Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Jul 24
Please mark all OS's this is impacting.
,
Jul 24
What are the implications of going to stable with this bug? We have our release candidate and about to roll out to stable right now.
,
Jul 24
This is all OSes. I highlighted my reasoning in #8 (lots of thread churn if we go to Stable with this bug). Is that a big deal, I'm not sure... but I'm definitely more comfortable with the merge than without (I'd initially decided against a merge request, see comment #7, but noticing the 10% power improvement brought by r565288, which implies it recovers a 10% regression, tipped the scale this morning).
,
Jul 24
We are rolling out M68 stable shortly, so we can consider this for a future respin. I won't block today's release. How safe is this merge overall? Seems like this has been in Canary/Dev for a while. Is it changes in #1,2,3 that need to be merged?
,
Jul 24
No, just #3, yes it should be a safe merge
,
Jul 26
I am approving the merge so we can get this into M68 already in case of a respin. Android is planning to have one next Monday.
,
Jul 26
Thanks, I've tried to mitigate this via Finch somewhat through https://critique.corp.google.com/#review/206207607 Let's see how the metric looks on Monday and re-assess the need for a merge.
,
Jul 27
This needs to be merged today if it should be included in the respin though. Hopefully there is no need to do so.
,
Jul 27
Is there any additional merge needed for M69? If yes, pls request. Thank you.
,
Jul 27
@#17: Timeline still only has data for Wed 25th. can't tell yet : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=d7a144392039b8e5e4a977a8e741873a @#18: no merge required for M69
,
Jul 27
Ok, actually thinking through this some more https://critique.corp.google.com/#review/206207607 isn't going to cut it. Sadly, something in M68 triggered this issue in TaskScheduler and I think we must merge the fix (r565288) to Stable. Once again, apologies for the late merge, we hadn't realized something had made this issue much worse in M68 only. The Stable graphs already started plummeting and will follow the Beta graphs all the way down (maximum resource churn) if we don't merge : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=58af158f88c57bcad2365188f9de89dc I will initiate the merge. Thanks
,
Jul 27
Ah... git drover is failing me on Windows and somehow I cannot do a manual merge like I always have... (Getting fatal git errors about setting branch-heads/3440 as my upstream..). I've done this a hundred times before.. some migration must have put my checkouts in a bad state (same issue on laptop), don't have time to fix right now.. I have to head out now and will not have access to corp resources again until Sunday evening.. sorry. If someone else wants to merge I can review the diff from phone (ping me). There's a trivial merge conflict in scheduler_worker_pool_impl.cc (need to keep the version from HEAD but remove is_on_idle_worker_stack member). Thanks
,
Jul 30
Can you please merge this before noon (pdt) tomorrow?
,
Jul 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1 commit 8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1 Author: Gabriel Charette <gab@chromium.org> Date: Mon Jul 30 15:46:23 2018 [Merge M68] [TaskScheduler] Consider the idle thread on top of the idle stack as active. Fixing crbug.com/847501 and adding two regression tests for it. Added SchedulerWorker::GetLastUsedTime() to support this fix and re-used it to suppress the need for the delegate's |is_on_idle_worker_stack_|. Tweaked WaitForWorkersCleanedUpForTesting() to allow being called multiple times. R=fdoray@chromium.org TBR=gab@chromium.org (cherry picked from commit 903fd04e262f060fecf5ed75549076681e63049d) Bug: 847501 Change-Id: I83f814358f679bdbee49778963ff9d05cd240adc Reviewed-on: https://chromium-review.googlesource.com/1086978 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#565288} Reviewed-on: https://chromium-review.googlesource.com/1155002 Reviewed-by: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/branch-heads/3440@{#771} Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733} [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker.cc [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker.h [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack.cc [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack.h [modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack_unittest.cc
,
Jul 30
Ah, tested on 3440 and this breaks compile of base/task_scheduler/scheduler_worker_pool_impl_unittest.cc on branch, do we care? I tried to fix but am getting errors again (on my Linux machine as well) when trying to set branch-heads/3440 as an upstream (and therefore can't upload)... We can just revert changes to that test file on M68 (I confirmed manually that everything else works and I don't think we run unit tests on Stable branch anyways?) Can someone with a working 3440 branch do this please? Thanks!
,
Jul 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3e77c211ad44ed4759a936e93c195a542a9df471 commit 3e77c211ad44ed4759a936e93c195a542a9df471 Author: Gabriel Charette <gab@chromium.org> Date: Mon Jul 30 17:42:42 2018 [M68] Fix base_unittests compile on branch Reverting scheduler_worker_pool_impl_unittest.cc changes on branch as they were based on a refactor whcih isn't on branch and thus break compile. Having these unittests on branch isn't necessary. I've confirmed locally that all other TaskScheduler base_unittests still pass with this chnage. TBR=fdoray@chromium.org Bug: 847501 Change-Id: Idefed0181cdc0da226503e2fcaef92a71cc0178c Reviewed-on: https://chromium-review.googlesource.com/1155106 Reviewed-by: Gabriel Charette <gab@chromium.org> Cr-Commit-Position: refs/branch-heads/3440@{#773} Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733} [modify] https://crrev.com/3e77c211ad44ed4759a936e93c195a542a9df471/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Aug 6
The metric has started to recover on Beta/Stable. Beta : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=d7a144392039b8e5e4a977a8e741873a Stable : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=25f1c6eef63d5ca6c59ac20fd163767d Back to etiennep@ for the longer term fix detailed in comment #7 in this bug (dampening thread creation and thread recycling as part of familiarizing himself with TaskScheduler).
,
Aug 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/730a9398cf76849f7deefc0366aa32823b2bab2d commit 730a9398cf76849f7deefc0366aa32823b2bab2d Author: Etienne Pierre-doray <etiennep@chromium.org> Date: Mon Aug 13 22:39:31 2018 [TaskScheduler]: Add histogram for number of worker in WorkerPool. This CL adds TaskScheduler.NumWorkers histogram to measure number of threads created in each WorkerPool every hour. Bug: 847501 Change-Id: I5e69f60320231c6ab60d62adb986eecacb15778e Reviewed-on: https://chromium-review.googlesource.com/1164242 Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Reviewed-by: Robert Kaplow (slow) <rkaplow@chromium.org> Reviewed-by: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#582724} [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl_unittest.cc [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread.cc [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread.h [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread_unittest.cc [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/task_scheduler_impl.cc [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/task_scheduler_impl.h [modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/tools/metrics/histograms/histograms.xml
,
Oct 1
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f commit 8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Mon Oct 01 19:56:15 2018 Add BrowserScheduler to fieldtrial testing config. Only DetachTimeFg16x is added because it is the most different variation from control, while other variations test values in between. Bug: 847501 Change-Id: I12da837873f86345e36cfb9577a7a803015baa40 Reviewed-on: https://chromium-review.googlesource.com/1249805 Reviewed-by: Robert Kaplow (sloooow) <rkaplow@chromium.org> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Cr-Commit-Position: refs/heads/master@{#595520} [modify] https://crrev.com/8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f/testing/variations/fieldtrial_testing_config.json
,
Oct 5
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a048280752fe6327990e38c2c957a1a38edf593f commit a048280752fe6327990e38c2c957a1a38edf593f Author: Etienne Pierre-doray <etiennep@chromium.org> Date: Fri Oct 05 15:47:59 2018 Add android/ios to BrowserScheduler fieldtrial testing config. Only DetachTimeFg16x is added because it is the most different variation from control, while other variations test values in between. Bug: 847501 Change-Id: Ifbf70af688249d48772779229e2af56a12e1b274 Reviewed-on: https://chromium-review.googlesource.com/c/1263042 Reviewed-by: Robert Kaplow (sloooow) <rkaplow@chromium.org> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Cr-Commit-Position: refs/heads/master@{#597143} [modify] https://crrev.com/a048280752fe6327990e38c2c957a1a38edf593f/testing/variations/fieldtrial_testing_config.json
,
Nov 20
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/edc4f91ef3bf95977ce43d0116b1600c6c5c4af7 commit edc4f91ef3bf95977ce43d0116b1600c6c5c4af7 Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Tue Nov 20 23:30:29 2018 [TaskScheduler]: Spawn workers from GetWork(). In an effort to avoid creating threads from the main thread, this CL avoids the creation of a new idle thread after waking up the current idle thread from the main thread. This is done by moving a call to MaintainAtLeastOneIdleWorkerLockRequired() from WakeUpOneWorkerLockRequired() to GetWork(). Note that a call to MaintainAtLeastOneIdleWorkerLockRequired() is still done before waking up a worker in WakeUpOneWorkerLockRequired(), in case no idle thread was created yet. Bug: 847501 Change-Id: Ib3903c117a781a72f9d6a6aaeceee6554e5b9803 Reviewed-on: https://chromium-review.googlesource.com/c/1338382 Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Reviewed-by: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#609848} [modify] https://crrev.com/edc4f91ef3bf95977ce43d0116b1600c6c5c4af7/base/task/task_scheduler/scheduler_worker_pool_impl.cc
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1e1792c4de47de3d42b680aed04f864bd5771ce6 commit 1e1792c4de47de3d42b680aed04f864bd5771ce6 Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Tue Nov 27 16:49:32 2018 [TaskScheduler]: Remove unecessary ScopedAllowWait in MessagePump. SyncWaiter::sw needs to be declared only used while idle in WaitableEvent to avoid being considered blocking. Bug: 847501 Change-Id: I9a70aa28e74dbaae5f4b1b8aa66bb27ed0b5ee28 Reviewed-on: https://chromium-review.googlesource.com/c/1349730 Reviewed-by: François Doray <fdoray@chromium.org> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Cr-Commit-Position: refs/heads/master@{#611146} [modify] https://crrev.com/1e1792c4de47de3d42b680aed04f864bd5771ce6/base/message_loop/message_pump_default.cc [modify] https://crrev.com/1e1792c4de47de3d42b680aed04f864bd5771ce6/base/synchronization/waitable_event_posix.cc
,
Nov 30
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3f40244cd1ae911072fdccc827fdf446e1a66005 commit 3f40244cd1ae911072fdccc827fdf446e1a66005 Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Fri Nov 30 18:09:48 2018 [TaskScheduler]: Create no detach below initial capacity feature Under this experiment, scheduler workers are only detached if the pool is above its initial capacity (threads that are created to replace blocked threads). 2 options were considered: Option A: Detach only when over initial capacity. Option B: Detach only when over current capacity (includes currently blocked threads in capacity). This might better handle the following case: At any given time, there is at least 1 blocked thread. On top of that, some periodic work uses all worker every 30s or so. The current capacity will encompass for the blocked thread and avoid detaching it periodically. Option A was picked because it is more conservative. Initial capacity is smaller or equal to current capacity, so detaching is closer to current behavior. We want to avoid having too many threads that aren't used. Bug: 847501 Change-Id: I0b116db54095767768b158d92f5f146249720b45 Reviewed-on: https://chromium-review.googlesource.com/c/1348863 Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#612698} [modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_features.cc [modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_features.h [modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_scheduler/scheduler_worker_pool_impl.cc
,
Dec 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0c82881c88cb9098581e3ad45034c7b1ca26f304 commit 0c82881c88cb9098581e3ad45034c7b1ca26f304 Author: Robert Flack <flackr@chromium.org> Date: Mon Dec 03 20:41:41 2018 Revert "[TaskScheduler]: Create no detach below initial capacity feature" This reverts commit 3f40244cd1ae911072fdccc827fdf446e1a66005. Reason for revert: Caused a data race running components_unittests with ScopedFeatureList. Sample failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20TSan%20Tests/30398 WARNING: ThreadSanitizer: data race (pid=9805) Write of size 8 at 0x55b63bfba2b0 by main thread: #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:284:27 (components_unittests+0x85bec63) #1 base::test::ScopedFeatureList::~ScopedFeatureList() base/test/scoped_feature_list.cc:98:3 (components_unittests+0x9a65d52) #2 content::UnitTestTestSuite::~UnitTestTestSuite() content/public/test/unittest_test_suite.cc:61:1 (components_unittests+0xb03cf21) #3 operator() buildtools/third_party/libc++/trunk/include/memory:2325:5 (components_unittests+0x8256779) #4 reset buildtools/third_party/libc++/trunk/include/memory:2638 (components_unittests+0x8256779) #5 ~unique_ptr buildtools/third_party/libc++/trunk/include/memory:2592 (components_unittests+0x8256779) #6 ~__tuple_leaf buildtools/third_party/libc++/trunk/include/tuple:171 (components_unittests+0x8256779) #7 ~tuple buildtools/third_party/libc++/trunk/include/tuple:470 (components_unittests+0x8256779) #8 ~BindState base/bind_internal.h:871 (components_unittests+0x8256779) #9 base::internal::BindState<int (content::UnitTestTestSuite::*)(), std::__1::unique_ptr<content::UnitTestTestSuite, std::__1::default_delete<content::UnitTestTestSuite> > >::Destroy(base::internal::BindStateBase const*) base/bind_internal.h:874 (components_unittests+0x8256779) #10 Destruct base/callback_internal.cc:29:3 (components_unittests+0x85b80c7) #11 Release base/memory/ref_counted.h:403 (components_unittests+0x85b80c7) #12 Release base/memory/scoped_refptr.h:284 (components_unittests+0x85b80c7) #13 ~scoped_refptr base/memory/scoped_refptr.h:208 (components_unittests+0x85b80c7) #14 base::internal::CallbackBase::~CallbackBase() base/callback_internal.cc:84 (components_unittests+0x85b80c7) #15 base::LaunchUnitTests(int, char**, base::OnceCallback<int ()>) base/test/launcher/unit_test_launcher.cc:575:3 (components_unittests+0x9a71272) #16 main components/test/run_all_unittests.cc:8:10 (components_unittests+0x4e49ad5) Previous read of size 8 at 0x55b63bfba2b0 by thread T12 (mutexes: write M19294): #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:200:8 (components_unittests+0x85be82d) #1 CanCleanupLockRequired base/task/task_scheduler/scheduler_worker_pool_impl.cc:672:12 (components_unittests+0x866a5e8) #2 base::internal::SchedulerWorkerPoolImpl::SchedulerWorkerDelegateImpl::GetWork(base::internal::SchedulerWorker*) base/task/task_scheduler/scheduler_worker_pool_impl.cc:538 (components_unittests+0x866a5e8) #3 base::internal::SchedulerWorker::RunWorker() base/task/task_scheduler/scheduler_worker.cc:324:51 (components_unittests+0x866f81d) #4 base::internal::SchedulerWorker::RunPooledWorker() base/task/task_scheduler/scheduler_worker.cc:229:3 (components_unittests+0x866f481) #5 base::internal::SchedulerWorker::ThreadMain() base/task/task_scheduler/scheduler_worker.cc:208:7 (components_unittests+0x866f2f1) #6 base::(anonymous namespace)::ThreadFunc(void*) base/threading/platform_thread_posix.cc:81:13 (components_unittests+0x86daf04) Location is global 'base::(anonymous namespace)::g_feature_list_instance' of size 8 at 0x55b63bfba2b0 (components_unittests+0x00000ebe52b0) Original change's description: > [TaskScheduler]: Create no detach below initial capacity feature > > Under this experiment, scheduler workers are only detached if the pool is > above its initial capacity (threads that are created to replace blocked threads). > > 2 options were considered: > Option A: Detach only when over initial capacity. > > Option B: Detach only when over current capacity (includes currently blocked threads in capacity). > This might better handle the following case: At any given time, there is at least 1 blocked thread. > On top of that, some periodic work uses all worker every 30s or so. The current capacity will > encompass for the blocked thread and avoid detaching it periodically. > > Option A was picked because it is more conservative. Initial capacity is smaller or > equal to current capacity, so detaching is closer to current behavior. We want to avoid having > too many threads that aren't used. > > Bug: 847501 > Change-Id: I0b116db54095767768b158d92f5f146249720b45 > Reviewed-on: https://chromium-review.googlesource.com/c/1348863 > Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> > Reviewed-by: François Doray <fdoray@chromium.org> > Cr-Commit-Position: refs/heads/master@{#612698} TBR=fdoray@chromium.org,etiennep@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 847501 Change-Id: I951f5c5701e2d296b2c4edef37648105c4911cf9 Reviewed-on: https://chromium-review.googlesource.com/c/1359127 Reviewed-by: Robert Flack <flackr@chromium.org> Commit-Queue: Robert Flack <flackr@chromium.org> Cr-Commit-Position: refs/heads/master@{#613229} [modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_features.cc [modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_features.h [modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_scheduler/scheduler_worker_pool_impl.cc
,
Dec 12
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a6fb79e9459a484d08b6d1837128b161686ac538 commit a6fb79e9459a484d08b6d1837128b161686ac538 Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Wed Dec 12 22:48:04 2018 Use singleton feature list in content::UnitTestTestSuite. Commit 3f40244cd1ae911072fdccc827fdf446e1a66005 had to be reverted due to a data race between ScopedFeatureList's destructor and a feature in the scheduler. Sample failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20TSan%20Tests/30398 There is no need for the feature list to be restored with ScopedFeatureList in the test launcher. Since the scheduler used for test launcher is leaked, it makes sense to have a singleton feature list instead of a ScopedFeatureList. Bug: 847501, 846380 Change-Id: I738ebebe5aa04871427399bcf012c072d7f0212c Reviewed-on: https://chromium-review.googlesource.com/c/1361441 Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#616090} [modify] https://crrev.com/a6fb79e9459a484d08b6d1837128b161686ac538/content/public/test/unittest_test_suite.cc [modify] https://crrev.com/a6fb79e9459a484d08b6d1837128b161686ac538/content/public/test/unittest_test_suite.h
,
Dec 13
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f9ef21866f48b676b0d91607cc8066e74e450dd0 commit f9ef21866f48b676b0d91607cc8066e74e450dd0 Author: Etienne Pierre-Doray <etiennep@chromium.org> Date: Thu Dec 13 14:45:42 2018 Reland "[TaskScheduler]: Create no detach below initial capacity feature" This is a reland of 3f40244cd1ae911072fdccc827fdf446e1a66005 Fix is here: https://chromium-review.googlesource.com/c/chromium/src/+/1361441 Original change's description: > [TaskScheduler]: Create no detach below initial capacity feature > > Under this experiment, scheduler workers are only detached if the pool is > above its initial capacity (threads that are created to replace blocked threads). > > 2 options were considered: > Option A: Detach only when over initial capacity. > > Option B: Detach only when over current capacity (includes currently blocked threads in capacity). > This might better handle the following case: At any given time, there is at least 1 blocked thread. > On top of that, some periodic work uses all worker every 30s or so. The current capacity will > encompass for the blocked thread and avoid detaching it periodically. > > Option A was picked because it is more conservative. Initial capacity is smaller or > equal to current capacity, so detaching is closer to current behavior. We want to avoid having > too many threads that aren't used. > > Bug: 847501 > Change-Id: I0b116db54095767768b158d92f5f146249720b45 > Reviewed-on: https://chromium-review.googlesource.com/c/1348863 > Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> > Reviewed-by: François Doray <fdoray@chromium.org> > Cr-Commit-Position: refs/heads/master@{#612698} Bug: 847501 Change-Id: I68a592151f965b8a297ece7cc192eeaf78a72cc8 Reviewed-on: https://chromium-review.googlesource.com/c/1374521 Commit-Queue: François Doray <fdoray@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#616309} [modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_features.cc [modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_features.h [modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_scheduler/scheduler_worker_pool_impl.cc
,
Dec 18
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6bec0ce9e2ff1c6433bae8203d12b839e8004688 commit 6bec0ce9e2ff1c6433bae8203d12b839e8004688 Author: Etienne Pierre-doray <etiennep@chromium.org> Date: Tue Dec 18 15:53:42 2018 [TaskScheduler]: Add histogram for number of active worker in WorkerPool. This CL adds TaskScheduler.NumActiveWorkers histogram to measure number of scheduled tasks in each WorkerPool every hour. Bug: 847501 Change-Id: I2a7ea9ebbd17e871d0cc1f06d1edf739d4d82334 Reviewed-on: https://chromium-review.googlesource.com/c/1380965 Reviewed-by: Gabriel Charette <gab@chromium.org> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org> Cr-Commit-Position: refs/heads/master@{#617504} [modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl_unittest.cc [modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/task_scheduler_impl.cc |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by bugdroid1@chromium.org
, Jun 4 2018