New issue
Advanced search Search tips

Issue 847501 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , iOS , Chrome , Mac , Fuchsia
Pri: 1
Type: Bug



Sign in to add a comment

[TaskScheduler] Thread creation/detach churn [Being *the* idle thread should count as being active]

Project Member Reported by gab@chromium.org, May 29 2018

Issue description

@ https://uma.googleplex.com/histograms?endDate=20180527&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

We see that TaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool is 0 99.9% of the time (i.e. barely ever use created threads)

Furthermore : TaskScheduler.DetachDuration.Browser.BackgroundPool is in under 1 second 89% of the time (i.e. tend to recreate threads after a very short timeout).

Other pools exhibit this problem too but BackgroundPool is the worst.

fdoray@ and I think this is because TaskScheduler has a policy to always keep one idle thread. But being *the* idle thread doesn't count as being "active". As such if the pool regularly has a single task, the first thread is used and the pool creates a second thread to replace it as *the* idle thread. That thread is never used if the pool doesn't tend to have more than one task. As such it's later recycled and created again for the same reasons later.

If a timer happens to line up with the detach duration (30 seconds in the case of BackgroundPool at the moment), then the replacing idle thread will tend to be created right after being detached.

The fix is to consider the idle thread on top of the stack as "active".
 
Project Member

Comment 1 by bugdroid1@chromium.org, Jun 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/efb6a36b7ca76b2b4d98e198b375242f29173492

commit efb6a36b7ca76b2b4d98e198b375242f29173492
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Jun 04 18:51:20 2018

[TaskScheduler] Cleanup SchedulerWorkerPoolImpl tests WaitableEvents some more.

Take advantage of the new WaitableEvent defaults as a precursor to
refactoring TaskSchedulerWorkerPoolStandbyPolicyTests.

R=fdoray@chromium.org

Bug: 847501
Change-Id: Ic31a55fcaed0b2dd2cf596bab96351463257e85e
Reviewed-on: https://chromium-review.googlesource.com/1085378
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#564166}
[modify] https://crrev.com/efb6a36b7ca76b2b4d98e198b375242f29173492/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc

Project Member

Comment 2 by bugdroid1@chromium.org, Jun 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4e4da122f4deb30a9b90713f2fc8f819057417d1

commit 4e4da122f4deb30a9b90713f2fc8f819057417d1
Author: Gabriel Charette <gab@chromium.org>
Date: Wed Jun 06 19:17:10 2018

[TaskScheduler] Use TaskSchedulerWorkerPoolImplTestBase for common code of StandyPolicy tests

In preparation for further such tests for crbug.com/847501

Add support for custom reclaim time to
TaskSchedulerWorkerPoolImplTestBase and remove its requirement for
workers being idle before JoinForTesting() (it was overkill after
task_tracker_.FlushForTesting() and worker_pool_->JoinForTesting()
already waits for the workers to return from running their last task).

R=fdoray@chromium.org

Bug: 847501
Change-Id: I4437163427edf4778647fe1fe9e14f9332c466db
Reviewed-on: https://chromium-review.googlesource.com/1085383
Reviewed-by: François Doray <fdoray@chromium.org>
Commit-Queue: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#564995}
[modify] https://crrev.com/4e4da122f4deb30a9b90713f2fc8f819057417d1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc

Project Member

Comment 3 by bugdroid1@chromium.org, Jun 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/903fd04e262f060fecf5ed75549076681e63049d

commit 903fd04e262f060fecf5ed75549076681e63049d
Author: Gabriel Charette <gab@chromium.org>
Date: Thu Jun 07 16:24:08 2018

[TaskScheduler] Consider the idle thread on top of the idle stack as active.

Fixing crbug.com/847501 and adding two regression tests for it.

Added SchedulerWorker::GetLastUsedTime() to support this fix and re-used
it to suppress the need for the delegate's |is_on_idle_worker_stack_|.

Tweaked WaitForWorkersCleanedUpForTesting() to allow being called
multiple times.

R=fdoray@chromium.org

Bug: 847501
Change-Id: I83f814358f679bdbee49778963ff9d05cd240adc
Reviewed-on: https://chromium-review.googlesource.com/1086978
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#565288}
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker.cc
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker.h
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl.h
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack.cc
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack.h
[modify] https://crrev.com/903fd04e262f060fecf5ed75549076681e63049d/base/task_scheduler/scheduler_worker_stack_unittest.cc

Comment 5 by fdoray@chromium.org, Jun 15 2018

The expected impact of your CL is that the "idle" thread lives longer before being reclaimed. An expected side-effect of this is that the total number of thread creation/destruction goes down. Unfortunately, we don't have histograms that measure the lifetime of threads or the number of thread creations per minute of Chrome usage directly.

What we have is:

- DetachDuration: Let's say that your CL reduces the number of thread detach. There will be less samples in DetachDuration histograms, but it's unclear how the duration of detach that happen will be affected.

- NumTasksBeforeDetach: Even if the idle thread has a longer lifetime, if we rarely have multiple background tasks posted concurrently, the idle thread will not run tasks and this metric won't be affected.

Shall we record these histograms every minute for each pool?
1) Number of threads created
2) Maximum number of threads that existed at the same time
3) Maximum number of tasks running at the same time

It could be our goal to minimize 1) and have matching distributions for 2) and 3).

DetachDuration and NumTaskBeforeDetach don't seem to me like things that we want to optimize directly.

Comment 6 by gab@chromium.org, Jun 18 2018

I expected the total count of detach related metrics to go down (since we'd detach less often), this seems to be the case but not in sync with my CL so unclear what caused this (could just be usage noise on Canary).

I also expected less hits in the milliseconds range of DetachDuration metrics (since noticing very fast recreation of threads is what had us figure out this issue in the first place) but that didn't seem to be impacted by this change.

I agree we should record high-level task scheduler metrics on a timer (most likely the existing service thread metrics heartbeat though if want total-threads-created since last heartbeat, once a minute sounds better than once per 59 minutes, might as well just reduce the latency of the existing heartbeat though, more reports won't hurt).
Cc: gab@chromium.org
Labels: Target-70 RegressedIn-68 Target-69 FoundIn-69
Owner: etiennep@chromium.org
Looking at graphs again : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=258d3d35cc48983a3314c646721676ec

Looks like it had regressed around May 12 (68.0.3428.0), recuperated for 68.0.3433 and regressed for good in 68.0.3434.0 There are no apparent task scheduler changes in these ranges (r559705 is the only one and I doubt it could have caused this). I suspect some CL merely triggered a recurring 30 seconds delayed background task, resulting in the issue described in the OP of this bug.

r565288 should have made this better, but interestingly things started to get better just before it landed (perhaps a specifically poorly timed 30 seconds delay was removed...). Since r565288 though it's improved past pre-68.0.3428.0 levels, so clearly a win : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=4b3928b0d5b9bc0025ad6a99dd65acd5

I think we can do even better though by adding dampening factors (with exponential decay) on both thread creation and detach, daily stats still show that we often recreate a thread right after recycling (detaching) it : https://uma.googleplex.com/histograms?endDate=latest&dayCount=1&histograms=TaskScheduler.DetachDuration.Browser.BackgroundBlockingPool%2CTaskScheduler.DetachDuration.Browser.BackgroundPool%2CTaskScheduler.DetachDuration.Browser.ForegroundBlockingPool%2CTaskScheduler.DetachDuration.Browser.ForegroundPool%2CTaskScheduler.NumTasksBeforeDetach.Browser.BackgroundPool&fixupData=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Dynamic detach durations would also allow us to get rid of the static param (|SchedulerWorkerPoolParams::suggested_reclaim_time_|) and avoid risking lining it up with a regularly used timer (e.g. 30 seconds) which we saw creates ill-conditions (currently poorly mitigated by having SchedulerWorkerPoolImpl::SchedulerWorkerDelegateImpl::GetSleepTimeout() return |suggested_reclaim_time_ * 1.1|...).


I think this will be a good one to get etiennep@ going on task scheduler internals and metrics =D


(target M69 for initial mitigation (r565288 -- sadly don't think we can merge this to M68); target M70 for dynamic dampening solution)
Cc: etiennep@chromium.org
Labels: Merge-Request-68
Owner: gab@chromium.org
Actually, looking at it some more, I think we might want to merge r565288 to M68. I think it's a safe merge and something in M68 made the issue worse (could be any code posting a task with a delay matching the task scheduler's configured "detach duration") before that mitigated it.

The dip in these metrics in M68 is *very* substantial : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=39adaefc94823246079bfbecce3f22ef

That dip means that we're causing a lot of thread churn (creating/destroying threads every ~30 seconds) and I'm a lot more worried knowing that this is going to Stable without r565288 than with.

Proving its worth some more, r565288 resulted in a 10% power improvement on some perf benchmarks :
https://chromeperf.appspot.com/report?sid=d99ad2294f639cf4e0d0bbfa0e9aee1cac5a0352ef9f35d266f375f52466a608&start_rev=1527740490&end_rev=1528862688

Sorry for the very late merge request, I just realized the magnitude of the problem yesterday when digging into the results of this CL (had remained on my stack of things to do) and some more today when I noticed a 10% power improvement had been attributed to this CL a few days ago (also on my stack of things to look at).

(will take ownership back while we resolve the M68-M69 story)
Project Member

Comment 9 by sheriffbot@chromium.org, Jul 24

Labels: -Merge-Request-68 Hotlist-Merge-Review Merge-Review-68
This bug requires manual review: Request affecting a post-stable build
Please contact the milestone owner if you have questions.
Owners: cmasso@(Android), kariahda@(iOS), bhthompson@(ChromeOS), abdulsyed@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Please mark all OS's this is impacting. 
Labels: OS-Android
What are the implications of going to stable with this bug? We have our release candidate and about to roll out to stable right now. 
Labels: OS-Chrome OS-Fuchsia OS-iOS OS-Linux OS-Mac OS-Windows
This is all OSes.

I highlighted my reasoning in #8 (lots of thread churn if we go to Stable with this bug). Is that a big deal, I'm not sure... but I'm definitely more comfortable with the merge than without (I'd initially decided against a merge request, see comment #7, but noticing the 10% power improvement brought by r565288, which implies it recovers a 10% regression, tipped the scale this morning).
We are rolling out M68 stable shortly, so we can consider this for a future respin. I won't block today's release.  

How safe is this merge overall? Seems like this has been in Canary/Dev for a while. Is it changes in #1,2,3 that need to be merged?
No, just #3, yes it should be a safe merge
Labels: -Hotlist-Merge-Review -Merge-Review-68 Merge-Approved-68
I am approving the merge so we can get this into M68 already in case of a respin. Android is planning to have one next Monday.
Thanks, I've tried to mitigate this via Finch somewhat through https://critique.corp.google.com/#review/206207607

Let's see how the metric looks on Monday and re-assess the need for a merge.
This needs to be merged today if it should be included in the respin though. Hopefully there is no need to do so.
Is there any additional merge needed for M69? If yes, pls request. Thank you.
@#17: Timeline still only has data for Wed 25th. can't tell yet : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=d7a144392039b8e5e4a977a8e741873a

@#18: no merge required for M69
Ok, actually thinking through this some more https://critique.corp.google.com/#review/206207607 isn't going to cut it.

Sadly, something in M68 triggered this issue in TaskScheduler and I think we must merge the fix (r565288) to Stable. Once again, apologies for the late merge, we hadn't realized something had made this issue much worse in M68 only.

The Stable graphs already started plummeting and will follow the Beta graphs all the way down (maximum resource churn) if we don't merge : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=58af158f88c57bcad2365188f9de89dc

I will initiate the merge. Thanks
Ah... git drover is failing me on Windows and somehow I cannot do a manual merge like I always have... (Getting fatal git errors about setting branch-heads/3440 as my upstream..). I've done this a hundred times before.. some migration must have put my checkouts in a bad state (same issue on laptop), don't have time to fix right now..

I have to head out now and will not have access to corp resources again until Sunday evening.. sorry.

If someone else wants to merge I can review the diff from phone (ping me). There's a trivial merge conflict in scheduler_worker_pool_impl.cc (need to keep the version from HEAD but remove is_on_idle_worker_stack member).

Thanks
Can you please merge this before noon (pdt) tomorrow?
Project Member

Comment 23 by bugdroid1@chromium.org, Jul 30

Labels: -merge-approved-68 merge-merged-3440
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1

commit 8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Jul 30 15:46:23 2018

[Merge M68] [TaskScheduler] Consider the idle thread on top of the idle stack as active.

Fixing crbug.com/847501 and adding two regression tests for it.

Added SchedulerWorker::GetLastUsedTime() to support this fix and re-used
it to suppress the need for the delegate's |is_on_idle_worker_stack_|.

Tweaked WaitForWorkersCleanedUpForTesting() to allow being called
multiple times.

R=fdoray@chromium.org
TBR=gab@chromium.org

(cherry picked from commit 903fd04e262f060fecf5ed75549076681e63049d)

Bug: 847501
Change-Id: I83f814358f679bdbee49778963ff9d05cd240adc
Reviewed-on: https://chromium-review.googlesource.com/1086978
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#565288}
Reviewed-on: https://chromium-review.googlesource.com/1155002
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#771}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker.cc
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker.h
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl.h
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack.cc
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack.h
[modify] https://crrev.com/8a42615d99c82c25f65fa1dc5d4ffea3b9fe9aa1/base/task_scheduler/scheduler_worker_stack_unittest.cc

Ah, tested on 3440 and this breaks compile of base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
on branch, do we care? I tried to fix but am getting errors again (on my Linux machine as well) when trying to set branch-heads/3440 as an upstream (and therefore can't upload)...

We can just revert changes to that test file on M68 (I confirmed manually that everything else works and I don't think we run unit tests on Stable branch anyways?)

Can someone with a working 3440 branch do this please?

Thanks!
Project Member

Comment 25 by bugdroid1@chromium.org, Jul 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3e77c211ad44ed4759a936e93c195a542a9df471

commit 3e77c211ad44ed4759a936e93c195a542a9df471
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Jul 30 17:42:42 2018

[M68] Fix base_unittests compile on branch

Reverting scheduler_worker_pool_impl_unittest.cc changes on branch as
they were based on a refactor whcih isn't on branch and thus break
compile.
Having these unittests on branch isn't necessary. I've confirmed locally
that all other TaskScheduler base_unittests still pass with this chnage.

TBR=fdoray@chromium.org

Bug: 847501
Change-Id: Idefed0181cdc0da226503e2fcaef92a71cc0178c
Reviewed-on: https://chromium-review.googlesource.com/1155106
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/branch-heads/3440@{#773}
Cr-Branched-From: 010ddcfda246975d194964ccf20038ebbdec6084-refs/heads/master@{#561733}
[modify] https://crrev.com/3e77c211ad44ed4759a936e93c195a542a9df471/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc

Owner: etiennep@chromium.org
The metric has started to recover on Beta/Stable.

Beta : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=d7a144392039b8e5e4a977a8e741873a
Stable : https://uma.googleplex.com/p/chrome/timeline_v2/?sid=25f1c6eef63d5ca6c59ac20fd163767d

Back to etiennep@ for the longer term fix detailed in comment #7 in this bug (dampening thread creation and thread recycling as part of familiarizing himself with TaskScheduler).
Project Member

Comment 27 by bugdroid1@chromium.org, Aug 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/730a9398cf76849f7deefc0366aa32823b2bab2d

commit 730a9398cf76849f7deefc0366aa32823b2bab2d
Author: Etienne Pierre-doray <etiennep@chromium.org>
Date: Mon Aug 13 22:39:31 2018

[TaskScheduler]: Add histogram for number of worker in WorkerPool.

This CL adds TaskScheduler.NumWorkers histogram to measure number of threads
created in each WorkerPool every hour.

Bug: 847501
Change-Id: I5e69f60320231c6ab60d62adb986eecacb15778e
Reviewed-on: https://chromium-review.googlesource.com/1164242
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Reviewed-by: Robert Kaplow (slow) <rkaplow@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#582724}
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl.h
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/scheduler_worker_pool_impl_unittest.cc
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread.cc
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread.h
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/service_thread_unittest.cc
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/task_scheduler_impl.cc
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/base/task/task_scheduler/task_scheduler_impl.h
[modify] https://crrev.com/730a9398cf76849f7deefc0366aa32823b2bab2d/tools/metrics/histograms/histograms.xml

Project Member

Comment 28 by bugdroid1@chromium.org, Oct 1

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f

commit 8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Mon Oct 01 19:56:15 2018

Add BrowserScheduler to fieldtrial testing config.

Only DetachTimeFg16x is added because it is the most
different variation from control, while other variations
test values in between.


Bug: 847501
Change-Id: I12da837873f86345e36cfb9577a7a803015baa40
Reviewed-on: https://chromium-review.googlesource.com/1249805
Reviewed-by: Robert Kaplow (sloooow) <rkaplow@chromium.org>
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Cr-Commit-Position: refs/heads/master@{#595520}
[modify] https://crrev.com/8f9f2d7cc80e1cc8eb8c2292b83a4c24c404a09f/testing/variations/fieldtrial_testing_config.json

Project Member

Comment 29 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a048280752fe6327990e38c2c957a1a38edf593f

commit a048280752fe6327990e38c2c957a1a38edf593f
Author: Etienne Pierre-doray <etiennep@chromium.org>
Date: Fri Oct 05 15:47:59 2018

Add android/ios to BrowserScheduler fieldtrial testing config.

Only DetachTimeFg16x is added because it is the most
different variation from control, while other variations
test values in between.

Bug: 847501
Change-Id: Ifbf70af688249d48772779229e2af56a12e1b274
Reviewed-on: https://chromium-review.googlesource.com/c/1263042
Reviewed-by: Robert Kaplow (sloooow) <rkaplow@chromium.org>
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Cr-Commit-Position: refs/heads/master@{#597143}
[modify] https://crrev.com/a048280752fe6327990e38c2c957a1a38edf593f/testing/variations/fieldtrial_testing_config.json

Project Member

Comment 30 by bugdroid1@chromium.org, Nov 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/edc4f91ef3bf95977ce43d0116b1600c6c5c4af7

commit edc4f91ef3bf95977ce43d0116b1600c6c5c4af7
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Tue Nov 20 23:30:29 2018

[TaskScheduler]: Spawn workers from GetWork().

In an effort to avoid creating threads from the main thread,
this CL avoids the creation of a new idle thread after waking up
the current idle thread from the main thread.
This is done by moving a call to MaintainAtLeastOneIdleWorkerLockRequired()
from WakeUpOneWorkerLockRequired() to GetWork().
Note that a call to MaintainAtLeastOneIdleWorkerLockRequired() is still
done before waking up a worker in WakeUpOneWorkerLockRequired(), in case no
idle thread was created yet.

Bug: 847501
Change-Id: Ib3903c117a781a72f9d6a6aaeceee6554e5b9803
Reviewed-on: https://chromium-review.googlesource.com/c/1338382
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#609848}
[modify] https://crrev.com/edc4f91ef3bf95977ce43d0116b1600c6c5c4af7/base/task/task_scheduler/scheduler_worker_pool_impl.cc

Project Member

Comment 31 by bugdroid1@chromium.org, Nov 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1e1792c4de47de3d42b680aed04f864bd5771ce6

commit 1e1792c4de47de3d42b680aed04f864bd5771ce6
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Tue Nov 27 16:49:32 2018

[TaskScheduler]: Remove unecessary ScopedAllowWait in MessagePump.

SyncWaiter::sw needs to be declared only used while idle in WaitableEvent
to avoid being considered blocking.

Bug: 847501
Change-Id: I9a70aa28e74dbaae5f4b1b8aa66bb27ed0b5ee28
Reviewed-on: https://chromium-review.googlesource.com/c/1349730
Reviewed-by: François Doray <fdoray@chromium.org>
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Cr-Commit-Position: refs/heads/master@{#611146}
[modify] https://crrev.com/1e1792c4de47de3d42b680aed04f864bd5771ce6/base/message_loop/message_pump_default.cc
[modify] https://crrev.com/1e1792c4de47de3d42b680aed04f864bd5771ce6/base/synchronization/waitable_event_posix.cc

Project Member

Comment 32 by bugdroid1@chromium.org, Nov 30

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3f40244cd1ae911072fdccc827fdf446e1a66005

commit 3f40244cd1ae911072fdccc827fdf446e1a66005
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Fri Nov 30 18:09:48 2018

[TaskScheduler]: Create no detach below initial capacity feature

Under this experiment, scheduler workers are only detached if the pool is
above its initial capacity (threads that are created to replace blocked threads).

2 options were considered:
Option A: Detach only when over initial capacity.

Option B: Detach only when over current capacity (includes currently blocked threads in capacity).
This might better handle the following case: At any given time, there is at least 1 blocked thread.
On top of that, some periodic work uses all worker every 30s or so. The current capacity will
encompass for the blocked thread and avoid detaching it periodically.

Option A was picked because it is more conservative. Initial capacity is smaller or
equal to current capacity, so detaching is closer to current behavior. We want to avoid having
too many threads that aren't used.

Bug: 847501
Change-Id: I0b116db54095767768b158d92f5f146249720b45
Reviewed-on: https://chromium-review.googlesource.com/c/1348863
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#612698}
[modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_features.cc
[modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_features.h
[modify] https://crrev.com/3f40244cd1ae911072fdccc827fdf446e1a66005/base/task/task_scheduler/scheduler_worker_pool_impl.cc

Project Member

Comment 33 by bugdroid1@chromium.org, Dec 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0c82881c88cb9098581e3ad45034c7b1ca26f304

commit 0c82881c88cb9098581e3ad45034c7b1ca26f304
Author: Robert Flack <flackr@chromium.org>
Date: Mon Dec 03 20:41:41 2018

Revert "[TaskScheduler]: Create no detach below initial capacity feature"

This reverts commit 3f40244cd1ae911072fdccc827fdf446e1a66005.

Reason for revert: Caused a data race running components_unittests with ScopedFeatureList.

Sample failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20TSan%20Tests/30398

WARNING: ThreadSanitizer: data race (pid=9805)
  Write of size 8 at 0x55b63bfba2b0 by main thread:
    #0 base::FeatureList::ClearInstanceForTesting() base/feature_list.cc:284:27 (components_unittests+0x85bec63)
    #1 base::test::ScopedFeatureList::~ScopedFeatureList() base/test/scoped_feature_list.cc:98:3 (components_unittests+0x9a65d52)
    #2 content::UnitTestTestSuite::~UnitTestTestSuite() content/public/test/unittest_test_suite.cc:61:1 (components_unittests+0xb03cf21)
    #3 operator() buildtools/third_party/libc++/trunk/include/memory:2325:5 (components_unittests+0x8256779)
    #4 reset buildtools/third_party/libc++/trunk/include/memory:2638 (components_unittests+0x8256779)
    #5 ~unique_ptr buildtools/third_party/libc++/trunk/include/memory:2592 (components_unittests+0x8256779)
    #6 ~__tuple_leaf buildtools/third_party/libc++/trunk/include/tuple:171 (components_unittests+0x8256779)
    #7 ~tuple buildtools/third_party/libc++/trunk/include/tuple:470 (components_unittests+0x8256779)
    #8 ~BindState base/bind_internal.h:871 (components_unittests+0x8256779)
    #9 base::internal::BindState<int (content::UnitTestTestSuite::*)(), std::__1::unique_ptr<content::UnitTestTestSuite, std::__1::default_delete<content::UnitTestTestSuite> > >::Destroy(base::internal::BindStateBase const*) base/bind_internal.h:874 (components_unittests+0x8256779)
    #10 Destruct base/callback_internal.cc:29:3 (components_unittests+0x85b80c7)
    #11 Release base/memory/ref_counted.h:403 (components_unittests+0x85b80c7)
    #12 Release base/memory/scoped_refptr.h:284 (components_unittests+0x85b80c7)
    #13 ~scoped_refptr base/memory/scoped_refptr.h:208 (components_unittests+0x85b80c7)
    #14 base::internal::CallbackBase::~CallbackBase() base/callback_internal.cc:84 (components_unittests+0x85b80c7)
    #15 base::LaunchUnitTests(int, char**, base::OnceCallback<int ()>) base/test/launcher/unit_test_launcher.cc:575:3 (components_unittests+0x9a71272)
    #16 main components/test/run_all_unittests.cc:8:10 (components_unittests+0x4e49ad5)

  Previous read of size 8 at 0x55b63bfba2b0 by thread T12 (mutexes: write M19294):
    #0 base::FeatureList::IsEnabled(base::Feature const&) base/feature_list.cc:200:8 (components_unittests+0x85be82d)
    #1 CanCleanupLockRequired base/task/task_scheduler/scheduler_worker_pool_impl.cc:672:12 (components_unittests+0x866a5e8)
    #2 base::internal::SchedulerWorkerPoolImpl::SchedulerWorkerDelegateImpl::GetWork(base::internal::SchedulerWorker*) base/task/task_scheduler/scheduler_worker_pool_impl.cc:538 (components_unittests+0x866a5e8)
    #3 base::internal::SchedulerWorker::RunWorker() base/task/task_scheduler/scheduler_worker.cc:324:51 (components_unittests+0x866f81d)
    #4 base::internal::SchedulerWorker::RunPooledWorker() base/task/task_scheduler/scheduler_worker.cc:229:3 (components_unittests+0x866f481)
    #5 base::internal::SchedulerWorker::ThreadMain() base/task/task_scheduler/scheduler_worker.cc:208:7 (components_unittests+0x866f2f1)
    #6 base::(anonymous namespace)::ThreadFunc(void*) base/threading/platform_thread_posix.cc:81:13 (components_unittests+0x86daf04)

  Location is global 'base::(anonymous namespace)::g_feature_list_instance' of size 8 at 0x55b63bfba2b0 (components_unittests+0x00000ebe52b0)

Original change's description:
> [TaskScheduler]: Create no detach below initial capacity feature
> 
> Under this experiment, scheduler workers are only detached if the pool is
> above its initial capacity (threads that are created to replace blocked threads).
> 
> 2 options were considered:
> Option A: Detach only when over initial capacity.
> 
> Option B: Detach only when over current capacity (includes currently blocked threads in capacity).
> This might better handle the following case: At any given time, there is at least 1 blocked thread.
> On top of that, some periodic work uses all worker every 30s or so. The current capacity will
> encompass for the blocked thread and avoid detaching it periodically.
> 
> Option A was picked because it is more conservative. Initial capacity is smaller or
> equal to current capacity, so detaching is closer to current behavior. We want to avoid having
> too many threads that aren't used.
> 
> Bug: 847501
> Change-Id: I0b116db54095767768b158d92f5f146249720b45
> Reviewed-on: https://chromium-review.googlesource.com/c/1348863
> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
> Reviewed-by: François Doray <fdoray@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#612698}

TBR=fdoray@chromium.org,etiennep@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 847501
Change-Id: I951f5c5701e2d296b2c4edef37648105c4911cf9
Reviewed-on: https://chromium-review.googlesource.com/c/1359127
Reviewed-by: Robert Flack <flackr@chromium.org>
Commit-Queue: Robert Flack <flackr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#613229}
[modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_features.cc
[modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_features.h
[modify] https://crrev.com/0c82881c88cb9098581e3ad45034c7b1ca26f304/base/task/task_scheduler/scheduler_worker_pool_impl.cc

Project Member

Comment 34 by bugdroid1@chromium.org, Dec 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a6fb79e9459a484d08b6d1837128b161686ac538

commit a6fb79e9459a484d08b6d1837128b161686ac538
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Wed Dec 12 22:48:04 2018

Use singleton feature list in content::UnitTestTestSuite.

Commit 3f40244cd1ae911072fdccc827fdf446e1a66005 had to be reverted due to a
data race between ScopedFeatureList's destructor and a feature in the scheduler.
Sample failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20TSan%20Tests/30398
There is no need for the feature list to be restored with ScopedFeatureList in
the test launcher. Since the scheduler used for test launcher is leaked,
it makes sense to have a singleton feature list instead of a ScopedFeatureList.

Bug: 847501, 846380
Change-Id: I738ebebe5aa04871427399bcf012c072d7f0212c
Reviewed-on: https://chromium-review.googlesource.com/c/1361441
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Reviewed-by: Scott Violet <sky@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616090}
[modify] https://crrev.com/a6fb79e9459a484d08b6d1837128b161686ac538/content/public/test/unittest_test_suite.cc
[modify] https://crrev.com/a6fb79e9459a484d08b6d1837128b161686ac538/content/public/test/unittest_test_suite.h

Project Member

Comment 35 by bugdroid1@chromium.org, Dec 13

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f9ef21866f48b676b0d91607cc8066e74e450dd0

commit f9ef21866f48b676b0d91607cc8066e74e450dd0
Author: Etienne Pierre-Doray <etiennep@chromium.org>
Date: Thu Dec 13 14:45:42 2018

Reland "[TaskScheduler]: Create no detach below initial capacity feature"

This is a reland of 3f40244cd1ae911072fdccc827fdf446e1a66005
Fix is here: https://chromium-review.googlesource.com/c/chromium/src/+/1361441

Original change's description:
> [TaskScheduler]: Create no detach below initial capacity feature
>
> Under this experiment, scheduler workers are only detached if the pool is
> above its initial capacity (threads that are created to replace blocked threads).
>
> 2 options were considered:
> Option A: Detach only when over initial capacity.
>
> Option B: Detach only when over current capacity (includes currently blocked threads in capacity).
> This might better handle the following case: At any given time, there is at least 1 blocked thread.
> On top of that, some periodic work uses all worker every 30s or so. The current capacity will
> encompass for the blocked thread and avoid detaching it periodically.
>
> Option A was picked because it is more conservative. Initial capacity is smaller or
> equal to current capacity, so detaching is closer to current behavior. We want to avoid having
> too many threads that aren't used.
>
> Bug: 847501
> Change-Id: I0b116db54095767768b158d92f5f146249720b45
> Reviewed-on: https://chromium-review.googlesource.com/c/1348863
> Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
> Reviewed-by: François Doray <fdoray@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#612698}

Bug: 847501
Change-Id: I68a592151f965b8a297ece7cc192eeaf78a72cc8
Reviewed-on: https://chromium-review.googlesource.com/c/1374521
Commit-Queue: François Doray <fdoray@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#616309}
[modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_features.cc
[modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_features.h
[modify] https://crrev.com/f9ef21866f48b676b0d91607cc8066e74e450dd0/base/task/task_scheduler/scheduler_worker_pool_impl.cc

Project Member

Comment 36 by bugdroid1@chromium.org, Dec 18

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6bec0ce9e2ff1c6433bae8203d12b839e8004688

commit 6bec0ce9e2ff1c6433bae8203d12b839e8004688
Author: Etienne Pierre-doray <etiennep@chromium.org>
Date: Tue Dec 18 15:53:42 2018

[TaskScheduler]: Add histogram for number of active worker in WorkerPool.

This CL adds TaskScheduler.NumActiveWorkers histogram to measure number of scheduled tasks in each WorkerPool every hour.

Bug: 847501
Change-Id: I2a7ea9ebbd17e871d0cc1f06d1edf739d4d82334
Reviewed-on: https://chromium-review.googlesource.com/c/1380965
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Etienne Pierre-Doray <etiennep@chromium.org>
Cr-Commit-Position: refs/heads/master@{#617504}
[modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl.h
[modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/scheduler_worker_pool_impl_unittest.cc
[modify] https://crrev.com/6bec0ce9e2ff1c6433bae8203d12b839e8004688/base/task/task_scheduler/task_scheduler_impl.cc

Sign in to add a comment