New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 653394 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 730693

Blocking:
issue 716187
issue 855121



Sign in to add a comment

Investigate using Callback::IsCancelled() in MessageLoop / TaskScheduler

Project Member Reported by dcheng@chromium.org, Oct 6 2016

Issue description

I'm going to measure how many tasks are actually cancelled in the current MessageLoop / WorkerPool implementations. If we see a fair number of cancellations, we may want to implement the same optimization Blink's task scheduler uses.
 
Alex, did you have some numbers for this already?
You're probably thinking of this which was for the renderer: https://docs.google.com/document/d/1IummzxGNDkgBdPOZFxyPPFhPIGLqrB5N8UpX65vOEjk/edit

I don't have any intuition over how frequent task cancellation is for the browser, however I would not be surprised if optimizations do proved worthwhile.
Measuring quantity should be pretty straightforward: I'm just going to instrument TaskAnnotator::RunTask and see what the raw numbers look like.

Measuring time saved will require an actual perf test.

The good news: MessagePump already has a perf test. Yay!
The bad news: the perf test hasn't actually been run in ages, as it crashes somewhere in the middle =)

Comment 4 by gab@chromium.org, Oct 6 2016

The TaskScheduler has no workload currently by default so you won't see much there :-).

We have a work item for ourselves to look into this once we do have a large workload.
I threw together a rough patch [1] to measure cancellations and did some emails / code review in the built Chrome binary.

The current numbers look something like this:

Histogram: TaskRunner.DispatchedTaskIsCancelled recorded 2801777 samples, mean = 0.0 (flags = 0x1)
0  ------------------------------------------------------------------------O (2735971 = 97.7%)
1  --O                                                                       (65806 = 2.3%) {97.7%}
2  O                                                                         (0 = 0.0%) {100.0%}


Histogram: TaskRunner.ExecutingDelayedTaskIsCancelled recorded 437362 samples, mean = 0.1 (flags = 0x41)
0  ------------------------------------------------------------------------O (384761 = 88.0%)
1  ----------O                                                               (52601 = 12.0%) {88.0%}
2  O                                                                         (0 = 0.0%) {100.0%}


Histogram: TaskRunner.ScheduledTaskIsCancelled recorded 260288 samples, mean = 0.1 (flags = 0x41)
0  ------------------------------------------------------------------------O (229681 = 88.2%)
1  ----------O                                                               (30607 = 11.8%) {88.2%}
2  O                                                                         (0 = 0.0%) {100.0%}


Histogram: TaskRunner.WokeUpTooSoonTaskIsCancelled recorded 3142498 samples, mean = 0.1 (flags = 0x41)
0  ------------------------------------------------------------------------O (2811652 = 89.5%)
1  --------O                                                                 (330846 = 10.5%) {89.5%}
2  O                                                                         (0 = 0.0%) {100.0%}

Several things to note:
- The message loop mostly dispatches immediate tasks:
  we dispatched ~2.8M immediate tasks and only ~430K delayed tasks.
- Cancelled task % has been trending down slowly over time.
- Unclear why we end up in WokeUpTooSoon so much: I suppose there's
  things other than 

In addition, the cost of dispatching / scheduling for the delayed task is unclear. It turns out we have a MessagePump perf test, which still builds. However, no one has run it in a long time it seems: the perftest aborts before the run is complete.


[1] https://codereview.chromium.org/2398263002/
Interesting. The ratio of cancelled tasks seems high enough that we might want to prune them -- especially for the extra wakeups. WDYT?

Comment 7 by gab@chromium.org, Oct 7 2016

Cc: fdoray@chromium.org
Re. WokeUpTooSoon (on Windows) : from this doc (https://docs.google.com/document/d/1dYaoZTcdzFQCf7hg3azMontkjzF_zzcKEtxPb7VET6c/edit), it appears we might be under sleeping by a tiny amount fairly often on Windows (fdoray repro'ed this in the lab for task scheduler too and we are exploring solutions):

"when message pump waits for a delayed work it is very likely to get out of sleep before it is time to run the earliest delayed task"

Comment 8 by gab@chromium.org, Oct 7 2016

Actually it does look like an attempt to address this was made : https://cs.chromium.org/chromium/src/base/message_loop/message_pump_default.cc?rcl=1475824766&l=60

but under sleeping is still a possibility (ref. WaitableEvent::TimedWait() in waitable_event_win.cc).
Yeah, I think it might be worth trying. I'll see what the numbers look like after leaving Chrome idle overnight when I get in today.

I actually prepared a patch for this [1] before I wrote the metrics, but it appears I don't know how to use the perf trybots.

One other thing I'd like to try out is measuring the cost of running cancelled tasks, but I need to fix the broken perftest first.

[1] https://codereview.chromium.org/2394833004/

Comment 10 by gab@chromium.org, Nov 7 2016

Components: Internals>TaskScheduler
Blocking: 716187
Ping, any update? 
Issue 716187 is an example where a cancellation functionality is handy. In the network stack, we post a delayed task for every socket connect. These socket connects finish quickly, but the delayed tasks are left over in IO thread's task queue for a long time. The more network requests we make the longer our delayed task queue is. It will be nice if we have a general solution for problems like this, as other parts of the codebase do use similar patterns (https://cs.chromium.org/search/?q=%26%5Ba-z:%5D*timeout+package:%5Echromium$&type=cs)
Project Member

Comment 13 by bugdroid1@chromium.org, May 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6a32dcf0c4968f022b2f603912ce2804606636e1

commit 6a32dcf0c4968f022b2f603912ce2804606636e1
Author: Daniel Cheng <dcheng@chromium.org>
Date: Wed May 31 21:32:10 2017

Use task cancellation status in base::MessageLoop.

MessageLoop now sweeps cancelled tasks from the front of the work queue
before dispatching or scheduling work. Previously, MessageLoop did not
use the task cancellation status. It often scheduled wakeups for
cancelled tasks or dispatched cancelled tasks, relying on base::Callback
to turn it into a no-op.

This was problematic for several reasons:
- Scheduling wakeups based on cancelled tasks means the wakeup is
  spurious.
- MessageLoop only processes one non-delayed and one delayed task per
  tick, so having many cancelled tasks results in spinning the message
  loop for awhile to drain them.

Bug: 653394
Change-Id: I7287d5d430c1791535f7274c56c40f1e3f8669a5
Reviewed-on: https://chromium-review.googlesource.com/505274
Commit-Queue: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: danakj <danakj@chromium.org>
Reviewed-by: Mike Wittman <wittman@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#476033}
[modify] https://crrev.com/6a32dcf0c4968f022b2f603912ce2804606636e1/base/message_loop/message_loop.cc
[modify] https://crrev.com/6a32dcf0c4968f022b2f603912ce2804606636e1/base/message_loop/message_loop.h
[modify] https://crrev.com/6a32dcf0c4968f022b2f603912ce2804606636e1/chrome/browser/metrics/thread_watcher.cc
[modify] https://crrev.com/6a32dcf0c4968f022b2f603912ce2804606636e1/chrome/browser/metrics/thread_watcher.h
[modify] https://crrev.com/6a32dcf0c4968f022b2f603912ce2804606636e1/chrome/browser/metrics/thread_watcher_unittest.cc

Comment 14 by gab@chromium.org, Jun 7 2017

Blockedon: 730693

Comment 15 by gab@chromium.org, Mar 13 2018

Cc: dcheng@chromium.org
Owner: gab@chromium.org
Hey Daniel, I'm going to assume you're not actively working on this.

I'll pick this up as part of addressing issue 786597 and issue 716187 which may need to drive this further. Thanks for prior work here!
How that the thread-safe WeakPtr::MaybeValid() and Callback::MaybeCancelled() have landed (or almost : https://chromium-review.googlesource.com/c/chromium/src/+/1144208).

I've put more thought into how we can do optimistic cancellation of TaskScheduler delayed tasks.

The tricky part is that task_scheduler::DelayedTaskManager posts its delayed tasks to service thread's MessageLoop (task_scheduler::ServiceThread). But MessageLoop doesn't know that it's not safe to delete these tasks there if cancelled.

So while MessageLoop (IncomingTaskQueue::DelayedQueue::HasTasks()) can start checking MaybeCancelled() instead of IsCancelled()), the tasks themselves need to ensure deletion on the proper sequence.

To do this, I think DelayedTaskManager::AddDelayedTaskNow() has to post a relay class that will do the right thing:

i.e.

void DelayedTaskManager::AddDelayedTaskNow(
    Task task,
    TimeDelta delay,
    PostTaskNowCallback post_task_now_callback) {
  ...

  // Posts |task| when destroyed. This ensure that |task| is handed to the right
  // sequence whether the underlying MessageLoop intended to run it or delete it
  // (e.g. if cancelled -- in that case it will no-op when it gets on its
  // sequence and be properly deleted there as its BindState expects).
  struct DelayedTaskRelay {
    ~DelayedTaskRelay() {
      std::move(post_task_now_callback).Run(std::move(task));
    }

    PostTaskNowCallback post_task_now_callback;
    Task task;
  };

  service_thread_task_runner_->PostDelayedTask(
      FROM_HERE, 
      BindOnce(DoNothing<DelayedTaskRelay>(), 
               DelayedTaskRelay{std::move(post_task_now_callback), std::move(task)}),
      delay);
  ...
}

Also : the DelayedTaskRelay will need custom CallbackCancellationTraits in order to forward the cancellation state of |task|.
Rather than the overhead of the relay wrapper in DelayedTaskManager, could we collate the list of cancelled Tasks, when we check MaybeCancelled(), into e.g. a vector<> that can be posted to the target Sequence to delete them?
The thing is, it's the ServiceThread's MessageLoop which holds the delayed tasks. But MessageLoop doesn't know where it's supposed to run/delete tasks (it assumes on its bound thread -- and ServiceManager gives it an indirection for tasks to bounce when ran (never cancelled at the moment per indirection losing the IsCancelled() bit)).

To do what you have in mind we'd have to have DelayedTaskManager keep the delayed tasks and only send the next shortest delay to MessageLoop (not a bad idea and in fact that's what I want to do with base::Timer : https://docs.google.com/document/d/11gcfpL-orj8SWY8bUBa1_ZGe8kPLUcjtGf2_WD7nBKw/edit)
Re #18: Yes, the idea would basically be to batch up cancelled tasks for deletion - I believe the only hard constraint is that each cancelled task must be deleted before any later task runs, so so long as we keep the batch in-order, and ensure to release the tasks in it sometime before the next non-cancelled task that follows them, then we otherwise have flexibility in precisely when/how that happens.
Deleting now vs later is not the tricky/costly part though. The tricky part is for the cancellation to be picked up by MessageLoop and somehow forwarded where it should happen.
Blocking: 855121
Re #20: How does the service-thread know where to _run_ the delayed tasks, but not know where to delete them?  Are you already wrapping delayed tasks with the sequence TaskRunner, so that when they're Run() they get PostTask'd to the sequence, or something?
Yes, see DelayedTaskManager::PostTaskNowCallback (these are what's queued on the service thread's MessageLoop)

Sign in to add a comment