New issue
Advanced search Search tips

Issue 850450 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Closed: Jul 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocking:
issue 786597



Sign in to add a comment

MessagePump can cause a busy-waiting loop when the timeout is larger than 100,000,000 seconds

Project Member Reported by horo@chromium.org, Jun 7 2018

Issue description

FreeBSD had a bug that is “timeout limited to 100,000,000 seconds".
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=18909

This bug was fixed in FreeBSD 11.1/i386 which was released July 2017. But Darwin XNU (the kernel of macOS) still has the bug.
https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/bsd/kern/kern_time.c#L605

So if the timeout is larger than 100,000,000 seconds, kevent() in kq_dispatch() returns EINVAL.
https://chromium.googlesource.com/chromium/src/+/9867e6aab3d874d0b9e61e1da0920a375cb5a4c2/base/third_party/libevent/kqueue.c#212
And kq_dispatch() returns -1.
https://chromium.googlesource.com/chromium/src/+/9867e6aab3d874d0b9e61e1da0920a375cb5a4c2/base/third_party/libevent/kqueue.c#218
And event_base_loop() returns -1.
https://chromium.googlesource.com/chromium/src/+/9867e6aab3d874d0b9e61e1da0920a375cb5a4c2/base/third_party/libevent/event.c#512

But MessagePumpLibevent::Run() doesn’t check the return value of event_base_loop().
https://chromium.googlesource.com/chromium/src/+/9867e6aab3d874d0b9e61e1da0920a375cb5a4c2/base/message_loop/message_pump_libevent.cc#247

So the loop is almost infinitely executed until delay becomes less than 100,000,000 seconds.


This is the root cause the 100% CPU usage issue of Synchronous XMLHttpRequest ( issue 848210 ) in M67.
This issue doesn't happen after 68.0.3435.0 when HeartbeatLatencyMicroseconds metric was introduced, because tasks to record latency histograms are periodically posted to the task queue.
https://chromium-review.googlesource.com/c/chromium/src/+/1059459

But we should fix the root cause.
 

Comment 1 by horo@chromium.org, Jun 7 2018

Owner: gab@chromium.org
Status: Assigned (was: Untriaged)
gab@
You are actively making changes in base/message_loop.
https://chromium.googlesource.com/chromium/src/+log/master/base/message_loop
Are you kind enough to fix this issue?
I'm not familiar with base/message_loop...

Comment 2 by horo@chromium.org, Jun 7 2018

Labels: Postmortem-Followup

Comment 3 by gab@chromium.org, Jun 7 2018

Status: Started (was: Assigned)
Oh wow, that's an interesting one =O! Looks like it's been a problem in the past too because we already have :

  DLOG_IF(WARNING, delay > kTaskDelayWarningThreshold)
      << "Requesting super-long task delay period of " << delay.InSeconds()
      << " seconds from here: " << from_here.ToString();

in IncomingTaskQueue::AddToIncomingQueue().

kTaskDelayWarningThreshold = 14 days

On Windows the maximum delay for ::SetTimer() (used by MessagePumpWin) is INT_MAX milliseconds == 24 days : https://msdn.microsoft.com/en-us/library/windows/desktop/ms644906(v=vs.85).aspx

On POSIX it seems the spec dictates that "the maximum timeout interval supported will be at least 31 days" : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=18909#c1

As such I suggest we turn that DLOG into a CHECK (14 days is already plenty). I'll put up a CL for this.

Thanks for the report!
Gab

Comment 4 by gab@chromium.org, Jun 8 2018

Blocking: 786597
Cc: fdoray@chromium.org
Uh oh! There are more of these..! My CL to ban delays > 14 days crashes some tests : https://chromium-review.googlesource.com/c/chromium/src/+/1090993

Some of this could actually explain issue 786597 (which I had so far always attributed to many medium-term delays being quickly cancelled).

Proposal:

I'll try to fix eggregious timeouts and add a safety check against them : https://chromium-review.googlesource.com/c/chromium/src/+/1090993

In parallel I'll also add metrics to figure out what the typical posted delays are with the plan of tweaking IncomingTaskQueue's DelayedTaskQueue to avoid having super long tasks in a priority queue (increasing complexity of the data structure for short-term tasks). Separating long delays into a simpler vector-based queue which keeps track of the min-timeout and is sorted only once when that's reached.

As for the relation with this bug, we could decide not to even make MessagePump aware of delays in the long queue (e.g. MessagePump will for sure wake-up for other reasons more than once an hour -- I'll add a metric for the actual sleep duration as well to confirm this).

Comment 5 by gab@chromium.org, Jun 9 2018

Labels: -Pri-3 FoundIn-67 TargetedFor-69 Pri-2
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 9 2018

Project Member

Comment 7 by bugdroid1@chromium.org, Jun 30 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f93ca115caf2976f5a799b1782aaddfd6544102e

commit f93ca115caf2976f5a799b1782aaddfd6544102e
Author: Gabriel Charette <gab@chromium.org>
Date: Sat Jun 30 05:35:30 2018

[MessageLoop] Add MessageLoop.ScheduledSleep.* metrics.

These metric help evaluate whether scheduled sleeps (especially long
ones) tend to be the cause of the next wakeup or not. This will
influence the upcoming redesign of delayed tasks management.

The time histogram was extended to 14 days since that's the maximum
possible cap we will enforce @  crbug.com/850450#c3 . With 50 buckets
that still yields 7 buckets under 16ms so we get proper resolution
in the fast range too.

This will also, as a side-effect of not doing delayed tasks logic
when ShouldQuitWhenIdle(), result in disabling hi-resolution timers
when the loop is QuitWhenIdle(). Not that this will matter much for power
per this mostly being a test-only operation but always nice to reduce
time spent under hi-res timers.

Bug:  850450 , 786597,  854237 
Change-Id: Ic552a512f534fc04162a3d6b53c6043621bb3fcf
Reviewed-on: https://chromium-review.googlesource.com/1094301
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Reviewed-by: Robert Kaplow <rkaplow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571763}
[modify] https://crrev.com/f93ca115caf2976f5a799b1782aaddfd6544102e/base/message_loop/message_loop.cc
[modify] https://crrev.com/f93ca115caf2976f5a799b1782aaddfd6544102e/base/message_loop/message_loop.h
[modify] https://crrev.com/f93ca115caf2976f5a799b1782aaddfd6544102e/tools/metrics/histograms/histograms.xml

Project Member

Comment 8 by bugdroid1@chromium.org, Jul 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/81626592441ad8517fb6acc8c85e4df4544222e4

commit 81626592441ad8517fb6acc8c85e4df4544222e4
Author: Gabriel Charette <gab@chromium.org>
Date: Thu Jul 19 23:22:46 2018

[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread

Addresses  crbug.com/860801  in time for the M69 branch while keeping the
diagnosis signals which will be used to assess future improvements
(MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle that is).

Removed MessageLoop.DelayedTaskQueue.PostedDelay as we've
learnt everything we needed from this one, that is :
We now know that while 99% of tasks are posted with a delay under 30s,
95% of users post at least one task in every portion of the 0ms-3hours
range at least once per day:
https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.DelayedTaskQueue.PostedDelay&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Removed MessageLoop.ScheduledSleep.* metrics as we've also
learnt what we wanted from it, that is :
The maximum sleep a MessageLoop ever completes is 1 hour (and
even then 99% of completed sleeps are under 1 second), all
longer delays are reliably always interrupted.
Win : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Mac : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Android : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Added a test for MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle
but it appears to be broken on MacOSX (prior to this CL). Added a note
to investigate on the metric but it shouldn't block this CL which
doesn't make it worse.

R=kylechar@chromium.org, rkaplow@chromium.org

Bug:  860801 ,  850450 , 786597
Change-Id: I7a693886ab4e1b9c8e9fbe8a64d906a95019a609
Reviewed-on: https://chromium-review.googlesource.com/1142589
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: kylechar <kylechar@chromium.org>
Reviewed-by: Robert Kaplow (OOO until 0730) <rkaplow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576686}
[modify] https://crrev.com/81626592441ad8517fb6acc8c85e4df4544222e4/base/message_loop/incoming_task_queue.cc
[modify] https://crrev.com/81626592441ad8517fb6acc8c85e4df4544222e4/base/message_loop/message_loop.cc
[modify] https://crrev.com/81626592441ad8517fb6acc8c85e4df4544222e4/base/message_loop/message_loop.h
[modify] https://crrev.com/81626592441ad8517fb6acc8c85e4df4544222e4/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/81626592441ad8517fb6acc8c85e4df4544222e4/tools/metrics/histograms/histograms.xml

Project Member

Comment 9 by bugdroid1@chromium.org, Jul 20

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5bd9b574444dc977a7b5c4b121df9d6f3bd957c7

commit 5bd9b574444dc977a7b5c4b121df9d6f3bd957c7
Author: Gabriel Charette <gab@chromium.org>
Date: Fri Jul 20 14:31:33 2018

[MessageLoop] Do not schedule sleeps > 1 day

Some platforms don't like timeouts > 100,000,000 seconds.

Note: I tried just banning long delays but it didn't work
(https://bugs.chromium.org/p/chromium/issues/detail?id=850450#c4).
And while we could control delays posted by our codebase (and we should
still try to address that), we can't control those set from Javascript.

R=kylechar@chromium.org

Bug:  850450 
Change-Id: Iaf1b983b7af1954412518ed049f6cd0a3f78ba2a
Reviewed-on: https://chromium-review.googlesource.com/1142602
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: kylechar <kylechar@chromium.org>
Cr-Commit-Position: refs/heads/master@{#576857}
[modify] https://crrev.com/5bd9b574444dc977a7b5c4b121df9d6f3bd957c7/base/message_loop/message_loop.cc
[modify] https://crrev.com/5bd9b574444dc977a7b5c4b121df9d6f3bd957c7/base/message_loop/message_loop.h

Status: Fixed (was: Started)
We never schedule sleeps > 1 day = 86,400 seconds anymore.

(would still be nice to fix very long delays being posted but I'll track that separately)
Project Member

Comment 11 by bugdroid1@chromium.org, Jul 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c512937e6b6f6ec01384fbb210401aac0c7f1467

commit c512937e6b6f6ec01384fbb210401aac0c7f1467
Author: Wez <wez@chromium.org>
Date: Mon Jul 23 09:56:44 2018

Revert "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"

This reverts commit 81626592441ad8517fb6acc8c85e4df4544222e4.

Reason for revert: MetricsOnlyFromUILoops test appears to be flaky (see  https://crbug.com/866299 ).

Original change's description:
> [MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread
>
> Addresses  crbug.com/860801  in time for the M69 branch while keeping the
> diagnosis signals which will be used to assess future improvements
> (MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle that is).
>
> Removed MessageLoop.DelayedTaskQueue.PostedDelay as we've
> learnt everything we needed from this one, that is :
> We now know that while 99% of tasks are posted with a delay under 30s,
> 95% of users post at least one task in every portion of the 0ms-3hours
> range at least once per day:
> https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.DelayedTaskQueue.PostedDelay&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
>
> Removed MessageLoop.ScheduledSleep.* metrics as we've also
> learnt what we wanted from it, that is :
> The maximum sleep a MessageLoop ever completes is 1 hour (and
> even then 99% of completed sleeps are under 1 second), all
> longer delays are reliably always interrupted.
> Win : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> Mac : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> Android : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
>
> Added a test for MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle
> but it appears to be broken on MacOSX (prior to this CL). Added a note
> to investigate on the metric but it shouldn't block this CL which
> doesn't make it worse.
>
> R=​kylechar@chromium.org, rkaplow@chromium.org
>
> Bug:  860801 ,  850450 , 786597
> Change-Id: I7a693886ab4e1b9c8e9fbe8a64d906a95019a609
> Reviewed-on: https://chromium-review.googlesource.com/1142589
> Commit-Queue: Gabriel Charette <gab@chromium.org>
> Reviewed-by: kylechar <kylechar@chromium.org>
> Reviewed-by: Robert Kaplow (OOO until 0730) <rkaplow@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#576686}

TBR=gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org

Bug:  860801 ,  850450 , 786597,  866299 
Change-Id: I848621acf9ddbec69d72649d16a459910ffa937e
Reviewed-on: https://chromium-review.googlesource.com/1146300
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577133}
[modify] https://crrev.com/c512937e6b6f6ec01384fbb210401aac0c7f1467/base/message_loop/incoming_task_queue.cc
[modify] https://crrev.com/c512937e6b6f6ec01384fbb210401aac0c7f1467/base/message_loop/message_loop.cc
[modify] https://crrev.com/c512937e6b6f6ec01384fbb210401aac0c7f1467/base/message_loop/message_loop.h
[modify] https://crrev.com/c512937e6b6f6ec01384fbb210401aac0c7f1467/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/c512937e6b6f6ec01384fbb210401aac0c7f1467/tools/metrics/histograms/histograms.xml

Project Member

Comment 12 by bugdroid1@chromium.org, Jul 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5c7b464ca5393510823ad2c2a0262a4eb1460c05

commit 5c7b464ca5393510823ad2c2a0262a4eb1460c05
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Jul 23 20:05:18 2018

Reland "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"

This reverts commit c512937e6b6f6ec01384fbb210401aac0c7f1467.

Reason for revert: change is correct, fixed test flakiness

Original change's description:
> Revert "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"
> 
> This reverts commit 81626592441ad8517fb6acc8c85e4df4544222e4.
> 
> Reason for revert: MetricsOnlyFromUILoops test appears to be flaky (see  https://crbug.com/866299 ).
> 
> Original change's description:
> > [MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread
> >
> > Addresses  crbug.com/860801  in time for the M69 branch while keeping the
> > diagnosis signals which will be used to assess future improvements
> > (MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle that is).
> >
> > Removed MessageLoop.DelayedTaskQueue.PostedDelay as we've
> > learnt everything we needed from this one, that is :
> > We now know that while 99% of tasks are posted with a delay under 30s,
> > 95% of users post at least one task in every portion of the 0ms-3hours
> > range at least once per day:
> > https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.DelayedTaskQueue.PostedDelay&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> >
> > Removed MessageLoop.ScheduledSleep.* metrics as we've also
> > learnt what we wanted from it, that is :
> > The maximum sleep a MessageLoop ever completes is 1 hour (and
> > even then 99% of completed sleeps are under 1 second), all
> > longer delays are reliably always interrupted.
> > Win : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > Mac : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > Android : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> >
> > Added a test for MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle
> > but it appears to be broken on MacOSX (prior to this CL). Added a note
> > to investigate on the metric but it shouldn't block this CL which
> > doesn't make it worse.
> >
> > R=​kylechar@chromium.org, rkaplow@chromium.org
> >
> > Bug:  860801 ,  850450 , 786597
> > Change-Id: I7a693886ab4e1b9c8e9fbe8a64d906a95019a609
> > Reviewed-on: https://chromium-review.googlesource.com/1142589
> > Commit-Queue: Gabriel Charette <gab@chromium.org>
> > Reviewed-by: kylechar <kylechar@chromium.org>
> > Reviewed-by: Robert Kaplow (OOO until 0730) <rkaplow@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#576686}
> 
> TBR=gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org
> 
> Bug:  860801 ,  850450 , 786597,  866299 
> Change-Id: I848621acf9ddbec69d72649d16a459910ffa937e
> Reviewed-on: https://chromium-review.googlesource.com/1146300
> Commit-Queue: Wez <wez@chromium.org>
> Reviewed-by: Wez <wez@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#577133}

TBR=wez@chromium.org,gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org

Change-Id: Idb53da9b8b40b407ece6cf6c81b95bcb7072d76c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  860801 ,  850450 , 786597,  866299 
Reviewed-on: https://chromium-review.googlesource.com/1147360
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577228}
[modify] https://crrev.com/5c7b464ca5393510823ad2c2a0262a4eb1460c05/base/message_loop/incoming_task_queue.cc
[modify] https://crrev.com/5c7b464ca5393510823ad2c2a0262a4eb1460c05/base/message_loop/message_loop.cc
[modify] https://crrev.com/5c7b464ca5393510823ad2c2a0262a4eb1460c05/base/message_loop/message_loop.h
[modify] https://crrev.com/5c7b464ca5393510823ad2c2a0262a4eb1460c05/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/5c7b464ca5393510823ad2c2a0262a4eb1460c05/tools/metrics/histograms/histograms.xml

Project Member

Comment 13 by bugdroid1@chromium.org, Jul 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/771b50f56cd66df43f6fa26bc113147432748255

commit 771b50f56cd66df43f6fa26bc113147432748255
Author: Gabriel Charette <gab@chromium.org>
Date: Mon Jul 23 20:15:56 2018

Revert "Reland "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread""

This reverts commit 5c7b464ca5393510823ad2c2a0262a4eb1460c05.

Reason for revert: oops, didn't mean to auto-CQ PS1.

Original change's description:
> Reland "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"
> 
> This reverts commit c512937e6b6f6ec01384fbb210401aac0c7f1467.
> 
> Reason for revert: change is correct, fixed test flakiness
> 
> Original change's description:
> > Revert "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"
> > 
> > This reverts commit 81626592441ad8517fb6acc8c85e4df4544222e4.
> > 
> > Reason for revert: MetricsOnlyFromUILoops test appears to be flaky (see  https://crbug.com/866299 ).
> > 
> > Original change's description:
> > > [MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread
> > >
> > > Addresses  crbug.com/860801  in time for the M69 branch while keeping the
> > > diagnosis signals which will be used to assess future improvements
> > > (MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle that is).
> > >
> > > Removed MessageLoop.DelayedTaskQueue.PostedDelay as we've
> > > learnt everything we needed from this one, that is :
> > > We now know that while 99% of tasks are posted with a delay under 30s,
> > > 95% of users post at least one task in every portion of the 0ms-3hours
> > > range at least once per day:
> > > https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.DelayedTaskQueue.PostedDelay&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > >
> > > Removed MessageLoop.ScheduledSleep.* metrics as we've also
> > > learnt what we wanted from it, that is :
> > > The maximum sleep a MessageLoop ever completes is 1 hour (and
> > > even then 99% of completed sleeps are under 1 second), all
> > > longer delays are reliably always interrupted.
> > > Win : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > > Mac : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > > Android : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
> > >
> > > Added a test for MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle
> > > but it appears to be broken on MacOSX (prior to this CL). Added a note
> > > to investigate on the metric but it shouldn't block this CL which
> > > doesn't make it worse.
> > >
> > > R=​kylechar@chromium.org, rkaplow@chromium.org
> > >
> > > Bug:  860801 ,  850450 , 786597
> > > Change-Id: I7a693886ab4e1b9c8e9fbe8a64d906a95019a609
> > > Reviewed-on: https://chromium-review.googlesource.com/1142589
> > > Commit-Queue: Gabriel Charette <gab@chromium.org>
> > > Reviewed-by: kylechar <kylechar@chromium.org>
> > > Reviewed-by: Robert Kaplow (OOO until 0730) <rkaplow@chromium.org>
> > > Cr-Commit-Position: refs/heads/master@{#576686}
> > 
> > TBR=gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org
> > 
> > Bug:  860801 ,  850450 , 786597,  866299 
> > Change-Id: I848621acf9ddbec69d72649d16a459910ffa937e
> > Reviewed-on: https://chromium-review.googlesource.com/1146300
> > Commit-Queue: Wez <wez@chromium.org>
> > Reviewed-by: Wez <wez@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#577133}
> 
> TBR=wez@chromium.org,gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org
> 
> Change-Id: Idb53da9b8b40b407ece6cf6c81b95bcb7072d76c
> No-Presubmit: true
> No-Tree-Checks: true
> No-Try: true
> Bug:  860801 ,  850450 , 786597,  866299 
> Reviewed-on: https://chromium-review.googlesource.com/1147360
> Reviewed-by: Gabriel Charette <gab@chromium.org>
> Commit-Queue: Gabriel Charette <gab@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#577228}

TBR=wez@chromium.org,gab@chromium.org,rkaplow@chromium.org,erikchen@chromium.org,kylechar@chromium.org

Change-Id: I29dec465bd2a29c3aa16940664ba5e1d3d005c0e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  860801 ,  850450 , 786597,  866299 
Reviewed-on: https://chromium-review.googlesource.com/1147380
Reviewed-by: Gabriel Charette <gab@chromium.org>
Commit-Queue: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577234}
[modify] https://crrev.com/771b50f56cd66df43f6fa26bc113147432748255/base/message_loop/incoming_task_queue.cc
[modify] https://crrev.com/771b50f56cd66df43f6fa26bc113147432748255/base/message_loop/message_loop.cc
[modify] https://crrev.com/771b50f56cd66df43f6fa26bc113147432748255/base/message_loop/message_loop.h
[modify] https://crrev.com/771b50f56cd66df43f6fa26bc113147432748255/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/771b50f56cd66df43f6fa26bc113147432748255/tools/metrics/histograms/histograms.xml

Labels: Merge-Request-69
Requesting M69 merge for r576857. It's a trivial change to address a concern in a recent stable post-mortem (the bug is no longer active but the mitigation prevents a similar headache in the future should conditions align poorly again).
Project Member

Comment 15 by bugdroid1@chromium.org, Jul 24

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5

commit 5a4495a48fb32b1faa38b4a0a21b47eddce44ef5
Author: Gabriel Charette <gab@chromium.org>
Date: Tue Jul 24 15:34:01 2018

Reland "[MessageLoop] Remove heavy diagnosis metrics and leave others only on UI thread"

Original CL: https://chromium-review.googlesource.com/c/chromium/src/+/1142589
Reverted for test flakiness on Fuchsia.
Implementation is fine (this is a diagnostic metric -- this change
definitely doesn't make it worse). The test was only added to ensure
proper logging on UI thread only. Made the test more robust in this
reland (see diff vs PS1). If it flakes again, please disable it on flaky
platforms instead of reverting.


Original change's description:

Addresses  crbug.com/860801  in time for the M69 branch while keeping the
diagnosis signals which will be used to assess future improvements
(MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle that is).

Removed MessageLoop.DelayedTaskQueue.PostedDelay as we've
learnt everything we needed from this one, that is :
We now know that while 99% of tasks are posted with a delay under 30s,
95% of users post at least one task in every portion of the 0ms-3hours
range at least once per day:
https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.DelayedTaskQueue.PostedDelay&fixupData=true&uniqueUsers=true&showMax=true&filters=platform%2Ceq%2CW%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Removed MessageLoop.ScheduledSleep.* metrics as we've also
learnt what we wanted from it, that is :
The maximum sleep a MessageLoop ever completes is 1 hour (and
even then 99% of completed sleeps are under 1 second), all
longer delays are reliably always interrupted.
Win : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Mac : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CM%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial
Android : https://uma.googleplex.com/histograms?endDate=20180717&dayCount=1&histograms=MessageLoop.ScheduledSleep.Completed%2CMessageLoop.ScheduledSleep.Interrupted&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C1%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

Added a test for MessageLoop.DelayedTaskQueueForUI.PendingTasksCountOnIdle
but it appears to be broken on MacOSX (prior to this CL). Added a note
to investigate on the metric but it shouldn't block this CL which
doesn't make it worse.

TBR=wez@chromium.org

Bug:  860801 ,  850450 , 786597,  866299 
Change-Id: Ic0fb2b58780cd595e47b36f97a9ae440bb988225
Reviewed-on: https://chromium-review.googlesource.com/1147400
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#577553}
[modify] https://crrev.com/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5/base/message_loop/incoming_task_queue.cc
[modify] https://crrev.com/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5/base/message_loop/message_loop.cc
[modify] https://crrev.com/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5/base/message_loop/message_loop.h
[modify] https://crrev.com/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/5a4495a48fb32b1faa38b4a0a21b47eddce44ef5/tools/metrics/histograms/histograms.xml

Project Member

Comment 16 by sheriffbot@chromium.org, Jul 25

Labels: -Merge-Request-69 Merge-Review-69 Hotlist-Merge-Review
This bug requires manual review: Reverts referenced in bugdroid comments after merge request.
Please contact the milestone owner if you have questions.
Owners: amineer@(Android), kariahda@(iOS), cindyb@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
The merge request is only for r576857 (recent reverts/relands are not relevant)
Labels: -Merge-Review-69 Merge-Approved-69
Approving merge to M69 branch 3497 for r576857 based on comment #14 and #17. Please merge ASAP. Thank you.
Pls merge your change to M69 branch 3497 latest by 3:00 PM PT today, Wednesday (07/25/18). Thank you.
Project Member

Comment 20 by bugdroid1@chromium.org, Jul 25

Labels: -merge-approved-69 merge-merged-3497
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/73635a15b16d8622a94edf5f4f9fd5dbb4bd84e8

commit 73635a15b16d8622a94edf5f4f9fd5dbb4bd84e8
Author: Gabriel Charette <gab@chromium.org>
Date: Wed Jul 25 19:09:18 2018

[Merge M69] [MessageLoop] Do not schedule sleeps > 1 day

Some platforms don't like timeouts > 100,000,000 seconds.

Note: I tried just banning long delays but it didn't work
(https://bugs.chromium.org/p/chromium/issues/detail?id=850450#c4).
And while we could control delays posted by our codebase (and we should
still try to address that), we can't control those set from Javascript.

TBR=​kylechar@chromium.org

Bug:  850450 
Change-Id: Iaf1b983b7af1954412518ed049f6cd0a3f78ba2a
Reviewed-on: https://chromium-review.googlesource.com/1142602
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: kylechar <kylechar@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#576857}(cherry picked from commit 5bd9b574444dc977a7b5c4b121df9d6f3bd957c7)
Reviewed-on: https://chromium-review.googlesource.com/1150581
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/branch-heads/3497@{#84}
Cr-Branched-From: 271eaf50594eb818c9295dc78d364aea18c82ea8-refs/heads/master@{#576753}
[modify] https://crrev.com/73635a15b16d8622a94edf5f4f9fd5dbb4bd84e8/base/message_loop/message_loop.cc
[modify] https://crrev.com/73635a15b16d8622a94edf5f4f9fd5dbb4bd84e8/base/message_loop/message_loop.h

Project Member

Comment 21 by bugdroid1@chromium.org, Nov 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/35f882550d19a231b21b2d281f78c4cee01cf433

commit 35f882550d19a231b21b2d281f78c4cee01cf433
Author: Alex Clarke <alexclarke@chromium.org>
Date: Fri Nov 02 17:59:54 2018

ThreadControllerWithMessagePumpImpl cap delays at one day

It's important to cap delays at one day or bad things can happen
on certain platforms.

Also removes a redundant call to ScheduleDelayedWork.

Bug: 863341,  850450 
Change-Id: Idfb95a5e5d800255caff74d0718db600b9348335
Reviewed-on: https://chromium-review.googlesource.com/c/1314489
Reviewed-by: Alexander Timin <altimin@chromium.org>
Commit-Queue: Alex Clarke <alexclarke@chromium.org>
Cr-Commit-Position: refs/heads/master@{#604980}
[modify] https://crrev.com/35f882550d19a231b21b2d281f78c4cee01cf433/base/task/sequence_manager/thread_controller_with_message_pump_impl.cc
[modify] https://crrev.com/35f882550d19a231b21b2d281f78c4cee01cf433/base/task/sequence_manager/thread_controller_with_message_pump_impl_unittest.cc

Sign in to add a comment