New issue
Advanced search Search tips

Issue 792310 link

Starred by 1 user

Issue metadata

Status: Duplicate
Merged: issue 768116
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 2
Type: Bug



Sign in to add a comment

SchedulerWorkerPoolImpl tests may fail on heavily-loaded systems

Project Member Reported by w...@chromium.org, Dec 6 2017

Issue description

SchedulerWorkerPoolImpl tests post work to the SchedulerWorkerPool and then use ExpectWorkerCapacityAfterDelay() to wait for a short while for the pool capacity to be increased as a result of all the running but blocked tasks.

The ExpectWorkerCapacityAfterDelay() implementation assumes that the capacity will have increased to its expected value within four wait cycles. This property doesn't hold true if the host system is under heavy load, since the workers or service thread may be starved of CPU cycles to increase the pool size.

We see this fail relatively often on the Fuchsia bots, which run the OS under QEMU, and in some cases without KVM hypervisor acceleration.
 

Comment 1 by w...@chromium.org, Dec 6 2017

Labels: OS-Android OS-Chrome OS-Linux OS-Mac OS-Windows

Comment 2 by w...@chromium.org, Dec 6 2017

Status: Started (was: Assigned)

Comment 3 by w...@chromium.org, Dec 6 2017

Example output:

[ RUN      ] TaskSchedulerWorkerPoolBlockingTest.WorkersIdleWhenOverCapacity/MAY_BLOCK
../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:983: Failure
      Expected: worker_pool_->GetWorkerCapacityForTesting()
      Which is: 7
To be equal to: expected_worker_capacity
      Which is: 8
../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:1124: Failure
      Expected: worker_pool_->GetWorkerCapacityForTesting()
      Which is: 7
To be equal to: 2 * kNumWorkersInWorkerPool
      Which is: 8
[  FAILED  ] TaskSchedulerWorkerPoolBlockingTest.WorkersIdleWhenOverCapacity/MAY_BLOCK, where GetParam() = 12-byte object <00-00 00-00 00-00 00-00 00-00 00-00> (1006 ms)
Project Member

Comment 4 by bugdroid1@chromium.org, Dec 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4bac235cb5696108cc2377780aef3337dbeefaaa

commit 4bac235cb5696108cc2377780aef3337dbeefaaa
Author: Wez <wez@chromium.org>
Date: Wed Dec 06 17:16:22 2017

Allow SchedulerWorkerPoolImplTests to wait longer for capacity increase.

Previously the tests would wait up to four times in the
ExpectWorkerCapacityAfterDelay() helper function, which leads to them
flaking sometimes under systems with heavy load, or systems running
under slow emulation, such as QEMU.

We now allow the test to wait indefinitely, provided that the capacity
of the worker pool is stable, or increases.

Bug:  792310 
Change-Id: Ida8aa3abbb2d290771f74a7aae4ba7fe5c2176a0
Reviewed-on: https://chromium-review.googlesource.com/809710
Reviewed-by: François Doray <fdoray@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#522117}
[modify] https://crrev.com/4bac235cb5696108cc2377780aef3337dbeefaaa/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc

Comment 5 by w...@chromium.org, Dec 6 2017

Cc: -fdoray@chromium.org
Owner: fdoray@chromium.org
Status: Assigned (was: Started)
fdoray: It's a shame to have removed the timeout, so we end up relying on the TestLauncher timeout to catch issues there - assigning to you to decide whether to follow-up with a fix to reintroduce one.

Comment 6 by w...@chromium.org, Dec 7 2017

fdoray: It looks like the problem has actually been that we are _never_ reaching the expected capacity; Fuchsia flaked again after this change landed:

https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20ARM64/2712

This time the test just timed-out entirely.

Is there any debugging output that would help you diagnose this?

Comment 7 by w...@chromium.org, Dec 8 2017

We just had this failure relating to worker capacity mis-calculation:

https://luci-milo.appspot.com/buildbot/chromium.fyi/Fuchsia/12002

[ RUN      ] TaskSchedulerWorkerPoolBlockingTest.MaximumWorkersTest
../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:1530: Failure
Expected equality of these values:
  worker_pool_->GetWorkerCapacityForTesting()
    Which is: 15
  kNumWorkersInWorkerPool + kNumExtraTasks
    Which is: 14
[  FAILED  ] TaskSchedulerWorkerPoolBlockingTest.MaximumWorkersTest (7726 ms)
Project Member

Comment 8 by bugdroid1@chromium.org, Feb 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e31677326baffe050afd6750c6c1f79b33eaa375

commit e31677326baffe050afd6750c6c1f79b33eaa375
Author: Scott Graham <scottmg@chromium.org>
Date: Wed Feb 07 21:54:16 2018

fuchsia: Disable TaskSchedulerWorkerPoolBlockingTest.MaximumWorkersTest

Most recently
https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/15991

TBR: wez@chromium.org
Bug:  768436 ,  792310 
Change-Id: I836a3e918bf67a0780d781bea1b64fa750f75ef3
Reviewed-on: https://chromium-review.googlesource.com/907531
Commit-Queue: Scott Graham <scottmg@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535157}
[modify] https://crrev.com/e31677326baffe050afd6750c6c1f79b33eaa375/testing/buildbot/filters/fuchsia.base_unittests.filter

Comment 9 by w...@chromium.org, Feb 9 2018

Mergedinto: 768116
Status: Duplicate (was: Assigned)
Project Member

Comment 10 by bugdroid1@chromium.org, Feb 22 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/92a8cd0fa676e90318fca3b78871477dc0448e61

commit 92a8cd0fa676e90318fca3b78871477dc0448e61
Author: Wez <wez@chromium.org>
Date: Thu Feb 22 17:07:42 2018

Filter all TaskSchedulerWorkerPoolBlockingTests which wait on capacity.

Several of these tests wait until the worker pool's capacity reaches an
expected value, which appears not to be a deterministic operation, and
fails often on the Fuchsia bots.

Also adds a filter for a flaking Mojo system unit-test.

Bug:  768436 ,  792310 ,  814596 
Change-Id: If453b3cda30747c995871fbeae6cb23830f02a88
Reviewed-on: https://chromium-review.googlesource.com/930476
Reviewed-by: Scott Graham <scottmg@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#538463}
[modify] https://crrev.com/92a8cd0fa676e90318fca3b78871477dc0448e61/testing/buildbot/filters/fuchsia.base_unittests.filter
[modify] https://crrev.com/92a8cd0fa676e90318fca3b78871477dc0448e61/testing/buildbot/filters/fuchsia.mojo_system_unittests.filter

Project Member

Comment 11 by bugdroid1@chromium.org, May 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4ce50aa89014267e7589941571976204c6928991

commit 4ce50aa89014267e7589941571976204c6928991
Author: Francois Doray <fdoray@chromium.org>
Date: Wed May 02 15:21:47 2018

TaskScheduler: Enable TaskSchedulerWorkerPoolBlockingTest.* on fuchsia.

Flakyness was fixed by https://chromium-review.googlesource.com/1033533

Bug:  768436 ,  792310 
Change-Id: I8720a1ae503d096419bcc368c3629bcaa6679e74
Reviewed-on: https://chromium-review.googlesource.com/1039767
Reviewed-by: François Doray <fdoray@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#555393}
[modify] https://crrev.com/4ce50aa89014267e7589941571976204c6928991/testing/buildbot/filters/fuchsia.base_unittests.filter

Sign in to add a comment