TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup flaked under Fuchsia/x64 |
||||
Issue descriptionTaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup failed under Fuchsia/x64 with: [00057.842] 03925.03952> [ RUN ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup [00057.842] 03925.03952> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:663: Failure [00057.844] 03925.03952> Expected equality of these values: [00057.845] 03925.03952> 0 [00057.845] 03925.03952> histogram->SnapshotSamples()->GetCount(0) [00057.845] 03925.03952> Which is: 1 [00057.845] 03925.03952> [ FAILED ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup (1740 ms) [00057.847] 03925.03952> [1615/2838] TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup (1740 ms) It looks like this test is inherently flaky, since it relies on a a PlatformThread::Sleep() at https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc?l=638 - IIUC the timeouts that the test is aiming to wait for are scheduled on a the SchedulerWorker service thread, so there is no guarantee that that ran in a timely manner to _start_ the timeouts, in general.
,
Feb 26 2018
I fixed this test's dependency on the sleep timing as part of https://chromium-review.googlesource.com/931524 (it still sleeps but only to test that the metric is logged properly when recovering from idle/cleanup -- doesn't matter whether cleanup actually kicked in, the sleep only accentuates the likelihood that it did and hence the relevance of this test).
,
Feb 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a8863d2fc9bc0ede0e9f35306ac85cc0195f3e6c commit a8863d2fc9bc0ede0e9f35306ac85cc0195f3e6c Author: Gabriel Charette <gab@chromium.org> Date: Fri Feb 23 10:25:03 2018 [TaskScheduler] Do not log 0 to TaskScheduler.NumTasksBetweenWaits histogram on detach. Doing so accesses |outer_| which was unsafe after invoking Cleanup(). While this will be made safe with https://chromium-review.googlesource.com/c/chromium/src/+/931547 we had noticed this metric being wrong in the past but had postponed cleaning up to avoid perturbing metrics as part of a bigger change. Now is the time to drop this report. I don't think we need to rename the metric because it's an internal diagnosis metric for our team. R=fdoray@chromium.org, robliao@chromium.org Bug: 756898 , 810464 Change-Id: Ica379d6f3c21aebabeb0574847ba0bfcda346df8 Reviewed-on: https://chromium-review.googlesource.com/931524 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#538751} [modify] https://crrev.com/a8863d2fc9bc0ede0e9f35306ac85cc0195f3e6c/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/a8863d2fc9bc0ede0e9f35306ac85cc0195f3e6c/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Feb 26 2018
Wait, actually my bad. r538751 is the CL that broke this..! The sleep isn't at fault here, the problem is that I wrongly assumed we would never get a drop in the zero bucket while it's still possible (if a worker is woken up but all the work is processed by other workers before it invokes GetWork() -- then it will do "0 tasks between waits"). Will fix.
,
Feb 26 2018
,
Feb 26 2018
Looking at TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithIdlePeriodAndCleanup in detail I fail to see how it got a hit in bucket "0" with r538751. Either way, cleaned up the test, hopefully this helps : https://chromium-review.googlesource.com/#/c/chromium/src/+/936626
,
Feb 28 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ee000a69dbe1884d15f87ce6565036825158fba8 commit ee000a69dbe1884d15f87ce6565036825158fba8 Author: Gabriel Charette <gab@chromium.org> Date: Wed Feb 28 11:33:00 2018 Cleanup TaskSchedulerWorkerPoolHistogramTest.NumTasks*Cleanup tests. TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup still depends on cleanup timing, I'll try to remove those dependencies in a follow-up. R=fdoray@chromium.org, robliao@chromium.org Bug: 816170 Change-Id: I2cca331d18ee8f3ca384493bd3584c68240c445e Reviewed-on: https://chromium-review.googlesource.com/936626 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#539789} [modify] https://crrev.com/ee000a69dbe1884d15f87ce6565036825158fba8/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/ee000a69dbe1884d15f87ce6565036825158fba8/base/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/ee000a69dbe1884d15f87ce6565036825158fba8/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/274f2e7f66d77271128910677f72b802a92e6c78 commit 274f2e7f66d77271128910677f72b802a92e6c78 Author: Gabriel Charette <gab@chromium.org> Date: Thu Mar 01 15:03:00 2018 [SchedulerWorkerPoolImpl] Make sure workers do not cleanup during WaitForWorkersIdle*(). While auditing the code I noticed that this was a problem: if a worker cleans up while waiting for n workers to be idle, the idle stack will shrink and this condition may never be reached. This prevents this completely for WaitForAllWorkersIdleForTesting() and greatly reduces the likelihood for WaitForWorkersIdleForTesting(n). WaitForAllWorkersIdleForTesting() is safe because "n" is obtained under the lock. Whereas an external call to WaitForWorkersIdleForTesting(n) could wait on a worker which was already detached. But in practice if the reclaim timeout is long enough and the main thread invokes this immediately after unblocking a test task and waiting for it to resolve, the race would require the main thread to be preempted for the full reclaim timeout for no reason which seems highly unlikely. Audited current usage and added a comment as such. R=fdoray@chromium.org, robliao@chromium.org Bug: 816170 , 735701 Change-Id: Ie7b4fd81c04f53cc9597ebc02a25b39d4f1d658f Reviewed-on: https://chromium-review.googlesource.com/937361 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#540154} [modify] https://crrev.com/274f2e7f66d77271128910677f72b802a92e6c78/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/274f2e7f66d77271128910677f72b802a92e6c78/base/task_scheduler/scheduler_worker_pool_impl.h
,
Apr 27 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9746a243c4b7a2e79f60fd4d164e8da833dbeeee commit 9746a243c4b7a2e79f60fd4d164e8da833dbeeee Author: Gabriel Charette <gab@chromium.org> Date: Fri Apr 27 14:31:27 2018 [Fuchsia+TaskScheduler] Re-enable tests that were flaky because of rogue SchedulerWorkers. Rogue SchedulerWorkers were fixed once and for all in crbug.com/827615 R=wez@chromium.org Bug: 816170 , 827615 Change-Id: I697124ed7cd5282bce8e504427e1daa6adc08df2 Reviewed-on: https://chromium-review.googlesource.com/1031210 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#554386} [modify] https://crrev.com/9746a243c4b7a2e79f60fd4d164e8da833dbeeee/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Apr 27 2018
I think this was fixed by r552160 (i.e. that this issue was also caused by a rogue SchedulerWorker). Please reopen if not, thanks |
||||
►
Sign in to add a comment |
||||
Comment 1 by bugdroid1@chromium.org
, Feb 24 2018