New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 827615 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 2
Type: Bug



Sign in to add a comment

TaskSchedulerWorkerPool* test crash flake in TaskTracker::RunAndPopNextTask on Fuchsia/ARM64 FYI

Project Member Reported by w...@chromium.org, Mar 30 2018

Issue description

In build https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20ARM64/builds/8490 we saw a crash:

[00101.512] 01202.01260> <== fatal exception: process base_unittests__exec[36403] thread TaskSchedulerSingleThreadForegr[37095]
[00101.513] 01202.01260> <== fatal page fault, PC at 0xd7cd30c8a128
[00101.514] 01202.01260>  x0       0xaf8c1282d40 x1      0x5dac41534248 x2                 0x4 x3                 0x4
[00101.514] 01202.01260>  x4      0x5dac41534238 x5      0x5d11643c1c40 x6              0xffff x7             0x10000
[00101.515] 01202.01260>  x8                   0 x9                   0 x10     0x5dac414fcb38 x11     0x5dac414fcb38
[00101.515] 01202.01260>  x12                  0 x13                  0 x14                  0 x15                  0
[00101.515] 01202.01260>  x16     0x87ab118ccef8 x17     0x87ab1180a118 x18                  0 x19      0xaf8c1282dd8
[00101.515] 01202.01260>  x20     0x5dac414f15e0 x21     0x5dac413a3398 x22      0xaf8c1282f18 x23      0xaf8c1282f20
[00101.516] 01202.01260>  x24                0x1 x25     0x2279a819a4b8 x26                0x1 x27                0x1
[00101.516] 01202.01260>  x28                  0 x29      0xaf8c1282f30 lr      0xd7cd30c8a114 sp       0xaf8c1282d30
[00101.516] 01202.01260>  pc      0xd7cd30c8a128 psr         0x80000000
[00101.516] 01202.01260>  far                  0 esr         0x92000004
[00101.517] 01202.01260> bottom of user stack:
[00101.530] 01202.01260> 0x00000af8c1282d30: c1282e80 00000af8 a819a4b8 00002279 |..(.........y"..|
[00101.531] 01202.01260> 0x00000af8c1282d40: 00000000 00000000 301a8566 0000d7cd |........f..0....|
[00101.531] 01202.01260> 0x00000af8c1282d50: 301c7fc6 0000d7cd 00000258 00000000 |...0....X.......|
[00101.531] 01202.01260> 0x00000af8c1282d60: 309e8198 0000d7cd 00000000 00000000 |...0............|
[00101.531] 01202.01260> 0x00000af8c1282d70: 00000000 00000000 00000000 00000000 |................|
[00101.531] 01202.01260> 0x00000af8c1282d80: 00000000 00000000 00000000 00000000 |................|
[00101.531] 01202.01260> 0x00000af8c1282d90: 00000009 00000000 00000000 414fcb00 |..............OA|
[00101.531] 01202.01260> 0x00000af8c1282da0: 00000001 413eca00 00000001 30c80000 |......>A.......0|
[00101.532] 01202.01260> 0x00000af8c1282db0: 00000000 00000000 060ac6c7 00000000 |................|
[00101.532] 01202.01260> 0x00000af8c1282dc0: 00000000 00000000 4150ca80 00005dac |..........PA.]..|
[00101.532] 01202.01260> 0x00000af8c1282dd0: 00000001 00000000 00000000 00000000 |................|
[00101.532] 01202.01260> 0x00000af8c1282de0: 301a8566 0000d7cd 301c7fc6 0000d7cd |f..0.......0....|
[00101.532] 01202.01260> 0x00000af8c1282df0: 00000258 00000000 309e8198 0000d7cd |X..........0....|
[00101.532] 01202.01260> 0x00000af8c1282e00: 00000000 00000000 00000000 00000000 |................|
[00101.532] 01202.01260> 0x00000af8c1282e10: 00000000 00000000 00000000 00000000 |................|
[00101.532] 01202.01260> 0x00000af8c1282e20: 00000000 00000000 00000009 00000000 |................|
[00101.532] 01202.01260> arch: aarch64
[00101.584] 01202.01260> dso: id=5eff3274d3b9702e18d23f2f2a02fc1ecd9f0b59 base=0xf7f3dc640000 name=<vDSO>
[00101.584] 01202.01260> dso: id=d18143639f9678d1e4ff24c1fd0233611383a16b base=0xe6041bc22000 name=libfdio.so
[00101.585] 01202.01260> dso: id=433fea2f733cb51e base=0xd7cd301a4000 name=app:base_unittests__exec
[00101.585] 01202.01260> dso: id=c80ab2d1637c180fb30afce7e0dd67fb67976a34 base=0x8b75d6b31000 name=liblaunchpad.so
[00101.585] 01202.01260> dso: id=1523e8ed5d2b2bab51aeb130dc527f2fb817c3c3 base=0x87ab117f9000 name=libc.so
#01: base::internal::TaskTracker::RunAndPopNextTask(scoped_refptr<base::internal::Sequence>, base::internal::CanScheduleSequenceObserver*) at ??:?
#02: base::internal::SchedulerWorker::Thread::ThreadMain() at ??:?
#03: base::(anonymous namespace)::ThreadFunc(void*) at ??:?
#04: pc 0x87ab1180a00c sp 0xaf8c1282fc0 (libc.so,0x1100c)
#05: pc 0x87ab1187c0e0 sp 0xaf8c1282fe0 (libc.so,0x830e0)

Since this is not the main test thread it is not immediately obvious which test actually crashed. The three listed bu the TestLauncher as having crashed were:

TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks
TaskSchedulerWorkerPoolStandbyPolicyTest.InitOne
TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaits

This is the same stack as  issue 813278  and  issue 813651 .

It looks like the last test to run successfully before the crash was TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostTasksBeforeStart.
 

Comment 1 by w...@chromium.org, Apr 10 2018

Regularly seeing this manifest as a crash in TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaits, as in https://build.chromium.org/deprecated/chromium.fyi/builders/Fuchsia%20ARM64/builds/8520.  The |far| indicates that this is a null-dereference.

Comment 2 by gab@chromium.org, Apr 11 2018

Owner: gab@chromium.org
Status: Started (was: Untriaged)
I just had this flake locally in Win with a bit of a better stack when running all base_unittests (couldn't repro after).

But good news! The stack was better locally and I spent a half hour with fdoray@ digging into TaskScheduler and we think we've identified the issue!

I'm working on a core fix to annihilate these once and for all!

Comment 3 by w...@chromium.org, Apr 11 2018

Excellent. Look forward to learning details. :)

Comment 4 by w...@chromium.org, Apr 11 2018

Might your working theory also explain issue 816575?

Comment 5 by gab@chromium.org, Apr 12 2018

Unfortunately not, our theory just explains the dangling workers.

Fuchsia+QEMU being slow at thread creation/tear down is another story IMO.

Comment 6 by gab@chromium.org, Apr 16 2018

Here's what's going on : we had assumed https://chromium-review.googlesource.com/931547 was only required for SchedulerWorkers in SchedulerWorkerPoolImpl because their delegates hold a raw pointer back to the pool. Well it turns out that SchedulerWorker themselves also hold a raw pointer back to TaskTracker. We knew that too and naively thought the fix to SchedulerWorkerPoolImpl was sufficient but looking closer : the SchedulerWorker::Cleanup() in SchedulerSingleThreadTaskRunnerManager::UnregisterSchedulerWorker() can result in the same bug (a worker is dropped from accounting while still being alive, allowing an upcoming JoinForTesting() call to return before that worker exits -- and that worker can use-after-free on its TaskTracker*).

fdoray and I discussed multiple solutions and settled on WeakRefFactory as a robust solution that will prevent this by design once-and-for-all.

A WeakRefFactory will effectively result in a refcounting scheme which doesn't prevent destruction (i.e. still single owner). Instead it will block destruction until all WeakRefs have been recalled (users are of course responsible to ensure they only destroy such a type when all its WeakRefs are dead or on their immediate way out -- otherwise destruction will hang and finding the culprit will be easy :)).

CLs in review :
 - https://chromium-review.googlesource.com/c/chromium/src/+/1013250 (WeakRefFactory)
 - https://chromium-review.googlesource.com/c/chromium/src/+/1013597 (fixes this issue by using WeakRefFactory in TaskTracker)

Comment 7 by w...@chromium.org, Apr 16 2018

Re #6: WeakRefFactory seems overkill for the Cleanup() worker-accounting bug; basically replacing clear lifetimes with ref-counting.

If there really isn't a way to provide well-defined lifetime for Worker then rather than WeakRefFactory I'd suggest we provide a Java-style thread-safe WeakPtr. Basically the object itself becomes ref-counted (though we can make the references non-copyable if we want, to preserve single-owner-ish semantics), and a weak pointer to the object must be converted to a strong reference at point of use; the ref-count is checked (so the ref-count must be in a separate Flag-like object) and incremented if non-zero, and that reference returned to the caller.

Comment 8 by ajwong@chromium.org, Apr 16 2018

Drive-by here... I'm not fully up to speed, but introducing yet-another-weak-something concept is a heavy weight hammer as it adds a lot of conceptual complexity.

Naively, #6 makes it sound like you have an ordering issue in the lifecycle API for ScheduleWorker that we're mitigating by adding refcounting.  Is there a way avoid that?

Refcounts are scarily viral in how much they make readability down the line harder...

Comment 9 by gab@chromium.org, Apr 18 2018

Please see comment on weak_ref.h @ https://chromium-review.googlesource.com/c/chromium/src/+/1013250

Re. Thread-safe WeakPtr. We had discussed this a while back when trying to figure out the memory model for these back pointers but figured the transactional model was overkill because these are never destroyed in production.

I don't think the WeakRef solution is overkill because the alternative is an AtomicRefCount anyways (see current SchedulerWorkerPoolImpl::live_workers_count. This just generalizes the refcount to all workers and makes the memory model foolproof. live_workers_count is currently exactly that but manually managed, and duplicating that logic felt wrong. Putting an explicit WeakRefFactory on the relevant types Makes everything nice and explicit.

And it's not refcounting per se (i.e. it doesn't introduce refcounting nightmares) because there's still a single owner. The only failure mode is a hang on destruction but that could only happen in task_scheduler tests themselves. Regular tests that use ScopedTaskEnvironment have the right ordering automatically. (And any other solution has the hang issue IMO)

And it's all in task_scheduler::internal::, so it's not a new confusing refcounting type for everyone.

There are other solutions, but this one documents the memory model clearly (instead of implicit assumptions of destruction order and the reader can now know it has to be true by design), something we've been wanting for a while since the initial write of task scheduler.
I guess my feedback is I did read the weak_ref.h comment before writing here, and I didn't understand it fully.

The "viral" comment in my head isn't about escaping base (though things have been known to migrate out of base::internal) but about the fact that in the future, I have yet-another-memory-management-method to consider if reading this code.  You are right that you make it explicit in the type system if the reader understands the implications of WeakRef, but there are two futures:
  (a) WeakRef is used a lot so everyone understands it. That's probably bad cause this ownership system doesn't seem to really be needed widely?
  (b) WeakRef isn't used much at all. So anyone reading it comes up from scratch.

In (b), you don't get a readability win necessarily along with the type-safety win.
Cc: ajwong@chromium.org
FYI, I'm out until tuesday and don't want to hold this up too much.

So two more thoughts.

First, what about CHECK failing to indicate an ordering issue instead of doing a blocking join?  That's less "friendly" but would require users to actually understand their shutdown semantics rather than papering over the ordering issue?

Second, I wonder if part of what feels weird here is having 2 semantics mingled in to one class. WeakRef is both a smartish-pointer as well as a synchronization primitive (Block Until No Refs). What if you split the two?

If you took both suggestions (check fail instead of blocking, and not making a smart pointer), you could imagine a class closer to SequenceChecker.

Comment 12 by gab@chromium.org, Apr 19 2018

We decided to renamed it to "TrackedRef" as there wasn't a "weak" portion to the semantic (the pointer is never null'ed from under someone holding one).

The synchronization primitives is exactly the point.

This is our preferred solution because it uses RAII semantics of C++ to manage reclaiming workers such that they don't need to have a contract to notify their owner when they exit their main. Otherwise workers need to explicitly be counted on entry and make sure to notify on exit and explicit Add/Release semantics are uglier than implicit RAII semantics we think.

Comment 13 by gab@chromium.org, Apr 19 2018

Sorry I failed to reply to your specific points:

> First, what about CHECK failing to indicate an ordering issue instead of doing a blocking join?  That's less "friendly" but would require users to actually understand their shutdown semantics rather than papering over the ordering issue?

That's not possible because we know there is a race (detached workers can still be on their way out, and we have no way to control where they are on the stack, they may be "forever" descheduled right before accessing their raw |task_tracker_| pointer).

The solution is either to wait for them to release an implicit back ref. Or to have them notify back when they exit their main. Explicitly managing entry/exit is uglier than having an automated RAII type we think.

> Second, I wonder if part of what feels weird here is having 2 semantics mingled in to one class. WeakRef is both a smartish-pointer as well as a synchronization primitive (Block Until No Refs). What if you split the two?

TrackedRef is strictly a synchronization primitives in fact. There's still a single owner. Hopefully renaming it "TrackedRef" makes this clearer.

This is in CQ at the moment and should land shortly. Hopefully we agree RAII is nicer than manual entry/exit notifications. Otherwise we can iterate after commit.

Thanks,
Gab
Project Member

Comment 14 by bugdroid1@chromium.org, Apr 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c88dae7bb920637f4e3a5800b4093d85e392fd31

commit c88dae7bb920637f4e3a5800b4093d85e392fd31
Author: Gabriel Charette <gab@chromium.org>
Date: Thu Apr 19 19:02:16 2018

[TaskScheduler] Introduce TrackedRef and replace SchedulerWorkerPoolImpl::live_workers_count with it

See tracked_ref.h for details.

This CL is intended to mostly be a no-op. The important bit is introducing TrackedRefs
so they can be used in https://chromium-review.googlesource.com/c/chromium/src/+/1013597
to fix  crbug.com/827615  once and for all

Also clean up |join_for_testing_returned_| which only needs to be a bool in DCHECK_IS_ON() builds.

R=fdoray@chromium.org

Bug:  827615 
Change-Id: Ia7fe988260dbc91164f71816d19a2ac5556271b5
Reviewed-on: https://chromium-review.googlesource.com/1013250
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#552104}
[modify] https://crrev.com/c88dae7bb920637f4e3a5800b4093d85e392fd31/base/BUILD.gn
[modify] https://crrev.com/c88dae7bb920637f4e3a5800b4093d85e392fd31/base/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/c88dae7bb920637f4e3a5800b4093d85e392fd31/base/task_scheduler/scheduler_worker_pool_impl.h
[add] https://crrev.com/c88dae7bb920637f4e3a5800b4093d85e392fd31/base/task_scheduler/tracked_ref.h
[add] https://crrev.com/c88dae7bb920637f4e3a5800b4093d85e392fd31/base/task_scheduler/tracked_ref_unittest.cc

Comment 15 by w...@chromium.org, Apr 19 2018

Re #13: Only just had a chance to get back to this, sadly only after it landed.

I think the main issue with this template class is that it's a mix of generic and specific:
- It is really only for this one use-case, yet it's templatized.
- Generally all it's doing is associating an action with the last tracked reference being dropped, yet it's hard-wired to Signal a WaitableEvent().

My two suggestions would be:
- Construct this from scoped_refptr<> with a customized Release() that Signals() when there is only one reference remaining.
- If you want it to be a generic template class then split that from the specific WaitableEvent stuff by having the Factory constructed with a base::Closure() to call when no other references remain.

Finally, the implementation as landed has no checks to protect against GetTrackedRef() being called _after_ |live_tracked_refs_| has already dropped to zero and caused |ready_to_destroy_| to be signalled, in which case the Wait() in ~TrackedRefFactory() would return immediately, even though references remain.
Project Member

Comment 16 by bugdroid1@chromium.org, Apr 19 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/881ffae1819eff75cf0329aac9882ba1fe6d3194

commit 881ffae1819eff75cf0329aac9882ba1fe6d3194
Author: Gabriel Charette <gab@chromium.org>
Date: Thu Apr 19 21:21:04 2018

[TaskScheduler] Make TaskTracker be tracked by a TrackedRefFactory.

Managing TaskTracker through TrackedRefs instead of raw pointers is the
final bandaid for issues like  crbug.com/827615 .

It will now be impossible for SchedulerWorkers to outlive their
TaskTracker. A test may hang if its shutdown logic is flawed (out of
order) but the blamed test will always be the one with the poor logic
(instead of a random test in whose context an errand SchedulerWorker makes
a use-after-free).

In fact, a few tests had to be fixed to land this change (while they
weren't necessarily flawed per se per being properly joined, the logic
is easier to grasp with TrackedRefs).

R=fdoray@chromium.org

Bug:  827615 
Change-Id: I5bff8c55b4edb9fef8c5232bddf36da2afd33078
Reviewed-on: https://chromium-review.googlesource.com/1013597
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#552160}
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/platform_native_worker_pool_win.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/platform_native_worker_pool_win.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_single_thread_task_runner_manager.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_single_thread_task_runner_manager.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_single_thread_task_runner_manager_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool_impl.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool_impl.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_pool_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_stack_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/scheduler_worker_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/task_scheduler_impl.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/task_tracker.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/task_tracker.h
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/task_scheduler/task_tracker_unittest.cc
[modify] https://crrev.com/881ffae1819eff75cf0329aac9882ba1fe6d3194/base/test/scoped_task_environment.cc

Comment 17 by gab@chromium.org, Apr 19 2018

Re. #15:

> Only just had a chance to get back to this, sadly only after it landed.

Apologies, this solves a critical bug resulting in flaky tests and landing a solution sooner than later was desired. All of the task_scheduler OWNERS agree that TrackedRef is the preferred solution to solve this once and for all and given it's in our internal namespace I didn't think we should block on other reviewers.

> I think the main issue with this template class is that it's a mix of generic and specific:
> - It is really only for this one use-case, yet it's templatized.

There are two classes using it. The template just makes it easier to reason about anyways IMO (the concept isn't specific to task_scheduler per se)

> - Generally all it's doing is associating an action with the last tracked reference being dropped, yet it's hard-wired to Signal a WaitableEvent().

True (but see below)

> 
> My two suggestions would be:
> - Construct this from scoped_refptr<> with a customized Release() that Signals() when there is only one reference remaining.

It could be tracked with a separate ref-count. But tracking it from the object that represents T* itself makes this super simple as it doesn't require holding another member

> - If you want it to be a generic template class then split that from the specific WaitableEvent stuff by having the Factory constructed with a base::Closure() to call when no other references remain.

That'd be a better generic solution I agree, but as you said, it's a specialized solution for the scheduler. Splitting this would only result in both classes using a TrackedRefFactory also needing a WaitableEvent member and binding things the same way. I don't see a reason to go more generic while it's not used as such.

> 
> Finally, the implementation as landed has no checks to protect against GetTrackedRef() being called _after_ |live_tracked_refs_| has already dropped to zero and caused |ready_to_destroy_| to be signalled, in which case the Wait() in ~TrackedRefFactory() would return immediately, even though references remain.

Good point, done @ https://chromium-review.googlesource.com/#/c/chromium/src/+/1020372 

Thanks for your dedicated input
Project Member

Comment 18 by bugdroid1@chromium.org, Apr 20 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b65e8848f2cd5575bc38fe68259f6e002baf5eb1

commit b65e8848f2cd5575bc38fe68259f6e002baf5eb1
Author: Gabriel Charette <gab@chromium.org>
Date: Fri Apr 20 02:46:10 2018

[TaskScheduler] DCHECK no TrackedRef obtained after count has already reached zero.

Follow-up to  https://crbug.com/827615#c17 

R=fdoray@chromium.org, wez@chromium.org

Bug:  827615 
Change-Id: I873b3eadd6be973b4e0d7b962d7cd720a663198e
Reviewed-on: https://chromium-review.googlesource.com/1020372
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Cr-Commit-Position: refs/heads/master@{#552241}
[modify] https://crrev.com/b65e8848f2cd5575bc38fe68259f6e002baf5eb1/base/task_scheduler/tracked_ref.h

Comment 19 by gab@chromium.org, Apr 23 2018

Status: Fixed (was: Started)
Pretty sure we shouldn't be seeing rogue SchedulerWorkers anymore. Woohoo!

Comment 20 by w...@chromium.org, Apr 26 2018

gab: Can you advise as to which of the currently filtered TaskScheduler tests in fuchsia.base_unittests.filter can now be re-enabled?

Comment 21 by gab@chromium.org, Apr 26 2018

I looked at [1] earlier this week and determined that none seemed to still be disabled for this crash.

[1] https://cs.chromium.org/chromium/src/testing/buildbot/filters/fuchsia.base_unittests.filter?type=cs&q=TaskScheduler+file:.*filter&sq=package:chromium

A) PostDelayedTask is a timing issue because Fuchsia+QEMU doesn't have an SLA of 250ms on delays (we're not willing to drop that test as that's already stretching any reasonable SLA for scheduling delayed tasks)

B) TaskSchedulerWorkerPoolBlockingTest.* : we haven't yet figured out the reason for this but I don't think it could be caused by rogue workers (maybe... could try to re-enable if you want)

C) TaskSchedulerWorkerPoolHistogramTest.NumTasks*Cleanup : ah maybe this one could have been caused by a rogue worker actually (which would have added a count to a bucket unexpectedly after the buckets were reset to begin the new test)

D) TaskSchedulerWorkerPoolCheckTlsReuse.CheckCleanupWorkers : ah, in my survey earlier this week I thought this one was associate to Fuchsia's failure to have handle *many* threads but that's not it.

I think we can try to re-enable C+D, I'll send a CL, thanks!
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9746a243c4b7a2e79f60fd4d164e8da833dbeeee

commit 9746a243c4b7a2e79f60fd4d164e8da833dbeeee
Author: Gabriel Charette <gab@chromium.org>
Date: Fri Apr 27 14:31:27 2018

[Fuchsia+TaskScheduler] Re-enable tests that were flaky because of rogue SchedulerWorkers.

Rogue SchedulerWorkers were fixed once and for all in  crbug.com/827615 

R=wez@chromium.org

Bug:  816170 ,  827615 
Change-Id: I697124ed7cd5282bce8e504427e1daa6adc08df2
Reviewed-on: https://chromium-review.googlesource.com/1031210
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#554386}
[modify] https://crrev.com/9746a243c4b7a2e79f60fd4d164e8da833dbeeee/testing/buildbot/filters/fuchsia.base_unittests.filter

Comment 23 by gab@chromium.org, May 3 2018

Cc: fdoray@chromium.org scottmg@chromium.org
 Issue 803900  has been merged into this issue.

Sign in to add a comment