Hanging test on Fuchsia: OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart |
||
Issue descriptionHey scheduler-ers, I'm putzing around at porting base to Fuchsia and my base_unittests runs are often hanging in OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart. (more complete log below) The debugging environment is anemic (to say the least) at the moment, so I don't have a lot of information about what's going on. Sometimes they don't hang, but mostly they do. e.g. with this command line out/fuch/bin/run_base_unittests --gtest_filter=OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart* at least one of the instances will generally hang. So! I guess something is racing, but I'm not sure what. I tried changing the .Wait at the end of the test to a .TimedWait() with a long timeout, so that it doesn't hang indefinitely, as well as adding a LOG(ERROR) in VerifyTimeAndTaskEnvironmentAndSignalEvent(). Then either everything either works (and there's a LOG) or there's no LOG and the wait times out. So it seems like the task is never getting run at all in the failing case. I know this is all a bit vague and it could be kind of anything, but just wondering if that rings any bells, or if you have any guesses as to where things might be going wrong. ... [00036.422] 02489.02523> [2502/2522] MessageLoopForIOOnOtherThread/FileDescriptorWatcherTest.DeleteControllerAfterFileDescriptorReadable/0 (200 ms) [00051.451] 02489.02523> Still waiting for the following processes to finish: [00051.451] 02489.02523> /system/base_unittests --gtest_flagfile=/tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.mbBOjn --single-process-tests --test-launcher-output=/tmp/.org.chromium.Chromium.PIpDjk/test_results.xml --test-launcher-retry [00051.451] 02489.02523> -limit=0 [00051.451] 02489.02523> /system/base_unittests --gtest_flagfile=/tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.DDoPfo --single-process-tests --test-launcher-output=/tmp/.org.chromium.Chromium.doOMAH/test_results.xml --test-launcher-retry [00051.451] 02489.02523> -limit=0 [00066.487] 02489.02523> Still waiting for the following processes to finish: [00066.487] 02489.02523> /system/base_unittests --gtest_flagfile=/tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.mbBOjn --single-process-tests --test-launcher-output=/tmp/.org.chromium.Chromium.PIpDjk/test_results.xml --test-launcher-retry [00066.487] 02489.02523> -limit=0 [00066.487] 02489.02523> /system/base_unittests --gtest_flagfile=/tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.DDoPfo --single-process-tests --test-launcher-output=/tmp/.org.chromium.Chromium.doOMAH/test_results.xml --test-launcher-retry [00066.487] 02489.02523> -limit=0 magenta$ magenta$ cat /tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.mbBOjn --gtest_filter=OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsNoDelayBeforeStart/17:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/0:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/1:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/2:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/3:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/4:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/5:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/6:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/7:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/8tsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/1:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/2:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/3:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/4:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/5:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/6:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/7:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/8raitsWithDelayBeforeStart/3:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/4:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/5:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/6:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/7:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/8edulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/6:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/7:OneTraitsExecutionModePaimagenta$ magenta$ magenta$ cat /tmp/.org.chromium.Chromium.hLemeL/.org.chromium.Chromium.DDoPfo --gtest_filter=OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/9:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/10:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/11:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/12:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/13:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/14:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/15:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/16:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/17:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTaskViaTaskRunnerBeforeStart/0neTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/11:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/12:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/13:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/14:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/15:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/16:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/17:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTaskViaTaskRunnerBeforeStart/0askWithTraitsWithDelayBeforeStart/13:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/14:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/15:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/16:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/17:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTaskViaTaskRunnerBeforeStart/0air/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/16:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart/17:OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTaskViaTaskRunnerBeforeStart/0magenta$ [00081.495] 02489.02523> Still waiting for the following processes to finish: ...
,
Jun 7 2017
[Apologies if this is crazy talk, I hadn't ever looked at the scheduler code until tonight.] My working theory is that there's a race in TaskSchedulerImpl::Start(). The service thread is started, then delayed_task_manager_.Start(service_thread_.task_runner()); but that happens before the worker_pools_ have been Start()d. So if the service thread keeps running, posts its delayed tasks via DelayedTaskManager::AddDelayedTaskNow(), then executes them, all before the worker pools back on the other thread get to Start(), then once those delayed tasks try to run, they get to SchedulerWorker::WakeUp(). The SchedulerWorker has no thread_ (because it's LAZY) so it calls CreateThread(), but SchedulerWorker hasn't even started yet, so CreateThread doesn't do anything, so then the remainder of WakeUp() doesn't do anything becuase no thread was created. Plausible?
,
Jun 7 2017
PostDelayedTaskWithTraitsWithDelayBeforeStart does this:
Constructor:
1. Create a TaskSchedulerImpl.
Body:
2. Create a WaitableEvent.
3. Post a delayed task that verifies the time, the environment and signals an event.
3.1. Since the DelayedTaskManager isn't started, the task is added to tasks_added_before_start_. The delayed run time is computed as TimeTicks::Now() + delay.
4. Wait until the task's delay expires.
5. Start the TaskScheduler.
5.1. Start the service thread.
5.2. Start the DelayedTaskManager.
5.2.1. The DelayedTaskManager sends the task added in tasks_added_before_start_ at step 3.1 to the service thread. The delay is computed as (delayed run time computed at 3.1) - TimeTicks::Now(). Since the delay is expired (see 4), it is likely that the task will immediately be forwarded to a worker pool.
5.3. Start the worker pools.
5.3.1. Create threads in a sleeping state.
5.4.1. For each wake up received before the worker pool was started, wake up a thread.
6. Wait until the event is signaled.
Because of 5.4.1, the fact that "SchedulerWorker hasn't even started yet" when the service threads forwards tasks to worker pools shouldn't be a problem (unless we have a bug in our code :) ).
PostDelayedTaskWithTraitsWithDelayBeforeStart uses TimeTicks::Now() while most of our other tests don't. Is this API working on Fuchsia?
It's hard to say what the problem is without being able to try things myself. I'll try to make a Fuchsia build and I'll come back to you.
,
Jun 7 2017
Thanks for the detailed explanation! I'll dig in some more as it's likely just something that's not implemented (or not implemented properly).
,
Jun 7 2017
Sorry, I'm probably still confused -- in "5.2.1", it looks like AddDelayedTaskNow() posts a delayed task to the service thread's task runner, right? If that posts with a delay of 0, couldn't it run right away, and so get to SchedulerWorker::CreateThread() before 5.3?
,
Jun 7 2017
(Hmm, yup, I'm confused. I don't know if that's possible or not, but experimentally that isn't happening anyway. :)
,
Jun 7 2017
OK, slightly modified theory: With the delayed task in the queue, 5.3 starts. It creates all the SchedulerWorkers but hasn't yet gottent to Start()ing them (but they're in idle_workers_stack_ already) Then the service manager thread runs. Because there's a worker in the stack already, WakeUp() pops a worker, calls WakeUp() and fails because it's not yet started. With https://chromium-review.googlesource.com/c/527674/ patched in on Windows out\Debug\base_unittests --gtest_filter=OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart* --test-launcher-retry-limit=0 fails fairly reliably for me locally.
,
Jun 8 2017
Proposed fix https://chromium-review.googlesource.com/528069.
,
Jun 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e commit 35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e Author: Scott Graham <scottmg@chromium.org> Date: Fri Jun 09 18:32:19 2017 Fix startup race in task_scheduler for lazy workers If a task was scheduled before the scheduler was started, then after the workers were created in SchedulerWorkerPoolImpl::Start(), but before the workers were Start()d, the DelayedTaskManager could cause an initial task to race to SchedulerWorkerPoolImpl::WakeUpOneWorker(), pop one of the idle workers (as yet unstarted) and try to WakeUp() that worker. Because the worker's not yet started, WakeUp() does nothing so that task never executes. It feels a little sketchy that WakeUp() doesn't DCHECK(started_) but that fails in other tests if I add that. TEST: OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelayBeforeStart Bug: 730340 Change-Id: Ie4b168652b84c0d961f2391632276f4167c747e6 Reviewed-on: https://chromium-review.googlesource.com/528069 Commit-Queue: Scott Graham <scottmg@chromium.org> Reviewed-by: Francois Doray <fdoray@chromium.org> Cr-Commit-Position: refs/heads/master@{#478344} [modify] https://crrev.com/35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e/base/task_scheduler/scheduler_worker.cc [modify] https://crrev.com/35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e/base/task_scheduler/scheduler_worker.h [modify] https://crrev.com/35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/35d16cb44ae5afd6eb07f240c2a1e1b8cdf7bd0e/base/task_scheduler/scheduler_worker_unittest.cc
,
Jun 9 2017
,
Jun 22 2017
Hmm, maybe this wasn't a full fix. I saw a crash in a similar test in bug 735701 , but it's been harder to repro.
,
Jun 22 2017
Bug 735740 too. :/ |
||
►
Sign in to add a comment |
||
Comment 1 by scottmg@chromium.org
, Jun 7 2017