Various TaskSchedulerImplTest sometimes crash/hang on Fuchsia |
|||||||||||||||||||||
Issue descriptionWhile soaking to find a crash in another test OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTaskViaTaskRunnerBeforeStart crashed. Not easily reproducible, no useful output in the crash log.
,
Jun 22 2017
Bug 735740 too.
,
Jun 23 2017
Issue 735740 has been merged into this issue.
,
Jun 23 2017
A couple recent flakes, one release run: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F5440%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout one debug run: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia__dbg_%2F1076%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout Which specific tests are inconsistent, but overall result is consistent.
,
Jun 23 2017
Aha! I caught a crash with a stack. It makes me ... nervous. [00013.302] 01123.01174> <== fatal exception: process [157426] thread [158207] [00013.302] 01123.01174> <== fatal page fault, PC at 0x2776db98f7a9 [00013.302] 01123.01174> CS: 0 RIP: 0x2776db98f7a9 EFL: 0x10297 CR2: 0 [00013.302] 01123.01174> RAX: 0xfffffffe RBX: 0x74e676630e18 RCX: 0 RDX: 0x9 [00013.302] 01123.01174> RSI: 0 RDI: 0x74e676630e18 RBP: 0x204b0d449320 RSP: 0x74c24b6dcb70 [00013.302] 01123.01174> R8: 0x10 R9: 0x204b0d44 R10: 0x2c74f1304500 R11: 0x800000000 [00013.302] 01123.01174> R12: 0 R13: 0x74c24b6dce50 R14: 0x204b0d449140 R15: 0x1 [00013.302] 01123.01174> errc: 0x6 [00013.302] 01123.01174> bottom of user stack: [00013.302] 01123.01174> 0x000074c24b6dcb70: 1e2e29a0 000044ee db97b083 00000000 |.)...D..........| [00013.302] 01123.01174> 0x000074c24b6dcb80: 00000000 00004ff6 db97aff1 00002776 |.....O......v'..| [00013.302] 01123.01174> 0x000074c24b6dcb90: 00000000 00000000 db97aff1 00002776 |............v'..| [00013.302] 01123.01174> 0x000074c24b6dcba0: ffffffff 7fffffff c21fdaa0 00006f46 |............Fo..| [00013.302] 01123.01174> 0x000074c24b6dcbb0: 1c072920 000038e6 236bf1a8 000027c3 | )...8....k#.'..| [00013.302] 01123.01174> 0x000074c24b6dcbc0: 1c06cd60 000038e6 98820a80 000063c5 |`....8.......c..| [00013.302] 01123.01174> 0x000074c24b6dcbd0: e4874020 00007d03 98820a88 000063c5 | @...}.......c..| [00013.302] 01123.01174> 0x000074c24b6dcbe0: 4b6dce50 000074c2 98820a80 000063c5 |P.mK.t.......c..| [00013.303] 01123.01174> 0x000074c24b6dcbf0: 0d449320 0000204b 98820a88 000063c5 | .D.K .......c..| [00013.303] 01123.01174> 0x000074c24b6dcc00: 4b6dce50 000074c2 0d449140 0000204b |P.mK.t..@.D.K ..| [00013.303] 01123.01174> 0x000074c24b6dcc10: 00000001 00000000 c2198d03 00006f46 |............Fo..| [00013.303] 01123.01174> 0x000074c24b6dcc20: a54aa020 00000afa 00000000 00000000 | .J.............| [00013.303] 01123.01174> 0x000074c24b6dcc30: 00000000 00000000 ed5576da 00002d97 |.........vU..-..| [00013.303] 01123.01174> 0x000074c24b6dcc40: 00000001 00000000 76630e50 000074e6 |........P.cv.t..| [00013.303] 01123.01174> 0x000074c24b6dcc50: 00000000 00000000 db986aee 00002776 |.........j..v'..| [00013.303] 01123.01174> 0x000074c24b6dcc60: 4b6dcca0 000074c2 e4874020 00007d03 |..mK.t.. @...}..| [00013.303] 01123.01174> arch: x86_64 [00013.303] 01123.01174> dso: id=0ee2c1798ac9282426341de1bfb7f8d509010a73 base=0x700f9f64b000 name=libmxio.so [00013.303] 01123.01174> dso: id=7232e5e96679f46409d5b78b2b94bccadd2de16a base=0x6f46c2122000 name=libc.so [00013.303] 01123.01174> dso: id=2b6882f235d28d75ef453da68f39774ef99c8388 base=0x4b39eca33000 name=<vDSO> [00013.303] 01123.01174> dso: id=6682d527b69ec892f730643f7ff711e8f78f4116 base=0x46b56707d000 name=liblaunchpad.so [00013.303] 01123.01174> dso: id=765f3f6ec08e043f base=0x2d97ed4ab000 name=libc++.so.2 [00013.303] 01123.01174> dso: id=fd295ef1fd598df5 base=0x2776dadcf000 name=app:/system/base_unittests [00013.303] 01123.01174> dso: id=05a640477f586007 base=0x25c63b95a000 name=libc++abi.so.1 [00013.303] 01123.01174> dso: id=aeae5a3cc3b99502 base=0x19e739f63000 name=libunwind.so.1 [00013.303] 01123.01174> bt#01: pc 0x2776db98f7a9 sp 0x74c24b6dcb70 (app:/system/base_unittests,0xbc07a9) [00013.303] 01123.01174> bt#02: pc 0x2776db98f484 sp 0x74c24b6dccc0 (app:/system/base_unittests,0xbc0484) [00013.304] 01123.01174> bt#03: pc 0x2776db98937a sp 0x74c24b6dce30 (app:/system/base_unittests,0xbba37a) [00013.304] 01123.01174> bt#04: pc 0x2776db999cd2 sp 0x74c24b6dcfb0 (app:/system/base_unittests,0xbcacd2) [00013.304] 01123.01174> bt#05: pc 0x6f46c213b63f sp 0x74c24b6dcfd0 (libc.so,0x1963f) [00013.304] 01123.01174> bt#06: pc 0x6f46c21b30da sp 0x74c24b6dcff0 (libc.so,0x910da) [00013.304] 01123.01174> bt#07: pc 0 sp 0x74c24b6dd000 [00013.304] 01123.01174> bt#08: end ----- start symbolized stack #01: std::__2::__atomic_base<int, true>::fetch_add(int, std::__2::memory_order) volatile at /work/cr/src/out/fuch/../../third_party/fuchsia-sdk/sysroot/x86_64-fuchsia/include/c++/v1/atomic:1001 (inlined by) base::subtle::NoBarrier_AtomicIncrement(int volatile*, int) at /work/cr/src/out/fuch/../../base/atomicops_internals_portable.h:86 (inlined by) base::internal::TaskTracker::State::DecrementNumTasksBlockingShutdown() at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:157 (inlined by) base::internal::TaskTracker::AfterRunTask(base::TaskShutdownBehavior) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:503 #02: base::time_internal::TimeBase<base::TimeTicks>::is_null() const at /work/cr/src/out/fuch/../../base/time/time.h:338 (inlined by) base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:248 #03: base::internal::SchedulerWorker::Thread::ThreadMain() at /work/cr/src/out/fuch/../../base/task_scheduler/scheduler_worker.cc:80 #04: base::(anonymous namespace)::ThreadFunc(void*) at /work/cr/src/out/fuch/../../base/threading/platform_thread_posix.cc:73 #05: ?? ??:0 #06: ?? ??:0 ----- end symbolized stack
,
Jun 24 2017
Got this again
----- start symbolized stack
#01: std::__2::__atomic_base<int, true>::fetch_add(int, std::__2::memory_order) volatile at /work/cr/src/out/fuch/../../third_party/fuchsia-sdk/sysroot/x86_64-fuchsia/include/c++/v1/atomic:1001
(inlined by) base::subtle::NoBarrier_AtomicIncrement(int volatile*, int) at /work/cr/src/out/fuch/../../base/atomicops_internals_portable.h:86
(inlined by) base::internal::TaskTracker::State::DecrementNumTasksBlockingShutdown() at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:157
(inlined by) base::internal::TaskTracker::AfterRunTask(base::TaskShutdownBehavior) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:503
#02: base::time_internal::TimeBase<base::TimeTicks>::is_null() const at /work/cr/src/out/fuch/../../base/time/time.h:338
(inlined by) base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:248
#03: base::internal::SchedulerWorker::Thread::ThreadMain() at /work/cr/src/out/fuch/../../base/task_scheduler/scheduler_worker.cc:80
#04: base::(anonymous namespace)::ThreadFunc(void*) at /work/cr/src/out/fuch/../../base/threading/platform_thread_posix.cc:73
#05: ?? ??:0
#06: ?? ??:0
----- end symbolized stack
on ContinueOnShutdown/TaskSchedulerTaskTrackerTest.WillPostAndRunLongTaskBeforeShutdown/0 this time.
Very strange, but at least occasionally reproducible now.
,
Jun 26 2017
(doh, wrong bug, sorry, removing #7)
,
Jun 26 2017
Issue 735160 has been merged into this issue.
,
Jun 26 2017
Similar but different in AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostTaskAfterShutdown/0 [00014.224] 01123.01174> <== fatal exception: process [156691] thread [157726] [00014.224] 01123.01174> <== fatal page fault, PC at 0x5abf2bfa915a [00014.224] 01123.01174> CS: 0 RIP: 0x5abf2bfa915a EFL: 0x10206 CR2: 0x18 [00014.224] 01123.01174> RAX: 0 RBX: 0x432e2b913e18 RCX: 0x3c8a1d3b000 RDX: 0x1 [00014.224] 01123.01174> RSI: 0x1 RDI: 0 RBP: 0x1 RSP: 0x8c0069ff9c0 [00014.224] 01123.01174> R8: 0 R9: 0 R10: 0x2c1db8b45100 R11: 0x1000f [00014.224] 01123.01174> R12: 0 R13: 0x18 R14: 0x432e2b913e30 R15: 0x2 [00014.224] 01123.01174> errc: 0x6 [00014.224] 01123.01174> bottom of user stack: [00014.224] 01123.01174> 0x000008c0069ff9c0: 9317e210 00003fd1 92eff286 00000000 |.....?..........| [00014.224] 01123.01174> 0x000008c0069ff9d0: 2b913e50 00000000 57fc453c ff27ac6b |P>.+....<E.Wk.'.| [00014.224] 01123.01174> 0x000008c0069ff9e0: 2b913e30 0000432e 2b913e18 0000432e |0>.+.C...>.+.C..| [00014.224] 01123.01174> 0x000008c0069ff9f0: a464c960 0000742e 00000000 00000000 |`.d..t..........| [00014.224] 01123.01174> 0x000008c0069ffa00: 069ffe50 000008c0 2bfa8c9e 00005abf |P..........+.Z..| [00014.224] 01123.01174> 0x000008c0069ffa10: 00000002 00000000 92f6a0be 00003fd1 |.............?..| [00014.224] 01123.01174> 0x000008c0069ffa20: 069ffcc8 000008c0 2c006d03 00005abf |.........m.,.Z..| [00014.224] 01123.01174> 0x000008c0069ffa30: 00000000 00000000 00000000 00000000 |................| [00014.224] 01123.01174> 0x000008c0069ffa40: 069ffa58 000008c0 069ffa57 000008c0 |X.......W.......| [00014.224] 01123.01174> 0x000008c0069ffa50: 00000000 00000000 2c06baa0 00005abf |...........,.Z..| [00014.224] 01123.01174> 0x000008c0069ffa60: 00000000 00000000 00000000 00000000 |................| [00014.224] 01123.01174> 0x000008c0069ffa70: 00000000 00000000 00000000 00000000 |................| [00014.224] 01123.01174> 0x000008c0069ffa80: 57fc453c ff27ac6b 931f1001 00003fd1 |<E.Wk.'......?..| [00014.224] 01123.01174> 0x000008c0069ffa90: 9317e220 00003fd1 931f1028 00003fd1 | ....?..(....?..| [00014.224] 01123.01174> 0x000008c0069ffaa0: 069ffcc8 000008c0 92f9568c 00003fd1 |.........V...?..| [00014.224] 01123.01174> 0x000008c0069ffab0: 931f10b0 00003fd1 00000000 00000000 |.....?..........| [00014.224] 01123.01174> arch: x86_64 [00014.225] 01123.01174> dso: id=aeae5a3cc3b99502 base=0x6fb8a8425000 name=libunwind.so.1 [00014.225] 01123.01174> dso: id=2b6882f235d28d75ef453da68f39774ef99c8388 base=0x66005e9cb000 name=<vDSO> [00014.225] 01123.01174> dso: id=7232e5e96679f46409d5b78b2b94bccadd2de16a base=0x5abf2bf90000 name=libc.so [00014.225] 01123.01174> dso: id=05a640477f586007 base=0x44bdc7ba5000 name=libc++abi.so.1 [00014.225] 01123.01174> dso: id=49cf2f4e3d6e4c96 base=0x3fd1923b7000 name=app:/system/base_unittests [00014.225] 01123.01174> dso: id=0ee2c1798ac9282426341de1bfb7f8d509010a73 base=0x1cd0e5f64000 name=libmxio.so [00014.225] 01123.01174> dso: id=6682d527b69ec892f730643f7ff711e8f78f4116 base=0x1a00b90d1000 name=liblaunchpad.so [00014.225] 01123.01174> dso: id=765f3f6ec08e043f base=0x850f9f3d000 name=libc++.so.2 [00014.225] 01123.01174> bt#01: pc 0x5abf2bfa915a sp 0x8c0069ff9c0 (libc.so,0x1915a) [00014.225] 01123.01174> bt#02: pc 0x5abf2bfa8c9e sp 0x8c0069ffa10 (libc.so,0x18c9e) [00014.225] 01123.01174> bt#03: pc 0x3fd192f6a0be sp 0x8c0069ffa20 (app:/system/base_unittests,0xbb30be) [00014.226] 01123.01174> bt#04: pc 0x3fd192f7eaa4 sp 0x8c0069ffb70 (app:/system/base_unittests,0xbc7aa4) [00014.226] 01123.01174> bt#05: pc 0x3fd192f7e6a1 sp 0x8c0069ffcc0 (app:/system/base_unittests,0xbc76a1) [00014.226] 01123.01174> bt#06: pc 0x3fd192f7858a sp 0x8c0069ffe30 (app:/system/base_unittests,0xbc158a) [00014.226] 01123.01174> bt#07: pc 0x3fd192f88ee2 sp 0x8c0069fffb0 (app:/system/base_unittests,0xbd1ee2) [00014.226] 01123.01174> bt#08: pc 0x5abf2bfa963f sp 0x8c0069fffd0 (libc.so,0x1963f) [00014.226] 01123.01174> bt#09: pc 0x5abf2c0210da sp 0x8c0069ffff0 (libc.so,0x910da) [00014.226] 01123.01174> bt#10: pc 0 sp 0x8c006a00000 [00014.226] 01123.01174> bt#11: end ----- start symbolized stack #01: ?? ??:0 #02: ?? ??:0 #03: base::ConditionVariable::Signal() at /work/cr/src/out/fuch/../../base/synchronization/condition_variable_posix.cc:? #04: ~AutoSchedulerLock at /work/cr/src/out/fuch/../../base/task_scheduler/scheduler_lock.h:75 (inlined by) base::internal::TaskTracker::DecrementNumPendingUndelayedTasks() at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:526 #05: base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:258 #06: base::internal::SchedulerWorker::Thread::ThreadMain() at /work/cr/src/out/fuch/../../base/task_scheduler/scheduler_worker.cc:80 #07: base::(anonymous namespace)::ThreadFunc(void*) at /work/cr/src/out/fuch/../../base/threading/platform_thread_posix.cc:73 #08: ?? ??:0 #09: ?? ??:0 ----- end symbolized stack It feels like TaskTracker and therefore state_ must be getting prematurely destroyed, but I don't immediately see how that could happen.
,
Jun 26 2017
This one seems to be the most common [00021.731] 02404.02591> ContinueOnShutdown/TaskSchedulerTaskTrackerTest.WillPostAndRunLongTaskBeforeShutdown/0 (../../base/task_scheduler/task_tracker_unittest.cc:253) [00018.398] 01123.01174> <== fatal exception: process [171679] thread [174237] [00018.398] 01123.01174> <== fatal page fault, PC at 0x3edfe7863789 [00018.398] 01123.01174> CS: 0 RIP: 0x3edfe7863789 EFL: 0x10297 CR2: 0 [00018.398] 01123.01174> RAX: 0xfffffffe RBX: 0x4dad76be6140 RCX: 0 RDX: 0x9 [00018.398] 01123.01174> RSI: 0 RDI: 0x4dad76be6140 RBP: 0 RSP: 0x4b1398028b70 [00018.398] 01123.01174> R8: 0x8 R9: 0x4dad76be R10: 0x43d78acef500 R11: 0x2 f[00018.398] 01123.01174> R12: 0 R13: 0x4b1398028e50 R14: 0x4dad76be66e0 R15: 0x1 [00018.398] 01123.01174> errc: 0x6 [00018.398] 01123.01174> bottom of user stack: [00018.398] 01123.01174> 0x00004b1398028b70: b3916740 000074c9 e784fcac 00003edf |@g...t.......>..| [00018.398] 01123.01174> 0x00004b1398028b80: 1295a2f8 00005f32 1295a2e0 00005f32 |....2_......2_..| [00018.398] 01123.01174> 0x00004b1398028b90: 00000000 00000000 e785047f 00003edf |.............>..| [00018.398] 01123.01174> 0x00004b1398028ba0: ffffffff 7fffffff faa878c8 00005941 |.........x..AY..| [00018.398] 01123.01174> 0x00004b1398028bb0: 98028bc8 00004b13 9cf18420 000047e4 |.....K.. ....G..| [00018.398] 01123.01174> 0x00004b1398028bc0: faa878c8 00005941 9cf12880 000047e4 |.x..AY...(...G..| [00018.398] 01123.01174> 0x00004b1398028bd0: 9cf148a0 000047e4 00000000 00000000 |.H...G..........| [00018.398] 01123.01174> 0x00004b1398028be0: 9cf18401 000047e4 dd300165 00007855 |.....G..e.0.Ux..| [00018.398] 01123.01174> 0x00004b1398028bf0: 00000001 00000000 faa878c8 00000000 |.........x......| [00018.398] 01123.01174> 0x00004b1398028c00: 98028d48 00004b13 00000008 00000000 |H....K..........| [00018.398] 01123.01174> 0x00004b1398028c10: 98028c50 00004b13 00000001 00000000 |P....K..........| [00018.398] 01123.01174> 0x00004b1398028c20: 3525f020 00006395 00000000 00000000 | .%5.c..........| [00018.398] 01123.01174> 0x00004b1398028c30: 00000000 00000000 6f8006da 000041be |...........o.A..| [00018.398] 01123.01174> 0x00004b1398028c40: 00000001 00000000 76be6178 00004dad |........xa.v.M..| [00018.398] 01123.01174> 0x00004b1398028c50: 00000000 00000000 e785aace 00003edf |.............>..| [00018.398] 01123.01174> 0x00004b1398028c60: 98028ca0 00004b13 76be6178 00004dad |.....K..xa.v.M..| [00018.398] 01123.01174> arch: x86_64 [00018.399] 01123.01174> dso: id=7232e5e96679f46409d5b78b2b94bccadd2de16a base=0x7855dd290000 name=libc.so [00018.399] 01123.01174> dso: id=05a640477f586007 base=0x6a240b40f000 name=libc++abi.so.1 [00018.399] 01123.01174> dso: id=6682d527b69ec892f730643f7ff711e8f78f4116 base=0x659a812ce000 name=liblaunchpad.so [00018.399] 01123.01174> dso: id=2b6882f235d28d75ef453da68f39774ef99c8388 base=0x5bd2c8035000 name=<vDSO> [00018.399] 01123.01174> dso: id=0ee2c1798ac9282426341de1bfb7f8d509010a73 base=0x5702e53c7000 name=libmxio.so [00018.399] 01123.01174> dso: id=aeae5a3cc3b99502 base=0x5215cc633000 name=libunwind.so.1 [00018.399] 01123.01174> dso: id=765f3f6ec08e043f base=0x41be6f754000 name=libc++.so.2 [00018.399] 01123.01174> dso: id=2b214fc470c9e4e5 base=0x3edfe6ca3000 name=app:/system/base_unittests [00018.399] 01123.01174> bt#01: pc 0x3edfe7863789 sp 0x4b1398028b70 (app:/system/base_unittests,0xbc0789) [00018.400] 01123.01174> bt#02: pc 0x3edfe7863464 sp 0x4b1398028cc0 (app:/system/base_unittests,0xbc0464) [00018.400] 01123.01174> bt#03: pc 0x3edfe785d35a sp 0x4b1398028e30 (app:/system/base_unittests,0xbba35a) [00018.400] 01123.01174> bt#04: pc 0x3edfe786dcb2 sp 0x4b1398028fb0 (app:/system/base_unittests,0xbcacb2) [00018.400] 01123.01174> bt#05: pc 0x7855dd2a963f sp 0x4b1398028fd0 (libc.so,0x1963f) [00018.400] 01123.01174> bt#06: pc 0x7855dd3210da sp 0x4b1398028ff0 (libc.so,0x910da) [00018.400] 01123.01174> bt#07: pc 0 sp 0x4b1398029000 [00018.401] 01123.01174> bt#08: end ----- start symbolized stack #01: std::__2::__atomic_base<int, true>::fetch_add(int, std::__2::memory_order) volatile at /work/cr/src/out/fuch/../../third_party/fuchsia-sdk/sysroot/x86_64-fuchsia/include/c++/v1/atomic:1001 (inlined by) base::subtle::NoBarrier_AtomicIncrement(int volatile*, int) at /work/cr/src/out/fuch/../../base/atomicops_internals_portable.h:86 (inlined by) base::internal::TaskTracker::State::DecrementNumTasksBlockingShutdown() at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:157 (inlined by) base::internal::TaskTracker::AfterRunTask(base::TaskShutdownBehavior) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:503 #02: base::time_internal::TimeBase<base::TimeTicks>::is_null() const at /work/cr/src/out/fuch/../../base/time/time.h:338 (inlined by) base::internal::TaskTracker::RunNextTask(base::internal::Sequence*) at /work/cr/src/out/fuch/../../base/task_scheduler/task_tracker.cc:248 #03: base::internal::SchedulerWorker::Thread::ThreadMain() at /work/cr/src/out/fuch/../../base/task_scheduler/scheduler_worker.cc:80 #04: base::(anonymous namespace)::ThreadFunc(void*) at /work/cr/src/out/fuch/../../base/threading/platform_thread_posix.cc:73 #05: ?? ??:0 #06: ?? ??:0 ----- end symbolized stack
,
Jun 26 2017
Hey Scott, this looks like a racy access to an atomic variable... could it be a bug in atomics on Fuschia?
,
Jun 26 2017
That's definitely possible too. It happens "sometimes" but not often enough to really debug it yet. I'll try to work on making it more reproducible I think.
,
Jun 26 2017
I just got a memory corruption warning running on a plain Linux build, so I think something's messed up here. Unfortunately no symbols on the build.
I'll try ASan or TSan I guess, maybe that'll clear things up.
Found a corrupted memory buffer in MallocBlock (may be offset from user ptr): buffer index: 0, buffer ptr: 0x3722fa4fd600, size of buffer: 680
Buffer byte 144 is 0x02 (should be 0xcd).
Buffer byte 145 is 0x00 (should be 0xcd).
Buffer byte 146 is 0x00 (should be 0xcd).
Buffer byte 147 is 0x00 (should be 0xcd).
Deleted by thread 0x7fbe5eb3a740
*** WARNING: Cannot convert addresses to symbols in output below.
*** Reason: Cannot find 'pprof' (is PPROF_PATH set correctly?)
*** If you cannot fix this, try running pprof directly.
@ 0x7fbe5e65a233
@ 0x7fbe5e5f38ad
@ 0x7fbe5e5f33b3
@ 0x7fbe5e5f2d95
@ 0xc28b02
@ 0xf3ad4b
@ 0xf497de
@ 0xf3a9e2
@ 0xf2fa05
@ 0xf2ff5f
@ 0xf3530c
@ 0xf4d76e
@ 0xf3c2e2
@ 0xf34faf
@ 0xf82e51
@ 0xf81b92
Memory was written to after being freed. MallocBlock: 0x3722fa4fd600, user ptr: 0x3722fa4fd620, size: 680. If you can't find the source of the error, try using ASan (http://code.google.com/p/address-sanitizer/), Valgrind, or Purify, or study the output of the deleter's stack printed above.
Received signal 11 SEGV_MAPERR 000000000039
#0 0x7fbe5e3263bb base::debug::StackTrace::StackTrace()
#1 0x7fbe5e3250bc base::debug::StackTrace::StackTrace()
#2 0x7fbe5e325ecf base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fbe5e7be390 <unknown>
#4 0x7fbe5e601cb8 tcmalloc::Abort()
#5 0x7fbe5e609cda LogPrintf()
#6 0x7fbe5e609b5b RAW_VLOG()
#7 0x7fbe5e63202e MallocBlock::CheckForCorruptedBuffer()
#8 0x7fbe5e631d1b MallocBlock::CheckForDanglingWrites()
#9 0x7fbe5e62fb76 MallocBlock::ProcessFreeQueue()
#10 0x7fbe5e633704 MallocBlock::Deallocate()
#11 0x7fbe5e62c3e5 DebugDeallocate()
#12 0x7fbe5e65a233 tc_free
#13 0x7fbe5e5f38ad (anonymous namespace)::TCFree()
#14 0x7fbe5e5f33b3 ShimCppDelete
#15 0x7fbe5e5f2d95 operator delete()
#16 0x00000053564d __gnu_cxx::new_allocator<>::deallocate()
#17 0x00000053561b std::string::_Rep::_M_destroy()
#18 0x000000535595 std::string::_Rep::_M_dispose()
#19 0x000000533c57 std::string::~string()
#20 0x000000f2f513 testing::TestInfo::~TestInfo()
#21 0x000000f2fce7 testing::internal::Delete<>()
#22 0x000000f4a304 std::for_each<>()
#23 0x000000f3b01e testing::internal::ForEach<>()
#24 0x000000f2fc68 testing::TestCase::~TestCase()
#25 0x000000f2fd19 testing::TestCase::~TestCase()
#26 0x000000f35a7b testing::internal::Delete<>()
#27 0x000000f42914 std::for_each<>()
#28 0x000000f3c7ee testing::internal::ForEach<>()
#29 0x000000f3590a testing::internal::UnitTestImpl::~UnitTestImpl()
#30 0x000000f35ae9 testing::internal::UnitTestImpl::~UnitTestImpl()
#31 0x000000f35611 testing::UnitTest::~UnitTest()
#32 0x7fbe5cf6aff8 <unknown>
#33 0x7fbe5cf6b045 exit
#34 0x7fbe5cf51837 __libc_start_main
#35 0x000000532c14 <unknown>
r8: 00007fffe38c0b40 r9: 000000000000002a r10: 000000000000c0e0 r11: 0000000000000202
r12: 0000000000000001 r13: 00007fbe5d2f6c40 r14: 0000000000000000 r15: 0000000000000000
di: 0000000000000002 si: 00007fffe38c0b40 bp: 00007fffe38c0ae0 bx: 0000000000000000
dx: 0000000000000125 ax: 0000000000000000 cx: 0000000000000000 sp: 00007fffe38c0ae0
ip: 00007fbe5e601cb8 efl: 0000000000010246 cgf: 0000000000000033 erf: 0000000000000006
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000039
[end of stack trace]
Calling _exit(1). Core file will not be generated.
[1341/2480] TaskSchedulerSingleThreadTaskRunnerManagerTest.DifferentThreadsUsed (2 ms)
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerTest.SameThreadUsed
[ OK ] TaskSchedulerSingleThreadTaskRunnerManagerTest.SameThreadUsed (8 ms)
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerTest.RunsTasksOnCurrentThread
[ OK ] TaskSchedulerSingleThreadTaskRunnerManagerTest.RunsTasksOnCurrentThread (14 ms)
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerTest.SharedWithBaseSyncPrimitivesDCHECKs
[WARNING] ../../third_party/googletest/src/googletest/src/gtest-death-test.cc:836:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads.
[ OK ] TaskSchedulerSingleThreadTaskRunnerManagerTest.SharedWithBaseSyncPrimitivesDCHECKs (60 ms)
[----------] 4 tests from TaskSchedulerSingleThreadTaskRunnerManagerTest (85 ms total)
[----------] 1 test from TaskSchedulerSingleThreadTaskRunnerManagerJoinTest
[ RUN ] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoin
[ OK ] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoin (9 ms)
[----------] 1 test from TaskSchedulerSingleThreadTaskRunnerManagerJoinTest (9 ms total)
[----------] 5 tests from TaskSchedulerWorkerStackTest
[ RUN ] TaskSchedulerWorkerStackTest.PushPop
[ OK ] TaskSchedulerWorkerStackTest.PushPop (1 ms)
[ RUN ] TaskSchedulerWorkerStackTest.PeekPop
[ OK ] TaskSchedulerWorkerStackTest.PeekPop (0 ms)
[ RUN ] TaskSchedulerWorkerStackTest.Contains
[ OK ] TaskSchedulerWorkerStackTest.Contains (1 ms)
[ RUN ] TaskSchedulerWorkerStackTest.Remove
[ OK ] TaskSchedulerWorkerStackTest.Remove (0 ms)
[ RUN ] TaskSchedulerWorkerStackTest.PushAfterRemove
[ OK ] TaskSchedulerWorkerStackTest.PushAfterRemove (1 ms)
[----------] 5 tests from TaskSchedulerWorkerStackTest (3 ms total)
[----------] Global test environment tear-down
[==========] 10 tests from 3 test cases ran. (99 ms total)
[ PASSED ] 10 tests.
Found a corrupted memory buffer in MallocBlock (may be offset from user ptr): buffer index: 0, buffer ptr: 0x3722fa4fd600, size of buffer: 680
Buffer byte 144 is 0x02 (should be 0xcd).
Buffer byte 145 is 0x00 (should be 0xcd).
Buffer byte 146 is 0x00 (should be 0xcd).
Buffer byte 147 is 0x00 (should be 0xcd).
Deleted by thread 0x7fbe5eb3a740
*** WARNING: Cannot convert addresses to symbols in output below.
*** Reason: Cannot find 'pprof' (is PPROF_PATH set correctly?)
*** If you cannot fix this, try running pprof directly.
@ 0x7fbe5e65a233
@ 0x7fbe5e5f38ad
@ 0x7fbe5e5f33b3
@ 0x7fbe5e5f2d95
@ 0xc28b02
@ 0xf3ad4b
@ 0xf497de
@ 0xf3a9e2
@ 0xf2fa05
@ 0xf2ff5f
@ 0xf3530c
@ 0xf4d76e
@ 0xf3c2e2
@ 0xf34faf
@ 0xf82e51
@ 0xf81b92
Memory was written to after being freed. MallocBlock: 0x3722fa4fd600, user ptr: 0x3722fa4fd620, size: 680. If you can't find the source of the error, try using ASan (http://code.google.com/p/address-sanitizer/), Valgrind, or Purify, or study the output of the deleter's stack printed above.
Received signal 11 SEGV_MAPERR 000000000039
#0 0x7fbe5e3263bb base::debug::StackTrace::StackTrace()
#1 0x7fbe5e3250bc base::debug::StackTrace::StackTrace()
#2 0x7fbe5e325ecf base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7fbe5e7be390 <unknown>
#4 0x7fbe5e601cb8 tcmalloc::Abort()
#5 0x7fbe5e609cda LogPrintf()
#6 0x7fbe5e609b5b RAW_VLOG()
#7 0x7fbe5e63202e MallocBlock::CheckForCorruptedBuffer()
#8 0x7fbe5e631d1b MallocBlock::CheckForDanglingWrites()
#9 0x7fbe5e62fb76 MallocBlock::ProcessFreeQueue()
#10 0x7fbe5e633704 MallocBlock::Deallocate()
#11 0x7fbe5e62c3e5 DebugDeallocate()
#12 0x7fbe5e65a233 tc_free
#13 0x7fbe5e5f38ad (anonymous namespace)::TCFree()
#14 0x7fbe5e5f33b3 ShimCppDelete
#15 0x7fbe5e5f2d95 operator delete()
#16 0x00000053564d __gnu_cxx::new_allocator<>::deallocate()
#17 0x00000053561b std::string::_Rep::_M_destroy()
#18 0x000000535595 std::string::_Rep::_M_dispose()
#19 0x000000533c57 std::string::~string()
#20 0x000000f2f513 testing::TestInfo::~TestInfo()
#21 0x000000f2fce7 testing::internal::Delete<>()
#22 0x000000f4a304 std::for_each<>()
#23 0x000000f3b01e testing::internal::ForEach<>()
#24 0x000000f2fc68 testing::TestCase::~TestCase()
#25 0x000000f2fd19 testing::TestCase::~TestCase()
#26 0x000000f35a7b testing::internal::Delete<>()
#27 0x000000f42914 std::for_each<>()
#28 0x000000f3c7ee testing::internal::ForEach<>()
#29 0x000000f3590a testing::internal::UnitTestImpl::~UnitTestImpl()
#30 0x000000f35ae9 testing::internal::UnitTestImpl::~UnitTestImpl()
#31 0x000000f35611 testing::UnitTest::~UnitTest()
#32 0x7fbe5cf6aff8 <unknown>
#33 0x7fbe5cf6b045 exit
#34 0x7fbe5cf51837 __libc_start_main
#35 0x000000532c14 <unknown>
r8: 00007fffe38c0b40 r9: 000000000000002a r10: 000000000000c0e0 r11: 0000000000000202
r12: 0000000000000001 r13: 00007fbe5d2f6c40 r14: 0000000000000000 r15: 0000000000000000
di: 0000000000000002 si: 00007fffe38c0b40 bp: 00007fffe38c0ae0 bx: 0000000000000000
dx: 0000000000000125 ax: 0000000000000000 cx: 0000000000000000 sp: 00007fffe38c0ae0
ip: 00007fbe5e601cb8 efl: 0000000000010246 cgf: 0000000000000033 erf: 0000000000000006
trp: 000000000000000e msk: 0000000000000000 cr2: 0000000000000039
[end of stack trace]
Calling _exit(1). Core file will not be generated.
,
Jun 26 2017
@fdoray/robliao: can you think of a situation that would cause a user-after-free of TaskTracker from SchedulerWorker::ThreadMain() in some scheduler unit tests? (the DecrementNumTasksBlockingShutdown() instance on the stack is likely bogus, it just happens to do the first member access on the freed TaskTracker in that stack which results in the first deref of the invalid |this|) @scottmg: trying to repro with ASAN or TSAN sure would help
,
Jun 27 2017
Here's a few hits under ASan Linux: https://gist.github.com/sgraham/7792bce3a1748cbecf0f2a4bb5c05a14 https://gist.github.com/sgraham/54f7bd309bbc34e8bbc5a6fb471f9c7a https://gist.github.com/sgraham/f5390f5b93e234860538066fb18063c1 I seem to be able to repro fairly reliably on my current config, so I'll poke at it a bit. The 54f7... one looks similar to the stack I frequently see on Fuchsia. Looks like the test is getting torn down, but the SchedulerWorker::Thread hasn't shutdown yet. (?)
,
Jun 27 2017
Ok so worker threads are somehow outliving TaskTracker, this means that
virtual void TearDownSingleThreadTaskRunnerManager() {
single_thread_task_runner_manager_->JoinForTesting();
single_thread_task_runner_manager_.reset();
}
which is called from TearDown() doesn't properly join all worker threads before TaskTracker is destroyed along with the fixture..?
@robliao?
Most stacks are in TaskSchedulerSingleThreadTaskRunnerManagerTest but there are a few in other fixtures above? Might be two different bugs. At this point I'd lean to say that these are likely test-only races.
,
Jun 27 2017
Nico pointed out https://build.chromium.org/p/chromium.memory/builders/Linux%20ASan%20LSan%20Tests%20%281%29/builds/36788 running base_unittests happily. I can repro maybe 50% of the time, but I'm on 16.04, so maybe that's a relevant variable somehow. Otherwise fairly plain I think: sgraham@river:/work/cr/src$ cat out/Debug/args.gn proprietary_codecs = true ffmpeg_branding = "Chrome" is_debug = true symbol_level = 2 is_component_build = true is_clang = true enable_nacl = false use_goma = true is_asan = true (Or maybe due to component?)
,
Jun 27 2017
On Linux it does seem to be only in TaskSchedulerSingleThreadTaskRunnerManagerTest. I can't get it to repro when running only those tests though unfortunately, only when running all tests which makes it awkward to narrow down. Another one that I hadn't seen before, but indirectly still is that test fixture: https://gist.github.com/sgraham/a240720e9daace1c0fe6b05a13d3da6a Hmm. My current suspicion is aimed at SchedulerSingleThreadTaskRunnerManager::JoinForTesting() doing ReleaseSharedSchedulerWorkers(). That could release a worker via Cleanup() and removing it from workers_, but then nothing would join it, so it's still running after test TearDown. But I haven't totally convinced myself of that yet.
,
Jun 27 2017
Hmm, but then that'd only fail in SHARED mode, but it also fails in DEDICATED mode. So maybe that's not it.
,
Jun 27 2017
Thought I'd hit this in a new and exciting place, but no it's just some other broken thing O_o https://bugs.chromium.org/p/chromium/issues/detail?id=737311.
,
Jun 28 2017
I didn't notice this in the spray of long class names in the output before, but in https://gist.github.com/sgraham/49e9912258fff6068b2637e7ae016912 it seems clear that I've generally copied the wrong test name in most cases. In particular, it _appears_ that AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostTaskAfterShutdown/0 is crashing, but the memory that's being U-A-F'd is both allocated and freed by base::internal::TaskSchedulerSingleThreadTaskRunnerManagerCommonTest_ThreadNamesSet_Test that is, the test immediately before it. The crashing thread (T12) was also created by base::internal::TaskSchedulerSingleThreadTaskRunnerManagerCommonTest_ThreadNamesSet_Test::TestBody(). So that it's still running during the subsequent test seems likely-not-good. I would have thought I could make this more reproducible by adding delays in the SchedulerWorker thread proc, but I haven't been able to, so it's more subtle than that, somehow. I might try scripting something to soak with to scrape the asan output to see which tests are really the ones that are leaking threads that outlive the TaskTracker.
,
Jun 28 2017
Ah, I think I found one. I read this comment https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker.cc?l=328 and even fell for it the first time! But despite the fact that they're all one statement, there's still a race. :) If should_exit_ and join_called_for_testing_ are checked first, then the thread is suspended, then the test's torn down, then TaskTracker is gone when the rest of the statement is evaluated. With https://chromium-review.googlesource.com/c/551358/1/base/task_scheduler/scheduler_worker.cc applied, even without ASan, a Linux build spews lock errors, because TaskTracker and shutdown_lock_ has been destroyed by the time we get here https://cs.chromium.org/chromium/src/base/task_scheduler/task_tracker.cc?l=268 . I think there might be more similar issues though, because not all the crash stacks are in the lock code or SchedulerWorker::ShouldExit().
,
Jun 28 2017
Aaah, nice one, I think other stacks may also be related. i.e. say ShouldExit() returns false (doesn't crash there) but TaskTracker is freed immediately after, the body executed based on !ShouldExit() uses TaskTracker (and races with it being freed any second). We will need to be more strict about joining SingleThread workers in unit tests. We will fix.
,
Jun 28 2017
@scottmg: Can you try https://chromium-review.googlesource.com/c/552379/ locally?
,
Jun 28 2017
Still seeing hits in ShouldExit() with that patch:
[2048/2446] Allocator/StatisticsRecorderTest.SetCallbackFailsWithoutHistogramTest/0 (1 ms)
[2049/2446] Allocator/StatisticsRecorderTest.SetCallbackFailsWithoutHistogramTest/1 (0 ms)
[2050/2446] Allocator/StatisticsRecorderTest.SetCallbackFailsWithHistogramTest/0 (0 ms)
[2051/2446] AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostTaskAfterShutdown/1 (24 ms)
[ RUN ] AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostDelayedTask/0
[24126:24208:0628/100035.513555:1209272186027:FATAL:lock_impl_posix.cc(63)] Check failed: rv == 0 (22 vs. 0). Invalid argument
#0 0x00000066dd31 in __interceptor_backtrace ??:0:0
#1 0x7f1f2a07422a in base::debug::StackTrace::StackTrace(unsigned long) /work/cr/src/out/Debug/../../base/debug/stack_trace_posix.cc:729:41
#2 0x7f1f2a06e30c in base::debug::StackTrace::StackTrace() /work/cr/src/out/Debug/../../base/debug/stack_trace.cc:199:28
#3 0x7f1f2a201a3c in logging::LogMessage::~LogMessage() /work/cr/src/out/Debug/../../base/logging.cc:553:29
#4 0x7f1f2a5dbe6a in base::internal::LockImpl::Lock() /work/cr/src/out/Debug/../../base/synchronization/lock_impl_posix.cc:63:3
#5 0x7f1f29f6d963 in base::Lock::Acquire() /work/cr/src/out/Debug/../../base/synchronization/lock.h:45:11
#6 0x7f1f2a648869 in base::internal::SchedulerLockImpl::Acquire() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_lock_impl.cc:137:9
#7 0x7f1f2a62b4b1 in base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_lock.h:71:11
#8 0x7f1f2a6af036 in base::internal::TaskTracker::IsShutdownComplete() const /work/cr/src/out/Debug/../../base/task_scheduler/task_tracker.cc:268:21
#9 0x7f1f2a671aea in base::internal::SchedulerWorker::ShouldExit() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_worker.cc:328:25
#10 0x7f1f2a6735a3 in base::internal::SchedulerWorker::Thread::ThreadMain() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_worker.cc:54:21
#11 0x7f1f2a6e28f9 in base::(anonymous namespace)::ThreadFunc(void*) /work/cr/src/out/Debug/../../base/threading/platform_thread_posix.cc:71:13
#12 0x7f1f2886f6ba in start_thread ??:0:0
#13 0x7f1f27a333dd in clone /build/glibc-bfm8X4/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109:0
...
[2048/2446] MemoryPeakDetectorTest.PeakCallbackThrottling (128 ms)
[2049/2446] MemoryPeakDetectorTest.StdDev (189 ms)
[2050/2446] MemoryPeakDetectorTest.Throttle (197 ms)
[2051/2446] AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostTaskAfterShutdown/1 (4 ms)
[ RUN ] AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostDelayedTask/0
[26976:27040:0628/100253.888508:1209410561037:FATAL:lock_impl_posix.cc(63)] Check failed: rv == 0 (22 vs. 0). Invalid argument
#0 0x00000066dd31 in __interceptor_backtrace ??:0:0
#1 0x7fc469c0e22a in base::debug::StackTrace::StackTrace(unsigned long) /work/cr/src/out/Debug/../../base/debug/stack_trace_posix.cc:729:41
#2 0x7fc469c0830c in base::debug::StackTrace::StackTrace() /work/cr/src/out/Debug/../../base/debug/stack_trace.cc:199:28
#3 0x7fc469d9ba3c in logging::LogMessage::~LogMessage() /work/cr/src/out/Debug/../../base/logging.cc:553:29
#4 0x7fc46a175e6a in base::internal::LockImpl::Lock() /work/cr/src/out/Debug/../../base/synchronization/lock_impl_posix.cc:63:3
#5 0x7fc469b07963 in base::Lock::Acquire() /work/cr/src/out/Debug/../../base/synchronization/lock.h:45:11
#6 0x7fc46a1e2869 in base::internal::SchedulerLockImpl::Acquire() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_lock_impl.cc:137:9
#7 0x7fc46a1c54b1 in base::internal::AutoSchedulerLock::AutoSchedulerLock(base::internal::SchedulerLock&) /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_lock.h:71:11
#8 0x7fc46a249036 in base::internal::TaskTracker::IsShutdownComplete() const /work/cr/src/out/Debug/../../base/task_scheduler/task_tracker.cc:268:21
#9 0x7fc46a20baea in base::internal::SchedulerWorker::ShouldExit() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_worker.cc:328:25
#10 0x7fc46a20d5a3 in base::internal::SchedulerWorker::Thread::ThreadMain() /work/cr/src/out/Debug/../../base/task_scheduler/scheduler_worker.cc:54:21
#11 0x7fc46a27c8f9 in base::(anonymous namespace)::ThreadFunc(void*) /work/cr/src/out/Debug/../../base/threading/platform_thread_posix.cc:71:13
#12 0x7fc4684096ba in start_thread ??:0:0
#13 0x7fc4675cd3dd in clone /build/glibc-bfm8X4/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109:0
[2052/2446] AllModes/TaskSchedulerSingleThreadTaskRunnerManagerCommonTest.PostDelayedTask/0 (CRASHED)
I don't seem to hit the crash on Fuchsia though with that CL, so that's progress! :)
,
Jun 28 2017
Another problem is in ~SchedulerSingleThreadTaskRunner(). In DEDICATED, that releases worker threads without joining them.
,
Jun 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/02c80fa9c3cc98fe3ae5e94e881f4a05d9cd174c commit 02c80fa9c3cc98fe3ae5e94e881f4a05d9cd174c Author: Scott Graham <scottmg@chromium.org> Date: Thu Jun 29 00:52:30 2017 fuchsia: Disable TaskScheduler tests This is unfortunately ~300 tests, but I'd like to get the bot to a reliable state (or at least identify the next racy code). Bug: 735701 Change-Id: I45aaac819dcb15fda1cc64a9407142a885fe193b Reviewed-on: https://chromium-review.googlesource.com/553637 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#483234} [modify] https://crrev.com/02c80fa9c3cc98fe3ae5e94e881f4a05d9cd174c/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Jul 15 2017
,
Jul 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a2136defe9b4d7c0d8ea340862e66b3ab1560e00 commit a2136defe9b4d7c0d8ea340862e66b3ab1560e00 Author: Robert Liao <robliao@chromium.org> Date: Mon Jul 17 19:40:45 2017 Release Shared SchedulerWorkers After Joining If shared SchedulerWorkers are released before joining, they are effectively detached and may outlive the SchedulerSingleThreadTaskRunnerManager. BUG= 735701 Change-Id: Ie83b50b99698a4059041454f579f4a2c1fbaabc0 Reviewed-on: https://chromium-review.googlesource.com/572233 Reviewed-by: Gabriel Charette <gab@chromium.org> Commit-Queue: Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#487197} [modify] https://crrev.com/a2136defe9b4d7c0d8ea340862e66b3ab1560e00/base/task_scheduler/scheduler_single_thread_task_runner_manager.cc
,
Jul 17 2017
Thanks Rob, @scottmg can you let us know what is still failing (ideally under ASAN or TSAN)? Le lun. 17 juil. 2017 15:42, bugdroid1 via monorail < monorail+v2.3275348242@chromium.org> a écrit :
,
Jul 17 2017
scottmg is OOO right now. We have all the TaskScheduler tests filtered out right now, but I'm doing a try run now with the filter removed, to see what fails. We don't yet have ASAN/TSan, though.
,
Jul 17 2017
Ok thanks, Scott was doing ASAN/TSAN locally I think to pinpoint the issues Le lun. 17 juil. 2017 17:39, wez via monorail < monorail+v2.1573466780@chromium.org> a écrit :
,
Jul 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4a3ab2babaf8960a0c4747d3fd8e67b66facdc83 commit 4a3ab2babaf8960a0c4747d3fd8e67b66facdc83 Author: Wez <wez@chromium.org> Date: Thu Jul 20 23:55:39 2017 Remove *TaskScheduler* filter, and more specific suppressions. Bug: 735701 Change-Id: I99b4e0c075c8c4787ae84eaf45351e5e69d1fa20 Reviewed-on: https://chromium-review.googlesource.com/575308 Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#488496} [modify] https://crrev.com/4a3ab2babaf8960a0c4747d3fd8e67b66facdc83/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Jul 27 2017
Did r488496 re-enable TaskScheduler tests in Fuschia? Are we good here? If not, let's sprint through the issues as they could be product facing issues.
,
Jul 27 2017
Yes, the tests are enabled now.
,
Jul 27 2017
Awesome so r487197 was the only fixed required?
,
Jul 27 2017
Yes, there were other crbugs for other issues in test code. But, this one only looked like multiple tests were failing, because it was crashing in the test after the one that was actually leaking the worker.
,
Jul 27 2017
Oh wow, r487197 fixed it? Nice!
,
Jul 31 2017
Looks like we're not quite out of the woods yet, still seeing hangs on Fuchsia waterfall: https://luci-milo.appspot.com/buildbot/chromium.fyi/Fuchsia/7725 [01150.774] 02484.02512> 5 tests timed out: [01150.775] 02484.02512> HeapAndPersistent/HistogramTest.FactoryTime/1 (../../base/metrics/histogram_unittest.cc:94) [01150.775] 02484.02512> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelay/12 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) [01150.775] 02484.02512> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelay/13 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) [01150.775] 02484.02512> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelay/3 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) [01150.775] 02484.02512> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTasksViaTaskRunner/1 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) https://luci-milo.appspot.com/buildbot/chromium.fyi/Fuchsia/7727 [00390.568] 02602.02675> 3 tests timed out: [00390.569] 02602.02675> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelay/14 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) [00390.569] 02602.02675> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostDelayedTaskWithTraitsWithDelay/7 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221) [00390.569] 02602.02675> OneTraitsExecutionModePair/TaskSchedulerImplTest.PostTasksViaTaskRunner/11 (../../base/task_scheduler/task_scheduler_impl_unittest.cc:221)
,
Jul 31 2017
,
Jul 31 2017
Any way to get better logs? Try under TSAN again maybe?
,
Jul 31 2017
,
Aug 1 2017
There's no TSan on Fuchsia yet. I tried TSanon Linux for a while, but it didn't repro there.
,
Aug 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9759fff7fc48f5d70e0555d51a54540536cf9cc7 commit 9759fff7fc48f5d70e0555d51a54540536cf9cc7 Author: Scott Graham <scottmg@chromium.org> Date: Wed Aug 02 01:53:03 2017 fuchsia: Filter out some known-flake tests in base_unittests Bug: 738275, 735701 , 750370 , 750756 Change-Id: Ibaa143616ce60ae6d94854f9cadc8daf2d84a37c Reviewed-on: https://chromium-review.googlesource.com/596914 Reviewed-by: Wez <wez@chromium.org> Commit-Queue: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#491200} [modify] https://crrev.com/9759fff7fc48f5d70e0555d51a54540536cf9cc7/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Aug 3 2017
Some different ones flaked here: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F8174%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout [01212.357] 02669.02715> TaskSchedulerImplTest.FileDescriptorWatcherNoOpsAfterShutdown (../../base/task_scheduler/task_scheduler_impl_unittest.cc:471) [01212.357] 02669.02715> TaskSchedulerImplTest.MultipleTraitsExecutionModePairs (../../base/task_scheduler/task_scheduler_impl_unittest.cc:347) [01212.357] 02669.02715> TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaits (../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:559)
,
Aug 4 2017
Any way to get the hang stack?
,
Aug 11 2017
Another perhaps more debuggable flake: TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithCleanup [00119.528] 02951.03022> [1468/2456] TemplateUtil.Less (0 ms) [00119.528] 02951.03022> [1469/2456] HistogramTesterTest.Scope (4 ms) [00119.528] 02951.03022> [1470/2456] HistogramTesterTest.GetHistogramSamplesSinceCreationNotNull (1 ms) [00119.579] 02951.03022> [1471/2456] TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaits (44 ms) [00119.580] 02951.03022> [1472/2456] TaskSchedulerWorkerPoolHistogramTest.NumTasksBetweenWaitsWithCleanup (1692 ms) [00119.580] 02951.03022> [ RUN ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup [00119.580] 02951.03022> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:767: Failure [00119.580] 02951.03022> Expected: 1 [00119.580] 02951.03022> To be equal to: histogram->SnapshotSamples()->GetCount(0) [00119.580] 02951.03022> Which is: 0 [00119.580] 02951.03022> [ FAILED ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1395 ms) [00119.580] 02951.03022> [1473/2456] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1395 ms) [00119.581] 02951.03022> [1474/2456] TaskSchedulerWorkerPoolStandbyPolicyTest.InitOne (18 ms) [00119.581] 02951.03022> [1475/2456] TaskSchedulerWorkerPoolStandbyPolicyTest.VerifyStandbyThread (1549 ms) [00119.581] 02951.03022> [1476/2456] TaskSchedulerWorkerStackTest.PushPop (2 ms) [00119.581] 02951.03022> [1477/2456] TaskSchedulerWorkerStackTest.PeekPop (6 ms) [00119.581] 02951.03022> [1478/2456] TaskSchedulerWorkerStackTest.Contains (2 ms) Or maybe something completely different.
,
Aug 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec commit c4822834e91f44835e3fcdd49fcf9ccafe6a84ec Author: Wez <wez@chromium.org> Date: Sat Aug 12 01:59:30 2017 Fix PlatformThread::CurrentId() under Fuchsia. Under Fuchsia CurrentId() was returning the pthread_t result of calling pthread_self() as the PlatformThreadId, which under POSIX platforms is defined to pid_t. Under Fuchsia these are an unsigned long derived from the pthread's internal point (i.e. 64-bit), and an int (32-bit) respectively. This is changed to use the platform thread handle, which is a 32-bit value as well. Bug: 751894 , 735701 , 738275 Change-Id: Ie5ab21c3c01d609b9b2d5b6a43d04ae57990eaf3 Reviewed-on: https://chromium-review.googlesource.com/611853 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#493935} [modify] https://crrev.com/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec/base/threading/platform_thread.h [modify] https://crrev.com/c4822834e91f44835e3fcdd49fcf9ccafe6a84ec/base/threading/platform_thread_posix.cc
,
Aug 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0 commit 0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0 Author: Wez <wez@chromium.org> Date: Sat Aug 12 16:45:16 2017 Fix inappropriate use of pthread_self() in PlatformThread::SetName(). We updated PlatformThread::CurrentID() to work correctly for Fuchsia in https://chromium-review.googlesource.com/c/611853, but ironically the Fuchsia version of PlatformThread::SetName() was still mis-using pthread_self() as the PlatformThreadId as well, causing swathes of tests to fail due to ThreadIdManager::Set/Get being called with different Ids (pthread_t vs mx_handle_t) and therefore mis-matching. TBR: scottmg@chromium.org Bug: 751894 , 735701 , 738275 Change-Id: I520560050c18efe624f701b4b9c5925b92bbec9a Reviewed-on: https://chromium-review.googlesource.com/612027 Reviewed-by: Wez <wez@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#493980} [modify] https://crrev.com/0cdc0f4c4db9d2766b29d3a93b3617c107f9f8b0/base/threading/platform_thread_fuchsia.cc
,
Aug 15 2017
Failure in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/8805 on TaskSchedulerWorkerPoolImplTest.PostDelayedTask. [00215.689] 02705.02792> [2431/2472] Sequenced/TaskSchedulerWorkerPoolImplTest.PostTaskAfterShutdown/0 (219 ms) [00215.690] 02705.02792> [ RUN ] Sequenced/TaskSchedulerWorkerPoolImplTest.PostDelayedTask/0 [00215.690] 02705.02792> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:340: Failure [00215.690] 02705.02792> Expected: (actual_delay) < (TimeDelta::FromMilliseconds(250) + TestTimeouts::tiny_timeout()), actual: 0.364163 s vs 0.35 s [00215.690] 02705.02792> [ FAILED ] Sequenced/TaskSchedulerWorkerPoolImplTest.PostDelayedTask/0, where GetParam() = 4-byte object <01-00 00-00> (658 ms)
,
Aug 16 2017
https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/8817 hit this one again: [00126.073] 02369.02432> [ RUN ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup [00126.073] 02369.02432> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:787: Failure [00126.073] 02369.02432> Expected: 1 [00126.073] 02369.02432> To be equal to: histogram->SnapshotSamples()->GetCount(0) [00126.073] 02369.02432> Which is: 0 [00126.073] 02369.02432> [ FAILED ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (7765 ms)
,
Aug 18 2017
https://build.chromium.org/p/chromium.fyi/builders/Fuchsia/builds/8927 had a timeout of [00418.681] 02400.02428> TaskSchedulerTaskTrackerTest.LoadWillPostAndRunBeforeShutdown (../../base/task_scheduler/task_tracker_unittest.cc:728) Haven't investigated.
,
Sep 11 2017
MAY_BLOCK/TaskSchedulerWorkerPoolBlockingEnterExitTest.ThreadBlockedUnblocked/0 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9291%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout
,
Sep 11 2017
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9313%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout [00057.449] 02788.02888> [ RUN ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup [00057.449] 02788.02888> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:620: Failure [00057.451] 02788.02888> Expected: *thread_ref [00057.451] 02788.02888> Which is: 8-byte object <30-DB F0-75 B7-0B 00-00> [00057.451] 02788.02888> To be equal to: PlatformThread::CurrentRef() [00057.452] 02788.02888> Which is: 8-byte object <30-DB 9C-9C 25-1B 00-00> [00057.453] 02788.02888> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:651: Failure [00057.454] 02788.02888> Expected: (thread_ref) != (PlatformThread::CurrentRef()), actual: 8-byte object <30-DB F0-75 B7-0B 00-00> vs 8-byte object <30-DB F0-75 B7-0B 00-00> [00057.455] 02788.02888> Worker reused. Worker will not cleanup and the histogram value will be wrong. [00057.455] 02788.02888> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:678: Failure [00057.456] 02788.02888> Expected: 0 [00057.456] 02788.02888> To be equal to: histogram->SnapshotSamples()->GetCount(1) [00057.456] 02788.02888> Which is: 1 [00057.456] 02788.02888> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:680: Failure [00057.457] 02788.02888> Expected: 1 [00057.457] 02788.02888> To be equal to: histogram->SnapshotSamples()->GetCount(3) [00057.457] 02788.02888> Which is: 0 [00057.458] 02788.02888> [ FAILED ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1340 ms) [00057.458] 02788.02888> [1520/2530] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1340 ms)
,
Sep 11 2017
Similarly: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9328%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout [00054.894] 02742.02836> [ RUN ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup [00054.894] 02742.02836> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:620: Failure [00054.894] 02742.02836> Expected: *thread_ref [00054.895] 02742.02836> Which is: 8-byte object <30-3B C6-F6 E9-0E 00-00> [00054.895] 02742.02836> To be equal to: PlatformThread::CurrentRef() [00054.896] 02742.02836> Which is: 8-byte object <30-2B 96-07 E7-72 00-00> [00054.896] 02742.02836> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:651: Failure [00054.896] 02742.02836> Expected: (thread_ref) != (PlatformThread::CurrentRef()), actual: 8-byte object <30-3B C6-F6 E9-0E 00-00> vs 8-byte object <30-3B C6-F6 E9-0E 00-00> [00054.897] 02742.02836> Worker reused. Worker will not cleanup and the histogram value will be wrong. [00054.897] 02742.02836> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:678: Failure [00054.897] 02742.02836> Expected: 0 [00054.897] 02742.02836> To be equal to: histogram->SnapshotSamples()->GetCount(1) [00054.898] 02742.02836> Which is: 1 [00054.898] 02742.02836> ../../base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:680: Failure [00054.898] 02742.02836> Expected: 1 [00054.899] 02742.02836> To be equal to: histogram->SnapshotSamples()->GetCount(3) [00054.899] 02742.02836> Which is: 0 [00054.899] 02742.02836> [ FAILED ] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1697 ms) [00054.899] 02742.02836> [1520/2530] TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup (1697 ms)
,
Sep 11 2017
Urgh, this one's ugly: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9351%2F%2B%2Frecipes%2Fsteps%2Fbase_unittests%2F0%2Fstdout [00123.963] 02704.02749> 10 tests failed on exit: [00123.963] 02704.02749> TaskSchedulerLock.AcquireLocksDifferentThreadsSafely (../../base/task_scheduler/scheduler_lock_unittest.cc:178) [00123.964] 02704.02749> TaskSchedulerLock.AcquireLocksWithPredecessorDifferentThreadsSafelyNoInterference (../../base/task_scheduler/scheduler_lock_unittest.cc:230) [00123.964] 02704.02749> TaskSchedulerLock.AcquireLocksWithPredecessorDifferentThreadsSafelyPredecessorFirst (../../base/task_scheduler/scheduler_lock_unittest.cc:192) [00123.964] 02704.02749> TaskSchedulerLock.AcquireLocksWithPredecessorDifferentThreadsSafelyPredecessorLast (../../base/task_scheduler/scheduler_lock_unittest.cc:211) [00123.966] 02704.02749> TaskSchedulerLock.PredecessorCycle (../../base/task_scheduler/scheduler_lock_unittest.cc:264) [00123.966] 02704.02749> TaskSchedulerLock.PredecessorLongerCycle (../../base/task_scheduler/scheduler_lock_unittest.cc:275) [00123.966] 02704.02749> TaskSchedulerLock.SelfReferentialLock (../../base/task_scheduler/scheduler_lock_unittest.cc:254) [00123.966] 02704.02749> TaskSchedulerSingleThreadTaskRunnerManagerTest.DifferentThreadsUsed (../../base/task_scheduler/scheduler_single_thread_task_runner_manager_unittest.cc:96) [00123.967] 02704.02749> TaskSchedulerSingleThreadTaskRunnerManagerTest.RunsTasksInCurrentSequence (../../base/task_scheduler/scheduler_single_thread_task_runner_manager_unittest.cc:149) [00123.967] 02704.02749> TaskSchedulerSingleThreadTaskRunnerManagerTest.SameThreadUsed (../../base/task_scheduler/scheduler_single_thread_task_runner_manager_unittest.cc:122)
,
Jan 19 2018
Filed issue 803900 for what looks like timing-dependent flake in a TaskScheduler test.
,
Jan 19 2018
gab: We have a hook in place so that when a test job process hangs we should get a full set of thread dumps - you can see an example in issue 803900 (though that is straightforward case of invalid expectations, I think). We can re-enable some of the filtered-out tests on our FYI bots to gather hang backtraces.
,
Jan 22 2018
@wez: sounds very exciting, let's wait until issue 803900 is fixed and then enable it to diagnose remaining errors?
,
Jan 22 2018
Re #60: The thread dumps in issue 803900 , since we see a variety of tests hang[1] on the Fuchsia bots, now and again, so we have a load of these diagostics in place to try to track down the issues. [1] The thread dumps are triggered when the TestLauncher thinks that a test job process (one of the processes that actually runs a batch of tests) has taken too long, so technically the process may not actually be hung, just excessively slow. In practice, though, our timeouts are so huge that processes hitting them _are_ hung. :)
,
Feb 8 2018
,
Feb 8 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2cf7a75a8ee8fe4d1a28dedd0841e3806df2896e commit 2cf7a75a8ee8fe4d1a28dedd0841e3806df2896e Author: Scott Graham <scottmg@chromium.org> Date: Thu Feb 08 19:55:40 2018 fuchsia: Filter TaskSchedulerWorkerPoolHistogramTest.NumTasksBeforeCleanup https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/16001 Bug: 735701 Change-Id: Ic21baf4b50b66ecd3b765763f10a4a82189e3171 Reviewed-on: https://chromium-review.googlesource.com/909233 Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#535479} [modify] https://crrev.com/2cf7a75a8ee8fe4d1a28dedd0841e3806df2896e/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Feb 21 2018
,
Feb 21 2018
,
Feb 21 2018
,
Feb 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/35b3464dccd14d07d154bd660adc9a608ed5bd5c commit 35b3464dccd14d07d154bd660adc9a608ed5bd5c Author: Wez <wez@chromium.org> Date: Sat Feb 24 23:49:27 2018 Broaden test filter for TaskSchedulerWorkerPoolHistogramTests. These tests rely on thread sleeps for synchronization, which makes them racey when run on heavily-loaded systems. TBR: scottmg Bug: 816170 , 735701 Change-Id: I9cad471cf8bc88fd91dc33662489983f13f18786 Reviewed-on: https://chromium-review.googlesource.com/936446 Reviewed-by: Wez <wez@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#539025} [modify] https://crrev.com/35b3464dccd14d07d154bd660adc9a608ed5bd5c/testing/buildbot/filters/fuchsia.base_unittests.filter
,
Feb 26 2018
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/274f2e7f66d77271128910677f72b802a92e6c78 commit 274f2e7f66d77271128910677f72b802a92e6c78 Author: Gabriel Charette <gab@chromium.org> Date: Thu Mar 01 15:03:00 2018 [SchedulerWorkerPoolImpl] Make sure workers do not cleanup during WaitForWorkersIdle*(). While auditing the code I noticed that this was a problem: if a worker cleans up while waiting for n workers to be idle, the idle stack will shrink and this condition may never be reached. This prevents this completely for WaitForAllWorkersIdleForTesting() and greatly reduces the likelihood for WaitForWorkersIdleForTesting(n). WaitForAllWorkersIdleForTesting() is safe because "n" is obtained under the lock. Whereas an external call to WaitForWorkersIdleForTesting(n) could wait on a worker which was already detached. But in practice if the reclaim timeout is long enough and the main thread invokes this immediately after unblocking a test task and waiting for it to resolve, the race would require the main thread to be preempted for the full reclaim timeout for no reason which seems highly unlikely. Audited current usage and added a comment as such. R=fdoray@chromium.org, robliao@chromium.org Bug: 816170 , 735701 Change-Id: Ie7b4fd81c04f53cc9597ebc02a25b39d4f1d658f Reviewed-on: https://chromium-review.googlesource.com/937361 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#540154} [modify] https://crrev.com/274f2e7f66d77271128910677f72b802a92e6c78/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/274f2e7f66d77271128910677f72b802a92e6c78/base/task_scheduler/scheduler_worker_pool_impl.h
,
Mar 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c commit 0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c Author: Gabriel Charette <gab@chromium.org> Date: Thu Mar 01 15:30:20 2018 [SchedulerWorkerPoolImpl] Introduce WaitForNWorkersCleanedUpForTesting() And use it instead of flaky sleep timings in tests that exercise the reclaim timeout. Bonus: this speeds up these tests from ~1500ms to ~500ms in the normal case. R=fdoray@chromium.org, robliao@chromium.org Bug: 735701 , 803900 Change-Id: I3b271fda6c7b42033ac7b3c128e8663e80acebb3 Reviewed-on: https://chromium-review.googlesource.com/937505 Commit-Queue: Gabriel Charette <gab@chromium.org> Reviewed-by: François Doray <fdoray@chromium.org> Reviewed-by: Robert Liao <robliao@chromium.org> Cr-Commit-Position: refs/heads/master@{#540162} [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl.cc [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl.h [modify] https://crrev.com/0fa2041e49fc2c94f22c314bde7ef0ab5ffe105c/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
,
Apr 26 2018
,
Apr 27 2018
I think we can call this fixed. Issue 768436 is tracking the last flakes in TaskSchedulerWorkerPoolBlockingTests (fix incoming). Woohoo :)! Thanks Fuchsia for being the best stress test of TaskScheduler tests! |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by scottmg@chromium.org
, Jun 22 2017