AliveCheckerTest.NotifyThenStop regularly fails on Fuchsia |
|||||
Issue descriptionWe often get a "fatal exception" from the AliveCheckerThread, crashing the test: [00005.644] 01194.01247> <== fatal exception: process media_unittests[3908] thread AliveCheckerThread[4925] [00005.645] 01194.01247> <== fatal page fault, PC at 0x44f71b5f51c0 [00005.645] 01194.01247> CS: 0 RIP: 0x44f71b5f51c0 EFL: 0x10202 CR2: 0x28 [00005.645] 01194.01247> RAX: 0x2a1d3c275100 RBX: 0 RCX: 0x44f71b016530 RDX: 0x1b58 [00005.646] 01194.01247> RSI: 0x1 RDI: 0 RBP: 0x1 RSP: 0x5ba589798b40 [00005.646] 01194.01247> R8: 0 R9: 0 R10: 0 R11: 0x246 [00005.647] 01194.01247> R12: 0x2a1d3c0e6dcc R13: 0x5ba589798dc8 R14: 0x1b58 R15: 0x2a1d3c275100 [00005.647] 01194.01247> errc: 0x4 [00005.647] 01194.01247> bottom of user stack: [00005.647] 01194.01247> 0x00005ba589798b40: 3c10eb01 00002a1d 00000001 00000000 |...<.*..........| [00005.647] 01194.01247> 0x00005ba589798b50: 00000000 00000000 89798d50 00005ba5 |........P.y..[..| [00005.647] 01194.01247> 0x00005ba589798b60: 00000094 00000000 89798d50 00005ba5 |........P.y..[..| [00005.648] 01194.01247> 0x00005ba589798b70: 3c0e6dcc 00002a1d 89798d50 00005ba5 |.m.<.*..P.y..[..| [00005.648] 01194.01247> 0x00005ba589798b80: 89798dc8 00005ba5 3c0e6dcc 00002a1d |..y..[...m.<.*..| [00005.648] 01194.01247> 0x00005ba589798b90: 89798dc8 00005ba5 00001b58 00000000 |..y..[..X.......| [00005.648] 01194.01247> 0x00005ba589798ba0: 3c275100 00002a1d 1b016552 000044f7 |.Q'<.*..Re...D..| [00005.648] 01194.01247> 0x00005ba589798bb0: 00000000 00000000 1ab4f82a 000044f7 |........*....D..| [00005.648] 01194.01247> 0x00005ba589798bc0: 1ab67144 000044f7 0000007d 00002a1d |Dq...D..}....*..| [00005.648] 01194.01247> 0x00005ba589798bd0: 1b8b937d 000044f7 89798dc8 00005ba5 |}....D....y..[..| [00005.649] 01194.01247> 0x00005ba589798be0: 3c0e6dcc 00002a1d 89798dc8 00005ba5 |.m.<.*....y..[..| [00005.649] 01194.01247> 0x00005ba589798bf0: 1ab7426b 000044f7 1bf9c5d0 000044f7 |kB...D.......D..| [00005.649] 01194.01247> 0x00005ba589798c00: 89798c78 00005ba5 1b8b9417 000044f7 |x.y..[.......D..| [00005.649] 01194.01247> 0x00005ba589798c10: efefefef efefefef 1b8e7f3f 000044f7 |........?....D..| [00005.650] 01194.01247> 0x00005ba589798c20: 00000000 00000000 00000000 00000000 |................| [00005.650] 01194.01247> 0x00005ba589798c30: 00000000 00000000 00000000 00000000 |................| [00005.650] 01194.01247> arch: x86_64 [00005.651] 01194.01247> dso: id=2ed9c9cc859c68fc12b4ef7cbc7e8d7d3a614e9d base=0x7ea041422000 name=liblaunchpad.so [00005.651] 01194.01247> dso: id=d6e9fb1e62bfd459ed2dec036b7ab03b8af647b7 base=0x7be2be175000 name=libc.so [00005.651] 01194.01247> dso: id=08bb51a867fccef7b32bd0803d15f9780df4969f base=0x79058432c000 name=<vDSO> [00005.651] 01194.01247> dso: id=849db4a46a2865a4 base=0x78947bae7000 name=libfxl_logging.so [00005.651] 01194.01247> dso: id=a5c49bbb8fad70ffa18dc0d21afe48a906e654ce base=0x6ec1d5147000 name=libfdio.so [00005.652] 01194.01247> dso: id=43e40effc277a0f3 base=0x63a67adad000 name=libc++.so.2 [00005.653] 01194.01247> dso: id=4f5dba440e49f4ca base=0x463f0e974000 name=libfsl.so [00005.653] 01194.01247> dso: id=5fb8214852d0f1d1 base=0x44f71ab05000 name=app:media_unittests [00005.654] 01194.01247> dso: id=b11d9e4b67c7edf2e8a271c2856fdea3cd820d52 base=0x2b5df8d4a000 name=libtrace-engine.so [00005.654] 01194.01247> dso: id=309d651ab2f7577f base=0x174d69a6b000 name=libfxl.so [00005.654] 01194.01247> dso: id=626ed768c4f9e078 base=0xfd568ce2000 name=libunwind.so.1 [00005.654] 01194.01247> dso: id=944d3dc1d4acace1444ae8357308aa0099259115 base=0xa7d6fecf000 name=libasync-default.so [00005.655] 01194.01247> dso: id=8f7dacc26d4b893d base=0x4ff4b743000 name=libmedia_client.so [00005.655] 01194.01247> dso: id=f034f421787360bc base=0x2e14ab4e000 name=libc++abi.so.1 #01: media::AliveChecker::NotifyAlive() at ??:? #02: media::AliveCheckerTest::NotifyAliveMultipleTimes(int, base::TimeDelta) at ??:? #03: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:? #04: base::MessageLoop::RunTask(base::PendingTask*) at ??:? #05: base::MessageLoop::DoDelayedWork(base::TimeTicks*) at ??:? #06: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) at ??:? #07: base::RunLoop::Run() at ??:? #08: base::Thread::ThreadMain() at ??:? #09: base::(anonymous namespace)::ThreadFunc(void*) at ??:? #10: pc 0x7be2be18d896 sp 0x5ba589798fe0 (libc.so,0x18896) #11: pc 0x7be2be20464a sp 0x5ba589798ff0 (libc.so,0x8f64a) #12: pc 0 sp 0x5ba589799000 Looking at the design of the test, it may simply be that when it runs too slowly it risks an Unretained() Bind() to the AliveChecker object, which will be torn-down during test termination, which may occur if the test times-out.
,
Nov 30 2017
Not sure what this class is for. +STO folk.
,
Dec 1 2017
I wrote AliveChecker and its test. It sure seems like it could be a test problem. I can take a closer look next week.
,
Dec 2 2017
,
Dec 5 2017
wez@: is there any way you can confirm it happens at a timeout? I looks like what happens is that the |alive_checker_thread_| is still running and executes the delayed task NotifyAliveMultipleTimes(), but the |alive_checker_| is gone.
,
Dec 7 2017
Re #5: Looking at the test, the NotifyAliveMultipleTimes posts a fixed number of delayed tasks on the AliveChecker, regardless of whether it has been stopped, or destroyed (see https://cs.chromium.org/chromium/src/media/audio/alive_checker_unittest.cc?sq=package:chromium&l=120). The test will therefore crash if there is enough of a delay between the ResetAliveCheckerOnAliveCheckerThread task being processed, and the thread being torn-down. Incidentally, we also seem to have NotifyAliveMultipleTimes notifying the AliveChecker the first time on the calling thread, but on subsequent times on the AliveChecker thread. There also seems to be a fair amount of redundant thread-hopping, e.g. the test posts a task to create the AliveChecker on the alive-checker-thread, and then waits for that task to complete; since all the other interaction with the AliveChecker is by tasks posted to the thread, the waiting doesn't seem necessary, and just adds complexity.
,
Dec 7 2017
I'm able to repro a crash with this patch:
--- a/media/audio/alive_checker_unittest.cc
+++ b/media/audio/alive_checker_unittest.cc
@@ -86,6 +86,11 @@ class AliveCheckerTest : public testing::Test {
base::BindOnce(&AliveCheckerTest::ResetAliveCheckerOnAliveCheckerThread,
base::Unretained(this), &done));
done.Wait();
+
+ // Allow the AliveChecker thread to continue running long enough to process
+ // one more notification after it has processed the reset task.
+ base::PlatformThread::Sleep(base::TimeDelta::FromMilliseconds(kNotifyIntervalMs));
+
}
void CreateAliveChecker(bool stop_at_first_alive_notification,
@@ -121,6 +126,10 @@ class AliveCheckerTest : public testing::Test {
base::TimeDelta delay) {
alive_checker_->NotifyAlive();
if (remaining_notifications > 1) {
+
+ // Pause long enough for the AliveChecker to think we're "dead".
+ base::PlatformThread::Sleep(base::TimeDelta::FromMilliseconds(kTimeoutMs));
+
alive_checker_thread_.task_runner()->PostDelayedTask(
FROM_HERE,
base::BindOnce(&AliveCheckerTest::NotifyAliveMultipleTimes,
This introduces a delay after NotifyAlive(), long enough to cause there to be NotifyAliveMultipleTimes tasks pending even after the WaitUntilDetectedDeadWithTimeout deadline (normally all the tasks would run within kNotifyIntervalMs * 10). We then introduce a delay after the reset task is posted, long enough to allow any in-flight NotifyAliveMultipleTimes task to run - then we see one of those execute, and crash trying to use the already-reset AliveChecker.
You can get the same effect by adding the Sleep() during teardown, but increasing the number of NotifyAliveMultipleTimes() notifications, rather than putting the Sleep in there, since that also causes there to be a delayed task pending that will touch the deleted object if they run.
I'm not sure whether the intention is actually that NotifyAliveMultipleTimes() is a blocking call; that isn't clear from the thread, and since most of the other calls are blocking, it may be that the test author assumed that one is blocking as well, and the design of the test certainly implies that - with non-blocking NotifyAliveMultipleTimes I don't think the test waits for long enough to detect a dead signal, right now.
Incidentally, it would be helpful to add some comments to describe the behaviours the tests are expecting - the current comment describes more or less verbatim what calls the test makes, without explaining _why_ those calls are testing something useful. In the case of NotifyThenStop, IIUC the idea is that after Stop()ing the checker, you should never get a "dead" notification, no matter what.
HTH
,
Dec 7 2017
FWIW, my recommendation would be to remove the alive-checker-thread and instead rebuild these tests using a mock clock.
,
Dec 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/165013536a7eee0361dd46be6fe210dcec094d28 commit 165013536a7eee0361dd46be6fe210dcec094d28 Author: Wez <wez@chromium.org> Date: Wed Dec 13 00:37:49 2017 Disable some AliveCheckerTests, which are inherently flaky. Bug: 789804 Change-Id: I58f9b55c85efd1e3dc6cca47798aa6e3b2aad1b7 Reviewed-on: https://chromium-review.googlesource.com/822732 Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#523624} [modify] https://crrev.com/165013536a7eee0361dd46be6fe210dcec094d28/media/audio/alive_checker_unittest.cc
,
Dec 20 2017
AliveCheckterTest.NotifyThenDetectDead flaked on the Fuchsia (dbg) bot: https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/15163 This is another test which uses NotifyAliveMultipleTimes(), leading to a crash: #01: pc 0x7d89e3a74416 sp 0xb731cdfcd80 (libmedia.so,0x253416) #02: pc 0x7d89e3a73e63 sp 0xb731cdfcef0 (libmedia.so,0x252e63) #03: media::AliveCheckerTest::NotifyAliveMultipleTimes(int, base::TimeDelta) at ??:? #04: void base::internal::FunctorTraits<void (media::AliveCheckerTest::*)(int, base::TimeDelta), void>::Invoke<media::AliveCheckerTest*, int, base::TimeDelta>(void (media::AliveCheckerTest::*)(int, base::TimeDelta), media::AliveCheckerTest*&&, int&&, base::TimeDelta&&) at ??:? #05: void base::internal::InvokeHelper<false, void>::MakeItSo<void (media::AliveCheckerTest::*)(int, base::TimeDelta), media::AliveCheckerTest*, int, base::TimeDelta>(void (media::AliveCheckerTest::*&&)(int, base::TimeDelta), media::AliveCheckerTest*&&, int&&, base::TimeDelta&&) at ??:? #06: void base::internal::Invoker<base::internal::BindState<void (media::AliveCheckerTest::*)(int, base::TimeDelta), base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, void ()>::RunImpl<void (media::AliveCheckerTest::*)(int, base::TimeDelta), std::__1::tuple<base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, 0ul, 1ul, 2ul>(void (media::AliveCheckerTest::*&&)(int, base::TimeDelta), std::__1::tuple<base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>&&, std::__1::integer_sequence<unsigned long, 0ul, 1ul, 2ul>) at ??:? #07: base::internal::Invoker<base::internal::BindState<void (media::AliveCheckerTest::*)(int, base::TimeDelta), base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:? ...
,
Dec 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8a5ef6b01900b3ca19827854fbafd754f3e2e197 commit 8a5ef6b01900b3ca19827854fbafd754f3e2e197 Author: Wez <wez@chromium.org> Date: Wed Dec 20 21:11:46 2017 Disable some more AliveCheckerTests. This disables the remaining tests which use the NotifyAliveMultipleTimes helper function, until they can be rewritten to use e.g. a MOCK_TIME ScopedTaskEnvironment instance. Bug: 789804 Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel Change-Id: I496fd5c76ebaed8df56a6c8b93225a2d3b0fe493 Reviewed-on: https://chromium-review.googlesource.com/836736 Commit-Queue: Yuri Wiitala <miu@chromium.org> Reviewed-by: Yuri Wiitala <miu@chromium.org> Cr-Commit-Position: refs/heads/master@{#525454} [modify] https://crrev.com/8a5ef6b01900b3ca19827854fbafd754f3e2e197/media/audio/alive_checker_unittest.cc
,
Jan 24 2018
,
Mar 7 2018
AliveCheckerTest.SuspendResumeWithAutoStop_NotifyBeforeSuspendAndRestart failed on fuchsia/x64/Debug in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/16935; the test itself failed some expectations: [00462.339] 04060.04119> [ RUN ] AliveCheckerTest.SuspendResumeWithAutoStop_NotifyBeforeSuspendAndRestart [00462.339] 04060.04119> ../../media/audio/alive_checker_unittest.cc:455: Failure [00462.340] 04060.04119> Value of: GetDetectedDead() [00462.340] 04060.04119> Actual: true [00462.340] 04060.04119> Expected: false [00462.340] 04060.04119> ../../media/audio/alive_checker_unittest.cc:462: Failure [00462.340] 04060.04119> Value of: WaitUntilDetectedDeadWithTimeout( base::TimeDelta::FromMilliseconds(kTimeoutMs + kCheckIntervalMs + 10)) [00462.341] 04060.04119> Actual: true [00462.341] 04060.04119> Expected: false [00462.341] 04060.04119> ../../media/audio/alive_checker_unittest.cc:463: Failure [00462.342] 04060.04119> Value of: GetDetectedDead() [00462.342] 04060.04119> Actual: true [00462.343] 04060.04119> Expected: false and it then appears to have stalled or taken sufficiently long that it was terminated by the TestLauncher. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by w...@chromium.org
, Nov 30 2017