New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 789804 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 2
Type: Bug



Sign in to add a comment

AliveCheckerTest.NotifyThenStop regularly fails on Fuchsia

Project Member Reported by w...@chromium.org, Nov 30 2017

Issue description

We often get a "fatal exception" from the AliveCheckerThread, crashing the test:

[00005.644] 01194.01247> <== fatal exception: process media_unittests[3908] thread AliveCheckerThread[4925]
[00005.645] 01194.01247> <== fatal page fault, PC at 0x44f71b5f51c0
[00005.645] 01194.01247>  CS:                   0 RIP:     0x44f71b5f51c0 EFL:            0x10202 CR2:               0x28
[00005.645] 01194.01247>  RAX:     0x2a1d3c275100 RBX:                  0 RCX:     0x44f71b016530 RDX:             0x1b58
[00005.646] 01194.01247>  RSI:                0x1 RDI:                  0 RBP:                0x1 RSP:     0x5ba589798b40
[00005.646] 01194.01247>   R8:                  0  R9:                  0 R10:                  0 R11:              0x246
[00005.647] 01194.01247>  R12:     0x2a1d3c0e6dcc R13:     0x5ba589798dc8 R14:             0x1b58 R15:     0x2a1d3c275100
[00005.647] 01194.01247>  errc:               0x4
[00005.647] 01194.01247> bottom of user stack:
[00005.647] 01194.01247> 0x00005ba589798b40: 3c10eb01 00002a1d 00000001 00000000 |...<.*..........|
[00005.647] 01194.01247> 0x00005ba589798b50: 00000000 00000000 89798d50 00005ba5 |........P.y..[..|
[00005.647] 01194.01247> 0x00005ba589798b60: 00000094 00000000 89798d50 00005ba5 |........P.y..[..|
[00005.648] 01194.01247> 0x00005ba589798b70: 3c0e6dcc 00002a1d 89798d50 00005ba5 |.m.<.*..P.y..[..|
[00005.648] 01194.01247> 0x00005ba589798b80: 89798dc8 00005ba5 3c0e6dcc 00002a1d |..y..[...m.<.*..|
[00005.648] 01194.01247> 0x00005ba589798b90: 89798dc8 00005ba5 00001b58 00000000 |..y..[..X.......|
[00005.648] 01194.01247> 0x00005ba589798ba0: 3c275100 00002a1d 1b016552 000044f7 |.Q'<.*..Re...D..|
[00005.648] 01194.01247> 0x00005ba589798bb0: 00000000 00000000 1ab4f82a 000044f7 |........*....D..|
[00005.648] 01194.01247> 0x00005ba589798bc0: 1ab67144 000044f7 0000007d 00002a1d |Dq...D..}....*..|
[00005.648] 01194.01247> 0x00005ba589798bd0: 1b8b937d 000044f7 89798dc8 00005ba5 |}....D....y..[..|
[00005.649] 01194.01247> 0x00005ba589798be0: 3c0e6dcc 00002a1d 89798dc8 00005ba5 |.m.<.*....y..[..|
[00005.649] 01194.01247> 0x00005ba589798bf0: 1ab7426b 000044f7 1bf9c5d0 000044f7 |kB...D.......D..|
[00005.649] 01194.01247> 0x00005ba589798c00: 89798c78 00005ba5 1b8b9417 000044f7 |x.y..[.......D..|
[00005.649] 01194.01247> 0x00005ba589798c10: efefefef efefefef 1b8e7f3f 000044f7 |........?....D..|
[00005.650] 01194.01247> 0x00005ba589798c20: 00000000 00000000 00000000 00000000 |................|
[00005.650] 01194.01247> 0x00005ba589798c30: 00000000 00000000 00000000 00000000 |................|
[00005.650] 01194.01247> arch: x86_64
[00005.651] 01194.01247> dso: id=2ed9c9cc859c68fc12b4ef7cbc7e8d7d3a614e9d base=0x7ea041422000 name=liblaunchpad.so
[00005.651] 01194.01247> dso: id=d6e9fb1e62bfd459ed2dec036b7ab03b8af647b7 base=0x7be2be175000 name=libc.so
[00005.651] 01194.01247> dso: id=08bb51a867fccef7b32bd0803d15f9780df4969f base=0x79058432c000 name=<vDSO>
[00005.651] 01194.01247> dso: id=849db4a46a2865a4 base=0x78947bae7000 name=libfxl_logging.so
[00005.651] 01194.01247> dso: id=a5c49bbb8fad70ffa18dc0d21afe48a906e654ce base=0x6ec1d5147000 name=libfdio.so
[00005.652] 01194.01247> dso: id=43e40effc277a0f3 base=0x63a67adad000 name=libc++.so.2
[00005.653] 01194.01247> dso: id=4f5dba440e49f4ca base=0x463f0e974000 name=libfsl.so
[00005.653] 01194.01247> dso: id=5fb8214852d0f1d1 base=0x44f71ab05000 name=app:media_unittests
[00005.654] 01194.01247> dso: id=b11d9e4b67c7edf2e8a271c2856fdea3cd820d52 base=0x2b5df8d4a000 name=libtrace-engine.so
[00005.654] 01194.01247> dso: id=309d651ab2f7577f base=0x174d69a6b000 name=libfxl.so
[00005.654] 01194.01247> dso: id=626ed768c4f9e078 base=0xfd568ce2000 name=libunwind.so.1
[00005.654] 01194.01247> dso: id=944d3dc1d4acace1444ae8357308aa0099259115 base=0xa7d6fecf000 name=libasync-default.so
[00005.655] 01194.01247> dso: id=8f7dacc26d4b893d base=0x4ff4b743000 name=libmedia_client.so
[00005.655] 01194.01247> dso: id=f034f421787360bc base=0x2e14ab4e000 name=libc++abi.so.1
#01: media::AliveChecker::NotifyAlive() at ??:?
#02: media::AliveCheckerTest::NotifyAliveMultipleTimes(int, base::TimeDelta) at ??:?
#03: base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) at ??:?
#04: base::MessageLoop::RunTask(base::PendingTask*) at ??:?
#05: base::MessageLoop::DoDelayedWork(base::TimeTicks*) at ??:?
#06: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) at ??:?
#07: base::RunLoop::Run() at ??:?
#08: base::Thread::ThreadMain() at ??:?
#09: base::(anonymous namespace)::ThreadFunc(void*) at ??:?
#10: pc 0x7be2be18d896 sp 0x5ba589798fe0 (libc.so,0x18896)
#11: pc 0x7be2be20464a sp 0x5ba589798ff0 (libc.so,0x8f64a)
#12: pc 0 sp 0x5ba589799000

Looking at the design of the test, it may simply be that when it runs too slowly it risks an Unretained() Bind() to the AliveChecker object, which will be torn-down during test termination, which may occur if the test times-out.
 

Comment 1 by w...@chromium.org, Nov 30 2017

Cc: dalecur...@chromium.org
Cc: grunell@chromium.org olka@chromium.org
Components: Internals>Media>Audio
Not sure what this class is for. +STO folk.
I wrote AliveChecker and its test. It sure seems like it could be a test problem. I can take a closer look next week.

Comment 4 by w...@chromium.org, Dec 2 2017

Owner: grunell@chromium.org
Cc: w...@chromium.org
wez@: is there any way you can confirm it happens at a timeout?

I looks like what happens is that the |alive_checker_thread_| is still running and executes the delayed task NotifyAliveMultipleTimes(), but the |alive_checker_| is gone.

Comment 6 by w...@chromium.org, Dec 7 2017

Re #5: Looking at the test, the  NotifyAliveMultipleTimes posts a fixed number of delayed tasks on the AliveChecker, regardless of whether it has been stopped, or destroyed (see 
https://cs.chromium.org/chromium/src/media/audio/alive_checker_unittest.cc?sq=package:chromium&l=120).

The test will therefore crash if there is enough of a delay between the ResetAliveCheckerOnAliveCheckerThread task being processed, and the thread being torn-down.

Incidentally, we also seem to have NotifyAliveMultipleTimes notifying the AliveChecker the first time on the calling thread, but on subsequent times on the AliveChecker thread. There also seems to be a fair amount of redundant thread-hopping, e.g. the test posts a task to create the AliveChecker on the alive-checker-thread, and then waits for that task to complete; since all the other interaction with the AliveChecker is by tasks posted to the thread, the waiting doesn't seem necessary, and just adds complexity.

Comment 7 by w...@chromium.org, Dec 7 2017

I'm able to repro a crash with this patch:

--- a/media/audio/alive_checker_unittest.cc
+++ b/media/audio/alive_checker_unittest.cc
@@ -86,6 +86,11 @@ class AliveCheckerTest : public testing::Test {
         base::BindOnce(&AliveCheckerTest::ResetAliveCheckerOnAliveCheckerThread,
                        base::Unretained(this), &done));
     done.Wait();
+
+    // Allow the AliveChecker thread to continue running long enough to process
+    // one more notification after it has processed the reset task.
+    base::PlatformThread::Sleep(base::TimeDelta::FromMilliseconds(kNotifyIntervalMs));
+
   }
 
   void CreateAliveChecker(bool stop_at_first_alive_notification,
@@ -121,6 +126,10 @@ class AliveCheckerTest : public testing::Test {
                                 base::TimeDelta delay) {
     alive_checker_->NotifyAlive();
     if (remaining_notifications > 1) {
+
+      // Pause long enough for the AliveChecker to think we're "dead".
+      base::PlatformThread::Sleep(base::TimeDelta::FromMilliseconds(kTimeoutMs));
+
       alive_checker_thread_.task_runner()->PostDelayedTask(
           FROM_HERE,
           base::BindOnce(&AliveCheckerTest::NotifyAliveMultipleTimes,

This introduces a delay after NotifyAlive(), long enough to cause there to be NotifyAliveMultipleTimes tasks pending even after the WaitUntilDetectedDeadWithTimeout deadline (normally all the tasks would run within kNotifyIntervalMs * 10).  We then introduce a delay after the reset task is posted, long enough to allow any in-flight NotifyAliveMultipleTimes task to run - then we see one of those execute, and crash trying to use the already-reset AliveChecker.

You can get the same effect by adding the Sleep() during teardown, but increasing the number of NotifyAliveMultipleTimes() notifications, rather than putting the Sleep in there, since that also causes there to be a delayed task pending that will touch the deleted object if they run.

I'm not sure whether the intention is actually that NotifyAliveMultipleTimes() is a blocking call; that isn't clear from the thread, and since most of the other calls are blocking, it may be that the test author assumed that one is blocking as well, and the design of the test certainly implies that - with non-blocking NotifyAliveMultipleTimes I don't think the test waits for long enough to detect a dead signal, right now.

Incidentally, it would be helpful to add some comments to describe the behaviours the tests are expecting - the current comment describes more or less verbatim what calls the test makes, without explaining _why_ those calls are testing something useful. In the case of NotifyThenStop, IIUC the idea is that after Stop()ing the checker, you should never get a "dead" notification, no matter what.

HTH

Comment 8 by w...@chromium.org, Dec 7 2017

FWIW, my recommendation would be to remove the alive-checker-thread and instead rebuild these tests using a mock clock.
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/165013536a7eee0361dd46be6fe210dcec094d28

commit 165013536a7eee0361dd46be6fe210dcec094d28
Author: Wez <wez@chromium.org>
Date: Wed Dec 13 00:37:49 2017

Disable some AliveCheckerTests, which are inherently flaky.

Bug: 789804
Change-Id: I58f9b55c85efd1e3dc6cca47798aa6e3b2aad1b7
Reviewed-on: https://chromium-review.googlesource.com/822732
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#523624}
[modify] https://crrev.com/165013536a7eee0361dd46be6fe210dcec094d28/media/audio/alive_checker_unittest.cc

Comment 10 by w...@chromium.org, Dec 20 2017

AliveCheckterTest.NotifyThenDetectDead flaked on the Fuchsia (dbg) bot:
https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/15163

This is another test which uses NotifyAliveMultipleTimes(), leading to a crash:
#01: pc 0x7d89e3a74416 sp 0xb731cdfcd80 (libmedia.so,0x253416)
#02: pc 0x7d89e3a73e63 sp 0xb731cdfcef0 (libmedia.so,0x252e63)
#03: media::AliveCheckerTest::NotifyAliveMultipleTimes(int, base::TimeDelta) at ??:?
#04: void base::internal::FunctorTraits<void (media::AliveCheckerTest::*)(int, base::TimeDelta), void>::Invoke<media::AliveCheckerTest*, int, base::TimeDelta>(void (media::AliveCheckerTest::*)(int, base::TimeDelta), media::AliveCheckerTest*&&, int&&, base::TimeDelta&&) at ??:?
#05: void base::internal::InvokeHelper<false, void>::MakeItSo<void (media::AliveCheckerTest::*)(int, base::TimeDelta), media::AliveCheckerTest*, int, base::TimeDelta>(void (media::AliveCheckerTest::*&&)(int, base::TimeDelta), media::AliveCheckerTest*&&, int&&, base::TimeDelta&&) at ??:?
#06: void base::internal::Invoker<base::internal::BindState<void (media::AliveCheckerTest::*)(int, base::TimeDelta), base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, void ()>::RunImpl<void (media::AliveCheckerTest::*)(int, base::TimeDelta), std::__1::tuple<base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, 0ul, 1ul, 2ul>(void (media::AliveCheckerTest::*&&)(int, base::TimeDelta), std::__1::tuple<base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>&&, std::__1::integer_sequence<unsigned long, 0ul, 1ul, 2ul>) at ??:?
#07: base::internal::Invoker<base::internal::BindState<void (media::AliveCheckerTest::*)(int, base::TimeDelta), base::internal::UnretainedWrapper<media::AliveCheckerTest>, int, base::TimeDelta>, void ()>::RunOnce(base::internal::BindStateBase*) at ??:?
...

Project Member

Comment 11 by bugdroid1@chromium.org, Dec 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8a5ef6b01900b3ca19827854fbafd754f3e2e197

commit 8a5ef6b01900b3ca19827854fbafd754f3e2e197
Author: Wez <wez@chromium.org>
Date: Wed Dec 20 21:11:46 2017

Disable some more AliveCheckerTests.

This disables the remaining tests which use the NotifyAliveMultipleTimes
helper function, until they can be rewritten to use e.g. a MOCK_TIME
ScopedTaskEnvironment instance.

Bug: 789804
Cq-Include-Trybots: master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Change-Id: I496fd5c76ebaed8df56a6c8b93225a2d3b0fe493
Reviewed-on: https://chromium-review.googlesource.com/836736
Commit-Queue: Yuri Wiitala <miu@chromium.org>
Reviewed-by: Yuri Wiitala <miu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#525454}
[modify] https://crrev.com/8a5ef6b01900b3ca19827854fbafd754f3e2e197/media/audio/alive_checker_unittest.cc

Labels: -M-65 M-66

Comment 13 by w...@chromium.org, Mar 7 2018

AliveCheckerTest.SuspendResumeWithAutoStop_NotifyBeforeSuspendAndRestart failed on fuchsia/x64/Debug in https://build.chromium.org/p/chromium.fyi/builders/Fuchsia%20%28dbg%29/builds/16935; the test itself failed some expectations:

[00462.339] 04060.04119> [ RUN      ] AliveCheckerTest.SuspendResumeWithAutoStop_NotifyBeforeSuspendAndRestart
[00462.339] 04060.04119> ../../media/audio/alive_checker_unittest.cc:455: Failure
[00462.340] 04060.04119> Value of: GetDetectedDead()
[00462.340] 04060.04119>   Actual: true
[00462.340] 04060.04119> Expected: false
[00462.340] 04060.04119> ../../media/audio/alive_checker_unittest.cc:462: Failure
[00462.340] 04060.04119> Value of: WaitUntilDetectedDeadWithTimeout( base::TimeDelta::FromMilliseconds(kTimeoutMs + kCheckIntervalMs + 10))
[00462.341] 04060.04119>   Actual: true
[00462.341] 04060.04119> Expected: false
[00462.341] 04060.04119> ../../media/audio/alive_checker_unittest.cc:463: Failure
[00462.342] 04060.04119> Value of: GetDetectedDead()
[00462.342] 04060.04119>   Actual: true
[00462.343] 04060.04119> Expected: false

and it then appears to have stalled or taken sufficiently long that it was terminated by the TestLauncher.

Sign in to add a comment