New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 831386 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
OOO Dec 22 - Jan 8
Closed: Jun 2018
EstimatedDays: ----
NextAction: ----
OS: Linux , Android
Pri: 2
Type: Bug


Participants' hotlists:
Audio-Service


Sign in to add a comment

audio service unittest is flaking in services_unittests

Project Member Reported by roc...@chromium.org, Apr 10 2018

Issue description

ServiceQuitsWhenLastClientDisconnects is flaky, at least on Android.

The oldest instance of failure I can find is https://ci.chromium.org/buildbot/tryserver.chromium.android/linux_android_rel_ng/525501 but it may go back further than that.

 

Comment 1 by olka@chromium.org, Apr 11 2018

Thanks!

An earlier failure https://ci.chromium.org/buildbot/tryserver.chromium.android/linux_android_rel_ng/524630

Log
[  FAILED  ] InProcessAudioService/ServiceLifetimeTestTemplate/0.ServiceQuitsWhenLastClientDisconnects, where TypeParam = <type>
 1 FAILED TEST
<<ScopedMainEntryLogger
[ RUN      ] InProcessAudioService/ServiceLifetimeTestTemplate/0.ServiceQuitsWhenLastClientDisconnects
../../services/audio/test/service_lifetime_test_template.h:88: Failure
Mock function called more times than expected - returning directly.
    Function call: ServiceStarted()
         Expected: to be never called
           Actual: called once - over-saturated and active
../../services/audio/test/service_lifetime_test_template.h:86: Failure
Mock function called more times than expected - returning directly.
    Function call: ServiceStopped()
         Expected: to be called once
           Actual: called twice - over-saturated and active
../../services/audio/test/service_lifetime_test_template.h:86: Failure
Mock function called more times than expected - returning directly.
    Function call: ServiceStopped()
         Expected: to be called once
           Actual: called 3 times - over-saturated and active

Comment 3 by olka@chromium.org, Apr 11 2018

I'm able to flakily repro it locally on my Linux desktop when setting service quit timeout to 1 microsecond in the test.

The failing test is this one [1]. I added some traces locally [2], and I see that BindingSet [3] is reserving OnConnectionError() calls, and service context ref counter becomes 0 => service quits.
For some reason this is easier to hit when running a bunch of tests rather than only the failing one.

Example (this one did no fail but ref counting still looks odd):
 RUN      ] InProcessAudioService/ServiceLifetimeTestTemplate/0.ServiceQuitsWhenLastClientDisconnects
[67685:67685:0411/195800.113797:4436064809357:ERROR:service.cc(50)] audio::Service::OnBindInterface: audio::mojom::SystemInfo
[67685:67685:0411/195800.114219:4436064809533:ERROR:service_context_ref.cc(78)] **AddRef
[67685:67685:0411/195800.114930:4436064810244:ERROR:service_observer_mock.cc(25)] Start: Service identity, name:audio user_id:505C0EE9-3013-43C0-82B0-A84F50CF8D84 instance:
[67685:67685:0411/195800.116290:4436064811604:ERROR:service_context_ref.cc(83)] **Release <<<<<<<<<< Why?
[67685:67685:0411/195800.116377:4436064811689:ERROR:service.cc(107)] Starting a timer
[67685:67685:0411/195800.116993:4436064812307:ERROR:service.cc(50)] audio::Service::OnBindInterface: audio::mojom::SystemInfo
[67685:67685:0411/195800.117084:4436064812397:ERROR:service_context_ref.cc(78)] **AddRef
[67685:67685:0411/195800.117797:4436064813111:ERROR:service.cc(50)] audio::Service::OnBindInterface: audio::mojom::SystemInfo
[67685:67685:0411/195800.117883:4436064813195:ERROR:service_context_ref.cc(78)] **AddRef
[67685:67685:0411/195800.118847:4436064814161:ERROR:service_context_ref.cc(83)] **Release
[67685:67685:0411/195800.119317:4436064814631:ERROR:service_context_ref.cc(83)] **Release
[67685:67685:0411/195800.119392:4436064814704:ERROR:service.cc(107)] Starting a timer
[67685:67685:0411/195800.119666:4436064814980:ERROR:service.cc(115)] MaybeRequestQuit
[67685:67685:0411/195800.121714:4436064817029:ERROR:service_observer_mock.cc(35)] Stop: Service identity, name:audio user_id:505C0EE9-3013-43C0-82B0-A84F50CF8D84 instance:




[1] https://cs.chromium.org/chromium/src/services/audio/test/service_lifetime_test_template.h?type=cs&l=73

[2] https://chromium-review.googlesource.com/#/c/chromium/src/+/1007057 

[3] https://cs.chromium.org/chromium/src/services/audio/system_info.h?type=cs&l=59

Comment 4 by olka@chromium.org, Apr 11 2018

Cc: olka@chromium.org
Labels: -Pri-3 OS-Linux Pri-2
Owner: roc...@chromium.org
So we have ServiceContextRefFactory::AddRef() followed by Release(), and then AddRef()/AddRef()/Release()/Release() - which does not make sense to me taking into account how the test looks. I also saw runs where the test fails and the sequence is AddRef()/Release()/AddRef()/Release()/AddRef()/Release().

Below is how backtrace looks like when ref count is decreased.
mojo::SimpleWatcher decides that something is wrong? Is it right?

Ken, could you comment/suggest where to look at?




#0 0x7ff01ed1a23d base::debug::StackTrace::StackTrace()
#1 0x7ff01ed187fc base::debug::StackTrace::StackTrace()
#2 0x7ff01be02a03 service_manager::ServiceContextRefFactory::Release()
#3 0x7ff01be02cc3 service_manager::ServiceContextRefImpl::~ServiceContextRefImpl()
#4 0x7ff01be02e09 service_manager::ServiceContextRefImpl::~ServiceContextRefImpl()
#5 0x0000014d3952 mojo::BindingSetBase<>::Entry::~Entry()
#6 0x0000014d515a mojo::BindingSetBase<>::OnConnectionError()
#7 0x0000014d4a6b mojo::BindingSetBase<>::Entry::OnConnectionError()
#8 0x0000006f1d3f _ZN4base8internal13FunctorTraitsIMN4mojo14BindingSetBaseIN15service_manager5mojom14ServiceFactoryENS2_7BindingIS6_NS2_19RawPtrImplRefTraitsIS6_EEEEvE5EntryEFvjRKNSt3__112basic_stringIcNSD_11char_traitsIcEENSD_9allocatorIcEEEEEvE6InvokeIPSC_JjSL_EEEvSN_OT_DpOT0_
#9 0x0000006f1c6a _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIMN4mojo14BindingSetBaseIN15service_manager5mojom14ServiceFactoryENS4_7BindingIS8_NS4_19RawPtrImplRefTraitsIS8_EEEEvE5EntryEFvjRKNSt3__112basic_stringIcNSF_11char_traitsIcEENSF_9allocatorIcEEEEEJPSE_jSN_EEEvOT_DpOT0_
#10 0x0000006f1be5 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo14BindingSetBaseIN15service_manager5mojom14ServiceFactoryENS3_7BindingIS7_NS3_19RawPtrImplRefTraitsIS7_EEEEvE5EntryEFvjRKNSt3__112basic_stringIcNSE_11char_traitsIcEENSE_9allocatorIcEEEEEJNS0_17UnretainedWrapperISD_EEEEEFvjSM_EE7RunImplISO_NSE_5tupleIJSQ_EEEJLm0EEEEvOT_OT0_NSE_16integer_sequenceImJXspT1_EEEEOjSM_
#11 0x0000006f1b18 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo14BindingSetBaseIN15service_manager5mojom14ServiceFactoryENS3_7BindingIS7_NS3_19RawPtrImplRefTraitsIS7_EEEEvE5EntryEFvjRKNSt3__112basic_stringIcNSE_11char_traitsIcEENSE_9allocatorIcEEEEEJNS0_17UnretainedWrapperISD_EEEEEFvjSM_EE7RunOnceEPNS0_13BindStateBaseEjSM_
#12 0x7ff01f489a5f _ZNO4base12OnceCallbackIFvjRKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEEE3RunEjS9_
#13 0x7ff01f488b65 mojo::InterfaceEndpointClient::NotifyError()
#14 0x7ff01f495341 mojo::internal::MultiplexRouter::ProcessNotifyErrorTask()
#15 0x7ff01f4926fd mojo::internal::MultiplexRouter::ProcessTasks()
#16 0x7ff01f490e08 mojo::internal::MultiplexRouter::OnPipeConnectionError()
#17 0x7ff01f4820cd _ZN4base8internal13FunctorTraitsIMN4mojo8internal19ControlMessageProxyEFvvEvE6InvokeIPS4_JEEEvS6_OT_DpOT0_
#18 0x7ff01f482044 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN4mojo8internal19ControlMessageProxyEFvvEJPS6_EEEvOT_DpOT0_
#19 0x7ff01f481ff5 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo8internal19ControlMessageProxyEFvvEJNS0_17UnretainedWrapperIS5_EEEEEFvvEE7RunImplIRKS7_RKNSt3__15tupleIJS9_EEEJLm0EEEEvOT_OT0_NSG_16integer_sequenceImJXspT1_EEEE
#20 0x7ff01f481f8c _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo8internal19ControlMessageProxyEFvvEJNS0_17UnretainedWrapperIS5_EEEEEFvvEE3RunEPNS0_13BindStateBaseE
#21 0x7ff01f476cee _ZNO4base12OnceCallbackIFvvEE3RunEv
#22 0x7ff01f4742d3 mojo::Connector::HandleError()
#23 0x7ff01f4758c4 mojo::Connector::OnHandleReadyInternal()
#24 0x7ff01f4757bb mojo::Connector::OnWatcherHandleReady()
#25 0x7ff01f47917f _ZN4base8internal13FunctorTraitsIMN4mojo9ConnectorEFvjEvE6InvokeIPS3_JjEEEvS5_OT_DpOT0_
#26 0x7ff01f4790df _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKMN4mojo9ConnectorEFvjEJPS5_jEEEvOT_DpOT0_
#27 0x7ff01f479075 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo9ConnectorEFvjEJNS0_17UnretainedWrapperIS4_EEEEEFvjEE7RunImplIRKS6_RKNSt3__15tupleIJS8_EEEJLm0EEEEvOT_OT0_NSF_16integer_sequenceImJXspT1_EEEEOj
#28 0x7ff01f478fab _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo9ConnectorEFvjEJNS0_17UnretainedWrapperIS4_EEEEEFvjEE3RunEPNS0_13BindStateBaseEj
#29 0x7ff01f47009e _ZNKR4base17RepeatingCallbackIFvjEE3RunEj
#30 0x7ff01f4781ff mojo::SimpleWatcher::DiscardReadyState()
#31 0x7ff01f478474 _ZN4base8internal13FunctorTraitsIPFvRKNS_17RepeatingCallbackIFvjEEEjRKN4mojo18HandleSignalsStateEEvE6InvokeIJS6_jSA_EEEvSC_DpOT_
#32 0x7ff01f478420 _ZN4base8internal12InvokeHelperILb0EvE8MakeItSoIRKPFvRKNS_17RepeatingCallbackIFvjEEEjRKN4mojo18HandleSignalsStateEEJS8_jSC_EEEvOT_DpOT0_
#33 0x7ff01f4783c0 _ZN4base8internal7InvokerINS0_9BindStateIPFvRKNS_17RepeatingCallbackIFvjEEEjRKN4mojo18HandleSignalsStateEEJS5_EEEFvjSB_EE7RunImplIRKSD_RKNSt3__15tupleIJS5_EEEJLm0EEEEvOT_OT0_NSK_16integer_sequenceImJXspT1_EEEEOjSB_
#34 0x7ff01f4782f6 _ZN4base8internal7InvokerINS0_9BindStateIPFvRKNS_17RepeatingCallbackIFvjEEEjRKN4mojo18HandleSignalsStateEEJS5_EEEFvjSB_EE3RunEPNS0_13BindStateBaseEjSB_
#35 0x7ff01f40529e _ZNKR4base17RepeatingCallbackIFvjRKN4mojo18HandleSignalsStateEEE3RunEjS4_
#36 0x7ff01f404be9 mojo::SimpleWatcher::OnHandleReady()
#37 0x7ff01f405a93 _ZN4base8internal13FunctorTraitsIMN4mojo13SimpleWatcherEFvijRKNS2_18HandleSignalsStateEEvE6InvokeIRKNS_7WeakPtrIS3_EEJRKiRKjS6_EEEvS8_OT_DpOT0_
#38 0x7ff01f4059d5 _ZN4base8internal12InvokeHelperILb1EvE8MakeItSoIRKMN4mojo13SimpleWatcherEFvijRKNS4_18HandleSignalsStateEERKNS_7WeakPtrIS5_EEJRKiRKjS8_EEEvOT_OT0_DpOT1_
#39 0x7ff01f405932 _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE7RunImplIRKS9_RKNSt3__15tupleIJSB_ijS5_EEEJLm0ELm1ELm2ELm3EEEEvOT_OT0_NSI_16integer_sequenceImJXspT1_EEEE
#40 0x7ff01f4057ac _ZN4base8internal7InvokerINS0_9BindStateIMN4mojo13SimpleWatcherEFvijRKNS3_18HandleSignalsStateEEJNS_7WeakPtrIS4_EEijS5_EEEFvvEE3RunEPNS0_13BindStateBaseE
#41 0x7ff01ecc71de _ZNO4base12OnceCallbackIFvvEE3RunEv
#42 0x7ff01ed1e3f2 base::debug::TaskAnnotator::RunTask()
#43 0x7ff01edc1bc9 base::internal::IncomingTaskQueue::RunTask()
#44 0x7ff01edcb007 base::MessageLoop::RunTask()
#45 0x7ff01edcb278 base::MessageLoop::DeferOrRunPendingTask()
#46 0x7ff01edcb5a9 base::MessageLoop::DoWork()
#47 0x7ff01edce226 base::MessagePumpGlib::Run()
#48 0x7ff01edca7dc base::MessageLoop::Run()
#49 0x7ff01ee82c3d base::RunLoop::Run()
#50 0x000000a601bf audio::gtest_case_ServiceLifetimeTestTemplate_::ServiceQuitsWhenLastClientDisconnects<>::TestBody()
#51 0x000000a5b9de _ZN7testing8internal12InvokeHelperIvNSt3__15tupleIJEEEE12InvokeMethodIN4base7RunLoopEMS8_FvvEEEvPT_T0_RKS4_
#52 0x0000011c74e2 testing::internal::HandleExceptionsInMethodIfSupported<>()
#53 0x0000011aaa46 testing::Test::Run()
#54 0x0000011ab470 testing::TestInfo::Run()
#55 0x0000011abf1f testing::TestCase::Run()
#56 0x0000011be2d8 testing::internal::UnitTestImpl::RunAllTests()
#57 0x0000011d223e testing::internal::HandleSehExceptionsInMethodIfSupported<>()
#58 0x0000011c8c72 testing::internal::HandleExceptionsInMethodIfSupported<>()
#59 0x0000011bdf37 testing::UnitTest::Run()
#60 0x000001cf3491 RUN_ALL_TESTS()
#61 0x000001cefe5b base::TestSuite::Run()

Comment 5 by roc...@chromium.org, Apr 11 2018

SimpleWatcher is probably not wrong. That likely means the remote end of the interface has been torn down.

I'm not sure what the problem is here, but it seems specific to the audio service. We don't observe similar problems elsewhere.

If you're saying you see the first AddRef() followed immediately by a Release(), this means the ref-count goes to zero and the ref factory is going to invoke its callback. It should not be possible for more AddRef() calls to be made after this, which implies something is amiss with how the service is managing refs.

Comment 6 by olka@chromium.org, Apr 11 2018

>> SimpleWatcher is probably not wrong. That likely means the remote end of the interface has been torn down.

But the remote end has not been torn down, according to the test?

>> I'm not sure what the problem is here, but it seems specific to the audio service. We don't observe similar problems elsewhere.

Most likely, but not necessarily: the test uses a very short timeout which may be revealing some race.

>> If you're saying you see the first AddRef() followed immediately by a Release(), this means the ref-count goes to zero and the ref factory is going to invoke its callback. It should not be possible for more AddRef() calls to be made after this, which implies something is amiss with how the service is managing refs.

Could you clarify why no more AddRef() calls is possible? Ref factory callback in our case just turns on a timer[1].
Ref factory callback is repeating, it is called each time ref count reaches zero [2].

[1] https://cs.chromium.org/chromium/src/services/audio/service.cc?l=101
[2] https://cs.chromium.org/chromium/src/services/service_manager/public/cpp/service_context_ref.cc?sq=package:chromium&l=80

Comment 7 by olka@chromium.org, Apr 12 2018

Cc: -olka@chromium.org
Owner: olka@chromium.org

Comment 8 by roc...@chromium.org, Apr 12 2018

FYI I found a relevant service manager bug which I'm fixing, but it only
fixes one failure mode where a duplicate OnServiceStopped notification is
dispatched.

The test is still flaky with that fix because it's also possible for the
service to detect info disconnecting and call RequestQuit (and have the
request granted) all before the service manager even processes the requests
for info2 and info3. I haven't found a convincing explanation for why
that's the case yet, but it appears to be the case.

I would expect RequestQuit in that case to be scheduled on the service
manager thread after the other two BindInterface calls.

In any case, the root of that issue does seem to be one of invalid ordering
assumptions in the test. There are no real guarantees that the above
expectations should be met, so we should find a way to restructure the
test. A good option would be to introduce a third runloop to drive
disconnection instead of doing it in the second loop.

Comment 9 by olka@chromium.org, Apr 12 2018

Yes, thanks - I think I found the problem: the service manager manager runs on a thread other than the main thread of the test, so despite the fact that both the service and the client are on the same thread, operations ordering is messed up because of the thread hop. I'm fixing the test now.

Comment 10 by olka@chromium.org, Apr 12 2018

Runloops do not help because of a thread hop, so I'm going to call into an interface and wait for its response to make sure AddRef is dispatched before we proceed to disconnecting from the service.
Note that you can use FlushForTesting on any InterfacePtr to synchronize
against end-to-end binding being established.

Comment 13 by olka@chromium.org, Apr 12 2018

Ah great, then I'll change my CL. Thanks!
Project Member

Comment 14 by bugdroid1@chromium.org, Apr 12 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/74d30d0b031c2a5d01bafcfa6421ef3b4b6ae148

commit 74d30d0b031c2a5d01bafcfa6421ef3b4b6ae148
Author: Ken Rockot <rockot@chromium.org>
Date: Thu Apr 12 14:03:19 2018

Avoid duplicate instance stopping at Service Manager shutdown

Instances are explicitly stopped during Service Manager shutdown, but
then "stopped" again during individual Instance destruction. That
results in listeners being notified twice about each instance being
stopped if they aren't stopped before Service Manager shutdown.

This CL fixes that.

Bug:  831386 
Change-Id: I210c9ce07cfa386d39bba7aa42244e00519b6f81
Reviewed-on: https://chromium-review.googlesource.com/1008968
Reviewed-by: Jay Civelli <jcivelli@chromium.org>
Commit-Queue: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#550179}
[modify] https://crrev.com/74d30d0b031c2a5d01bafcfa6421ef3b4b6ae148/services/service_manager/service_manager.cc

Project Member

Comment 15 by bugdroid1@chromium.org, Apr 18 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d1f25a6b0977ff18e897228982f9f2997fc374af

commit d1f25a6b0977ff18e897228982f9f2997fc374af
Author: Olga Sharonova <olka@chromium.org>
Date: Wed Apr 18 11:21:32 2018

Audio service lifetime testing: wait synchronously for AddRef()

Synchronously wait while AddRef() of the service context is called.
(There is a thread hop through the service manager thread there).

Bug:  831386 
Change-Id: I848652f647a106c8f9a1dfcd847de56819aded41
Reviewed-on: https://chromium-review.googlesource.com/1010202
Commit-Queue: Olga Sharonova <olka@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#551635}
[modify] https://crrev.com/d1f25a6b0977ff18e897228982f9f2997fc374af/services/audio/owning_audio_manager_accessor.h
[modify] https://crrev.com/d1f25a6b0977ff18e897228982f9f2997fc374af/services/audio/test/service_lifetime_test_template.h

Comment 16 by olka@chromium.org, Apr 19 2018

Status: Started (was: Assigned)

Comment 17 by olka@chromium.org, Jun 7 2018

Status: Fixed (was: Started)

Sign in to add a comment