New issue
Advanced search Search tips

Issue 794105 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

DCHECK in MessageWindow::WindowClass::~WindowClass fails due to BatteryStatusService not shutdown

Reported by andr...@nvidia.com, Dec 12 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36

Steps to reproduce the problem:
1. start content_shell.exe https://www.youtube.com/embed/OXM8rXDzSp0?autoplay=1
2. wait for youtube to start loading video (but not yet started to play)
3. close content_shell app -> crash 

What is the expected behavior?
no crash

What went wrong?
base.dll!base::debug::BreakDebugger() Line 21 C++ Symbols loaded.
  base.dll!logging::LogMessage::~LogMessage() Line 847 C++ Symbols loaded.
  base.dll!base::win::MessageWindow::WindowClass::~WindowClass() Line 69 C++ Symbols loaded.
  [External Code] Annotated Frame
  base.dll!base::LazyInstanceTraitsBase<base::win::MessageWindow::WindowClass>::CallDestructor(base::win::MessageWindow::WindowClass * instance) Line 64 C++ Symbols loaded.
  base.dll!base::internal::DestructorAtExitLazyInstanceTraits<base::win::MessageWindow::WindowClass>::Delete(base::win::MessageWindow::WindowClass * instance) Line 87 C++ Symbols loaded.
  base.dll!base::LazyInstance<base::win::MessageWindow::WindowClass,base::internal::DestructorAtExitLazyInstanceTraits<base::win::MessageWindow::WindowClass> >::OnExit(void * lazy_instance) Line 242 C++ Symbols loaded.
  base.dll!base::internal::FunctorTraits<void (__cdecl*)(void * __ptr64),void>::Invoke<void * __ptr64 const & __ptr64>(void(*)(void *) function, void * const & <args_0>) Line 150 C++ Symbols loaded.
  base.dll!base::internal::InvokeHelper<0,void>::MakeItSo<void (__cdecl*const & __ptr64)(void * __ptr64),void * __ptr64 const & __ptr64>(void(*)(void *) & functor, void * const & <args_0>) Line 279 C++ Symbols loaded.
  base.dll!base::internal::Invoker<base::internal::BindState<void (__cdecl*)(void * __ptr64),void * __ptr64>,void __cdecl(void)>::RunImpl<void (__cdecl*const & __ptr64)(void * __ptr64),std::tuple<void * __ptr64> const & __ptr64,0>(void(*)(void *) & functor, const std::tuple<void *> & bound, std::integer_sequence<unsigned __int64,0> __formal) Line 355 C++ Symbols loaded.
  base.dll!base::internal::Invoker<base::internal::BindState<void (__cdecl*)(void * __ptr64),void * __ptr64>,void __cdecl(void)>::Run(base::internal::BindStateBase * base) Line 336 C++ Symbols loaded.
  base.dll!base::RepeatingCallback<void __cdecl(void)>::Run() Line 95 C++ Symbols loaded.
  base.dll!base::AtExitManager::ProcessCallbacksNow() Line 91 C++ Symbols loaded.
  base.dll!base::AtExitManager::~AtExitManager() Line 46 C++ Symbols loaded.
  [External Code] Annotated Frame
  content.dll!std::default_delete<base::AtExitManager>::operator()(base::AtExitManager * _Ptr) Line 1999 C++ Symbols loaded.
  content.dll!std::unique_ptr<base::AtExitManager,std::default_delete<base::AtExitManager> >::reset(base::AtExitManager * _Ptr) Line 2239 C++ Symbols loaded.
  content.dll!content::ContentMainRunnerImpl::Shutdown() Line 729 C++ Symbols loaded.
  content.dll!content::ContentServiceManagerMainDelegate::ShutDownEmbedderProcess() Line 58 C++ Symbols loaded.
  embedder.dll!service_manager::Main(const service_manager::MainParams & params) Line 478 C++ Symbols loaded.
  content.dll!content::ContentMain(const content::ContentMainParams & params) Line 19 C++ Symbols loaded.
  content_shell.exe!wWinMain(HINSTANCE__ * instance, HINSTANCE__ * __formal, wchar_t * __formal, int __formal) Line 33 C++ Symbols loaded

Did this work before? N/A 

Chrome version: 62.0.3202.94  Channel: n/a
OS Version: 10.0
Flash Version: 

git commit id 588a5825421e715b6047e3b30a903d22252a06dd "Allow WebStateObserver to observe N WebStates [47/N]".

While exiting content_shell, mojo services release procedure partially happens on IO thread while it is alive. But BatteryStatusService which requires operation on UI thread during it's release may not have time to be released as there is no guaranty that IO thread completed it's part of release procedure before being killed. Due to this it is possible to leak MessageWindow associated to the BatteryStatusService which makes that DCHECK to fail.
Some other mojo services may also leak silently (for ex. device::DeviceService::~DeviceService is never called) but this is not detected due to there are no dchecks.
 
Cc: blundell@chromium.org
Components: Internals>Services>Device
Summary: DCHECK in MessageWindow::WindowClass::~WindowClass fails due to BatteryStatusService not shutdown (was: DCHECK in MessageWindow::WindowClass::~WindowClass fails)
cc blundell@: (as per git blame on device::BatteryStatusService::GetInstance()->Shutdown() which does not get called)

Have you seen such a problem, is it a problem in Chromium perspective? (DCHECK on shutdown)

We have a usecase where it is a bit of awkward and it's a bit hard to solve in isolation.

Feel free to point us to somebody else, if you know more appropriate person.


Labels: Needs-Triage-M62
So trying to summarize with other words:

The BatteryStatusService seems to be expected to be created and shutdown in UI thread.

BatteryStatusService "shutdown" normally relies on counting the clients. When client count hits 0, the platform window that is used to obtain the battery status is deallocated.

BatteryStatusService abnormal shutdown relies on mojo connection close error, which should destroy the DeviceService object. The ~DeviceService should call BatteryStatusService::Shutdown().

However, in the abnormal shutdown is initiated by the process of shutting down the UI message loop. Thus all sequences that schedule work to IO, which then schedule work on UI are flawed, because by the time IO schedules work to UI, the shutdown process might have already progressed to destroy the UI message loop. This work is then just discarded.

Would it be possible to somehow run the service in IO, and maybe respond to connection errors so that the client count would be maintained correctly and window be destroyed this way?


Hi,

Thanks for providing such detailed feedback here! Am I understanding correctly that the root problem is that the DeviceService destructor is not invoked during shutdown?

Comment 5 by andr...@nvidia.com, Dec 13 2017

I think, answer to your question is yes.

Though I am no sure what is correct/designed way to shutdown all device services.

Another way to solve the crash could be just to dereference all device service separately and then they will destroy themselves. ~DeviceService may not be called in this case. This is what is happening in chrome and content_shell normally, except this crash case.
Cc: -blundell@chromium.org
Owner: blundell@chromium.org
Status: Started (was: Unconfirmed)
Gotcha, I'll look into reproing first of all. Thanks!
Cc: roc...@chromium.org
Hi,

Unfortunately I can't repro this behavior on Linux, and I don't have a Windows machine readily available to debug on.

+rockot@

Ken, is it a known issues that services embedded by //content might not have their destructors called on shutdown? (cf. c#3 above)?

The assert in question, base.dll!base::win::MessageWindow::WindowClass::~WindowClass() Line 69  is Windows specific.
Apologies, what I meant is that I wasn't able to repro the Device Service destructor not running.
re #7 - Realistically the only guarantees we can make are (probably, I think) that ~Service is invoked if the service runs on the IO or main thread.

At shutdown we destroy all EmbeddedServiceRunner instances, each of which in turn calls ShutDown() on corresponding EmbeddedInstanceManager[1]. Whether or not the Service dtor is invoked depends on what happens from there, i.e. if/when QuitOnServiceSequence actually runs.

[1] https://cs.chromium.org/chromium/src/services/service_manager/embedder/embedded_instance_manager.cc?rcl=766d48f58f34794918d8c547dc9a9cf957e3ef04&l=52
Hey Ken,

Thanks for the reply! So as pointed in c#4 above, ~Service being invoked if the service runs on the main thread isn't actually guaranteed here. I was able to repro this behavior on Linux, and what I see is the following:
- EmbeddedInstanceManager instance associated with the Device Service posts a task to invoke QuitOnServiceSequence().
- content::BrowserMainRunner::Shutdown() runs, shutting down the main loop.

So it looks like only services that live on the IO thread are guaranteed to have their destructors run.

What would you say is the expected behavior here? Should it just be a known expectation that service instances might leak at shutdown if they run on threads other than the IO thread? This would then transitively mean that all objects that they own would need to be changed as relevant to be leaky (e.g., base::MessageWindow::WindowClass as referenced in the OP of this bug).
I see.

It does seem like a bad DCHECK to have if any MessageWindow owner can leak at shutdown. Given that we already have leak detection through LSAN I'm inclined to say we should remove the DCHECK.

Alternatively we could encourage use of base::TaskShutdownBehavior::BLOCK_SHUTDOWN for Service task runners, but I'm not sure we want to force shutdown to block in most cases. Like most async work, it should probably just be ok to drop Service destruction on shutdown.
I agree that if we can avoid introducing more blocking on shutdown, that would be preferable. I'll start off by asking about this specific issue on chromium-dev@ to see if any Windows experts chime in on why this would be a bad idea.

If we go this route, I'll add a big caveat somewhere about service destructors likely not running on shutdown when the service is running on a thread other than the IO thread.
Issue 782500 has been merged into this issue.
Apologies for the long radio silence here.

We discussed this problem on this chromium-dev@ thread: https://www.google.com/url?q=https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CAKQnr0RbzL7kmN%253DszoRDzXBbCDD2nUj4s5DRTsEPi-YP48fVNg%2540mail.gmail.com?utm_medium%3Demail%26utm_source%3Dfooter&source=gmail&ust=1527780090645000&usg=AFQjCNEBYrVQwwUnXRO1F9Qsia8AgztKlw and developed an approach on this services-dev@ thread:
https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/services-dev/orJmSM0Bb-E/_jIEF210DwAJ. 

The relevant summary is the following:

Removing this DCHECK (or making the WindowClass singleton leaky) is undesirable, as the DCHECK serves as a desired protection that MessageWindow objects aren't leaking.

Hence, we want to provide a mechanism to notify embedded services on shutdown. rockot's enthusiastic suggestion is the following:

"If you wanted to stuff a flag in EmbedderServiceInfo and hack ServiceManagerContext to do something with it (main thread only I assume), I wouldn't fight it. I would however probably try to limit the shutdown work to only tearing down the necessary (DCHECKing) bits rather than ensuring that the whole service is necessarily cleaned up."

I'll try to put together that CL over the next week or so.
After discussion with Ken, we're going to explore whether it's possible to hand the Device Service a TaskRunner that blocks shutdown but continues to run tasks on the UI thread as the Device Service does today. This would then avoid the need for introducing more coupling between //content and service implementations, which is anti the long-term direction that we want to go in. Will update this thread when we have a resolution and are starting concrete development on either this path or the one in c#16.
An update: The option described in c#17 is not viable in the short term (it will likely become viable in the Q3-Q4 timeframe thanks to planned TaskScheduler development, but that obviously doesn't match the timeframe here). Hence, going to take the approach described in c#16.
andreyf@nvidia.com / kkinnunen@nvidia.com: I have a tentative fix out at https://chromium-review.googlesource.com/c/chromium/src/+/1099245. Would you be able to patch it in and verify whether it eliminates your crash? As I stated upthread, I don't have a Windows machine to repro the crash (although I did verify that the shutdown method I added is invoked when shutting down Chrome on Linux, whereas the Device Service destructor is not). 
Project Member

Comment 20 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8e9dbcd3188c015f2dd67ba781168431aa51dab5

commit 8e9dbcd3188c015f2dd67ba781168431aa51dab5
Author: Colin Blundell <blundell@chromium.org>
Date: Thu Jun 14 14:47:43 2018

Have Device Service observe MessageLoop destruction

The Device Service needs to clean up parts of its internal state as part
of browser shutdown. However, it also needs to run on the UI thread,
and embedded services that run on the UI thread are not guaranteed to
be destroyed as part of browser shutdown (tasks to destroy these
services are posted from the IO thread by
ServiceManagerConnectionImpl::ShutDownOnIOThread, but the UI thread is
typically shut down before these posted tasks are run).

To solve this issue we discussed adding plumbing wherein embedded
services could inform //content that they wanted to be notified when
shutdown was occurring on the main thread. However, on investigation
this plumbing would be painful to implement: it is only
EmbeddedInstanceManager that has direct information of these service
instances, and that object lives far away from //content's
ServiceManagerContext, the object that knows when shutdown is occurring
on the main thread.

This CL takes an alternative approach of having the Device Service
implementation observe the destruction of its MessageLoop. I have
verified that the observation is triggered on shutdown of Chrome.

Bug: 794105
Change-Id: I3b383871679d42f544812be4bcb13c872cf276ff
Reviewed-on: https://chromium-review.googlesource.com/1099245
Commit-Queue: Colin Blundell <blundell@chromium.org>
Reviewed-by: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#567260}
[modify] https://crrev.com/8e9dbcd3188c015f2dd67ba781168431aa51dab5/services/device/device_service.cc
[modify] https://crrev.com/8e9dbcd3188c015f2dd67ba781168431aa51dab5/services/device/device_service.h

Comment 21 by andr...@nvidia.com, Jun 14 2018

blundell@chromium.org: I applied your patch. It works. 
I couldn't repro the crash anymore manually. At least before the patch it was easy to repro.
Thank you
Project Member

Comment 22 by bugdroid1@chromium.org, Jul 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0720b0228d5da8605132b93c3c5c989e5d3b8db1

commit 0720b0228d5da8605132b93c3c5c989e5d3b8db1
Author: Colin Blundell <blundell@chromium.org>
Date: Thu Jul 12 16:53:32 2018

Revert "Have Device Service observe MessageLoop destruction"

This reverts commit 8e9dbcd3188c015f2dd67ba781168431aa51dab5.

Reason for revert: It turns out that this is not the right solution to the problem. crbug.com/856771 presents a crash that this CL introduces on ChromeOS: the Device Service instance is now shut down after the DBusThreadManager global instance, on which it implicitly depends. We'll have to be more precise about the timing of when the Device Service instance is destroyed, which will mean doing the plumbing that the below CL was looking to avoid. This is all sad-making, but c'est la vie.

Bug: 856771, 794105

Original change's description:
> Have Device Service observe MessageLoop destruction
>
> The Device Service needs to clean up parts of its internal state as part
> of browser shutdown. However, it also needs to run on the UI thread,
> and embedded services that run on the UI thread are not guaranteed to
> be destroyed as part of browser shutdown (tasks to destroy these
> services are posted from the IO thread by
> ServiceManagerConnectionImpl::ShutDownOnIOThread, but the UI thread is
> typically shut down before these posted tasks are run).
>
> To solve this issue we discussed adding plumbing wherein embedded
> services could inform //content that they wanted to be notified when
> shutdown was occurring on the main thread. However, on investigation
> this plumbing would be painful to implement: it is only
> EmbeddedInstanceManager that has direct information of these service
> instances, and that object lives far away from //content's
> ServiceManagerContext, the object that knows when shutdown is occurring
> on the main thread.
>
> This CL takes an alternative approach of having the Device Service
> implementation observe the destruction of its MessageLoop. I have
> verified that the observation is triggered on shutdown of Chrome.
>
> Bug: 794105
> Change-Id: I3b383871679d42f544812be4bcb13c872cf276ff
> Reviewed-on: https://chromium-review.googlesource.com/1099245
> Commit-Queue: Colin Blundell <blundell@chromium.org>
> Reviewed-by: Ken Rockot <rockot@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#567260}

TBR=rockot@chromium.org,blundell@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 794105
Change-Id: I6de862f5828560ea903855898e09322e9f8d1e6c
Reviewed-on: https://chromium-review.googlesource.com/1134887
Commit-Queue: Colin Blundell <blundell@chromium.org>
Reviewed-by: Colin Blundell <blundell@chromium.org>
Cr-Commit-Position: refs/heads/master@{#574603}
[modify] https://crrev.com/0720b0228d5da8605132b93c3c5c989e5d3b8db1/services/device/device_service.cc
[modify] https://crrev.com/0720b0228d5da8605132b93c3c5c989e5d3b8db1/services/device/device_service.h

That patch wasn't correct for the reasons specified in the revert above, so this bug is now present again on trunk. I will develop an alternative solution.
Cc: blundell@chromium.org
Owner: ----
Status: Available (was: Started)
Ken and I discussed offline and came up with a proposal that strikes us as the "least bad" approach:

- ServiceManagerConnectionImpl creates and owns a thin object that wraps a base::Closure
- It passes a pointer to this object through to EmbeddedServiceRunner, which passes it on to EmbeddedServiceInstance so that the latter can set the closure on the service task runner (i.e., the UI thread) the first time that it creates a service instance. The latter sets this closure to a method that allows it to call back the actual embedded service instance so that the latter can take any necessary shutdown action.
- ServiceManagerConnectionImpl just runs through invoking all the non-empty   
  closures on shutdown

Now that I look at this plan again, it seems to me that there might be some subtleties with threading, as EmbeddedServiceRunner operates entirely on the IO thread IIRC. Locking could be used if needed, as there won't be performance criticality here.

I won't be able to get to this before September; if someone else wants to take a stab at it in the meantime, please feel free (and update the bug status to reflect that).

Cc: -roc...@chromium.org rockot@google.com

Sign in to add a comment