New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 894369 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Oct 12
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Heap profiling service may fail to trace new created process

Project Member Reported by chiahungduan@google.com, Oct 11

Issue description

Because of lacking synchronization, the heap profiling service can't always get ack from newly created client process. As a result, we won't track some processes if we don't manually add them back.

In cast devices, they don't have the easy way to access the UI interface(e.g. chrome://memory-internals) so it depends on the trace-when-process-created function.


 
> Because of lacking synchronization, the heap profiling service can't always get ack from newly created client process.

Can you clarify? Never receiving an ack is different from a delayed ack.
Sorry I didn't state the problem clearly. This happens when we specify "--memlog=all" or "--memlog=all-renderers" and start the browser.

AFAIK, The browser process will notify heap profiling service that there's a new process, then heap profiling service will try to bridge the pipe between itself and the client process.

At here, heap profiling service starts listening the pipe. 

https://cs.chromium.org/chromium/src/components/services/heap_profiling/connection_manager.cc?gsn=StartReadingOnIOThread&g=0&l=173

and then heap profiling service requests the client process for sending data.

https://cs.chromium.org/chromium/src/components/services/heap_profiling/connection_manager.cc?rcl=042ad90ac5334bd95efe24e495bdf3274930305e&l=176

A client process will bind the profiling client interface here,

https://cs.chromium.org/chromium/src/chrome/common/chrome_content_client.cc?type=cs&g=0&l=755

If the request from heap profiling service is earlier than client process binds the interface, then the service manager can't find the related interface binding then the pipe will be closed.

https://cs.chromium.org/chromium/src/components/services/heap_profiling/receiver_pipe_posix.cc?q=receiver_pipe_posix.cc&dr&l=48

Then the heap profiling service stops tracing this client process. 

Thus I guess we need to synchronize this process, is that correct?

BTW, If we turn the "memlog=all" flags on in chrome://flags, restart the browser, then sometimes we can see not all processes are marked as traced in chrome://memory-internals(i.e. the "Report" icon) 



Cc: roc...@chromium.org
Thanks for investigating this! 

> If the request from heap profiling service is earlier than client process binds the interface, then the service manager can't find the related interface binding then the pipe will be closed.
> https://cs.chromium.org/chromium/src/components/services/heap_profiling/receiver_pipe_posix.cc?q=receiver_pipe_posix.cc&dr&l=48

+ rockot -- this doesn't match my expectations for Service Manager -- can you confirm that this is WAI? I would expect queued up Bindings to be resolved after the Client is registered.
Well just to be clear, this wouldn't be a service manager issue, but a content child process management issue. That is, if it's an issue at all. It's not clear to me that the provided explanation is accurate. Here's what I'm seeing:

The service sends the ProfilingClient request from here: https://cs.chromium.org/chromium/src/components/heap_profiling/client_connection_manager.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=38 (or the place just below).

This means a call to either content::BrowserChildProcessHostImpl::BindInterface (https://cs.chromium.org/chromium/src/content/browser/browser_child_process_host_impl.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=348) or content::RenderProcessHostImpl::BindInterface (https://cs.chromium.org/chromium/src/content/browser/renderer_host/render_process_host_impl.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=2393)

Conveniently, both things end up using content::ChildConnection::BindInterface (https://cs.chromium.org/chromium/src/content/common/service_manager/child_connection.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=42).

This will ultimately send an IPC to the Service Manager, via the ChildConnection's Connector interface, to ask SM to forward the ProfilingClient request to the appropriate service instance.

What is being claimed, AFAICT, is that the Service Manager doesn't yet know the identity of the child process (that is, child_identity_, as passed here: https://cs.chromium.org/chromium/src/content/common/service_manager/child_connection.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=61). But that's impossible unless the process has already died, because the way SM learns about child_identity_ is from a message which has already been sent on the same Connector. (here, in fact: https://cs.chromium.org/chromium/src/content/common/service_manager/child_connection.cc?rcl=9846f362482e0d0f9895cd2828fac269415b5c09&l=87)

So the only way the SM could get the ProfilingClient interface request and *not* know where to route it, would be if the child process in question already died and thus disconnected from SM.

=====

NOW, after looking at child process code, I think I see the obvious problem here.

https://cs.chromium.org/chromium/src/content/child/child_thread_impl.cc?rcl=c7542cefcc5c4dc18e14e4b10bcfe84396d4819a&l=746

This code starts the Service Manager connection, meaning the child process can start receiving incoming interface requests on the IO thread; and THEN it calls ContentClient::OnServiceManagerConnected.

Chrome's implementation thereof goes on to call AddConnectionFilter: https://cs.chromium.org/chromium/src/chrome/common/chrome_content_client.cc?rcl=c7542cefcc5c4dc18e14e4b10bcfe84396d4819a&l=765

and the point of ConnectionFilter is to give the main thread an easy way to inject interface binders into the IO thread handler.

Alas, the IO thread can receive the ProfilingClient request before the code above has run.

The very good news is that there's no reason we can't just swap the two lines in ChildThreadImpl::StartServiceManagerConnection() so the ContentClient can register interfaces *before* we start the connection. That should fix this bug.
Awesome. Thanks for the detailed debugging. Do you intend to throw up a CL or do you want me to do so?
Already on it :)
Uplaoded https://chromium-review.googlesource.com/c/chromium/src/+/1279172 to fix.

Thanks again chiahungduan@ for doing the heavy lifting investigating this!
Project Member

Comment 8 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/efbd9084b65903be9d68f63b689a5572dea710ed

commit efbd9084b65903be9d68f63b689a5572dea710ed
Author: Ken Rockot <rockot@chromium.org>
Date: Fri Oct 12 22:56:30 2018

[content] Fix ConnectionFilter registration

Fixes the ordering between ContentClient::OnServiceManagerConnected()
and ServiceManagerConnection::Start() in ChildThreadImpl and
UtilityThreadIMpl, ensuring embedders can register their own ConnectionFilters
prior to incoming interface requests being dispatched on the IO thread.

Bug:  894369 
Change-Id: Ibaa9290904f1c5004b53111aec687385e4e80a9d
Reviewed-on: https://chromium-review.googlesource.com/c/1279172
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Reviewed-by: Erik Chen <erikchen@chromium.org>
Commit-Queue: Ken Rockot <rockot@chromium.org>
Cr-Commit-Position: refs/heads/master@{#599393}
[modify] https://crrev.com/efbd9084b65903be9d68f63b689a5572dea710ed/content/child/child_thread_impl.cc
[modify] https://crrev.com/efbd9084b65903be9d68f63b689a5572dea710ed/content/utility/utility_thread_impl.cc

Status: Fixed (was: Untriaged)

Sign in to add a comment