New issue
Advanced search Search tips

Issue 594852 link

Starred by 2 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Mus in Chrome blocks the main thread

Project Member Reported by roc...@chromium.org, Mar 15 2016

Issue description

It seems there is a dependency on blocking the main thread in any client process consuming mus client libraries... or something. Letting a run loop spin while waiting for displays (screen_mus.cc) breaks the display completely, so the thread has to be completely blocked on a MojoWait. This is bad.

This also in turn means that any clients consuming mus must wait for ShellClient initialization, otherwise the aforementioned Wait can block the ShellClient::Initialize call from dispatching on the same thread, resulting in an effective deadlock at startup.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Mar 15 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/27eb4409b17331235123e5e0a6837aa760203816

commit 27eb4409b17331235123e5e0a6837aa760203816
Author: rockot <rockot@chromium.org>
Date: Tue Mar 15 05:57:59 2016

Reinstate wait-for-Initialize when Chrome is run in Mash

Mus Views needs to completely block the main thread during
initialization, including any pending tasks. This means
incoming messages on any other pipe will never be dispatched.
If ShellClient::Initialize is blocked, the pipe views is
waiting on will in turn never be able to signal, resulting
in an effective deadlock.

So we need to wait for Initialize before proceeding with the
rest of the main thread setup. This adds a temporary switch
to explicitly opt in to wait-for-Initialize behavior on
MojoShellConnection, and adds that switch when running in mash.

BUG= 594636 ,594852
TEST=views_mus_unittests, mus_ws_unittests, and chrome --mash renders as well as it did before I broke it.

Review URL: https://codereview.chromium.org/1797153002

Cr-Commit-Position: refs/heads/master@{#381182}

[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/chrome/app/mash/mash_runner.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/content/common/mojo/mojo_shell_connection_impl.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/content/common/mojo/mojo_shell_connection_impl.h
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/content/public/common/content_switches.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/content/public/common/content_switches.h
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/mojo/edk/embedder/embedder.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/mojo/shell/public/cpp/lib/shell_connection.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/mojo/shell/public/cpp/shell_connection.h
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/ui/views/mus/platform_test_helper_mus.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/ui/views/mus/screen_mus.cc
[modify] https://crrev.com/27eb4409b17331235123e5e0a6837aa760203816/ui/views/mus/screen_mus.h

Project Member

Comment 2 by bugdroid1@chromium.org, Apr 26 2016

Components: MUS
Components: Internals>MUS
Labels: Proj-Mustash
Project Member

Comment 5 by bugdroid1@chromium.org, Feb 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599

commit 73b9848f38e00bf1b9f48cfa2da7f8cda7c41599
Author: rockot <rockot@chromium.org>
Date: Wed Feb 15 18:58:02 2017

Make browser process a singleton service

Changes the primordial browser process service manager connection
to identify as a singleton service named content_packaged_services.
This is used to package any existing global services previously
associated with the global root content_browser connection.

The root content_browser connection still exists in order to serve
as a client process host for non-renderer child processes.

One result of this change is that no services other than content_*
are allowed to connect directly to content_browser, in order to avoid
racing with the (now asynchronous) content_browser instance creation
by ServiceManagerContext. As such, code which previously connected to
content_browser has also been fixed to do something less weird. To wit:

 - An "ash" service is now embedded in the browser process when not
   running in mash mode - this allows simplification of ash client code
   as there is no longer a need to switch the target service name at
   runtime when connecting to ash interfaces
 - A new "chrome" service is packaged in content_packaged_services by
   within Chrome. This hosts a mash::mojom::Launchable interface, allowing
   Chrome to be launched within mash by connecting to "chrome"
 - ChromeInterfaceFactory is deleted

Somewhat unrelated but because this CL exposed a timing issue between packaged
service connection and main thread browser initialization, this also removes the
unnecessary blocking of startup on service manager connection (see change in
chrome_browser_main_extra_parts_views.cc.)

BUG= 649407 ,594852
TEST=chrome --mash functions normally, standalone mash runner functions normally with chrome, chrome non-mash functions normally, browser_tests work with and without --run-in-mash
TBR=tsepez@chromium.org

Review-Url: https://codereview.chromium.org/2695803004
Cr-Commit-Position: refs/heads/master@{#450750}

[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/ash/mus/manifest.json
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/BUILD.gn
[add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/chrome_manifest.json
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/mash/BUILD.gn
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/mash/mash_runner.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/DEPS
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/browser_resources.grd
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_browser_client.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_browser_manifest_overlay.json
[add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_packaged_services_manifest_overlay.json
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chromeos/BUILD.gn
[delete] https://crrev.com/5e8ee75a86c1c36b53c1a6b5ec28d89b7bb6d4ad/chrome/browser/chromeos/chrome_interface_factory.cc
[delete] https://crrev.com/5e8ee75a86c1c36b53c1a6b5ec28d89b7bb6d4ad/chrome/browser/chromeos/chrome_interface_factory.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/prefs/preferences_connection_manager.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/ash/ash_util.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/ash/ash_util.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/views/chrome_browser_main_extra_parts_views.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/test/BUILD.gn
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/test/base/mojo_test_connector.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/BUILD.gn
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/browser/service_manager/service_manager_context.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/browser/service_manager/service_manager_context.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/common/service_manager/service_manager_connection_impl.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/common/service_manager/service_manager_connection_impl.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/content_resources.grd
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/app/BUILD.gn
[add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/app/mojo/content_packaged_services_manifest.json
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/common/service_manager_connection.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/common/service_names.mojom
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/mash/BUILD.gn
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/mash/session/session.cc
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/services/service_manager/public/cpp/interface_registry.h
[modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/services/service_manager/public/cpp/lib/interface_registry.cc

Cc: jonr...@chromium.org
Labels: -Pri-2 Pri-1
Status: Available (was: Untriaged)
This wait is being hit during mash_browser_tests, causing the run of subsequent tests to hang.

Comment 7 by roc...@chromium.org, Feb 23 2017

Owner: roc...@chromium.org
Status: Assigned (was: Available)
Ah, hmm. I wasn't able to repro the hang when landing the CL above, which is why I went ahead and deleted this (what I thought was obsolete) logic.

I'm trying to wrap my head around why this is still an issue given that we now bind the browser's Service impl on the IO thread which is certainly not blocked by the wait in the mus client library.
I just ran into this, so I haven't had the chance to dig further into why the hang is happening.

To repro, run two browser_tests in succession. The second one hangs while initializing ScreenMus:

./out/${OUT_DIR}/browser_tests --run-in-mash --gtest_filter=BrowserTest.*Title
The issue seen in #8 was caused by being unable to connect to the UI service. So the wait never ended. Still slightly concerning to have these waits at startup, however it is unlikely to be ran into outside of the testing framework.

I'm separately from this looking at changing the test startup.
Cc: roc...@chromium.org
Labels: -Pri-1 Pri-3
Owner: ----
Status: Available (was: Assigned)
I was just trying to change the ash process in mash to delay part of its initialization until it can make a connection to the prefs service to get local state prefs. When I do that there's a startup deadlock with chrome stuck here:

void ScreenMus::Init(service_manager::Connector* connector) {
  ...

  // We need the set of displays before we can continue. Wait for it.
  //
  // TODO(rockot): Do something better here. This should not have to block tasks
  // from running on the calling thread. http://crbug.com/594852.
  bool success = display_manager_observer_binding_.WaitForIncomingMethodCall();

Is there anything that can be done about this? I'm guessing no, but if there's a simple-ish way to fix this it would save a moderate amount of complexity in ash code, since ash would be able to assume local state prefs exist during initial UI construction.

I don't know what we can do. My understanding is that this is necessary
because other code which subsequently runs on the main thread will make
assumptions about the state of the display subsystem. sky@ probably has
more insight into the details.
We'd likely need a rework of the code that assumes the display subsystem is available.

This block has also caused issues in the tests before. Where tests could infinitely block on this line until timeout.
This is currently leading to a deadlock in the mash_browser_tests

I'm on ed7555f98074e69a08a0fe4cf03dccb421d7fa6c and the following test is consistently locking up the test suite:

./out/${OUT_DIR}/browser_tests --test-launcher-bot-mode --override-use-software-gl-for-tests --run-in-mash --gtest_filter=FormAutocompleteTest.AjaxSucceeded_FilledFormStillVisible

What happens is that QuickLaunch::OnStart is blocked by ScreenMus waiting, as in #11

Then when the test completes we attempt to tear down all ServiceContext. BackgroundServiceManager blocks on this. 

Unfortunately ServiceManager::Instance ends up blocking on shutting down the associated quick_launch service.

QuickLaunch ends up never getting a signal, since the display manager is already destroyed. However the ServiceManager can then never finish shutting down.

I'm going to look at disabling the quick_launch in the tests to deal with the symptom.


Project Member

Comment 15 by bugdroid1@chromium.org, Aug 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/912d50035d9b0770acfe7c26b29e68e8058f52f2

commit 912d50035d9b0770acfe7c26b29e68e8058f52f2
Author: Jonathan <jonross@chromium.org>
Date: Wed Aug 16 14:40:58 2017

Disable QuickLaunch in Mash Session

Currently QuickLaunch startup can deadlock with the shutdown of the
ServiceManager. This is being encountered in the mash_browser_tests where tests
can complete and begin teardown before an unused QuickLaunch has finished
startup.

This change disables the starting of QuickLaunch within MashSession, which is
currently only used by tests.

TEST=mash_browser_tests

Bug: 594852
Change-Id: Ie9458d764f07182099588abfdac1d645379453b5
Reviewed-on: https://chromium-review.googlesource.com/615387
Reviewed-by: Scott Violet <sky@chromium.org>
Commit-Queue: Jonathan Ross <jonross@chromium.org>
Cr-Commit-Position: refs/heads/master@{#494774}
[modify] https://crrev.com/912d50035d9b0770acfe7c26b29e68e8058f52f2/mash/session/session.cc

Cc: sky@chromium.org
Labels: -Pri-3 Pri-2
sky, we talked about this before, but can you think of anything we can do about this?

I took a look at our audio initialization (issue 762306), which depends on local state prefs, which aren't available at ash Shell init time. I tried naively splitting ash Shell init into pre- and post-local state stages, but I hit deadlocks on this call.

I would really like to defer Shell init until local state prefs are available (see https://chromium-review.googlesource.com/c/chromium/src/+/604169) but I can't do that with this blocking call in place.

It feels like we need some kind of very explicit startup flow where the various mash processes walk through a state machine as they gradually discover all the startup state.

Comment 17 by sky@chromium.org, Sep 7 2017

I think there are multiple issues at play here:
1. the quick launch hang. I think this is happening because quick launch is waiting on mus and mus it at a point where it won't ack to the quick launch. I suspect mus has either shutdown, or is in the process of shutting down. I'm not sure which, have to investigate.
2. desire to have prefs available before initializing shell. This would mean you need to gate shell init on chrome init (prefs). That is certainly possible, it just means you have to get the order right. Currently that means you would need to get prefs ready, send it to ash, then have chrome init blocked until ash init completes.
Cc: sadrul@chromium.org jamescook@chromium.org
What if we start with some dummy Display during init? Once mus sends the real display info to the client, ScreenMus will be updated accordingly, and things should normally fall in place. Something like this: https://chromium-review.googlesource.com/c/chromium/src/+/654478 The trybots don't seem too upset about this. With this change, we would be doing some extra work during initialization, but I'd claim it's still better than actually blocking everything during init.

Thoughts?
sky@ #17 1) For the quick launch hang, I've filed  issue 760257  which dives deeper into the deadlock between ServiceManager shutdown and blocked children process. Quick launch is one such example (caused by the wait mentioned in this issue) however the same symptoms are occurring for other process.

jamescooke@ #16 I've recently seen the local state process get into a deadlocked state with ServiceManager on tests. Now that you mentioned that these aren't ready in time for Ash Shell init. Both of these two lead me to believe that there is some long blocking call occurring in local state. I haven't had a chance to go look at that. Have you?

sadrul@ #18 +1 to handling the display init asynchronously. I think the extra work is worth avoiding deadlock. We could also probably quickly metric it to see how much longer it is.

Comment 20 by sky@chromium.org, Sep 7 2017

The risk with not blocking is that we'll use the fake display to make decisions that aren't right for the real display. I suspect ash and mus will reject a request to create a window on a display that doesn't exist.
My hope/wishful thinking is that chrome would/should deal with it correctly. What does chrome on a chromebox do if there's no monitor attached during startup? We have a similar situation there, right? (i.e. zero displays?)
+kylechar, do you know what we do when there's no monitor attached to a chromebox on startup? I think we create a fake display, but maybe we do a special dance when a monitor is connected later?

jonross, re: local state, what do you mean by "local state process"? Do you mean the ash process attempting to connect to the mojo preferences service in chrome? I haven't seen deadlocks related to local state in my own work.

Cc: kylec...@chromium.org
Really +kylechar this time, see #22

@jamescook: when running in mus-ash, sometimes the ServiceManager cannot shutdown the ServieManager::Instance that is associated with running the "local_state"

Specifically: prefs::mojom::kLocalStateServiceName

So I believe that the mojo preferences service is deadlocking with the ServiceManager.

I haven't seen ash deadlocking with it. However I haven't tried to split ash's startup like you have.
If there is no display attached then DisplayManager creates pretty normal looking ManagedDisplayInfo with a fake size and continues initialization. As far as ash knows after that, there is a display with that fake size and does everything normally. Ash would tell the WS about the fake display and things would continue.

If a normal display gets attached then the next time DisplayManager::OnNativeDisplaysChanged() is called it looks like the fake display was removed and the actual display was added. There are some optimizations to reuse the WindowTreeHost from the fake display with the real display, so I think most of the work would be redrawing at the correct size and some work in WS/Ozone creating new ws::Display, ws::PlatformDisplay, ws::FrameGenerator, etc.

Comment 27 by sky@chromium.org, Sep 8 2017

I think a fake display case is problematic too. We may end up resizing/moving windows during restore.
Components: Internals>Services>ServiceManager
Bulk applying component Internals>Services>ServiceManager to issues referencing the text ServiceManager.  This may not be 100% accurate, so please feel free to pull the component as needed.
Cc: steve...@chromium.org
+stevenjb FYI re: local state and ash startup, see comment 11 on down

Components: -Internals>MUS Internals>Services>WindowService
Components: -MUS
Labels: -Proj-Mustash Proj-Mash-MultiProcess
Bug scrub: Still a headache for mustash.

Cc: -roc...@chromium.org rockot@google.com

Sign in to add a comment