Mus in Chrome blocks the main thread |
|||||||||||||||
Issue descriptionIt seems there is a dependency on blocking the main thread in any client process consuming mus client libraries... or something. Letting a run loop spin while waiting for displays (screen_mus.cc) breaks the display completely, so the thread has to be completely blocked on a MojoWait. This is bad. This also in turn means that any clients consuming mus must wait for ShellClient initialization, otherwise the aforementioned Wait can block the ShellClient::Initialize call from dispatching on the same thread, resulting in an effective deadlock at startup.
,
Apr 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2d26a030a05d92fbb7489d62cfb7759b8f045335 commit 2d26a030a05d92fbb7489d62cfb7759b8f045335 Author: sadrul <sadrul@chromium.org> Date: Tue Apr 26 17:37:26 2016 mus/chrome: Always wait for the mojo shell in the browser process. When chrome is launched in mus, it always needs to block on Initialize to complete, regardless of whether run as 'chrome --mash' or as 'mojo_runner mojo:mash_session'. BUG=594852 Review URL: https://codereview.chromium.org/1915053004 Cr-Commit-Position: refs/heads/master@{#389816} [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/chrome/app/mash/mash_runner.cc [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/chrome/test/base/mojo_test_connector.cc [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/chrome/test/base/mojo_test_connector.h [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/content/browser/browser_main_loop.cc [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/content/common/mojo/mojo_shell_connection_impl.cc [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/content/common/mojo/mojo_shell_connection_impl.h [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/content/public/common/content_switches.cc [modify] https://crrev.com/2d26a030a05d92fbb7489d62cfb7759b8f045335/content/public/common/content_switches.h
,
Jul 25 2016
,
Oct 4 2016
,
Feb 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599 commit 73b9848f38e00bf1b9f48cfa2da7f8cda7c41599 Author: rockot <rockot@chromium.org> Date: Wed Feb 15 18:58:02 2017 Make browser process a singleton service Changes the primordial browser process service manager connection to identify as a singleton service named content_packaged_services. This is used to package any existing global services previously associated with the global root content_browser connection. The root content_browser connection still exists in order to serve as a client process host for non-renderer child processes. One result of this change is that no services other than content_* are allowed to connect directly to content_browser, in order to avoid racing with the (now asynchronous) content_browser instance creation by ServiceManagerContext. As such, code which previously connected to content_browser has also been fixed to do something less weird. To wit: - An "ash" service is now embedded in the browser process when not running in mash mode - this allows simplification of ash client code as there is no longer a need to switch the target service name at runtime when connecting to ash interfaces - A new "chrome" service is packaged in content_packaged_services by within Chrome. This hosts a mash::mojom::Launchable interface, allowing Chrome to be launched within mash by connecting to "chrome" - ChromeInterfaceFactory is deleted Somewhat unrelated but because this CL exposed a timing issue between packaged service connection and main thread browser initialization, this also removes the unnecessary blocking of startup on service manager connection (see change in chrome_browser_main_extra_parts_views.cc.) BUG= 649407 ,594852 TEST=chrome --mash functions normally, standalone mash runner functions normally with chrome, chrome non-mash functions normally, browser_tests work with and without --run-in-mash TBR=tsepez@chromium.org Review-Url: https://codereview.chromium.org/2695803004 Cr-Commit-Position: refs/heads/master@{#450750} [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/ash/mus/manifest.json [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/BUILD.gn [add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/chrome_manifest.json [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/mash/BUILD.gn [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/app/mash/mash_runner.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/DEPS [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/browser_resources.grd [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_browser_client.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_browser_manifest_overlay.json [add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chrome_content_packaged_services_manifest_overlay.json [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/chromeos/BUILD.gn [delete] https://crrev.com/5e8ee75a86c1c36b53c1a6b5ec28d89b7bb6d4ad/chrome/browser/chromeos/chrome_interface_factory.cc [delete] https://crrev.com/5e8ee75a86c1c36b53c1a6b5ec28d89b7bb6d4ad/chrome/browser/chromeos/chrome_interface_factory.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/prefs/preferences_connection_manager.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/ash/ash_util.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/ash/ash_util.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/browser/ui/views/chrome_browser_main_extra_parts_views.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/test/BUILD.gn [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/chrome/test/base/mojo_test_connector.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/BUILD.gn [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/browser/service_manager/service_manager_context.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/browser/service_manager/service_manager_context.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/common/service_manager/service_manager_connection_impl.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/common/service_manager/service_manager_connection_impl.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/content_resources.grd [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/app/BUILD.gn [add] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/app/mojo/content_packaged_services_manifest.json [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/common/service_manager_connection.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/content/public/common/service_names.mojom [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/mash/BUILD.gn [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/mash/session/session.cc [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/services/service_manager/public/cpp/interface_registry.h [modify] https://crrev.com/73b9848f38e00bf1b9f48cfa2da7f8cda7c41599/services/service_manager/public/cpp/lib/interface_registry.cc
,
Feb 23 2017
This wait is being hit during mash_browser_tests, causing the run of subsequent tests to hang.
,
Feb 23 2017
Ah, hmm. I wasn't able to repro the hang when landing the CL above, which is why I went ahead and deleted this (what I thought was obsolete) logic. I'm trying to wrap my head around why this is still an issue given that we now bind the browser's Service impl on the IO thread which is certainly not blocked by the wait in the mus client library.
,
Feb 23 2017
I just ran into this, so I haven't had the chance to dig further into why the hang is happening.
To repro, run two browser_tests in succession. The second one hangs while initializing ScreenMus:
./out/${OUT_DIR}/browser_tests --run-in-mash --gtest_filter=BrowserTest.*Title
,
Feb 28 2017
The issue seen in #8 was caused by being unable to connect to the UI service. So the wait never ended. Still slightly concerning to have these waits at startup, however it is unlikely to be ran into outside of the testing framework. I'm separately from this looking at changing the test startup.
,
May 29 2017
,
Aug 3 2017
I was just trying to change the ash process in mash to delay part of its initialization until it can make a connection to the prefs service to get local state prefs. When I do that there's a startup deadlock with chrome stuck here:
void ScreenMus::Init(service_manager::Connector* connector) {
...
// We need the set of displays before we can continue. Wait for it.
//
// TODO(rockot): Do something better here. This should not have to block tasks
// from running on the calling thread. http://crbug.com/594852.
bool success = display_manager_observer_binding_.WaitForIncomingMethodCall();
Is there anything that can be done about this? I'm guessing no, but if there's a simple-ish way to fix this it would save a moderate amount of complexity in ash code, since ash would be able to assume local state prefs exist during initial UI construction.
,
Aug 3 2017
I don't know what we can do. My understanding is that this is necessary because other code which subsequently runs on the main thread will make assumptions about the state of the display subsystem. sky@ probably has more insight into the details.
,
Aug 4 2017
We'd likely need a rework of the code that assumes the display subsystem is available. This block has also caused issues in the tests before. Where tests could infinitely block on this line until timeout.
,
Aug 15 2017
This is currently leading to a deadlock in the mash_browser_tests I'm on ed7555f98074e69a08a0fe4cf03dccb421d7fa6c and the following test is consistently locking up the test suite: ./out/${OUT_DIR}/browser_tests --test-launcher-bot-mode --override-use-software-gl-for-tests --run-in-mash --gtest_filter=FormAutocompleteTest.AjaxSucceeded_FilledFormStillVisible What happens is that QuickLaunch::OnStart is blocked by ScreenMus waiting, as in #11 Then when the test completes we attempt to tear down all ServiceContext. BackgroundServiceManager blocks on this. Unfortunately ServiceManager::Instance ends up blocking on shutting down the associated quick_launch service. QuickLaunch ends up never getting a signal, since the display manager is already destroyed. However the ServiceManager can then never finish shutting down. I'm going to look at disabling the quick_launch in the tests to deal with the symptom.
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/912d50035d9b0770acfe7c26b29e68e8058f52f2 commit 912d50035d9b0770acfe7c26b29e68e8058f52f2 Author: Jonathan <jonross@chromium.org> Date: Wed Aug 16 14:40:58 2017 Disable QuickLaunch in Mash Session Currently QuickLaunch startup can deadlock with the shutdown of the ServiceManager. This is being encountered in the mash_browser_tests where tests can complete and begin teardown before an unused QuickLaunch has finished startup. This change disables the starting of QuickLaunch within MashSession, which is currently only used by tests. TEST=mash_browser_tests Bug: 594852 Change-Id: Ie9458d764f07182099588abfdac1d645379453b5 Reviewed-on: https://chromium-review.googlesource.com/615387 Reviewed-by: Scott Violet <sky@chromium.org> Commit-Queue: Jonathan Ross <jonross@chromium.org> Cr-Commit-Position: refs/heads/master@{#494774} [modify] https://crrev.com/912d50035d9b0770acfe7c26b29e68e8058f52f2/mash/session/session.cc
,
Sep 7 2017
sky, we talked about this before, but can you think of anything we can do about this? I took a look at our audio initialization (issue 762306), which depends on local state prefs, which aren't available at ash Shell init time. I tried naively splitting ash Shell init into pre- and post-local state stages, but I hit deadlocks on this call. I would really like to defer Shell init until local state prefs are available (see https://chromium-review.googlesource.com/c/chromium/src/+/604169) but I can't do that with this blocking call in place. It feels like we need some kind of very explicit startup flow where the various mash processes walk through a state machine as they gradually discover all the startup state.
,
Sep 7 2017
I think there are multiple issues at play here: 1. the quick launch hang. I think this is happening because quick launch is waiting on mus and mus it at a point where it won't ack to the quick launch. I suspect mus has either shutdown, or is in the process of shutting down. I'm not sure which, have to investigate. 2. desire to have prefs available before initializing shell. This would mean you need to gate shell init on chrome init (prefs). That is certainly possible, it just means you have to get the order right. Currently that means you would need to get prefs ready, send it to ash, then have chrome init blocked until ash init completes.
,
Sep 7 2017
What if we start with some dummy Display during init? Once mus sends the real display info to the client, ScreenMus will be updated accordingly, and things should normally fall in place. Something like this: https://chromium-review.googlesource.com/c/chromium/src/+/654478 The trybots don't seem too upset about this. With this change, we would be doing some extra work during initialization, but I'd claim it's still better than actually blocking everything during init. Thoughts?
,
Sep 7 2017
sky@ #17 1) For the quick launch hang, I've filed issue 760257 which dives deeper into the deadlock between ServiceManager shutdown and blocked children process. Quick launch is one such example (caused by the wait mentioned in this issue) however the same symptoms are occurring for other process. jamescooke@ #16 I've recently seen the local state process get into a deadlocked state with ServiceManager on tests. Now that you mentioned that these aren't ready in time for Ash Shell init. Both of these two lead me to believe that there is some long blocking call occurring in local state. I haven't had a chance to go look at that. Have you? sadrul@ #18 +1 to handling the display init asynchronously. I think the extra work is worth avoiding deadlock. We could also probably quickly metric it to see how much longer it is.
,
Sep 7 2017
The risk with not blocking is that we'll use the fake display to make decisions that aren't right for the real display. I suspect ash and mus will reject a request to create a window on a display that doesn't exist.
,
Sep 7 2017
My hope/wishful thinking is that chrome would/should deal with it correctly. What does chrome on a chromebox do if there's no monitor attached during startup? We have a similar situation there, right? (i.e. zero displays?)
,
Sep 8 2017
+kylechar, do you know what we do when there's no monitor attached to a chromebox on startup? I think we create a fake display, but maybe we do a special dance when a monitor is connected later? jonross, re: local state, what do you mean by "local state process"? Do you mean the ash process attempting to connect to the mojo preferences service in chrome? I haven't seen deadlocks related to local state in my own work.
,
Sep 8 2017
Really +kylechar this time, see #22
,
Sep 8 2017
@jamescook: when running in mus-ash, sometimes the ServiceManager cannot shutdown the ServieManager::Instance that is associated with running the "local_state" Specifically: prefs::mojom::kLocalStateServiceName So I believe that the mojo preferences service is deadlocking with the ServiceManager. I haven't seen ash deadlocking with it. However I haven't tried to split ash's startup like you have.
,
Sep 8 2017
If there is no display attached then DisplayManager creates pretty normal looking ManagedDisplayInfo with a fake size and continues initialization. As far as ash knows after that, there is a display with that fake size and does everything normally. Ash would tell the WS about the fake display and things would continue. If a normal display gets attached then the next time DisplayManager::OnNativeDisplaysChanged() is called it looks like the fake display was removed and the actual display was added. There are some optimizations to reuse the WindowTreeHost from the fake display with the real display, so I think most of the work would be redrawing at the correct size and some work in WS/Ozone creating new ws::Display, ws::PlatformDisplay, ws::FrameGenerator, etc.
,
Sep 8 2017
Here is where the fake display gets added: https://cs.chromium.org/chromium/src/ui/display/manager/display_manager.cc?sq=package:chromium&dr=CSs&l=561
,
Sep 8 2017
I think a fake display case is problematic too. We may end up resizing/moving windows during restore.
,
Nov 8 2017
Bulk applying component Internals>Services>ServiceManager to issues referencing the text ServiceManager. This may not be 100% accurate, so please feel free to pull the component as needed.
,
Feb 21 2018
+stevenjb FYI re: local state and ash startup, see comment 11 on down
,
Feb 26 2018
,
Feb 26 2018
,
Aug 14
Bug scrub: Still a headache for mustash.
,
Oct 17
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by bugdroid1@chromium.org
, Mar 15 2016