libdbus fork()s causing hangs in a multithreaded process like chrome |
||||||
Issue description
Forking this bug from 693668
TL;DR:
- we use (indirectly) libdbus
- On trusty there is evidence that libdbus fork()+exec() out of the blue.
- fork() in a multithreaded process is known to be harmful.
- Specifically, it leaves locks held causing deadlocks.
More details:
Evidence that libdbus fork + execs:
-----------------------------------
From git:///anongit.freedesktop.org/git/dbus/dbus @ dbus-1.3.0 (1.3 is what trusty ships)
_dbus_transport_open()
_dbus_transport_open_autolaunch()
_dbus_transport_new_for_autolaunch()
_dbus_get_autolaunch_address()
_read_subprocess_line_argv()
fork()
wait() (on the pipe)
Also from strace on Issue 693668#c36:
open("/var/lib/dbus/machine-id", O_RDONLY) = 178
fstat(178, {st_mode=S_IFREG|0644, st_size=33, ...}) = 0
read(178, "06a8bb708fa536f777c2fdbc5507239a"..., 33) = 33
close(178) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
pipe([178, 179]) = 0
pipe([180, 181]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc778d65cd0) = 6446
close(179) = 0
close(181) = 0
read(178,
Why is that harmful for chrome?
-------------------------------
TCmalloc, well, like most allocators, has a spinlock to protect the global heap.
If a fork() happens while the spinlock is held, the forked process will have the spinlock flagged as taken. But since fork() doesn't clone threads, the thread that was holding the spinlock will not be there in the forked process, hence that spinlock will be never released.
At this point, the first time that the forked process will try to do an allocation, tcmalloc will try to re-acquire the spinlock and hang forever.
But there is more, dbus makes it worse than this.
In fact, the code that forks (_read_subprocess_line_argv() in dbus/dbus-sysdeps-unix.c) creates a pipe between the main process (chrome) and the forked one, and blocks the main process for data coming from the child process.
Other than the obvious jank and delay that this will introduce in the hosting process (chrome), this means that once the child process gets stuck because of the deadlock, the main process will hang together.
Now note how this is not really specific to tcmalloc. Any sort of lock might be held by any of the various chrome threads at the time the fork() happens.
As such we should refrain from using gconf (or directly dbus) after we have created any thread, or we should use it in an isolated utility process to limit the damage.
On top of this, If I am reading the code correctly, this means that during our initialization we'll have to wait for gconf to read a bunch of files and spawn the dbus-autolauncher. Currently this seems to happen in ChromeBrowserMainParts::PreMainMessageLoopRun().
I have no idea in how many places we use gconf. Definitely ui::AtkUtilAuraLinux::Initialize() is one which emerged in Issue 693668#c24
,
Feb 23 2017
,
Feb 23 2017
,
Feb 24 2017
I assume this is for latest chromium versions?
,
Feb 24 2017
Looking at dbus-transport.c, several open methods are defined:
} open_funcs[] = {
{ _dbus_transport_open_socket },
{ _dbus_transport_open_platform_specific },
{ _dbus_transport_open_autolaunch }
_dbus_transport_open_autolaunch() seems to be used when dbus-daemon is not yet running. I guess that's unusual on Linux desktop?
Maybe Chrome should check if the dbus-daemon is already running and quit if it's not the case, with a friendly error message.
,
Feb 24 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/56741ec45a112dcf94dc8d2d571646a2210e367c commit 56741ec45a112dcf94dc8d2d571646a2210e367c Author: thomasanderson <thomasanderson@google.com> Date: Fri Feb 24 03:17:05 2017 Remove gconf usage from atk_util_auralinux Excerpt from https://developer.gnome.org/accessibility-devel-guide/stable/gad-how-it-works.html.en Whether or not applications on the GNOME desktop automatically load these accessibility support libraries depends on the value of a gconf key, "/desktop/gnome/interface/accessibility"; a boolean value of "true" enables support for assistive technologies and applications which call gnome_program_init will automatically load the appropriate accessibility libraries at runtime. "Pure GTK+ applications", e.g. those that use gtk+ but do not link to libgnome, rely on the value of the GTK_MODULES environment variable, which must be set to "gail:atk-bridge" in order to enable assistive technology support. This CL changes Chromium to determine it's accessibility preferences from the GTK_MODULES variable rather than the /desktop/gnome/interface/accessibility gconf key. This has the added advantage of being desktop-agnostic, so now accessibility settings should be recognized on more environments than just Gnome. BUG=693668, 695643 TBR=aboxhall@chromium.org R=thestig@chromium.org CC=dpranke@chromium.org,primiano@chromium.org Review-Url: https://codereview.chromium.org/2709963004 Cr-Commit-Position: refs/heads/master@{#452735} [modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/BUILD.gn [modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/platform/atk_util_auralinux.cc [modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/platform/atk_util_auralinux.h
,
Feb 25 2017
The CL in #6 fixes the major symptom. It's unknown at this point if there are other places that use dbus that cause this problem.
,
Feb 25 2017
What version of chromium does this fix apply to?
,
Feb 25 2017
re: comment 8 - 58.0.3023.0 and newer.
,
Apr 26 2017
,
Apr 26 2017
,
Apr 26 2017
,
May 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0 commit 8511820ec8280caacbd4f81f3ecd13b6c61681b0 Author: primiano <primiano@chromium.org> Date: Mon May 08 15:09:51 2017 Linux: Disable DBus auto-launch This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading friendly and causing random hangs when running chrome outside of Linux desktop environments. Background: ----------- Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS environment variable. This variable allows the dbus client library to directly connect to the existing bus, which is started by the desktop environment or systemd. When this variable is missing, the dbus client library will fallback to auto-launch mode [1], which causes 4 nested fork() + exec() calls. Doing this has two problems: (i) slows down startup; (ii) can hang the browser if the fork() happens while another thread is in a malloc() (Chrome's tcmalloc has no at-fork handlers). This situation (no env variable) is very common in test scenarios (browsertests, chromedriver, etc). Change introduced by this CL: ----------------------------- This CL sets the bus address env variable to "disabled:" if not set. This effectively shuts down the dbus auto-launch. If necessary, this behavior can be restored by setting, before launching chrome, DBUS_SESSION_BUS_ADDRESS="autolaunch:" . This workaround will be necessary until libdbus and gspawn are fixed to be multi-threading friendly [2,3] and that fix rolls into the various distributions. The change is introduced in the main embedder rather than in the google-chrome wrapper, as several binaries can be affected by this, for instance: - browser tests (http://crbug.com/693668) - chrome --headless - webdriver/selenium which seem to directly invoke "chrome" see https://github.com/SeleniumHQ/docker-selenium/issues/87 [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843 [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699 BUG= 715658 , 695643 , 713947 TEST=strace -ff -o trace chrome; grep dbus-launch trace* Review-Url: https://codereview.chromium.org/2861163002 Cr-Commit-Position: refs/heads/master@{#469987} [modify] https://crrev.com/8511820ec8280caacbd4f81f3ecd13b6c61681b0/services/service_manager/embedder/main.cc
,
May 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173 commit 1e78cb7863da28bb3411286cdbcc4fb4510ce173 Author: iclelland <iclelland@chromium.org> Date: Mon May 08 18:41:31 2017 Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ ) Reason for revert: Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv. Filed crbug.com/719633 for this as well. Original issue's description: > Linux: Disable DBus auto-launch > > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading > friendly and causing random hangs when running chrome outside of Linux > desktop environments. > > Background: > ----------- > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS > environment variable. This variable allows the dbus client library to > directly connect to the existing bus, which is started by the desktop > environment or systemd. > When this variable is missing, the dbus client library will fallback > to auto-launch mode [1], which causes 4 nested fork() + exec() calls. > Doing this has two problems: (i) slows down startup; (ii) can hang > the browser if the fork() happens while another thread is in a malloc() > (Chrome's tcmalloc has no at-fork handlers). > This situation (no env variable) is very common in test scenarios > (browsertests, chromedriver, etc). > > Change introduced by this CL: > ----------------------------- > This CL sets the bus address env variable to "disabled:" if not set. > This effectively shuts down the dbus auto-launch. If necessary, this > behavior can be restored by setting, before launching chrome, > DBUS_SESSION_BUS_ADDRESS="autolaunch:" . > This workaround will be necessary until libdbus and gspawn are fixed > to be multi-threading friendly [2,3] and that fix rolls into the > various distributions. > The change is introduced in the main embedder rather than in the > google-chrome wrapper, as several binaries can be affected by this, > for instance: > - browser tests (http://crbug.com/693668) > - chrome --headless > - webdriver/selenium which seem to directly invoke "chrome" > see https://github.com/SeleniumHQ/docker-selenium/issues/87 > > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843 > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699 > > BUG= 715658 , 695643 , 713947 > TEST=strace -ff -o trace chrome; grep dbus-launch trace* > > Review-Url: https://codereview.chromium.org/2861163002 > Cr-Commit-Position: refs/heads/master@{#469987} > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0 TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org,satorux@chromium.org,primiano@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 715658 , 695643 , 713947 Review-Url: https://codereview.chromium.org/2869843003 Cr-Commit-Position: refs/heads/master@{#470059} [modify] https://crrev.com/1e78cb7863da28bb3411286cdbcc4fb4510ce173/services/service_manager/embedder/main.cc
,
May 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91 commit 2fc330d0b93d4bfd7bd04b9fdd3102e529901f91 Author: primiano <primiano@chromium.org> Date: Tue May 09 13:07:12 2017 Reland of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2869843003/ ) Reason for reland: Adding TSan suppression. The race is independent of this CL, see crbug.com/719633 Original issue's description: > Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ ) > > Reason for revert: > Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv. > > Filed crbug.com/719633 for this as well. > > Original issue's description: > > Linux: Disable DBus auto-launch > > > > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading > > friendly and causing random hangs when running chrome outside of Linux > > desktop environments. > > > > Background: > > ----------- > > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS > > environment variable. This variable allows the dbus client library to > > directly connect to the existing bus, which is started by the desktop > > environment or systemd. > > When this variable is missing, the dbus client library will fallback > > to auto-launch mode [1], which causes 4 nested fork() + exec() calls. > > Doing this has two problems: (i) slows down startup; (ii) can hang > > the browser if the fork() happens while another thread is in a malloc() > > (Chrome's tcmalloc has no at-fork handlers). > > This situation (no env variable) is very common in test scenarios > > (browsertests, chromedriver, etc). > > > > Change introduced by this CL: > > ----------------------------- > > This CL sets the bus address env variable to "disabled:" if not set. > > This effectively shuts down the dbus auto-launch. If necessary, this > > behavior can be restored by setting, before launching chrome, > > DBUS_SESSION_BUS_ADDRESS="autolaunch:" . > > This workaround will be necessary until libdbus and gspawn are fixed > > to be multi-threading friendly [2,3] and that fix rolls into the > > various distributions. > > The change is introduced in the main embedder rather than in the > > google-chrome wrapper, as several binaries can be affected by this, > > for instance: > > - browser tests (http://crbug.com/693668) > > - chrome --headless > > - webdriver/selenium which seem to directly invoke "chrome" > > see https://github.com/SeleniumHQ/docker-selenium/issues/87 > > > > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html > > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843 > > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699 > > > > BUG= 715658 , 695643 , 713947 > > TEST=strace -ff -o trace chrome; grep dbus-launch trace* > > > > Review-Url: https://codereview.chromium.org/2861163002 > > Cr-Commit-Position: refs/heads/master@{#469987} > > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0 > Review-Url: https://codereview.chromium.org/2869843003 > Cr-Commit-Position: refs/heads/master@{#470059} > Committed: https://chromium.googlesource.com/chromium/src/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173 BUG= 715658 , 695643 , 713947 , 719633 TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org Review-Url: https://codereview.chromium.org/2865283002 Cr-Commit-Position: refs/heads/master@{#470301} [modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/build/sanitizers/tsan_suppressions.cc [modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/services/service_manager/embedder/main.cc
,
May 15 2017
Workaround landed in #15. please reopen if this happens again. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by thestig@chromium.org
, Feb 23 2017