New issue
Advanced search Search tips

Issue 695643 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug

Blocked on:
issue 309093

Blocking:
issue 713947
issue 715658



Sign in to add a comment

libdbus fork()s causing hangs in a multithreaded process like chrome

Project Member Reported by primiano@chromium.org, Feb 23 2017

Issue description

Forking this bug from 693668

TL;DR:
- we use (indirectly) libdbus
- On trusty there is evidence that libdbus fork()+exec() out of the blue.
- fork() in a multithreaded process is known to be harmful.
- Specifically, it leaves locks held causing deadlocks.

More details:
Evidence that libdbus fork + execs:
-----------------------------------
From git:///anongit.freedesktop.org/git/dbus/dbus @ dbus-1.3.0 (1.3 is what trusty ships)

_dbus_transport_open()
  _dbus_transport_open_autolaunch()
    _dbus_transport_new_for_autolaunch()
      _dbus_get_autolaunch_address()
        _read_subprocess_line_argv()
          fork()
            wait() (on the pipe)

Also from strace on Issue 693668#c36:
open("/var/lib/dbus/machine-id", O_RDONLY) = 178
fstat(178, {st_mode=S_IFREG|0644, st_size=33, ...}) = 0
read(178, "06a8bb708fa536f777c2fdbc5507239a"..., 33) = 33
close(178)                              = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
pipe([178, 179])                        = 0
pipe([180, 181])                        = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc778d65cd0) = 6446
close(179)                              = 0
close(181)                              = 0
read(178,


Why is that harmful for chrome?
-------------------------------
TCmalloc, well, like most allocators, has a spinlock to protect the global heap.
If a fork() happens while the spinlock is held, the forked process will have the spinlock flagged as taken. But since fork() doesn't clone threads, the thread that was holding the spinlock will not be there in the forked process, hence that spinlock will be never released.
At this point, the first time that the forked process will try to do an allocation, tcmalloc will try to re-acquire the spinlock and hang forever.

But there is more, dbus makes it worse than this.
In fact, the code that forks (_read_subprocess_line_argv() in dbus/dbus-sysdeps-unix.c) creates a pipe between the main process (chrome) and the forked one, and blocks the main process for data coming from the child process.
Other than the obvious jank and delay that this will introduce in the hosting process (chrome), this means that once the child process gets stuck because of the deadlock, the main process will hang together.


Now note how this is not really specific to tcmalloc. Any sort of lock might be held by any of the various chrome threads at the time the fork() happens.
As such we should refrain from using gconf (or directly dbus) after we have created any thread, or we should use it in an isolated utility process to limit the damage.
On top of this, If I am reading the code correctly, this means that during our initialization we'll have to wait for gconf to read a bunch of files and spawn the dbus-autolauncher. Currently this seems to happen in  ChromeBrowserMainParts::PreMainMessageLoopRun().

I have no idea in how many places we use gconf. Definitely ui::AtkUtilAuraLinux::Initialize() is one which emerged in Issue 693668#c24
 
Cc: satorux@chromium.org
+satorux for DBus things.  Bug 692174  is also complaining about deadlocks inside ui::AtkUtilAuraLinux::Initialize(). Merge into this bug?
Cc: primiano@chromium.org
 Issue 692174  has been merged into this issue.
Owner: thomasanderson@chromium.org
Status: Started (was: Untriaged)
I assume this is for latest chromium versions?
Looking at dbus-transport.c, several open methods are defined:

} open_funcs[] = {
  { _dbus_transport_open_socket },
  { _dbus_transport_open_platform_specific },
  { _dbus_transport_open_autolaunch }

_dbus_transport_open_autolaunch() seems to be used when dbus-daemon is not yet running. I guess that's unusual on Linux desktop?

Maybe Chrome should check if the dbus-daemon is already running and quit if it's not the case, with a friendly error message.
Project Member

Comment 6 by bugdroid1@chromium.org, Feb 24 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/56741ec45a112dcf94dc8d2d571646a2210e367c

commit 56741ec45a112dcf94dc8d2d571646a2210e367c
Author: thomasanderson <thomasanderson@google.com>
Date: Fri Feb 24 03:17:05 2017

Remove gconf usage from atk_util_auralinux

Excerpt from https://developer.gnome.org/accessibility-devel-guide/stable/gad-how-it-works.html.en
Whether or not applications on the GNOME desktop automatically load
these accessibility support libraries depends on the value of a gconf
key, "/desktop/gnome/interface/accessibility"; a boolean value of
"true" enables support for assistive technologies and applications
which call gnome_program_init will automatically load the appropriate
accessibility libraries at runtime. "Pure GTK+ applications",
e.g. those that use gtk+ but do not link to libgnome, rely on the
value of the GTK_MODULES environment variable, which must be set to
"gail:atk-bridge" in order to enable assistive technology support.

This CL changes Chromium to determine it's accessibility preferences
from the GTK_MODULES variable rather than the
/desktop/gnome/interface/accessibility gconf key.  This has the added
advantage of being desktop-agnostic, so now accessibility settings
should be recognized on more environments than just Gnome.

BUG=693668, 695643 

TBR=aboxhall@chromium.org
R=thestig@chromium.org
CC=dpranke@chromium.org,primiano@chromium.org

Review-Url: https://codereview.chromium.org/2709963004
Cr-Commit-Position: refs/heads/master@{#452735}

[modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/BUILD.gn
[modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/platform/atk_util_auralinux.cc
[modify] https://crrev.com/56741ec45a112dcf94dc8d2d571646a2210e367c/ui/accessibility/platform/atk_util_auralinux.h

Owner: ----
Status: Available (was: Started)
The CL in #6 fixes the major symptom.  It's unknown at this point if there are other places that use dbus that cause this problem.
What version of chromium does this fix apply to?
re: comment 8 - 58.0.3023.0 and newer.

Comment 10 by kbr@chromium.org, Apr 26 2017

Blockedon: 309093
Linking this to the bug where this was investigated the first time.

Comment 11 by kbr@chromium.org, Apr 26 2017

Blocking: 713947

Comment 12 by kbr@chromium.org, Apr 26 2017

Blocking: 715658
Project Member

Comment 13 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

commit 8511820ec8280caacbd4f81f3ecd13b6c61681b0
Author: primiano <primiano@chromium.org>
Date: Mon May 08 15:09:51 2017

Linux: Disable DBus auto-launch

This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
friendly and causing random hangs when running chrome outside of Linux
desktop environments.

Background:
-----------
Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
environment variable. This variable allows the dbus client library to
directly connect to the existing bus, which is started by the desktop
environment or systemd.
When this variable is missing, the dbus client library will fallback
to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
Doing this has two problems: (i) slows down startup; (ii) can hang
the browser if the fork() happens while another thread is in a malloc()
(Chrome's tcmalloc has no at-fork handlers).
This situation (no env variable) is very common in test scenarios
(browsertests, chromedriver, etc).

Change introduced by this CL:
-----------------------------
This CL sets the bus address env variable to "disabled:" if not set.
This effectively shuts down the dbus auto-launch. If necessary, this
behavior can be restored by setting, before launching chrome,
DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
This workaround will be necessary until libdbus and gspawn are fixed
to be multi-threading friendly [2,3] and that fix rolls into the
various distributions.
The change is introduced in the main embedder rather than in the
google-chrome wrapper, as several binaries can be affected by this,
for instance:
- browser tests (http://crbug.com/693668)
- chrome --headless
- webdriver/selenium which seem to directly invoke "chrome"
   see https://github.com/SeleniumHQ/docker-selenium/issues/87

[1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
[2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
[3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699

BUG= 715658 , 695643 , 713947 
TEST=strace -ff -o trace chrome; grep dbus-launch trace*

Review-Url: https://codereview.chromium.org/2861163002
Cr-Commit-Position: refs/heads/master@{#469987}

[modify] https://crrev.com/8511820ec8280caacbd4f81f3ecd13b6c61681b0/services/service_manager/embedder/main.cc

Project Member

Comment 14 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

commit 1e78cb7863da28bb3411286cdbcc4fb4510ce173
Author: iclelland <iclelland@chromium.org>
Date: Mon May 08 18:41:31 2017

Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )

Reason for revert:
Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.

Filed  crbug.com/719633  for this as well.

Original issue's description:
> Linux: Disable DBus auto-launch
>
> This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> friendly and causing random hangs when running chrome outside of Linux
> desktop environments.
>
> Background:
> -----------
> Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> environment variable. This variable allows the dbus client library to
> directly connect to the existing bus, which is started by the desktop
> environment or systemd.
> When this variable is missing, the dbus client library will fallback
> to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> Doing this has two problems: (i) slows down startup; (ii) can hang
> the browser if the fork() happens while another thread is in a malloc()
> (Chrome's tcmalloc has no at-fork handlers).
> This situation (no env variable) is very common in test scenarios
> (browsertests, chromedriver, etc).
>
> Change introduced by this CL:
> -----------------------------
> This CL sets the bus address env variable to "disabled:" if not set.
> This effectively shuts down the dbus auto-launch. If necessary, this
> behavior can be restored by setting, before launching chrome,
> DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> This workaround will be necessary until libdbus and gspawn are fixed
> to be multi-threading friendly [2,3] and that fix rolls into the
> various distributions.
> The change is introduced in the main embedder rather than in the
> google-chrome wrapper, as several binaries can be affected by this,
> for instance:
> - browser tests (http://crbug.com/693668)
> - chrome --headless
> - webdriver/selenium which seem to directly invoke "chrome"
>    see https://github.com/SeleniumHQ/docker-selenium/issues/87
>
> [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
>
> BUG= 715658 , 695643 , 713947 
> TEST=strace -ff -o trace chrome; grep dbus-launch trace*
>
> Review-Url: https://codereview.chromium.org/2861163002
> Cr-Commit-Position: refs/heads/master@{#469987}
> Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org,satorux@chromium.org,primiano@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 715658 , 695643 , 713947 

Review-Url: https://codereview.chromium.org/2869843003
Cr-Commit-Position: refs/heads/master@{#470059}

[modify] https://crrev.com/1e78cb7863da28bb3411286cdbcc4fb4510ce173/services/service_manager/embedder/main.cc

Project Member

Comment 15 by bugdroid1@chromium.org, May 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91

commit 2fc330d0b93d4bfd7bd04b9fdd3102e529901f91
Author: primiano <primiano@chromium.org>
Date: Tue May 09 13:07:12 2017

Reland of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2869843003/ )

Reason for reland:
Adding TSan suppression. The race is independent of this CL, see  crbug.com/719633 

Original issue's description:
> Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )
>
> Reason for revert:
> Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.
>
> Filed  crbug.com/719633  for this as well.
>
> Original issue's description:
> > Linux: Disable DBus auto-launch
> >
> > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> > friendly and causing random hangs when running chrome outside of Linux
> > desktop environments.
> >
> > Background:
> > -----------
> > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> > environment variable. This variable allows the dbus client library to
> > directly connect to the existing bus, which is started by the desktop
> > environment or systemd.
> > When this variable is missing, the dbus client library will fallback
> > to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> > Doing this has two problems: (i) slows down startup; (ii) can hang
> > the browser if the fork() happens while another thread is in a malloc()
> > (Chrome's tcmalloc has no at-fork handlers).
> > This situation (no env variable) is very common in test scenarios
> > (browsertests, chromedriver, etc).
> >
> > Change introduced by this CL:
> > -----------------------------
> > This CL sets the bus address env variable to "disabled:" if not set.
> > This effectively shuts down the dbus auto-launch. If necessary, this
> > behavior can be restored by setting, before launching chrome,
> > DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> > This workaround will be necessary until libdbus and gspawn are fixed
> > to be multi-threading friendly [2,3] and that fix rolls into the
> > various distributions.
> > The change is introduced in the main embedder rather than in the
> > google-chrome wrapper, as several binaries can be affected by this,
> > for instance:
> > - browser tests (http://crbug.com/693668)
> > - chrome --headless
> > - webdriver/selenium which seem to directly invoke "chrome"
> >    see https://github.com/SeleniumHQ/docker-selenium/issues/87
> >
> > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
> >
> > BUG= 715658 , 695643 , 713947 
> > TEST=strace -ff -o trace chrome; grep dbus-launch trace*
> >
> > Review-Url: https://codereview.chromium.org/2861163002
> > Cr-Commit-Position: refs/heads/master@{#469987}
> > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

> Review-Url: https://codereview.chromium.org/2869843003
> Cr-Commit-Position: refs/heads/master@{#470059}
> Committed: https://chromium.googlesource.com/chromium/src/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

BUG= 715658 , 695643 , 713947 , 719633 
TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org

Review-Url: https://codereview.chromium.org/2865283002
Cr-Commit-Position: refs/heads/master@{#470301}

[modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/build/sanitizers/tsan_suppressions.cc
[modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/services/service_manager/embedder/main.cc

Owner: primiano@chromium.org
Status: Fixed (was: Available)
Workaround landed in #15. please reopen if this happens again.

Sign in to add a comment