New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 715658 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Bug

Blocked on:
issue 309093
issue 537776
issue 695643
issue 713947



Sign in to add a comment

dbus autolaunch causes chrome to hang

Project Member Reported by primiano@chromium.org, Apr 26 2017

Issue description

General problem: some third party libraries we depend on (glibc and friends) do a sequence of fork() -> malloc() -> exec(). This can cause deadlocks (full explanation in  Issue 695643 ) if another thread is in a malloc() at the time of the fork(). 
This is because chromium's tcmalloc doesn't have the atfork handler which was later introduced in tcmalloc [1]. I looked into that and backporting the fix is non-trivial. chromium's tcmalloc is lagging behind and has some other security patches on top, so a backport would be a stability risk. Also, the real problem here is that these libraries shouldn't just do anything else other than an exec() after a fork(). tcmalloc is only one of the many possible problems here, and changing it will not solve the others.

There is a very specific instance of this problem: dbus. the dbus client library has this annonying feature of auto-launching dbus when it's not there in the desktop session. This happens very frequently in test environment which run under xvfb and don't have a full session manager with dbus.
This struck internal and external developers in a number of occasions, such as:
- Issue 693668
- https://github.com/w3c/web-platform-tests/issues/5406
- https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699

A proposal here is to workaround this specific problem by disabling dbus autolaunch.
My understanding is that dbus is not a hard requirement for chrome, and we can live without it.
If this is the case, let's figure out what is the best way to prevent dbus autolaunch.
I have two proposals, but don't know too much about it to judge the side effects , would appreciate more knowledgeable people to jump in.

Option 1)
if getenv(DBUS_SESSION_BUS_ADDRESS) == ""
  setenv(DBUS_SESSION_BUS_ADDRESS, "/dev/null")
This will effectively disable the autolaunch feature when chrome is invoked outside a desktop session (e.g., if you launch chrome from ssh or a cron job).
There is a potential side effect here: in this case, the dbus auto launch would have first tried to connect to one of the existing sessions under ~/.dbus/ and eventually found one if there was a real desktop session somewhere. forcing the DBUS_SESSION_BUS_ADDRESS to /dev/null will prevent the dbus-launch to early out sooner.

Option 2)
Force a dbus connection in browser_main_loop or somewhere else very early, before we start any other thread. This will kick the auto-launch in a controlled environment when we know that no other thread can be causing a malloc (Because there is no other thread). I don't know if this will prevent any future dbus-launch attempt.



[1] https://github.com/gperftools/gperftools/issues/499
 

Comment 1 by kbr@chromium.org, Apr 26 2017

Blockedon: 713947 309093 695643
Linking this to related bugs. The open ones on this topic should be coalesced for clarity.

Bonus question, if the proposals above don't fly, is there any cmdline flag we could use to detect that we are in a testing harness, so we could clear DBUS_SESSION_BUS_ADDRESS only in testing cases?

Comment 3 by kbr@chromium.org, Apr 26 2017

Blockedon: 537776
Most of Chromium's tests pass the --test-type flag into the main process, and that's used to tell whether or not we're in a testing environment. See  Issue 537776 .

Comment 4 by kbr@chromium.org, Apr 26 2017

One point: in gspawn.c:
https://github.com/GNOME/glib/blob/master/glib/gspawn.c

Fixing this bug would amount to refactoring fork_exec_with_pipes and do_exec so that if "close_descriptors" is true, it calls a helper function to do the fdwalk and gather the results *before* it calls fork(). At least, I think that's the correct fix.

kbr@ I think I have missed some other bug here, where is that gspawn called from?

Last time I checked dbus (see  Issue 695643 ) it has its own implementation of "fork+pipe+exec", where they end up malloc-ing via opendir/readdir [1] entirely from libdbus. libdbus doesn't seem to use gspawn, but maybe something else does? (in which case now we have two problems to deal with, not just one :/)

> Fixing this bug would amount to refactoring fork_exec_with_pipes and do_exec so that if "close_descriptors" is true, it calls a helper function to do the fdwalk and gather the results *before* it calls fork(). > At least, I think that's the correct fix.
Well not quite. You don't really want to close all file descriptor before fork()-ing, because doing so will just tear down all resources in the parent process that called gspawn (say, chrome), which essentially would mean: using gspawn() causes the caller process to become unusable, which is quite undesirable :)

The common practice is to close() after fork() to make sure that the child process doesn't leak / keep refcounted resources and capabilities from the parent process.
close() itself is harmless. The real problem here is:
- fork()s should be made only in a controlled environment (say before bringing up any threads)
- if one (say libdbus in our case) wants to take the risk and fork() at any random point (when there are other threads) at very least they have to be careful and not using any function that uses malloc() (which is quite tricky because lot of gblibc functions do these days).

Specifically with reference to the example above, the less-harmful thing to do would be IMHO just for(i in [3, MAX_FDs]) close(i).
(Which ironically, is what dbus does as a fallback in NON-Linux cases (See https://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c?id=d42d167cc3ba0d20aa891a75aa1cd80ec5f490f1#n4413))

The bug we are hitting with dbus is because on Linux they are attempting to do something too clever and they use opendir(/proc/self/fds) to reduce the number of close() calls. See 
https://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c?id=d42d167cc3ba0d20aa891a75aa1cd80ec5f490f1#n4368



[1] https://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c?id=d42d167cc3ba0d20aa891a75aa1cd80ec5f490f1#n4358
I filed a bug upstream against libdbus:
https://bugs.freedesktop.org/show_bug.cgi?id=100843
And there seem to be a similar discussion in glib about gspawn:
https://bugzilla.gnome.org/show_bug.cgi?id=746604

I had some back and forth with the libdbus maintainer (See https://bugs.freedesktop.org/show_bug.cgi?id=100843).
I think his ultimate opinion is "this is bad, but I'd stick with whatever glib does".
I feel that both libdbus and glib make very aggressive assumptions (i.e. everybody uses glib's malloc) and fork()+malloc() is IMHO a really unorthodox practice. However, in all honesty, I can understand the position of a library (dbus) maintainer to just stick with whatever glib does.

On other news, I asked about the idea above of clobbering DBUS_SESSION_BUS_ADDRESS when not set, copy-pasting from comment 5 there:
-----
One thing that this could easily break, though: recent versions of libdbus and other major D-Bus implementations (sd-bus, GDBus) will try "$XDG_RUNTIME_DIR/bus" (which we hope will become the dominant way to locate a session bus over the next few years) before falling back to X11 autolaunch. At the moment, we expect distributions that make use of this feature to set DBUS_SESSION_BUS_ADDRESS="unix:path=$XDG_RUNTIME_DIR/bus". However, eventually we would like to be able to stop setting that variable, which would break your proposed code.
-----

So, weighting everything together, my personal opinion is that we should, in order of priority and realism:
1. definitely set DBUS_SESSION_BUS_ADDRESS to "disabled:" if not set when having a  --test-type flag
2. consider (but I'd like other opinions here) just doing this regardless of --test-type.
3. adding some checks around third-party code using fork() (i.e. introducing some ScopedAllowFork() only in places where we actually expect it) and forbidding the use of any library that  just does that.
4. considering backporting that tcmalloc atfork, but that honestly is quite some work and would make me quite nervous about the stability risks.

Comment 8 by kbr@chromium.org, Apr 28 2017

#5: what I'm suggesting is to change the interface to do_exec so that the caller does the opendir/fdwalk before fork(), but just collects the file descriptors, and passes that array of "fds to call fcntl (fd, F_SETFD, FD_CLOEXEC) upon" to do_exec. Then do_exec (after the fork) would iterate down that array making that call on each.

Comment 9 by kbr@chromium.org, Apr 28 2017

Primiano and I talked offline and he pointed out that my proposed solution would introduce a race condition where a multithreaded program could open a new fd between the time the fdwalk was done, and when fork() was called, and that would be leaked. Ignore my suggestion as there's a better one in https://bugzilla.gnome.org/show_bug.cgi?id=746604#c8 .

Comment 10 by kbr@chromium.org, Apr 28 2017

Also, per #5: the stack trace showing the problem in gspawn is from this linked bug comment:
https://bugs.chromium.org/p/chromium/issues/detail?id=309093#c51

Owner: primiano@chromium.org
Status: Assigned (was: Unconfirmed)
Re #10, Ah thanks. If I read that stacktrace correctly I think that is just the old version of the dbus client glib which was using g_spawn, while now libdbus migrated to have its own implementation (which is almost copy/pasted from gspawn).

Essentially I am saying that:
- I would expect https://bugs.chromium.org/p/chromium/issues/detail?id=309093#c51 to NOT show up anymore on trusty in favor of  Issue 695643  (i.e. the same problem has moved from gspawn to libdbus by virtue of moving to trusty)
- the DBUS_SESSION_BUS_ADDRESS="disabled:" workaround should apply also here.
- We should still fix both libdbus and gspawn, as gspawn could be used by some other library that we don't know about yet. I'll send out a patch to that as I get some spare bandwidth
Project Member

Comment 12 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

commit 8511820ec8280caacbd4f81f3ecd13b6c61681b0
Author: primiano <primiano@chromium.org>
Date: Mon May 08 15:09:51 2017

Linux: Disable DBus auto-launch

This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
friendly and causing random hangs when running chrome outside of Linux
desktop environments.

Background:
-----------
Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
environment variable. This variable allows the dbus client library to
directly connect to the existing bus, which is started by the desktop
environment or systemd.
When this variable is missing, the dbus client library will fallback
to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
Doing this has two problems: (i) slows down startup; (ii) can hang
the browser if the fork() happens while another thread is in a malloc()
(Chrome's tcmalloc has no at-fork handlers).
This situation (no env variable) is very common in test scenarios
(browsertests, chromedriver, etc).

Change introduced by this CL:
-----------------------------
This CL sets the bus address env variable to "disabled:" if not set.
This effectively shuts down the dbus auto-launch. If necessary, this
behavior can be restored by setting, before launching chrome,
DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
This workaround will be necessary until libdbus and gspawn are fixed
to be multi-threading friendly [2,3] and that fix rolls into the
various distributions.
The change is introduced in the main embedder rather than in the
google-chrome wrapper, as several binaries can be affected by this,
for instance:
- browser tests (http://crbug.com/693668)
- chrome --headless
- webdriver/selenium which seem to directly invoke "chrome"
   see https://github.com/SeleniumHQ/docker-selenium/issues/87

[1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
[2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
[3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699

BUG= 715658 , 695643 , 713947 
TEST=strace -ff -o trace chrome; grep dbus-launch trace*

Review-Url: https://codereview.chromium.org/2861163002
Cr-Commit-Position: refs/heads/master@{#469987}

[modify] https://crrev.com/8511820ec8280caacbd4f81f3ecd13b6c61681b0/services/service_manager/embedder/main.cc

Project Member

Comment 13 by bugdroid1@chromium.org, May 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

commit 1e78cb7863da28bb3411286cdbcc4fb4510ce173
Author: iclelland <iclelland@chromium.org>
Date: Mon May 08 18:41:31 2017

Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )

Reason for revert:
Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.

Filed  crbug.com/719633  for this as well.

Original issue's description:
> Linux: Disable DBus auto-launch
>
> This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> friendly and causing random hangs when running chrome outside of Linux
> desktop environments.
>
> Background:
> -----------
> Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> environment variable. This variable allows the dbus client library to
> directly connect to the existing bus, which is started by the desktop
> environment or systemd.
> When this variable is missing, the dbus client library will fallback
> to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> Doing this has two problems: (i) slows down startup; (ii) can hang
> the browser if the fork() happens while another thread is in a malloc()
> (Chrome's tcmalloc has no at-fork handlers).
> This situation (no env variable) is very common in test scenarios
> (browsertests, chromedriver, etc).
>
> Change introduced by this CL:
> -----------------------------
> This CL sets the bus address env variable to "disabled:" if not set.
> This effectively shuts down the dbus auto-launch. If necessary, this
> behavior can be restored by setting, before launching chrome,
> DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> This workaround will be necessary until libdbus and gspawn are fixed
> to be multi-threading friendly [2,3] and that fix rolls into the
> various distributions.
> The change is introduced in the main embedder rather than in the
> google-chrome wrapper, as several binaries can be affected by this,
> for instance:
> - browser tests (http://crbug.com/693668)
> - chrome --headless
> - webdriver/selenium which seem to directly invoke "chrome"
>    see https://github.com/SeleniumHQ/docker-selenium/issues/87
>
> [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
>
> BUG= 715658 , 695643 , 713947 
> TEST=strace -ff -o trace chrome; grep dbus-launch trace*
>
> Review-Url: https://codereview.chromium.org/2861163002
> Cr-Commit-Position: refs/heads/master@{#469987}
> Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org,satorux@chromium.org,primiano@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 715658 , 695643 , 713947 

Review-Url: https://codereview.chromium.org/2869843003
Cr-Commit-Position: refs/heads/master@{#470059}

[modify] https://crrev.com/1e78cb7863da28bb3411286cdbcc4fb4510ce173/services/service_manager/embedder/main.cc

Project Member

Comment 14 by bugdroid1@chromium.org, May 9 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91

commit 2fc330d0b93d4bfd7bd04b9fdd3102e529901f91
Author: primiano <primiano@chromium.org>
Date: Tue May 09 13:07:12 2017

Reland of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2869843003/ )

Reason for reland:
Adding TSan suppression. The race is independent of this CL, see  crbug.com/719633 

Original issue's description:
> Revert of Linux: Disable DBus auto-launch (patchset #1 id:1 of https://codereview.chromium.org/2861163002/ )
>
> Reason for revert:
> Speculative revert -- the TSan bots have been reporting a data race when setting Envvars (in this case, appending to the python path to start a websocket server). The race appeared immediately after this patch landed, so it may be legitimate. Reverting this to see if it clears the failures up; if so, we'll probably just have to serialize the calls to setenv.
>
> Filed  crbug.com/719633  for this as well.
>
> Original issue's description:
> > Linux: Disable DBus auto-launch
> >
> > This is a workaround (ETA ~ 2-3 years) for libdbus not being multi-threading
> > friendly and causing random hangs when running chrome outside of Linux
> > desktop environments.
> >
> > Background:
> > -----------
> > Typically, Linux desktop environments set the DBUS_SESSION_BUS_ADDRESS
> > environment variable. This variable allows the dbus client library to
> > directly connect to the existing bus, which is started by the desktop
> > environment or systemd.
> > When this variable is missing, the dbus client library will fallback
> > to auto-launch mode [1], which causes 4 nested fork() + exec() calls.
> > Doing this has two problems: (i) slows down startup; (ii) can hang
> > the browser if the fork() happens while another thread is in a malloc()
> > (Chrome's tcmalloc has no at-fork handlers).
> > This situation (no env variable) is very common in test scenarios
> > (browsertests, chromedriver, etc).
> >
> > Change introduced by this CL:
> > -----------------------------
> > This CL sets the bus address env variable to "disabled:" if not set.
> > This effectively shuts down the dbus auto-launch. If necessary, this
> > behavior can be restored by setting, before launching chrome,
> > DBUS_SESSION_BUS_ADDRESS="autolaunch:" .
> > This workaround will be necessary until libdbus and gspawn are fixed
> > to be multi-threading friendly [2,3] and that fix rolls into the
> > various distributions.
> > The change is introduced in the main embedder rather than in the
> > google-chrome wrapper, as several binaries can be affected by this,
> > for instance:
> > - browser tests (http://crbug.com/693668)
> > - chrome --headless
> > - webdriver/selenium which seem to directly invoke "chrome"
> >    see https://github.com/SeleniumHQ/docker-selenium/issues/87
> >
> > [1] https://dbus.freedesktop.org/doc/dbus-launch.1.html
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=100843
> > [3] https://bugs.chromium.org/p/chromedriver/issues/detail?id=1699
> >
> > BUG= 715658 , 695643 , 713947 
> > TEST=strace -ff -o trace chrome; grep dbus-launch trace*
> >
> > Review-Url: https://codereview.chromium.org/2861163002
> > Cr-Commit-Position: refs/heads/master@{#469987}
> > Committed: https://chromium.googlesource.com/chromium/src/+/8511820ec8280caacbd4f81f3ecd13b6c61681b0

> Review-Url: https://codereview.chromium.org/2869843003
> Cr-Commit-Position: refs/heads/master@{#470059}
> Committed: https://chromium.googlesource.com/chromium/src/+/1e78cb7863da28bb3411286cdbcc4fb4510ce173

BUG= 715658 , 695643 , 713947 , 719633 
TBR=satorux@google.com,thestig@chromium.org,jam@chromium.org

Review-Url: https://codereview.chromium.org/2865283002
Cr-Commit-Position: refs/heads/master@{#470301}

[modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/build/sanitizers/tsan_suppressions.cc
[modify] https://crrev.com/2fc330d0b93d4bfd7bd04b9fdd3102e529901f91/services/service_manager/embedder/main.cc

PSA discussing the fix: 
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/-cs1V-tz06A

Thank so much Primiano!
Status: Fixed (was: Assigned)
Workaround landed in chrome. please reopen if this happens again.

Sign in to add a comment