Xvfb doesn't play nice with a remote_run-managed TEMPDIR |
||||||||
Issue descriptionSee bug 708681 for background. remote_run attempts to set TEMPDIR and TMPDIR in order to manage where the recipe-launched processes are writing to. Apparently Xvfb (one of those processes) wants to make sure that a file actually in /tmp exists, and so attempts to create a link from /tmp/foo to TEMPDIR/foo . Which works fine, as long as TEMPDIR is on the same filesystem. But, on some machines, it won't be, and that breaks. See: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FSite_Isolation_Linux%2F16115%2F%2B%2Frecipes%2Fsteps%2Fwebkit_tests%2F0%2Fstdout for the errors.
,
Apr 27 2017
,
Apr 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/fb4b3222dc1a951b98720c877f8b622c71857481 commit fb4b3222dc1a951b98720c877f8b622c71857481 Author: Dirk Pranke <dpranke@chromium.org> Date: Thu Apr 27 00:20:49 2017 Disable kitchen on Site Isolation Linux for now. kitchen sets TEMPDIR and TMPDIR in order to attempt to make sure that its subprocesses are only writing to directories it can manage. However, if you run Xvfb, it appears to want to create something that exists in a hard-coded /tmp/foo location, and so it seems to try and create a hard link from /tmp/foo to TEMPDIR/foo. Which doesn't work when TEMPDIR is on a different filesystem. This CL disables kitchen for now, until we can come up with a fix. TBR=dnj@chromium.org, tansell@chromium.org BUG=708681, 715848 Change-Id: I54700551de824c8c6fc8ec3d5e4d74ee24c260eb Reviewed-on: https://chromium-review.googlesource.com/488133 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Dirk Pranke <dpranke@chromium.org> [modify] https://crrev.com/fb4b3222dc1a951b98720c877f8b622c71857481/scripts/slave/remote_run.py
,
Apr 27 2017
,
Apr 27 2017
https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1033179 Looks like it's known, and nobody cares. I am not sure upstream fix here will be timely or useful.
,
Apr 27 2017
It looks like the bug is confirmed, but no one's patched it; they may just take a patch.
,
Apr 27 2017
I'm more concerned about the larger process: - Prepare / submit / defend a patch. - Get new XVFB rolled into a package. - Update the package on all of our testing bots. This is before we can run Kitchen on any of them. Another bug: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/972324 Maybe there is an XVFB flag that we can explicitly set...
,
Apr 27 2017
Problem value is hard-coded here: https://cgit.freedesktop.org/xorg/xserver/tree/xkb/ddxLoad.c#n62 Throughout this, though, I see a lot of hardcoded "/var/run/..." etc. We're not going to get a good quarantine even if we fix the TEMPDIR problem. Perhaps we should approach this from the other side and have the hosted Xvfb use a hardcoded "/tmp"? Or possibly probe "/tmp" from "xkb" or the native Xorg configuration? https://chromium.googlesource.com/chromium/src/+/master/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/linux.py#154
,
Apr 27 2017
can we mount /tmp to the same partition where we run bots? It might be a good change in general. We avoid using native temp dir because it is a on a small drive. +friedman
,
Apr 27 2017
Perhaps something with "unionfs"? The problem is that we would need to bootstrap our build process in some sort of isolated container so the filesystem mount is limited to that process. We can do that with cgroups or Docker, but it's a far larger change :( I'm not worried about /tmp size here. Xvfb is pretty versatile and works on any number of platforms, embedded and otherwise, so I expect it to use /tmp responsibly enough.
,
Apr 27 2017
Thats not a bad idea, especially on VMs where sdb is the disk with all the churn, but it is a fairly large one-off. What happens if you launch xvfb with a custom env var like: TEMPDIR=/tmp xvfb? Can you give me some commands to replicate this failure that I can play with?
,
Apr 27 2017
The entire problem here is that it's ignoring those variables in some places.
,
Apr 27 2017
I see. I suppose puppet could mount over /tmp with something from sdb, but I'm concerned about the existing fds there... /tmp is important. Is /tmp/$foo $foo predefined? We could pre-mount that first. There's also pivot_root which might help but ... yikes? chroot?
,
Apr 28 2017
Other options include; * Use xvnc instead of Xvfb (I actually suggest we should switch to using PyVirtualDisplay module which supports xvnc, xvfb and xeypher). * Using a chroot. * LD_PRELOAD
,
Apr 28 2017
from yesterday offline discussion between iannucci@ and dnj@: since xvfb hardcodes /tmp and /var, we could wrap invocation of xvfb with a wrapper that sets TEMPDIR back to hardcoded /tmp. I find this solution least invasive.
,
Apr 28 2017
It's already wrapped in a wrapper (liked in #8), so we just need to tweak that.
,
May 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38 commit 9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38 Author: dnj <dnj@chromium.org> Date: Thu May 04 17:48:04 2017 webkitpy: Override TEMPDIR for Xvfb. Hard-code TEMPDIR/TMPDIR for "Xvfb" to "/tmp" in webkitpy's Linux layout test port. The Xvfb tool's "xkb" library compiles in "/tmp" in several places, and hard-linking expectations get violated if TEMPDIR and "/tmp" are on different filesystems. BUG= chromium:715848 TEST=local - Ran without patch and TEMPDIR/TMPDIR overrides, verified hardlink failure from bug. - Ran with patch, no more hardlink failure! Review-Url: https://codereview.chromium.org/2859703002 Cr-Commit-Position: refs/heads/master@{#469378} [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/common/system/executive_mock.py [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/breakpad/dump_reader_win_unittest.py [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/linux.py [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/layout_tests/port/linux_unittest.py [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/auto_rebaseline_unittest.py [modify] https://crrev.com/9cc81c66fbe4e06f84a5d8647faf1a2fab5ecf38/third_party/WebKit/Tools/Scripts/webkitpy/tool/commands/rebaseline_unittest.py
,
May 4 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/37b16786dee83e4e72e8be310c68d46d3da7b19c commit 37b16786dee83e4e72e8be310c68d46d3da7b19c Author: Daniel Jacques <dnj@chromium.org> Date: Thu May 04 19:03:55 2017 Revert "Disable kitchen on Site Isolation Linux for now." This reverts commit fb4b3222dc1a951b98720c877f8b622c71857481. Reason for revert: Xvfb issue believed to be fixed. Original change's description: > Disable kitchen on Site Isolation Linux for now. > > kitchen sets TEMPDIR and TMPDIR in order to attempt to make sure > that its subprocesses are only writing to directories it can manage. > However, if you run Xvfb, it appears to want to create something > that exists in a hard-coded /tmp/foo location, and so it seems to > try and create a hard link from /tmp/foo to TEMPDIR/foo. Which > doesn't work when TEMPDIR is on a different filesystem. > > This CL disables kitchen for now, until we can come up with a fix. > > TBR=dnj@chromium.org, tansell@chromium.org > BUG=708681, 715848 > > Change-Id: I54700551de824c8c6fc8ec3d5e4d74ee24c260eb > Reviewed-on: https://chromium-review.googlesource.com/488133 > Reviewed-by: Dirk Pranke <dpranke@chromium.org> > Commit-Queue: Dirk Pranke <dpranke@chromium.org> > TBR=iannucci@chromium.org,dpranke@chromium.org,dnj@chromium.org,tansell@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. BUG=708681, 715848 Change-Id: Ie6e07e7044623c5bc4a7fb91f21f94d4c774b9f7 Reviewed-on: https://chromium-review.googlesource.com/495852 Reviewed-by: Daniel Jacques <dnj@chromium.org> Commit-Queue: Daniel Jacques <dnj@chromium.org> [modify] https://crrev.com/37b16786dee83e4e72e8be310c68d46d3da7b19c/scripts/slave/remote_run.py
,
May 4 2017
Assigned to dnj to update. Marked for LUCI-M1-S1
,
May 4 2017
After the landed CL, a Kitchen-run version of this build seems to be working! https://luci-milo.appspot.com/buildbot/chromium.fyi/Site%20Isolation%20Linux/16342
,
May 5 2017
,
May 6 2017
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by iannu...@google.com
, Apr 27 2017