Clang MacToT buildbot failing to build clang |
||||||||
Issue descriptionFrom https://ci.chromium.org/buildbot/chromium.clang/ToTMac/1075 It's failing to build libcxx: [2102/3315] Building CXX object projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o FAILED: projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o /b/c/xcode_ios_9c40b.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DNDEBUG -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/libcxx/lib -I/b/c/b/ToTMac/src/third_party/llvm/projects/libcxx/lib -I/b/c/xcode_ios_9c40b.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/libxml2 -Iinclude -I/b/c/b/ToTMac/src/third_party/llvm/include -Iprojects/libcxx/include/c++build -I/b/c/b/ToTMac/src/third_party/llvm/projects/libcxx/include -DLLVM_FORCE_HEAD_REVISION -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -O3 -isysroot /b/c/xcode_ios_9c40b.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk -mmacosx-version-min=10.12 -DLIBCXX_BUILDING_LIBCXXABI -std=c++14 -nostdinc++ -fvisibility-inlines-hidden -Wall -Wextra -W -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wno-user-defined-literals -Wno-covered-switch-default -Wno-error -fPIC -MMD -MT projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -MF projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o.d -o projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -c /b/c/b/ToTMac/src/third_party/llvm/projects/libcxx/src/experimental/filesystem/operations.cpp /b/c/b/ToTMac/src/third_party/llvm/projects/libcxx/src/experimental/filesystem/operations.cpp:592:9: error: 'utimensat' is only available on macOS 10.13 or newer [-Werror,-Wunguarded-availability-new] if (::utimensat(AT_FDCWD, p.c_str(), tbuf, 0) == -1) { ^~~~~~~~~~~ /b/c/xcode_ios_9c40b.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/stat.h:374:5: note: 'utimensat' has been explicitly marked partial here int utimensat(int __fd, const char *__path, const struct timespec __times[2], ^ /b/c/b/ToTMac/src/third_party/llvm/projects/libcxx/src/experimental/filesystem/operations.cpp:592:9: note: enclose 'utimensat' in a __builtin_available check to silence this warning if (::utimensat(AT_FDCWD, p.c_str(), tbuf, 0) == -1) { ^~~~~~~~~~~ 1 error generated. What's mysterious is that ToTMac(dbg) succeeds (e.g. https://ci.chromium.org/buildbot/chromium.clang/ToTMac%20%28dbg%29/605). It's on a different build slave, but they're both on macOS 10.12. Something must be different between them.
,
Feb 1 2018
SDK version mismatch? The 10.13 SDK added a lot of new availability annotations.
,
Feb 1 2018
Yes, the failing bot seems to be using an SDK under /b/c/xcode_ios_9c40b.app, but the succeeding one is using /Applications/Xcode8.0.app The failing bot's machine is also used by https://build.chromium.org/deprecated/chromium.clang/builders/ToTiOS which is maybe why that other SDK was installed..
,
Feb 1 2018
,
Feb 1 2018
and there was a patch for that, but it seems to have stalled: https://reviews.llvm.org/D34249 Nico, do you have any ideas here? I'm not sure why the build is trying to target 10.12 in the first place (for the bootstrap build I think we target 10.7). Maybe something is wonky with the SDK installation?
,
Feb 1 2018
dexonsmith says all the right things on https://reviews.llvm.org/D34249 Maybe just say "we're running into this on Chromium, can we get this landed" on that review? The issue here is that on macOS, SDK version and deployment target are different. You can use the newest SDK (in fact, you're encouraged to do so, and Xcode only ships with the newest SDK) and tell the compiler "please build me a binary that can run on macOS version X". The issue here is that utimensat() only is available on new macOS versions, so you're supposed to use if(__builtin_available(...)) around the utimensat() call. But libc++ only looks if the define UTIME_OMIT exists -- which with the new SDK always is the case, independent of deployment target. Now that I've written this, I actually think the patch might not be ideal. Let me comment on the patch.
,
Feb 1 2018
https://reviews.llvm.org/D34249#994721 If we need a workaround: Do you remember why we build libc++ on the tot bots at all? Ah, it was to bundle the libc++ headers with the compiler, right? (And maybe long ago to get a clang that can run on 10.6, but we no longer need that.) On Linux (and eventually Android and Windows) we use libc++ headers deps'd into the chromium repo instead of bundling them with the compiler (https://cs.chromium.org/chromium/src/third_party/perfetto/gn/standalone/libc%2B%2B/BUILD.gn?type=cs&q=isystem+file:%5C.gn+file:libc&sq=package:chromium&l=25) -- as a workaround we could try using that code path on mac too (while still linking against system libc++ -- we'd just change which headers we use) and then we wouldn't have to build libc++ on the tot bots (or when building clang ever) and we wouldn't run into this.
,
Feb 6 2018
`tools/clang/scripts/update.py --force-local-build --llvm-force-head-revision` worked for me locally (with the same Xcode). Attaching build log & attempting full local package.py next.
,
Feb 6 2018
Wait, the bot doesn't do full package.py either.
,
Feb 6 2018
If I manually run this locally, I see the build error:
$ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DNDEBUG -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/libcxx/lib -I../llvm/projects/libcxx/lib -I/b/c/xcode_ios_9c40b.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/libxml2 -Iinclude -I../llvm/include -Iprojects/libcxx/include/c++build -I../llvm/projects/libcxx/include -DLLVM_FORCE_HEAD_REVISION -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -O3 -mmacosx-version-min=10.12 -DLIBCXX_BUILDING_LIBCXXABI -std=c++14 -nostdinc++ -fvisibility-inlines-hidden -Wall -Wextra -W -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wno-user-defined-literals -Wno-covered-switch-default -Wno-error -fPIC -MMD -MT projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -MF projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o.d -o projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -c ../llvm/projects/libcxx/src/experimental/filesystem/operations.cpp -isysroot $(xcrun -show-sdk-path)
../llvm/projects/libcxx/src/experimental/filesystem/operations.cpp:602:9: error: 'utimensat' is only available on macOS 10.13 or newer
[-Werror,-Wunguarded-availability-new]
if (::utimensat(AT_FDCWD, p.c_str(), tbuf, 0) == -1) {
^~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/stat.h:374:5: note: 'utimensat' has been explicitly
marked partial here
int utimensat(int __fd, const char *__path, const struct timespec __times[2],
^
../llvm/projects/libcxx/src/experimental/filesystem/operations.cpp:602:9: note: enclose 'utimensat' in a __builtin_available check to silence this warning
if (::utimensat(AT_FDCWD, p.c_str(), tbuf, 0) == -1) {
^~~~~~~~~~~
1 error generated.
,
Feb 6 2018
This builds fine: $ ninja projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -v [1/1] /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DNDEBUG -D_LIBCPP_BUILDING_LIBRARY -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/libcxx/lib -I/Users/thakis/src/chrome/src/third_party/llvm/projects/libcxx/lib -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/libxml2 -Iinclude -I/Users/thakis/src/chrome/src/third_party/llvm/include -Iprojects/libcxx/include/c++build -I/Users/thakis/src/chrome/src/third_party/llvm/projects/libcxx/include -DLLVM_FORCE_HEAD_REVISION -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk -DLIBCXX_BUILDING_LIBCXXABI -std=c++14 -nostdinc++ -fvisibility-inlines-hidden -Wall -Wextra -W -Wwrite-strings -Wno-unused-parameter -Wno-long-long -Werror=return-type -Wno-user-defined-literals -Wno-covered-switch-default -Wno-error -fPIC -MMD -MT projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -MF projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o.d -o projects/libcxx/lib/CMakeFiles/cxx_experimental.dir/__/src/experimental/filesystem/operations.cpp.o -c /Users/thakis/src/chrome/src/third_party/llvm/projects/libcxx/src/experimental/filesystem/operations.cpp
,
Feb 6 2018
Aha, my local thing doesn't have -mmacosx-version-min=10.12. If I add that, then I can repro this with my local real build too. So I suppose updating the tot bot to 10.13 would've also gotten rid of the error. Anyhow, now that I can build, let me try a local patch.
,
Feb 6 2018
http://llvm.org/viewvc/llvm-project?view=revision&revision=324385 Let's see what the bot thinks about that.
,
Feb 6 2018
https://ci.chromium.org/buildbot/chromium.clang/ToTMac/1216 made it past runhooks
,
Feb 7 2018
made it past compile too, but some tests fail. chromedriver and telemetry might just be flakiness, but the blink heap unittests usually aren't flaky :-/
,
Feb 8 2018
I can reproduce the blink_heap_unittests failure locally. The problem is it crashes in the same way with pinned Clang too. I tried Chromium revisions back to October. I guess my machine just doesn't like it for some reason. Something must have changed since the bot went from green to red on this one, but it's not clear that there's a Clang ToT problem.
,
Feb 8 2018
The test fails for me locally as well, in a release-no-dchecks build.
$ lldb -- out/gn/blink_heap_unittests --gtest_filter=HeapTest.TraceDeepEagerlyNote: Google Test filter = HeapTest.TraceDeepEagerly
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from HeapTest
[ RUN ] HeapTest.TraceDeepEagerly
Process 75306 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeef3ffd80)
frame #0: 0x00007fff53b64369 libdyld.dylib`dyld_stub_binder + 241
libdyld.dylib`dyld_stub_binder:
-> 0x7fff53b64369 <+241>: movq %rcx, (%r8)
0x7fff53b6436c <+244>: addq $0x8, %r8
0x7fff53b64370 <+248>: cmpq %r8, %r9
0x7fff53b64373 <+251>: ja 0x7fff53b64369 ; <+241>
Target 0: (blink_heap_unittests) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeef3ffd80)
* frame #0: 0x00007fff53b64369 libdyld.dylib`dyld_stub_binder + 241
frame #1: 0x0000000100720008 blink_heap_unittests
frame #2: 0x00000001000c5eb0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) + 96
frame #3: 0x00000001000c5eb0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) + 96
frame #4: 0x00000001000c5eb0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) + 96
frame #5: 0x00000001000c5eb0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) + 96
(...omitting some 20000 lines of stack...)
frame #23816: 0x00000001000c5eb0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) + 96
frame #23817: 0x00000001000c5e2a blink_heap_unittests`blink::TraceTrait<blink::DeepEagerly>::Trace(blink::Visitor*, void*) + 42
frame #23818: 0x000000010933b309 libblink_platform.dylib`blink::ThreadHeap::AdvanceMarkingStackProcessing(blink::Visitor*, double) + 153
frame #23819: 0x000000010934ca28 libblink_platform.dylib`blink::ThreadState::MarkPhaseAdvanceMarking(double) + 120
frame #23820: 0x0000000109348375 libblink_platform.dylib`blink::ThreadState::CollectGarbage(blink::BlinkGC::StackState, blink::BlinkGC::GCType, blink::BlinkGC::GCReason) + 277
frame #23821: 0x000000010934a273 libblink_platform.dylib`blink::ThreadState::ScheduleGCIfNeeded() + 883
frame #23822: 0x0000000109343715 libblink_platform.dylib`blink::NormalPageArena::OutOfLineAllocate(unsigned long, unsigned long) + 197
frame #23823: 0x00000001000758a9 blink_heap_unittests`blink::HeapTest_TraceDeepEagerly_Test::TestBody() + 41
frame #23824: 0x00000001000f4566 blink_heap_unittests`testing::Test::Run() + 246
frame #23825: 0x00000001000f5170 blink_heap_unittests`testing::TestInfo::Run() + 288
frame #23826: 0x00000001000f5797 blink_heap_unittests`testing::TestCase::Run() + 263
frame #23827: 0x00000001000fdd97 blink_heap_unittests`testing::internal::UnitTestImpl::RunAllTests() + 903
frame #23828: 0x00000001000fd9e3 blink_heap_unittests`testing::UnitTest::Run() + 163
frame #23829: 0x0000000100447b07 blink_heap_unittests`base::TestSuite::Run() + 167
frame #23830: 0x00000001000d3d15 blink_heap_unittests`runHelper(base::TestSuite*) + 85
frame #23831: 0x0000000100455eeb blink_heap_unittests`base::LaunchUnitTests(int, char**, base::RepeatingCallback<int ()> const&) + 139
frame #23832: 0x00000001000d3dab blink_heap_unittests`main + 123
frame #23833: 0x00007fff53b64115 libdyld.dylib`start + 1
frame #23834: 0x00007fff53b64115 libdyld.dylib`start + 1
The trybots all build with dchecks on so the test doesn't run there, but it seems to pass on the main waterfall release bots for some reason. (I want to find out why.)
But since it fails with pinned clang too, this isn't a roll blocker.
,
Feb 8 2018
Hm, maybe https://chromium.googlesource.com/chromium/src/+/6ffadfd65a3137b7b2d754cbf491268c5c85f258 triggered this. This would explain why it started recently, but not why it works on the main waterfall bot.
,
Feb 8 2018
Reverting that locally doesn't change anything.
,
Feb 8 2018
The CL should only affect Windows where ~0ul != ~uintptr_t{0} (as sizeof(ul) != sizeof(uintptr_t on Win64). For Mac it should be a no op. Can you check that?
The test itself checks that we properly abort recursively tracing when we are close to stack limit. There are a few platform dependent things involved (in blink::StackFrameDepth) to get this right though.
,
Feb 8 2018
We rely on __builtin_frame_address(0), see [1]. Could that be an issue? [1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/heap/StackFrameDepth.h?q=StackFrameDepth&l=59
,
Feb 8 2018
More breadcrumbs: https://codereview.chromium.org/946483002
,
Feb 8 2018
EXC_BAD_ACCESS means that we likely run into a stackoverflow because StackFrameDepth is not properly working on your machine. Printing of limits and current pc in StackFrameDepth would be useful.
,
Feb 8 2018
I changed StackFrameDepth::IsSafeToRecurse to print the lhs (CurrentStackFrame()) and rhs (stack_frame_limit_) before the return. It prints 7ffeee822d28 > 7ffeee02c400 ? ...20k lines... 7ffeee05cd50 > 7ffeee02c400 ? 7ffeee05cbf0 > 7ffeee02c400 ? before crashing. The stack addresses increase by 7.7MiB, so the lhs feels roughly correct. To check the rhs, I added some prints to StackFrameDepth::EnableStackLimit(). (I also verified that if I unconditionally call DisableStackLimit() at the end of that function, the test passes fine.) stack base 0x7ffeee82c000 (this is WTF::GetStackStart()). GetUnderestimatedStackSize() seems to return the 8*1024*1024 it's supposed to return for the main thread on mac. kStackRoomSize is 1024 -- is this supposed to be 1024*1024? 1MB stack room is pretty low, and we run out in 7.7MiB not in 8MiB -- so GetUnderestimatedStackSize() apparently doesn't underestimate enough.
,
Feb 8 2018
We assume that 1k stack space is enough until we hit the check again. That's essentially what kStackRoomSize does. We've had problems with it on MSAN already because this configuration also required more stack space. I think we can just raise this to e.g. 4k which would probably work. I assume that clang compiles now into code that takes a little bit more stack space. Would be interesting to know why though.
,
Feb 8 2018
Hm, actually the 8MB look fine. (with 7MB the test passes though)
I added a "stack left" field to my printf, it produces this output:
[----------] 1 test from HeapTest
[ RUN ] HeapTest.TraceDeepEagerly
returning 8mb
stack size 8192.000000 kiB
stack base 0x7ffeed38f000
set stack_frame_limit_ to 7ffeecb8f400, 8191.000000 kiB below stack start
7ffeed38dd60 > 7ffeecb8f400 ? 8186.343750 kiB stack left
returning 8mb
stack size 8192.000000 kiB
stack base 0x7ffeed38f000
set stack_frame_limit_ to 7ffeecb8f400, 8191.000000 kiB below stack start
7ffeed38dd60 > 7ffeecb8f400 ? 8186.343750 kiB stack left
7ffeed38db70 > 7ffeecb8f400 ? 8185.859375 kiB stack left
7ffeed38da10 > 7ffeecb8f400 ? 8185.515625 kiB stack left
7ffeed38d8b0 > 7ffeecb8f400 ? 8185.171875 kiB stack left
7ffeed38d750 > 7ffeecb8f400 ? 8184.828125 kiB stack left
7ffeed38d5f0 > 7ffeecb8f400 ? 8184.484375 kiB stack left
7ffeed38d490 > 7ffeecb8f400 ? 8184.140625 kiB stack left
7ffeed38d330 > 7ffeecb8f400 ? 8183.796875 kiB stack left
7ffeed38d1d0 > 7ffeecb8f400 ? 8183.453125 kiB stack left
7ffeed38d070 > 7ffeecb8f400 ? 8183.109375 kiB stack left
7ffeed38cf10 > 7ffeecb8f400 ? 8182.765625 kiB stack left
...
7ffeecbc02f0 > 7ffeecb8f400 ? 195.734375 kiB stack left
7ffeecbc0190 > 7ffeecb8f400 ? 195.390625 kiB stack left
7ffeecbc0030 > 7ffeecb8f400 ? 195.046875 kiB stack left
7ffeecbbfed0 > 7ffeecb8f400 ? 194.703125 kiB stack left
7ffeecbbfd70 > 7ffeecb8f400 ? 194.359375 kiB stack left
[1/1] HeapTest.TraceDeepEagerly (CRASHED)
1 test crashed:
HeapTest.TraceDeepEagerly (../../third_party/WebKit/Source/platform/heap/HeapTest.cpp:6057)
So not too far away from 0 I suppose...
compiler-rt used this workaround: https://reviews.llvm.org/rL200703
,
Feb 8 2018
I assume that 8M is still fine. The delta here seems to be 0.34375kiB which is 352 bytes. So 1k kStackRoomSize "should" be enough to reach the next check again. As a short term fix you could probably try kStackRoomSize of 4k.
,
Feb 8 2018
Which CL do I need to patch in to try this locally?
,
Feb 8 2018
I added some more printing based on that compiler-rt patch:
guard size 4.000000 kiB -> pthread_attr_getguardsize
official 8000.000000 kiB -> pthread_get_stacksize_np(pthread_self())
rlimit stack 8000.000000 -> getrlimit(RLIMIT_STACK, &rl);
stack size 8192.000000 kiB -> hardcoded
Note that these all print 8000.0 kiB, not 8192 kiB. So the fix is to return "8 * 1000 * 1024" I guess? With that, we get to
7ffee9a6bd70 > 7ffee9a6b400 ? 2.359375 kiB stack left
[1/1] HeapTest.TraceDeepEagerly (CRASHED)
1 test crashed:
HeapTest.TraceDeepEagerly (../../third_party/WebKit/Source/platform/heap/HeapTest.cpp:6057)
If I now also subtract the guard page size, the test passes. So I think we have two bugs here. Or maybe they were true in 10.9 and aren't in newer macOS versions.
So I think we know what the fix is -- do the compiler_rt workaround (i.e. call pthread_get_stacksize_np(pthread_self()), if that returns 1<<19 and we're on 10.9, call getrlimit() and if that looks sane use that, else use 8*1000*1024, and subtract pthread_attr_getguardsize().
But I still don't understand why the test passes on the bots. mark, do you know if macOS default stack sizes changed in newer versions? Do the bots bump it up some? Why would this fail locally for me and on the https://ci.chromium.org/buildbot/chromium.clang/ToTMac/ bot (10.12.2) but not on the main waterfall testers (https://chromium-swarm.appspot.com/task?id=3b8fab95c0cd5a10&refresh=10&show_raw=1 -- 10.12.6)
,
Feb 8 2018
mlippautz: None, just build at head. My debug printfs are the webkit/ changes in https://chromium-review.googlesource.com/#/c/chromium/src/+/909650
,
Feb 8 2018
Your investigation looks on spot and we can change the limits. Would still be interesting to know why it worked so far.
,
Feb 8 2018
The default rlimit has been 8192kB for a very long time, hasn’t changed. The main thread gets this limit. This is inherited by child processes, but I’m not sure if Chrome’s bots bump it up anywhere. (Doubt it, but you never know.) The default stack size for a pthread stack for non-main threads that don’t have their own stack size specified is 512kB. I don’t really want to fire up a 10.9 VM to check, but 10.9 did feature the “new” pthreads library, and it’s possible that they goofed and reported the pthreads default stack size for the main thread through the pthreads interface rather than its actual effective limit, which would be the rlimit. I have a vague recollection of this being the case. Apple never released the source for the new pthreads library in 10.9, they started doing that in 10.10, so I can’t tell for sure just by reading source. I can tell you that 10.8 (the last of the “old” pthreads library in libc) and 10.10 (the first of the “new” libpthread) both unconditionally say that the main thread’s stack size is 8MB, even if it’s really not. Starting in 10.12, they do suck out the actual thread size set up and passed by the kernel and properly report that for the main thread. I don’t know anything about 8000kB vs. 8192kB.
,
Feb 14 2018
Assigning to mlippautz. Can you take this?
,
Feb 15 2018
Will have a look.
,
Feb 15 2018
For me on 10.13.2 - pthread_get_stacksize_np(pthread_self()): 8192kb - rlimit(RLIMIT_STACK, &rl): 8192kb On wonder why the stack limits would include a guard page on IOS but not on regular OSX. In general, I will try to use a fix inspired by WebKit [1]. [1] https://github.com/WebKit/webkit/blob/master/Source/WTF/wtf/StackBounds.cpp#L97
,
Feb 15 2018
Any help from OSX experts is appreciated. Attached is a minimal working example that crashes on OSX 10.13.2 (High Sierra). OSX returns 8MiB as stack size but we crash earlier than that. Both pthread and getrlimit return the same (wrong) size. We crash ~1k7 bytes before reaching the limit. I could not find any documentation on whether the guard page is included in the stack size. It would be weird though as the stack size is defined as memory available to the process.
,
Feb 15 2018
Obviously printing also takes up some space, so the crasher is expected (I see that we crash in printing when accessing an address passt the actual stack space). The 8MiB limit also seems correct as it can even be found in official Apple documentation. For the GC we reserve 1k stack space until we have to do the check again. Investigating now where we actually crash during recursively tracing.
,
Feb 15 2018
Seems to be related how the visitors are compiled with this clang roll. I see "dyld_stub_binder" which indicates that we do an indirect call here.
The solution seems to be that we give the bit more stack room for a turn of checking the stack depth.
Log:
[ RUN ] HeapTest.TraceDeepEagerly
pthread limit: 8388608
rlimit: 8388608
Stack start: 0x7ffeefc00000
Current: 0x7ffeefbfedc0
limit: 0x7ffeef400400
real limit: 0x7ffeef400000
pthread limit: 8388608
rlimit: 8388608
Stack start: 0x7ffeefc00000
Current: 0x7ffeefbfedc0
limit: 0x7ffeef400400
real limit: 0x7ffeef400000
Process 5057 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeef3ffdc0)
frame #0: 0x00007fff6e8bf369 libdyld.dylib`dyld_stub_binder + 241
libdyld.dylib`dyld_stub_binder:
-> 0x7fff6e8bf369 <+241>: movq %rcx, (%r8)
0x7fff6e8bf36c <+244>: addq $0x8, %r8
0x7fff6e8bf370 <+248>: cmpq %r8, %r9
0x7fff6e8bf373 <+251>: ja 0x7fff6e8bf369 ; <+241>
Target 0: (blink_heap_unittests) stopped.
(lldb) bt 15
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7ffeef3ffdc0)
* frame #0: 0x00007fff6e8bf369 libdyld.dylib`dyld_stub_binder + 241
frame #1: 0x000000010019f008 blink_heap_unittests
frame #2: 0x00000001000c5a00 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::TraceTrait<blink::DeepEagerly>::Mark<blink::Visitor*>(visitor=<unavailable>, t=<unavailable>) at TraceTraits.h:241 [opt]
frame #3: 0x00000001000c59f8 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::Visitor::Mark<blink::DeepEagerly>(t=<unavailable>) at Visitor.h:117 [opt]
frame #4: 0x00000001000c59f3 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::Visitor::Trace<blink::DeepEagerly>(blink::Member<blink::DeepEagerly> const&) at Visitor.h:123 [opt]
frame #5: 0x00000001000c59f0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] blink::DeepEagerly::Trace(blink::Visitor*) at HeapTest.cpp:6043 [opt]
frame #6: 0x00000001000c59d9 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] blink::TraceTrait<blink::DeepEagerly>::Trace(self=<unavailable>) at TraceTraits.h:259 [opt]
frame #7: 0x00000001000c59d9 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(visitor=0x000000010e312f30, t=0x0000003d9035a6b0) at TraceTraits.h:97 [opt]
frame #8: 0x00000001000c5a00 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::TraceTrait<blink::DeepEagerly>::Mark<blink::Visitor*>(visitor=<unavailable>, t=<unavailable>) at TraceTraits.h:241 [opt]
frame #9: 0x00000001000c59f8 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::Visitor::Mark<blink::DeepEagerly>(t=<unavailable>) at Visitor.h:117 [opt]
frame #10: 0x00000001000c59f3 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::Visitor::Trace<blink::DeepEagerly>(blink::Member<blink::DeepEagerly> const&) at Visitor.h:123 [opt]
frame #11: 0x00000001000c59f0 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] blink::DeepEagerly::Trace(blink::Visitor*) at HeapTest.cpp:6043 [opt]
frame #12: 0x00000001000c59d9 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] blink::TraceTrait<blink::DeepEagerly>::Trace(self=<unavailable>) at TraceTraits.h:259 [opt]
frame #13: 0x00000001000c59d9 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(visitor=0x000000010e312f30, t=0x0000003d9035a6d0) at TraceTraits.h:97 [opt]
frame #14: 0x00000001000c5a00 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(blink::Visitor*, blink::DeepEagerly const*) [inlined] void blink::TraceTrait<blink::DeepEagerly>::Mark<blink::Visitor*>(visitor=<unavailable>, t=<unavailable>) at TraceTraits.h:241 [opt]
(lldb) f 7
blink_heap_unittests was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #7: 0x00000001000c59d9 blink_heap_unittests`void blink::AdjustAndMarkTrait<blink::DeepEagerly, false>::Mark<blink::Visitor*>(visitor=0x000000010e312f30, t=0x0000003d9035a6b0) at TraceTraits.h:97 [opt]
94 DCHECK(visitor->Heap().GetStackFrameDepth().IsAcceptableStackUse());
95 if (LIKELY(visitor->Heap().GetStackFrameDepth().IsSafeToRecurse())) {
96 if (visitor->EnsureMarked(t)) {
-> 97 TraceTrait<T>::Trace(visitor, const_cast<T*>(t));
98 }
99 return;
100 }
(lldb) p/x $sp
(unsigned long) $0 = 0x00007ffeef4003a0
(lldb) f 0
frame #0: 0x00007fff6e8bf369 libdyld.dylib`dyld_stub_binder + 241
libdyld.dylib`dyld_stub_binder:
-> 0x7fff6e8bf369 <+241>: movq %rcx, (%r8)
0x7fff6e8bf36c <+244>: addq $0x8, %r8
0x7fff6e8bf370 <+248>: cmpq %r8, %r9
0x7fff6e8bf373 <+251>: ja 0x7fff6e8bf369 ; <+241>
(lldb) p/x $sp
(unsigned long) $1 = 0x00007ffeef3ffdc0
,
Feb 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/12b72e899889f0ed16bf16ef7da63e3e9c93781a commit 12b72e899889f0ed16bf16ef7da63e3e9c93781a Author: Michael Lippautz <mlippautz@chromium.org> Date: Fri Feb 16 08:26:48 2018 [oilpan] StackFrameDepth: Adjust stack room size Latest clang rolls require a bit more stack space when calling through the eager marking visitors. Bug: chromium:807994 Change-Id: I5bb2a80fdca7f5d536fd965a9cd6151e81573f9a Reviewed-on: https://chromium-review.googlesource.com/922206 Reviewed-by: Nico Weber <thakis@chromium.org> Reviewed-by: Kentaro Hara <haraken@chromium.org> Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/master@{#537255} [modify] https://crrev.com/12b72e899889f0ed16bf16ef7da63e3e9c93781a/third_party/WebKit/Source/platform/heap/StackFrameDepth.cpp
,
Feb 16 2018
The test passes now for me, would be awesome if somebody could double check. One more theory is that this is actually related to component build. Doing a build with dynamic lookups will definitely have a different call stack size than the static builds we ship.
,
Feb 16 2018
Verified that it fixes the test locally for me. I'll update the bug when the buildbot has cycled.
,
Feb 19 2018
Verified that the test passes on the buildbot now. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by h...@chromium.org
, Feb 1 2018