base_unittests DCHECKs on 32-bit iOS device |
|||||||||||
Issue descriptionChrome Version: 61 OS: iOS9 Tested against: 32-bit (armv7) iOS 9.2.1 iPod Touch (Model PD749LL/A) What steps will reproduce the problem? (1) Run base_unittests What is the expected result? Test suite should run without DCHECKing What happens instead? DCHECK hits in base/message_loop/message_loop.cc at line 378. Ex: [493:22019:0713/145400.881873:479292871252:FATAL:message_loop.cc(378)] Check failed: this == current() (0x1569ae90 vs. 0x0) I'm still working to bisect this completely, but so far I have narrowed down the introduction between 3505cba91e5a (LKGR) and e7ba6263e5d7 (unittests DCHECK on this revision.)
,
Jul 14 2017
,
Jul 17 2017
Who can own this? michaeldo: Please find an appropriate owner.
,
Jul 17 2017
Yes, assigned to me to fine. I am almost done bisecting the above revisions. Then I will re-assign based on my findings.
,
Jul 17 2017
It looks like the clang roll is the issue. I can fix it the DCHECK by reverting the following commit locally on top of HEAD. 3a26e05c70e63815c7c18284e5e006b9d0ac75af Roll clang 303910:305281. The clang update script says it's OK to revert rolls, but given this is a commit from a month ago, I'll ask the owners specifically. hans@, thakis@: The clang roll to 305281 breaks 32-bit iOS devices by DCHECKing as seen in #0 above. Should I revert CLANG_REVISION to 303910 or do one of you want to take a look at the failure first?
,
Jul 17 2017
There have been at least one more clang roll since then; reverting this a month later isn't ok. Why did it take a month to find this? Is this on a bot or just in a local build? Do we still do 32bit builds on iOS?
,
Jul 17 2017
Also, the android 32bit arm bots look happy -- are you sure this isn't a code bug?
,
Jul 17 2017
It took so long because this configuration isn't tested on the bots and it was a fluke that we found it at all. See crbug.com/742565 for details about working to prevent this in the future. We do still ship to 32-bit because we can't control that directly. We control our minimum iOS version and Apple decides how that correlates to supported devices/processors. I don't think this is a specific bug in //ios code, but I'm not sure the exact cause of it. Here are some additional details about the crash: The current() call in the DCHECK references a base::ThreadLocalPointer<MessageLoop> which holds a base::ThreadLocalStorage. It appears that the ThreadLocalStorage pointer isn't holing the pointer even though set is called. + jam@ as //base/threading owner.
,
Jul 17 2017
(To be clear: This doesn't affect the chrome/iOS that you ship, since that uses xcode clang. If you found this just due to a fluke, this likely isn't p-1)
,
Jul 18 2017
justincohen, here's another suspected breakage due to a clang roll. Does the ios tot bot show this failure? (I forgot its url by now :-/)
,
Jul 19 2017
Unfortunately it's failing for a different reason, just opened https://critique.corp.google.com/#review/162488470 Url is https://uberchromegw.corp.google.com/i/internal.bling.fyi/builders/clang-tot-device
,
Jul 31 2017
Do we have any update here?
,
Aug 1 2017
This does not affect production so I am removing the RBS label. This is however blocking developers from being able to debug on 32-bit devices locally, so it is still high priority to investigate what's going on and to fix it.
,
Aug 1 2017
justincohen, could you look at this? the tot bot looks happy, but it's also doing 64-bit release, so it wouldn't have caught this.
,
Aug 1 2017
thakis@, do base_unittests run on an android 32 bit bot per comment #7? I can't find this bot, but I'm not sure I'm looking in all the right places. I looked into this a bit more today and here is what I found: The issue is inside base/threading/thread_local_storage.cc: First ThreadLocalStorage::StaticSlot::Set(void* value) gets called, then when ThreadLocalStorage::StaticSlot::Get() is called later on, tls_data is null, and ConstructTlsVector() is called again. Of course, since the TlsVectorEntry is recreated, it doesn't find the data that was stored inside the Set call.
,
Aug 2 2017
No. We only have a single tot bot atm. Getting more is blocked on the public cert work.
,
Aug 3 2017
,
Aug 3 2017
thakis@ it turns out chrome doesn't run on 32bit devices at all with the latest tot clang. I'll try and bisect a bit.
,
Aug 3 2017
Justin, do you want to bisect clang? Chrome breakage happened after compiler roll: https://bugs.chromium.org/p/chromium/issues/detail?id=742563#c5
,
Aug 3 2017
bisecting: bad: 304596 good: 304253
,
Aug 3 2017
Current bisect window, from justincohen: fail: 304274 good: 304253 svn log -r304253:304274 https://nico@llvm.org/svn/llvm-project Since this is arm32-ios-specific, this looks like the most likely one to me: r304267 | matze | 2017-05-30 21:21:35 -0400 (Tue, 30 May 2017) | 13 lines ARM: Fix cmpxchg O0 expansion This is the equivalent of r304048 for ARM: - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these.
,
Aug 3 2017
,
Aug 3 2017
Hit send too soon. I confirmed that rL304267 reproduces, and rL304266 does not, building base_unittests on arm32.
,
Aug 3 2017
Thanks! Filed https://bugs.llvm.org/show_bug.cgi?id=34056 upstream.
,
Aug 4 2017
making bug public
,
Aug 5 2017
I postead some comments on the llvm bug report. I am basically unable to pinpoint a bug or something actionable from the information provided. I'd appreciate some more information that helps reproducing the issue. Assembly diffs, preprocessed code[1] would be ideal. Alternatively at least some step by step instructions for someone not familiar with Chromium to reproduce the issue. [1] So that I can build the same binary without figuring out how the chromium build process works and what exact SDK/libc++ versions are at play.
,
Aug 7 2017
I managed to repro locally. It DCHECKs when running WeakPtrDeathTest.WeakPtrCopyDoesNotChangeThreadBinding . I'll try to reduce it a bit. Will post to llvm bug once I have something standalone-ish.
,
Aug 7 2017
I'm down to a 339K binary with ~20 files (https://chromium-review.googlesource.com/604687). It's single-threaded. The bkpt 0 here gets hit: #include "base/test/test_support_ios.h" // nogncheck #include "base/threading/thread_local.h" class MyClass; base::ThreadLocalPointer<MyClass>* MyGetTLSMessageLoop() { static auto* lazy_tls_ptr = new base::ThreadLocalPointer<MyClass>(); return lazy_tls_ptr; } class MyClass { public: void f() { MyGetTLSMessageLoop()->Set(this); if (this != MyGetTLSMessageLoop()->Get()) asm("bkpt 0"); } }; void do_the_thing() { MyClass c; c.f(); } int main(int argc, char* argv[]) { base::RunTestsFromIOSApp(argc, argv); }
,
Aug 7 2017
(But I think the miscompilation is somewhere in the base::ThreadLocalPointer code)
,
Aug 8 2017
I added a fairly reduced repro to the llvm bug.
,
Aug 9 2017
,
Aug 15 2017
This should be better now. Please verify. Major thanks to justincohen for bisecting!
,
Aug 15 2017
I just verified that this is fixed and I can run on 32-bit device. Thank you all! |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by michaeldo@chromium.org
, Jul 13 2017