New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 742563 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 1
Type: Bug

Blocked on:
issue 753944



Sign in to add a comment

base_unittests DCHECKs on 32-bit iOS device

Project Member Reported by michaeldo@chromium.org, Jul 13 2017

Issue description

Chrome Version: 61
OS: iOS9
Tested against: 32-bit (armv7) iOS 9.2.1 iPod Touch (Model PD749LL/A)

What steps will reproduce the problem?
(1) Run base_unittests

What is the expected result?
Test suite should run without DCHECKing

What happens instead?
DCHECK hits in base/message_loop/message_loop.cc at line 378.
Ex: [493:22019:0713/145400.881873:479292871252:FATAL:message_loop.cc(378)] Check failed: this == current() (0x1569ae90 vs. 0x0)

I'm still working to bisect this completely, but so far I have narrowed down the introduction between 3505cba91e5a (LKGR) and e7ba6263e5d7 (unittests DCHECK on this revision.)
 
I just finished the bisect of ios_internal:

e69ff8c4c is the Chromium roll commit that introduced the problem, so 79a50681a is the last known good downstream revision.

Roll CL:
https://chromereviews.googleplex.com/594037014

I'll continue to bisect the upstream commits. Here are the upstream commits contained in the roll:
bff5dcfbfafd Correctly update CWVWebView properties on state restoration.
eed319f1ecb6 Add tools/sublime/OWNERS
ff3e4a87dc25 Introduce ResourceFinishObserver
34174075ba5f Improve docstrings and comments in webkitpy/w3c.
05ba7bc22678 Extensions: Move targets in extensions/browser/guest_view to extensions/browser:browser_sources.
adeae69afea9 CSS: distance and rotate optional in 'offset'
75af3a8ba9bd Hook up Reader Mode InfoBar
05227511e9c0 Remove PostAccessibilityNotification from ChromeClient.
2b2fdbb4b96e [Android Webview] Refactor LoadUrlTest and work on embedded_test_server custom handler to support the java code.
fb6d187db2e8 Auto-always/never can at most triggered twice (new translate infobar)
7c0adb62e021 [MDC roll] Roll Material Components to 6d5844642b004594ba1607916032616f85c7c045
fdf158da5c17 Mojo: Introduce more new message APIs
95467ed22705 media: Implement MediaCodecVideoDecoder decoder IO
f87bf36bc630 Run sanbox_linux_unittests & breakpad_unittests on only one N5X device.
dd1e509ada57 Cache protected password entry and password on focus ping separately.
d0b70c4e593e Add //third_party/boringssl as a public dep of //net
23fe0e721fae Implementation of translation event logging.
ffb480444a84 Fold ContentMenuClientImpl into ContextMenuClient.
37df2ae52914 Disable flaky DownloadExtensionTest_FileIcon_Active.
1a6cf45a50e5 Disable flaky ProcessMemoryMetricsEmitterTest.FetchXXX tests.
fd4ba279668f Omnibox UI Experiments: Add feature flags for 3 URL elision experiments.
3a26e05c70e6 Roll clang 303910:305281.
1799a1223112 Change linux default hidden file save directory to XDG_DATA_HOME
Cc: eugene...@chromium.org

Comment 3 by pkl@chromium.org, Jul 17 2017

Owner: michaeldo@chromium.org
Status: Assigned (was: Untriaged)
Who can own this? michaeldo: Please find an appropriate owner.
Yes, assigned to me to fine. I am almost done bisecting the above revisions. Then I will re-assign based on my findings.
Cc: h...@chromium.org thakis@chromium.org
It looks like the clang roll is the issue. I can fix it the DCHECK by reverting the following commit locally on top of HEAD.

3a26e05c70e63815c7c18284e5e006b9d0ac75af Roll clang 303910:305281.

The clang update script says it's OK to revert rolls, but given this is a commit from a month ago, I'll ask the owners specifically.

hans@, thakis@:
The clang roll to 305281 breaks 32-bit iOS devices by DCHECKing as seen in #0 above. Should I revert CLANG_REVISION to 303910 or do one of you want to take a look at the failure first?

Comment 6 by thakis@chromium.org, Jul 17 2017

There have been at least one more clang roll since then; reverting this a month later isn't ok.

Why did it take a month to find this? Is this on a bot or just in a local build? Do we still do 32bit builds on iOS?

Comment 7 by thakis@chromium.org, Jul 17 2017

Also, the android 32bit arm bots look happy -- are you sure this isn't a code bug?
Cc: jam@chromium.org
It took so long because this configuration isn't tested on the bots and it was a fluke that we found it at all. See crbug.com/742565 for details about working to prevent this in the future.

We do still ship to 32-bit because we can't control that directly. We control our minimum iOS version and Apple decides how that correlates to supported devices/processors.

I don't think this is a specific bug in //ios code, but I'm not sure the exact cause of it. Here are some additional details about the crash:
The current() call in the DCHECK references a base::ThreadLocalPointer<MessageLoop> which holds a base::ThreadLocalStorage. It appears that the ThreadLocalStorage pointer isn't holing the pointer even though set is called.

+ jam@ as //base/threading owner.

Comment 9 by thakis@chromium.org, Jul 17 2017

(To be clear: This doesn't affect the chrome/iOS that you ship, since that uses xcode clang. If you found this just due to a fluke, this likely isn't p-1)
Cc: justincohen@chromium.org
justincohen, here's another suspected breakage due to a clang roll. Does the ios tot bot show this failure? (I forgot its url by now :-/)
Do we have any update here?
Labels: -ReleaseBlock-Stable
This does not affect production so I am removing the RBS label. This is however blocking developers from being able to debug on 32-bit devices locally, so it is still high priority to investigate what's going on and to fix it.
justincohen, could you look at this? the tot bot looks happy, but it's also doing 64-bit release, so it wouldn't have caught this.
thakis@, do base_unittests run on an android 32 bit bot per comment #7? I can't find this bot, but I'm not sure I'm looking in all the right places.


I looked into this a bit more today and here is what I found:

The issue is inside base/threading/thread_local_storage.cc:

First ThreadLocalStorage::StaticSlot::Set(void* value) gets called, then when ThreadLocalStorage::StaticSlot::Get() is called later on, tls_data is null, and ConstructTlsVector() is called again.

Of course, since the TlsVectorEntry is recreated, it doesn't find the data that was stored inside the Set call.
No. We only have a single tot bot atm. Getting more is blocked on the public cert work.
Cc: kkhorimoto@chromium.org
thakis@ it turns out chrome doesn't run on 32bit devices at all with the latest tot clang.

I'll try and bisect a bit.
Justin, do you want to bisect clang? Chrome breakage happened after compiler roll: https://bugs.chromium.org/p/chromium/issues/detail?id=742563#c5
bisecting: 

bad: 304596
good: 304253
Current bisect window, from justincohen:

fail: 304274
good: 304253

 svn log -r304253:304274 https://nico@llvm.org/svn/llvm-project


Since this is arm32-ios-specific, this looks like the most likely one to me:

r304267 | matze | 2017-05-30 21:21:35 -0400 (Tue, 30 May 2017) | 13 lines

ARM: Fix cmpxchg O0 expansion

This is the equivalent of r304048 for ARM:

- Rewrite livein calculation to use the computeLiveIns() helper
  function. This is slightly less efficient but easier to reason about
  and doesn't unnecessarily add pristine and reserved registers[1]
- Zero the status register at the beginning of the loop to make sure it
  has a defined value.
- Remove kill flags of values that need to stay alive throughout the loop.

[1] An upcoming commit of mine will tighten the MachineVerifier to catch
    these.

Current bisect window, from justincohen:
r304263 good
r304267 bad
Owner: thakis@chromium.org
Appears to be https://reviews.llvm.org/rL304267
Hit send too soon.  I confirmed that rL304267 reproduces, and rL304266 does not, building base_unittests on arm32.
Thanks! Filed https://bugs.llvm.org/show_bug.cgi?id=34056 upstream.
Labels: -Restrict-View-Google
making bug public
I postead some comments on the llvm bug report. I am basically unable to pinpoint a bug or something actionable from the information provided. I'd appreciate some more information that helps reproducing the issue. Assembly diffs, preprocessed code[1] would be ideal. Alternatively at least some step by step instructions for someone not familiar with Chromium to reproduce the issue.

[1] So that I can build the same binary without figuring out how the chromium build process works and what exact SDK/libc++ versions are at play.
I managed to repro locally. It DCHECKs when running WeakPtrDeathTest.WeakPtrCopyDoesNotChangeThreadBinding . I'll try to reduce it a bit. Will post to llvm bug once I have something standalone-ish.
I'm down to a 339K binary with ~20 files (https://chromium-review.googlesource.com/604687). It's single-threaded. The bkpt 0 here gets hit:

#include "base/test/test_support_ios.h"  // nogncheck
#include "base/threading/thread_local.h"

class MyClass;

base::ThreadLocalPointer<MyClass>* MyGetTLSMessageLoop() {
  static auto* lazy_tls_ptr = new base::ThreadLocalPointer<MyClass>();
  return lazy_tls_ptr;
}

class MyClass {
public:
  void f() {
    MyGetTLSMessageLoop()->Set(this);
    if (this != MyGetTLSMessageLoop()->Get())
      asm("bkpt 0");
  }
};

void do_the_thing() {
  MyClass c;
  c.f();
}

int main(int argc, char* argv[]) {
  base::RunTestsFromIOSApp(argc, argv);
}
(But I think the miscompilation is somewhere in the base::ThreadLocalPointer code)
I added a fairly reduced repro to the llvm bug.
Blockedon: 753944
Status: Fixed (was: Assigned)
This should be better now. Please verify.

Major thanks to justincohen for bisecting!
I just verified that this is fixed and I can run on 32-bit device.

Thank you all!

Sign in to add a comment