some test binaries fail to --gtest_list_tests on linux_android_rel_ng with clang enabled |
||||||
Issue descriptionhttps://codereview.chromium.org/2901093007/ enables clang by default on Android. On linux_android_rel_ng (but not, say, android_n5x_swarming_rel), several test binaries fail to even start. I focused on content_unittests (out/gnandrel/bin/run_content_unittests --gtest_filter=AppCacheTest.CleanupUnusedCache). If I build that binary locally (at the same revision, same gn args), I can't reproduce the crash. If I download the binary that the bot build and run that locally, I can reproduce. Now the weird part: To check where things go wrong, I unpacked the .apk file from the bot and copied bits into my local build directory and rebuilt my local apk. My local apk still passes if I copy over all the .so files and classes.dex from the bot, while the apk from the bot still crashes. (I tried copying the full bot apk into my local build dir; that way the "local" build crashes too, so it's not due to test runner scripts.) I unpacked both apks, and diffed the two resulting directories. The only files that differ are META-INF/{CHROMIUM.RSA,CHROMIUM.SF,MANIFEST.MF}. MANIFEST.MF from the bot has Created-By: 1.8.0_45-internal (Oracle Corporation) while my local one has Created-By: 1.8.0_112-google-v7 (Google Inc.) and other than that they're identical. The bot's CHROMIUM.SF starts Signature-Version: 1.0 SHA1-Digest-Manifest-Main-Attributes: TN5zBsqBLAij6alOeMWe+Ejwd4g= SHA1-Digest-Manifest: CZX3E7GkuMRCQpgcUx3D4Tcimqs= Created-By: 1.8.0_45-internal (Oracle Corporation) while mine starts Signature-Version: 1.0 Created-By: 1.8.0_112-google-v7 (Google Inc.) SHA1-Digest-Manifest: Vjhonb+T3qk3ILeTJPStyl5zEOM= SHA1-Digest-Manifest-Main-Attributes: XtJNzAw+PYdbqmb7DnD2wWb5eJI= (I haven't compared the .RSA file, figuring it's some signature.) The disassembly for my locally built lib_content_unittests__library.so is identical to that of the bot, so I tried symbolizing the stack I get when I run the bot binary locally using my locally-built symbols. I get this stack, which looks like it makes sense: thakis@thakis:~/src/chrome/src$ third_party/android_platform/development/scripts/stack --output-directory=out/gnandrel stack.txt Searching for native crashes in: /usr/local/google/home/thakis/src/chrome/src/stack.txt Reading Android symbols from: /usr/local/google/home/thakis/src/chrome/src Searching for Chrome symbols from within: /usr/local/google/home/thakis/src/chrome/src/out/gnandrel/lib.unstripped:/usr/local/google/home/thakis/src/chrome/src/out/gnandrel/lib:/usr/local/google/home/thakis/src/chrome/src/out/gnandrel Find ABI:arm Using toolchain from: /usr/local/google/home/thakis/src/chrome/src/third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- pid: 14716, tid: 14716, name: st:test_process >>> org.chromium.native_test:test_process <<< signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x94155dc6 r0 94155dba r1 00000000 r2 0021bc9c r3 00002d51 r4 beec6074 r5 beec616c r6 0021bc9c r7 beec6074 r8 00000000 r9 00000000 sl beec61c4 fp aa9fde68 ip 985d9af8 sp beec6050 lr 9682711f pc 968273c6 Stack Trace: RELADDR FUNCTION FILE:LINE v------> GetHeaderValue /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/serializer-common.h:280 v------> Payload /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/snapshot-common.cc:222 01d7a3c6 Deserializer<v8::internal::SnapshotData> /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/deserializer.h:35 01d7a11b Initialize /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/snapshot-common.cc:43 017be6b1 IsolateNewImpl /usr/local/google/home/thakis/src/chrome/src/v8/src/api.cc:8367 00b08311 IsolateHolder /usr/local/google/home/thakis/src/chrome/src/gin/isolate_holder.cc:54 01f0cf25 V8PerIsolateData /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/platform/bindings/V8PerIsolateData.cpp:55 01f0d13d Initialize /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/platform/bindings/V8PerIsolateData.cpp:82 02624f97 InitializeMainThread /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/bindings/core/v8/V8Initializer.cpp:442 021802a7 Initialize /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/web/WebKit.cpp:84 021e305b TestBlinkWebUnitTestSupport /usr/local/google/home/thakis/src/chrome/src/content/test/test_blink_web_unit_test_support.cc:134 021ddaa9 UnitTestTestSuite /usr/local/google/home/thakis/src/chrome/src/content/public/test/unittest_test_suite.cc:34 0092b29b main /usr/local/google/home/thakis/src/chrome/src/content/test/run_all_unittests.cc:14 And indeed, https://cs.chromium.org/chromium/src/v8/src/snapshot/serializer-common.h?q=serializer-common.h+package:%5Echromium$&dr&l=280 has a (pointless) invalid-looking cast to int* at an unaligned address. So that might be the reason. However, I don't understand why I'm not getting that crash when I run my locally-built executable with all the binaries (snapshot.bin included) copied over from the bot. (cc'ing people who might be interested in this; I'll just send a patch to v8 and hope that it helps.)
,
Jun 2 2017
(...and the bot not providing symbolized stacks is issue 657040 )
,
Jun 2 2017
Wild theory for why this only repros with the bot apk: Maybe the byte difference in the META-INF shifts the snapshot around a bit in the apk zip, and so the alignment of the snapshot is different. (Shouldn't zipalign take care of that though?)
,
Jun 2 2017
That cast looks indeed pointless: the memcpy is well intended, but given the crash I wonder if clang is doing some smart optimization of the form "if you cast to int* it means that you are sure that this is sizeof(int)-aligned, so I am not going to memcpy but I am just going to emit a mov". and then the SIGBUS-BUS_ADRALN follows. As regards the "why doesn't happen on my machine": IIRC the snapshot is mmaped from the apk. So I guess in your local case you have been very lucky and the layout of the apk is such that the offset of the v8 snapshot in the apk + the load offset that gets passed as argument to GetHeaderValue() happens to be int-aligned.
,
Jun 2 2017
thakis@thakis:~/src/chrome/src$ python printzip.py out/gnandrel/content_unittests_apk/content_unittests-debug.apk META-INF/MANIFEST.MF 0 META-INF/CHROMIUM.SF 15184 META-INF/CHROMIUM.RSA 30481 AndroidManifest.xml 31538 assets/natives_blob.bin 33131 assets/snapshot_blob_32.bin 265583 classes.dex 2534880 lib/armeabi-v7a/lib_content_unittests__library.so 5363571 lib/armeabi-v7a/libosmesa.so 37189298 resources.arsc 38368798 thakis@thakis:~/src/chrome/src$ python printzip.py work/out/Release/content_unittests_apk/content_unittests-debug.apk META-INF/MANIFEST.MF 0 META-INF/CHROMIUM.SF 15188 META-INF/CHROMIUM.RSA 30490 AndroidManifest.xml 31548 assets/natives_blob.bin 33141 assets/snapshot_blob_32.bin 265593 classes.dex 2534890 lib/armeabi-v7a/lib_content_unittests__library.so 5363581 lib/armeabi-v7a/libosmesa.so 37189308 resources.arsc 38368354 (the latter one is the apk I got from the bot, using `tools/swarming_client/swarming.py reproduce -S chromium-swarm.appspot.com 36784150e5362910`) So it looks like the files are not 4-byte aligned. Now I'm confused what `zipalign` is supposed to do, but it looks like my theory might be true.
,
Jun 2 2017
What is printzip.py? I think I just hacked up the same in the meantime https://gist.github.com/primiano/7556d7df5dd9ace60dbe1164f2ed24b2 Can you have a try on both apks?
,
Jun 2 2017
Oh sorry I meant to paste this bit here too: thakis@thakis:~/src/chrome/src$ cat printzip.py from zipfile import ZipFile import sys myzip = ZipFile(sys.argv[1], 'r') for info in myzip.infolist(): if 'res/' in info.filename: continue print info.filename, info.header_offset
,
Jun 2 2017
Ah quite similar, but you are printing out the address of the per-file zip header, not the address of the content. see https://gist.github.com/primiano/7556d7df5dd9ace60dbe1164f2ed24b2 :) (don't ask me about that +2, but seems to work by doing some manual checks via xxd)
,
Jun 2 2017
Output with your thing: thakis@thakis:~/src/chrome/src$ python printzip.py out/gnandrel/content_unittests_apk/content_unittests-debug.apk 34 15134 META-INF/MANIFEST.MF 3b84 15247 META-INF/CHROMIUM.SF 7746 1006 META-INF/CHROMIUM.RSA 7b65 1544 AndroidManifest.xml 81a2 232399 assets/natives_blob.bin 40daa 2269240 assets/snapshot_blob_32.bin 26ae0b 2828650 classes.dex 51d7c4 31825648 lib/armeabi-v7a/lib_content_unittests__library.so 23776ee 883357 lib/armeabi-v7a/libosmesa.so 249764c 468248 resources.arsc thakis@thakis:~/src/chrome/src$ python printzip.py work/out/Release/content_unittests_apk/content_unittests-debug.apk 34 15138 META-INF/MANIFEST.MF 3b88 15252 META-INF/CHROMIUM.SF 774f 1007 META-INF/CHROMIUM.RSA 7b6f 1544 AndroidManifest.xml 81ac 232399 assets/natives_blob.bin 40db4 2269240 assets/snapshot_blob_32.bin 26ae15 2828650 classes.dex 51d7ce 31825648 lib/armeabi-v7a/lib_content_unittests__library.so 23776f8 883357 lib/armeabi-v7a/libosmesa.so 2497490 468248 resources.arsc snapshot_blob_32 has offset 10 vs 4 (or, without the +2, 8 vs 2 -- and the 2 looks definitely weird)
,
Jun 2 2017
right, the snapshot in the first apk is aligned@2 and would cause a sigbus. the snapshot in the 2nd apk is aligned@4 and will work by luck.
,
Jun 2 2017
not sure I get the " without the +2". my reasoning was: 0x40daa % 4 = 2 --> SIGBUS 0x40db4 % 4 = 9 --> quietly works
,
Jun 2 2017
The aa is the one that works and the b4 is the one that crashes :-) (4.3.7 here https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE%2820120901%29_Version_6.3.3.txt explains the +2, the header before the filename is 2 mod 4; assuming extra_info is empty)
,
Jun 2 2017
zipalign docs: "The adjustment is made by altering the size of the "extra" field in the zip Local File Header sections. Existing data in the "extra" fields may be altered by this process.". So i guess I should dump that "extra" size.
,
Jun 2 2017
Better tools:
thakis@thakis:~/src/chrome/src$ third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 out/gnandrel/content_unittests_apk/content_unittests-debug.apk | grep -v res/
Verifying alignment of out/gnandrel/content_unittests_apk/content_unittests-debug.apk (4)...
50 META-INF/MANIFEST.MF (OK - compressed)
15234 META-INF/CHROMIUM.SF (OK - compressed)
30532 META-INF/CHROMIUM.RSA (OK - compressed)
31587 AndroidManifest.xml (OK - compressed)
33184 assets/natives_blob.bin (OK)
265640 assets/snapshot_blob_32.bin (OK)
2534921 classes.dex (OK - compressed)
5363650 lib/armeabi-v7a/lib_content_unittests__library.so (OK - compressed)
37189356 lib/armeabi-v7a/libosmesa.so (OK - compressed)
38368844 resources.arsc (OK)
Verification succesful
thakis@thakis:~/src/chrome/src$ third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 work/out/Release/content_unittests_apk/content_unittests-debug.apk | grep -v res/
Verifying alignment of work/out/Release/content_unittests_apk/content_unittests-debug.apk (4)...
50 META-INF/MANIFEST.MF (OK - compressed)
15238 META-INF/CHROMIUM.SF (OK - compressed)
30541 META-INF/CHROMIUM.RSA (OK - compressed)
31597 AndroidManifest.xml (OK - compressed)
33194 assets/natives_blob.bin (BAD - 2)
265650 assets/snapshot_blob_32.bin (BAD - 2)
2534931 classes.dex (OK - compressed)
5363660 lib/armeabi-v7a/lib_content_unittests__library.so (OK - compressed)
37189366 lib/armeabi-v7a/libosmesa.so (OK - compressed)
38368398 resources.arsc (BAD - 2)
Verification FAILED
See BAD -- crazy! How does the bot end up with a badly aligned zip??
,
Jun 2 2017
Hm, this kind of looks immediately questionable: https://cs.chromium.org/chromium/src/build/android/gyp/finalize_apk.py?type=cs&q=finalize.apk.py+package:%5Echromium$&l=97
,
Jun 2 2017
v8-side patch: https://codereview.chromium.org/2915323002/ Try jobs for patch set 2 at https://codereview.chromium.org/2901093007/ check if the _NormalizeZip() call is to blame.
,
Jun 2 2017
agrieve, primiano says you might be interested to learn that our apks aren't zipaligned on at least one bot. (I believe that's already the case and we just don't notice because gcc doesn't happen to produce an "load word" instruction for the lhs of https://codereview.chromium.org/2915323002/ whiel clang does, which is why I'm only seeing this on my try job even though the zip alignment is probably already bad)
,
Jun 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/v8/v8.git/+/13af45c91dac27dffc7d25ca902fb81630b49ca4 commit 13af45c91dac27dffc7d25ca902fb81630b49ca4 Author: thakis <thakis@chromium.org> Date: Fri Jun 02 19:48:50 2017 v8: Fix unaligned access when deserializing snapshots. The code was already careful to use memcpy() here, but then it added needless casts to wider types that made the compiler think that it can do 4-byte-aligned accesses when it couldn't. (It's also a bug that the snapshot got loaded at an unaligned address, but we can fix both bugs.) BUG= chromium:729059 Review-Url: https://codereview.chromium.org/2915323002 Cr-Commit-Position: refs/heads/master@{#45698} [modify] https://crrev.com/13af45c91dac27dffc7d25ca902fb81630b49ca4/src/snapshot/serializer-common.h
,
Jun 2 2017
Patch set 2 also had these errors on the trybot (https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/309488), so it's probably not due to _NormalizeZip(). From that run: python tools/swarming_client/isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s 7c6a08dae3ce879740d907bc47bf2ddbb92dcf45 --target foo third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 foo/out/Release/components_unittests_apk/components_unittests-debug.apk | grep -v res/ Verifying alignment of foo/out/Release/components_unittests_apk/components_unittests-debug.apk (4)... 50 META-INF/MANIFEST.MF (OK - compressed) 24958 META-INF/CHROMIUM.SF (OK - compressed) 50093 META-INF/CHROMIUM.RSA (OK - compressed) 51148 AndroidManifest.xml (OK - compressed) 52745 assets/natives_blob.bin (BAD - 1) 285201 assets/snapshot_blob_32.bin (BAD - 1) 2554498 classes.dex (OK - compressed) 3854235 classes2.dex (OK - compressed) 6077255 lib/armeabi-v7a/lib_components_unittests__library.so (OK - compressed) 41685793 lib/armeabi-v7a/libosmesa.so (OK - compressed) 43052759 resources.arsc (BAD - 3) Verification FAILED So for some reason that bot creates badly aligned zip files.
,
Jun 5 2017
I'm pretty sure the culprit here is this gem: https://cs.chromium.org/chromium/src/tools/determinism/remove_build_metadata.py?q=remove_build_metadata.py&sq=package:chromium&l=80 I tried running zipalign -c on a locally built Chrome.apk and it all checked out. I then ran remove_build_metadata.py on it, and it came out non-zipaligned. I'd guess we could just delete this logic, as our build system does already produce timestamp-normalized apks. John - do you happen to know of any reason we might still want this logic?
,
Jun 5 2017
Yeah, this lets us achieve idempotency of build products. Swarming deduplication wouldn't work for APKs without it.
,
Jun 5 2017
I believe that since this was added: https://cs.chromium.org/chromium/src/build/android/gyp/finalize_apk.py?type=cs&q=finalize.apk.py+package:%5Echromium$&l=97 GN-based android_apk() create idempotent zip files. Assuming we only care about uploading .apks built from GN, it should be safe to remove.
,
Jun 5 2017
ah, I'd either missed or forgotten about that. In that case, yeah, removing remove_zip_timestamps should be fine. We don't care about non-GN .apks.
,
Jun 6 2017
,
Jun 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/28d5fba595834d633f36b0613ecdd7300a10f267 commit 28d5fba595834d633f36b0613ecdd7300a10f267 Author: agrieve <agrieve@chromium.org> Date: Tue Jun 06 19:29:07 2017 swarming: Don't normalize .apk files (breaks alignment) The normalization is no longer necessary, as ninja now generates .apk files with zeroed out timestamps. The normalization turned out to be actively harmful, as it was not maintaining alignment of zip entries. BUG= 729059 Review-Url: https://codereview.chromium.org/2930483002 Cr-Commit-Position: refs/heads/master@{#477372} [modify] https://crrev.com/28d5fba595834d633f36b0613ecdd7300a10f267/tools/determinism/remove_build_metadata.py
,
Jun 7 2017
,
Jun 7 2017
,
Jun 10 2017
Thanks for the fix! I filed issue 732069 for investigating why content_browsertests still requires an aligned apk. Also, I filed issue 732071 about scripts modifying built products in-place after the fact.
,
Sep 10
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by thakis@chromium.org
, Jun 2 2017