New issue
Advanced search Search tips

Issue 729059 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug

Blocking:
issue 481675
issue 882231



Sign in to add a comment

some test binaries fail to --gtest_list_tests on linux_android_rel_ng with clang enabled

Project Member Reported by thakis@chromium.org, Jun 2 2017

Issue description

https://codereview.chromium.org/2901093007/ enables clang by default on Android. On linux_android_rel_ng (but not, say, android_n5x_swarming_rel), several test binaries fail to even start.

I focused on content_unittests (out/gnandrel/bin/run_content_unittests --gtest_filter=AppCacheTest.CleanupUnusedCache).


If I build that binary locally (at the same revision, same gn args), I can't reproduce the crash. If I download the binary that the bot build and run that locally, I can reproduce. Now the weird part: To check where things go wrong, I unpacked the .apk file from the bot and copied bits into my local build directory and rebuilt my local apk. My local apk still passes if I copy over all the .so files and classes.dex from the bot, while the apk from the bot still crashes. (I tried copying the full bot apk into my local build dir; that way the "local" build crashes too, so it's not due to test runner scripts.)

I unpacked both apks, and diffed the two resulting directories. The only files that differ are META-INF/{CHROMIUM.RSA,CHROMIUM.SF,MANIFEST.MF}.

MANIFEST.MF from the bot has
Created-By: 1.8.0_45-internal (Oracle Corporation)
while my local one has
Created-By: 1.8.0_112-google-v7 (Google Inc.)
and other than that they're identical.

The bot's CHROMIUM.SF starts
Signature-Version: 1.0
SHA1-Digest-Manifest-Main-Attributes: TN5zBsqBLAij6alOeMWe+Ejwd4g=
SHA1-Digest-Manifest: CZX3E7GkuMRCQpgcUx3D4Tcimqs=
Created-By: 1.8.0_45-internal (Oracle Corporation)

while mine starts
Signature-Version: 1.0
Created-By: 1.8.0_112-google-v7 (Google Inc.)
SHA1-Digest-Manifest: Vjhonb+T3qk3ILeTJPStyl5zEOM=
SHA1-Digest-Manifest-Main-Attributes: XtJNzAw+PYdbqmb7DnD2wWb5eJI=

(I haven't compared the .RSA file, figuring it's some signature.)


The disassembly for my locally built lib_content_unittests__library.so is identical to that of the bot, so I tried symbolizing the stack I get when I run the bot binary locally using my locally-built symbols. I get this stack, which looks like it makes sense:

thakis@thakis:~/src/chrome/src$ third_party/android_platform/development/scripts/stack --output-directory=out/gnandrel stack.txt
Searching for native crashes in: /usr/local/google/home/thakis/src/chrome/src/stack.txt
Reading Android symbols from: /usr/local/google/home/thakis/src/chrome/src
Searching for Chrome symbols from within: /usr/local/google/home/thakis/src/chrome/src/out/gnandrel/lib.unstripped:/usr/local/google/home/thakis/src/chrome/src/out/gnandrel/lib:/usr/local/google/home/thakis/src/chrome/src/out/gnandrel
Find ABI:arm
Using toolchain from: /usr/local/google/home/thakis/src/chrome/src/third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-
pid: 14716, tid: 14716, name: st:test_process  >>> org.chromium.native_test:test_process <<<
signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x94155dc6
     r0 94155dba  r1 00000000  r2 0021bc9c  r3 00002d51
     r4 beec6074  r5 beec616c  r6 0021bc9c  r7 beec6074
     r8 00000000  r9 00000000  sl beec61c4  fp aa9fde68
     ip 985d9af8  sp beec6050  lr 9682711f  pc 968273c6

Stack Trace:
  RELADDR   FUNCTION                                  FILE:LINE
  v------>  GetHeaderValue                            /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/serializer-common.h:280
  v------>  Payload                                   /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/snapshot-common.cc:222
  01d7a3c6  Deserializer<v8::internal::SnapshotData>  /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/deserializer.h:35
  01d7a11b  Initialize                                /usr/local/google/home/thakis/src/chrome/src/v8/src/snapshot/snapshot-common.cc:43
  017be6b1  IsolateNewImpl                            /usr/local/google/home/thakis/src/chrome/src/v8/src/api.cc:8367
  00b08311  IsolateHolder                             /usr/local/google/home/thakis/src/chrome/src/gin/isolate_holder.cc:54
  01f0cf25  V8PerIsolateData                          /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/platform/bindings/V8PerIsolateData.cpp:55
  01f0d13d  Initialize                                /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/platform/bindings/V8PerIsolateData.cpp:82
  02624f97  InitializeMainThread                      /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/bindings/core/v8/V8Initializer.cpp:442
  021802a7  Initialize                                /usr/local/google/home/thakis/src/chrome/src/third_party/WebKit/Source/web/WebKit.cpp:84
  021e305b  TestBlinkWebUnitTestSupport               /usr/local/google/home/thakis/src/chrome/src/content/test/test_blink_web_unit_test_support.cc:134
  021ddaa9  UnitTestTestSuite                         /usr/local/google/home/thakis/src/chrome/src/content/public/test/unittest_test_suite.cc:34
  0092b29b  main                                      /usr/local/google/home/thakis/src/chrome/src/content/test/run_all_unittests.cc:14



And indeed, https://cs.chromium.org/chromium/src/v8/src/snapshot/serializer-common.h?q=serializer-common.h+package:%5Echromium$&dr&l=280 has a (pointless) invalid-looking cast to int* at an unaligned address.

So that might be the reason.

However, I don't understand why I'm not getting that crash when I run my locally-built executable with all the binaries (snapshot.bin included) copied over from the bot.

(cc'ing people who might be interested in this; I'll just send a patch to v8 and hope that it helps.)
 
I also tried sending the change to the unswarmed bot in hopes to get a stack there, but there this particular problem also doesn't happen (https://build.chromium.org/p/tryserver.chromium.android/builders/android_unswarmed_n5_rel/builds/12).
(...and the bot not providing symbolized stacks is  issue 657040 )
Wild theory for why this only repros with the bot apk: Maybe the byte difference in the META-INF shifts the snapshot around a bit in the apk zip, and so the alignment of the snapshot is different. (Shouldn't zipalign take care of that though?)
That cast looks indeed pointless: the memcpy is well intended, but given the crash I wonder if clang is doing some smart optimization of the form "if you cast to int* it means that you are sure that this is sizeof(int)-aligned, so I am not going to memcpy but I am just going to emit a mov". and then the SIGBUS-BUS_ADRALN follows.

As regards the "why doesn't happen on my machine": IIRC the snapshot is mmaped from the apk. So I guess in your local case you have been very lucky and the layout of the apk is such that the offset of the v8 snapshot in the apk + the load offset that gets passed as argument to GetHeaderValue() happens to be int-aligned.

thakis@thakis:~/src/chrome/src$ python printzip.py out/gnandrel/content_unittests_apk/content_unittests-debug.apk 
META-INF/MANIFEST.MF 0
META-INF/CHROMIUM.SF 15184
META-INF/CHROMIUM.RSA 30481
AndroidManifest.xml 31538
assets/natives_blob.bin 33131
assets/snapshot_blob_32.bin 265583
classes.dex 2534880
lib/armeabi-v7a/lib_content_unittests__library.so 5363571
lib/armeabi-v7a/libosmesa.so 37189298
resources.arsc 38368798
thakis@thakis:~/src/chrome/src$ python printzip.py work/out/Release/content_unittests_apk/content_unittests-debug.apk 
META-INF/MANIFEST.MF 0
META-INF/CHROMIUM.SF 15188
META-INF/CHROMIUM.RSA 30490
AndroidManifest.xml 31548
assets/natives_blob.bin 33141
assets/snapshot_blob_32.bin 265593
classes.dex 2534890
lib/armeabi-v7a/lib_content_unittests__library.so 5363581
lib/armeabi-v7a/libosmesa.so 37189308
resources.arsc 38368354

(the latter one is the apk I got from the bot, using `tools/swarming_client/swarming.py reproduce -S chromium-swarm.appspot.com 36784150e5362910`)


So it looks like the files are not 4-byte aligned. Now I'm confused what `zipalign` is supposed to do, but it looks like my theory might be true.
What is printzip.py? I think I just hacked up the same in the meantime
https://gist.github.com/primiano/7556d7df5dd9ace60dbe1164f2ed24b2
Can you have a try on both apks?
Oh sorry I meant to paste this bit here too:

thakis@thakis:~/src/chrome/src$ cat printzip.py 
from zipfile import ZipFile
import sys

myzip = ZipFile(sys.argv[1], 'r')
for info in myzip.infolist():
  if 'res/' in info.filename: continue
  print info.filename, info.header_offset

Ah quite similar, but you are printing out the address of the per-file zip header, not the address of the content. see https://gist.github.com/primiano/7556d7df5dd9ace60dbe1164f2ed24b2 :) (don't ask me about that +2, but seems to work by doing some manual checks via xxd)
Output with your thing:

thakis@thakis:~/src/chrome/src$ python printzip.py out/gnandrel/content_unittests_apk/content_unittests-debug.apk 
34               15134      META-INF/MANIFEST.MF
3b84             15247      META-INF/CHROMIUM.SF
7746             1006       META-INF/CHROMIUM.RSA
7b65             1544       AndroidManifest.xml
81a2             232399     assets/natives_blob.bin
40daa            2269240    assets/snapshot_blob_32.bin
26ae0b           2828650    classes.dex
51d7c4           31825648   lib/armeabi-v7a/lib_content_unittests__library.so
23776ee          883357     lib/armeabi-v7a/libosmesa.so
249764c          468248     resources.arsc
thakis@thakis:~/src/chrome/src$ python printzip.py work/out/Release/content_unittests_apk/content_unittests-debug.apk 
34               15138      META-INF/MANIFEST.MF
3b88             15252      META-INF/CHROMIUM.SF
774f             1007       META-INF/CHROMIUM.RSA
7b6f             1544       AndroidManifest.xml
81ac             232399     assets/natives_blob.bin
40db4            2269240    assets/snapshot_blob_32.bin
26ae15           2828650    classes.dex
51d7ce           31825648   lib/armeabi-v7a/lib_content_unittests__library.so
23776f8          883357     lib/armeabi-v7a/libosmesa.so
2497490          468248     resources.arsc


snapshot_blob_32 has offset 10 vs 4 (or, without the +2, 8 vs 2 -- and the 2 looks definitely weird)
right, the snapshot in the first apk is aligned@2 and would cause a sigbus. the snapshot in the 2nd apk is aligned@4 and will work by luck.
not sure I get the " without the +2".
my reasoning was:
0x40daa % 4 = 2 --> SIGBUS
0x40db4 % 4 = 9 --> quietly works


The aa is the one that works and the b4 is the one that crashes :-)

(4.3.7 here https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE%2820120901%29_Version_6.3.3.txt explains the +2, the header before the filename is 2 mod 4; assuming extra_info is empty)
zipalign docs: "The adjustment is made by altering the size of the "extra" field in the zip Local File Header sections. Existing data in the "extra" fields may be altered by this process.". So i guess I should dump that "extra" size.
Better tools:

thakis@thakis:~/src/chrome/src$ third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 out/gnandrel/content_unittests_apk/content_unittests-debug.apk  | grep -v res/
Verifying alignment of out/gnandrel/content_unittests_apk/content_unittests-debug.apk (4)...
      50 META-INF/MANIFEST.MF (OK - compressed)
   15234 META-INF/CHROMIUM.SF (OK - compressed)
   30532 META-INF/CHROMIUM.RSA (OK - compressed)
   31587 AndroidManifest.xml (OK - compressed)
   33184 assets/natives_blob.bin (OK)
  265640 assets/snapshot_blob_32.bin (OK)
 2534921 classes.dex (OK - compressed)
 5363650 lib/armeabi-v7a/lib_content_unittests__library.so (OK - compressed)
37189356 lib/armeabi-v7a/libosmesa.so (OK - compressed)
38368844 resources.arsc (OK)
Verification succesful
thakis@thakis:~/src/chrome/src$ third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 work/out/Release/content_unittests_apk/content_unittests-debug.apk | grep -v res/
Verifying alignment of work/out/Release/content_unittests_apk/content_unittests-debug.apk (4)...
      50 META-INF/MANIFEST.MF (OK - compressed)
   15238 META-INF/CHROMIUM.SF (OK - compressed)
   30541 META-INF/CHROMIUM.RSA (OK - compressed)
   31597 AndroidManifest.xml (OK - compressed)
   33194 assets/natives_blob.bin (BAD - 2)
  265650 assets/snapshot_blob_32.bin (BAD - 2)
 2534931 classes.dex (OK - compressed)
 5363660 lib/armeabi-v7a/lib_content_unittests__library.so (OK - compressed)
37189366 lib/armeabi-v7a/libosmesa.so (OK - compressed)
38368398 resources.arsc (BAD - 2)
Verification FAILED



See BAD -- crazy! How does the bot end up with a badly aligned zip??
v8-side patch: https://codereview.chromium.org/2915323002/

Try jobs for patch set 2 at https://codereview.chromium.org/2901093007/ check if the _NormalizeZip() call is to blame.
Cc: agrieve@chromium.org
agrieve, primiano says you might be interested to learn that our apks aren't zipaligned on at least one bot. (I believe that's already the case and we just don't notice because gcc doesn't happen to produce an "load word" instruction for the lhs of https://codereview.chromium.org/2915323002/ whiel clang does, which is why I'm only seeing this on my try job even though the zip alignment is probably already bad)
Project Member

Comment 18 by bugdroid1@chromium.org, Jun 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/13af45c91dac27dffc7d25ca902fb81630b49ca4

commit 13af45c91dac27dffc7d25ca902fb81630b49ca4
Author: thakis <thakis@chromium.org>
Date: Fri Jun 02 19:48:50 2017

v8: Fix unaligned access when deserializing snapshots.

The code was already careful to use memcpy() here, but then it
added needless casts to wider types that made the compiler think
that it can do 4-byte-aligned accesses when it couldn't.

(It's also a bug that the snapshot got loaded at an unaligned
address, but we can fix both bugs.)

BUG= chromium:729059 

Review-Url: https://codereview.chromium.org/2915323002
Cr-Commit-Position: refs/heads/master@{#45698}

[modify] https://crrev.com/13af45c91dac27dffc7d25ca902fb81630b49ca4/src/snapshot/serializer-common.h

Patch set 2 also had these errors on the trybot (https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/309488), so it's probably not due to _NormalizeZip(). From that run:

python tools/swarming_client/isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s 7c6a08dae3ce879740d907bc47bf2ddbb92dcf45 --target foo
third_party/android_tools/sdk/build-tools/25.0.2/zipalign -c -v 4 foo/out/Release/components_unittests_apk/components_unittests-debug.apk  | grep -v res/
Verifying alignment of foo/out/Release/components_unittests_apk/components_unittests-debug.apk (4)...
      50 META-INF/MANIFEST.MF (OK - compressed)
   24958 META-INF/CHROMIUM.SF (OK - compressed)
   50093 META-INF/CHROMIUM.RSA (OK - compressed)
   51148 AndroidManifest.xml (OK - compressed)
   52745 assets/natives_blob.bin (BAD - 1)
  285201 assets/snapshot_blob_32.bin (BAD - 1)
 2554498 classes.dex (OK - compressed)
 3854235 classes2.dex (OK - compressed)
 6077255 lib/armeabi-v7a/lib_components_unittests__library.so (OK - compressed)
41685793 lib/armeabi-v7a/libosmesa.so (OK - compressed)
43052759 resources.arsc (BAD - 3)
Verification FAILED

So for some reason that bot creates badly aligned zip files.
Cc: thakis@chromium.org
Owner: jbudorick@chromium.org
I'm pretty sure the culprit here is this gem:

https://cs.chromium.org/chromium/src/tools/determinism/remove_build_metadata.py?q=remove_build_metadata.py&sq=package:chromium&l=80

I tried running zipalign -c on a locally built Chrome.apk and it all checked out. I then ran remove_build_metadata.py on it, and it came out non-zipaligned.

I'd guess we could just delete this logic, as our build system does already produce timestamp-normalized apks. John - do you happen to know of any reason we might still want this logic?
Status: Assigned (was: Started)
Yeah, this lets us achieve idempotency of build products. Swarming deduplication wouldn't work for APKs without it.
I believe that since this was added:
https://cs.chromium.org/chromium/src/build/android/gyp/finalize_apk.py?type=cs&q=finalize.apk.py+package:%5Echromium$&l=97

GN-based android_apk() create idempotent zip files. Assuming we only care about uploading .apks built from GN, it should be safe to remove.
ah, I'd either missed or forgotten about that. In that case, yeah, removing remove_zip_timestamps should be fine. We don't care about non-GN .apks.
Project Member

Comment 25 by bugdroid1@chromium.org, Jun 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/28d5fba595834d633f36b0613ecdd7300a10f267

commit 28d5fba595834d633f36b0613ecdd7300a10f267
Author: agrieve <agrieve@chromium.org>
Date: Tue Jun 06 19:29:07 2017

swarming: Don't normalize .apk files (breaks alignment)

The normalization is no longer necessary, as ninja now generates
.apk files with zeroed out timestamps. The normalization turned out to
be actively harmful, as it was not maintaining alignment of zip entries.

BUG= 729059 

Review-Url: https://codereview.chromium.org/2930483002
Cr-Commit-Position: refs/heads/master@{#477372}

[modify] https://crrev.com/28d5fba595834d633f36b0613ecdd7300a10f267/tools/determinism/remove_build_metadata.py

Cc: jbudorick@chromium.org
Owner: agrieve@chromium.org
Status: Fixed (was: Assigned)
Latest trybot succeeded:
https://codereview.chromium.org/2901093007/
Thanks for the fix!

I filed issue 732069 for investigating why content_browsertests still requires an aligned apk.

Also, I filed  issue 732071  about scripts modifying built products in-place after the fact.
Blocking: 882231

Sign in to add a comment