//chrome:unit_tests are flaky on Android official builds due to ODR violation due to multiple .so(s) due to resource_whitelist deps |
|||||||||
Issue description
I am going to tell you a story. It's a bit of a complicated story so sit down and get a comfy chair.
TL;DR
When building the unit_tests_apk with {is_official_build=true, is_component_build=false}, the final apk ends up with two .so(s) libchrome.so and lib_unit_tests__library.so. libchrome.so is not needed an shouldn't be there. It's presence is not only unnecessary but harmful.
Because we LoadLibrary() both, we end up in a weird situation where, depending on the scenario, we can hit code from both .so(s) **within the same call-stack**. libchrome.so is there because of a build-time only dependency required for the resource_whitelist generation and accidentally ends up in the apk causing problems.
Chapter 1: follow the white rabbit in the ODR violation hole
------------------------------------------------------------
Everything started with Issue 739510. bauerb@ got a minified repro case that was creating a call stack like this:
test_runner::main() -> [A] C++ function -> [B] call into a Java function via JNI -> [C] call into another C++ function via JNI.
bauerb@ got into an absurd situation where the state of global variables in [A] was different from the ones in [C]. Looked like globals were re-initialized in [C].
After a long debugging session we figured out that even if the functions in [A] and [C] were defined in the same translation unit, their addresses were strongly different. Specifically they were being pulled from two different .so(s). So the actual call stack looked like this:
test_runner::main() [lib_unit_tests__library.so]
A_CppFunc() [lib_unit_tests__library.so]
B_javaFunction() [Some classes.dex]
C_AnotherCppFunc() [libchrome.so]
Not only the addresses of these functions were different but, because of position-independent-code, also the addresses of their globals. So effectively [C] would see an entirely new world where everything is fresh and clean. But then imagine things like TLS pointers, which instead get initialized early and stashed in a register, which still refer to the old [A] world. At this point you can imagine the surprise, the shock, the sense of outrage, the indignation and the grief coming from those printf("%p %p %p\n", ...) statements.
(Also, because of reasons I'll tell you later, this happens only in official builds. The days of those leaving on the is_official_build=false planet are joyful, trouble-less and ODR-compliant)
The main reason for this odd situation is:
- the presence of the two .so() in the apk
- the fact that we System.LoadLibrary() both.
Chapter 2: Deciding which .so to load, how hard can it be?
----------------------------------------------------------
Why do we load both? In some cases (the all CrazyLinker story makes this a bit more complicated, but let's skip this part) they are loaded according to the list defined in //base/android/java/templates/NativeLibraries.template.
That content of that template is written by //build/android/gyp/write_ordered_libraries.py (yes it says "gyp", but that's just a legacy name, let's also skip this).
That script is ran by build/config/android/rules.gni, which in turn uses GN's magic write_runtime_deps = ... [1] to ask the build-system to expose the runtime-dependencies for a target into a file.
So essentially GN, at build time, writes the runtime dependencies of the target to a file which is consumed by write_ordered_libraries.py which produces a .json file, which is fed back to another GN action that uses jinja to expand the NativeLibraries.template into a NativeLibraries.java which then becomes part of the build, which then contains an array of ["filename.so"] which then is iterated at runtime calling System.LoadLibrary(). It's just that easy!
Chapter 3: What is "deps", baby don't hurt me, don't hurt me, no more.
----------------------------------------------------------------------
The action target itself is a template (generate_resource_whitelist) [2] which in turn is used by targets like //chrome:resource_whitelist [3].
I am not too familiar with resource_whitelist, but this seems to be a host tool that scans a .so (well some extra artifacts produced by the compiler_wrapper, really) and extracts a whitelist of various IDS_ resources (translations strings and that stuff) referenced by the various translation units.
Hold on, how do we get from unit_tests to this point? Here's the dependency chain:
$ gn path out/droid //chrome/test:unit_tests //chrome/android:chrome
//chrome/test:unit_tests --[private]-->
//chrome:packed_resources --[private]-->
//chrome:resource_whitelist --[private]-->
//chrome/android:chrome
Essentially the unit_tests target, other than depending on the lib_unit_tests__library.so itself (this is right and expected), depends on the packed_resources. This is also expected, as there is nothing bad about the test depending on our .pak files.
But in official_builds (which makes enable_resource_whitelist_generation=true), packed_resources depends on the resource_whitelist target.
This is also fine, I guess we want to get coverage for that whitelist.
In turn resource_whitelist depends on libchrome.so. Also this seems legit. We want to whitelist the resources used by the main executable, not by the test.
But now we ended up with two .so(s) in the same apk: lib_unit_tests__library.so. libchrome.so. These .so re-define a lot of the same symbols. They both have their own copy of v8, content, etc. This leads to the weird situation above (let's not get into symbol-resolution rules for JNI->C++ calls)
Chapter 4: run-time vs build-time deps, where's the line?
---------------------------------------------------------
So, where is the problem then? Let's merge the two discussions above.
resource_whitelist needs to "depend" on libchrome.so only at build-time in order to generate the resource whitelist.
the test apk needs to "depend" on lib_unit_tests__library.so on build-time AND run-time because, well, that contains the code of the test.
This is the part where I start getting lost. GN seems to have different notions of deps and I think we are mixing them.
I think that what we want to say here is:
the test apk depends:
- At build time: on both libchrome.so and lib_unit_tests__library.so
- At run-time (which in turn generates the .so list that end up in the apk): on lib_unit_tests__library.so only
I don't know what is wrong link of the chain here. I think that the issue starts when chrome:resource_whitelist declares a
generate_resource_whitelist("resource_whitelist") {
deps = [ "//chrome/android:chrome" ]
...
}
At this point libchrome.so becomes a runtime_dep for the test apk, which in turn causes the write_ordered_libraries to include it.
I have no idea how to express the fact that chrome:packed_resources depends at build-time only on //chrome/android:chrome.
The other GN deps I found, data_deps, seem more oriented at the opposite scenario (run-time only dependency).
This is where our story ends for today...
[1] https://chromium.googlesource.com/chromium/src/+/master/tools/gn/docs/reference.md#write_runtime_deps
[2] tools/resources/generate_resource_whitelist.gni
[3] https://cs.chromium.org/chromium/src/chrome/BUILD.gn?type=cs&q=chrome+%5C%22resource_whitelist&sq=package:chromium&l=1777
,
Jul 24 2017
Brett might know more about deps and GN.
,
Jul 25 2017
WOW! You win a prize for best bug description evar! :) +agrieve who is probably best able to wrestle this situation into something sensible.
,
Jul 25 2017
Here's a work-around: https://chromium-review.googlesource.com/c/584170/ Although I think it's probably better to change GN to not automatically consider shared_libraries through actions as runtime_deps
,
Jul 25 2017
One of possible solutions to fix it on GN side https://chromium-review.googlesource.com/c/584832 However I'm not convinced that we should fix it that way (of fix it in GN at all).
,
Jul 25 2017
,
Jul 25 2017
Fix lgtm!
,
Jul 25 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/8e65260fedd5792d9f10eefae3f6650936aace34 commit 8e65260fedd5792d9f10eefae3f6650936aace34 Author: Andrew Grieve <agrieve@chromium.org> Date: Tue Jul 25 16:46:14 2017 Cludge to prevent .so from being a data_dep in resource_whitelist BUG= 748113 Change-Id: I9aa6fa7cc0f8b7e4c84d0eea11686033ebf45e6e Reviewed-on: https://chromium-review.googlesource.com/584170 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Andrew Grieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#489338} [modify] https://crrev.com/8e65260fedd5792d9f10eefae3f6650936aace34/tools/resources/OWNERS [add] https://crrev.com/8e65260fedd5792d9f10eefae3f6650936aace34/tools/resources/dummy.c [modify] https://crrev.com/8e65260fedd5792d9f10eefae3f6650936aace34/tools/resources/generate_resource_whitelist.gni
,
Jul 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/a207fe68faa4ba90ea6c82a4cbada06059c40e3f commit a207fe68faa4ba90ea6c82a4cbada06059c40e3f Author: Andrew Grieve <agrieve@chromium.org> Date: Wed Jul 26 13:40:48 2017 Revert "Cludge to prevent .so from being a data_dep in resource_whitelist" This reverts commit 8e65260fedd5792d9f10eefae3f6650936aace34. Reason for revert: Breaks monochrome arm64 builds https://bugs.chromium.org/p/chromium/issues/detail?id=749003 Original change's description: > Cludge to prevent .so from being a data_dep in resource_whitelist > > BUG= 748113 > > Change-Id: I9aa6fa7cc0f8b7e4c84d0eea11686033ebf45e6e > Reviewed-on: https://chromium-review.googlesource.com/584170 > Reviewed-by: Dirk Pranke <dpranke@chromium.org> > Commit-Queue: Andrew Grieve <agrieve@chromium.org> > Cr-Commit-Position: refs/heads/master@{#489338} TBR=dpranke@chromium.org,agrieve@chromium.org,kraynov@google.com Change-Id: I65ccd9a1501a902efc8d87de7fb90732fd72ddb5 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 748113 , 749003 Reviewed-on: https://chromium-review.googlesource.com/586787 Reviewed-by: Andrew Grieve <agrieve@chromium.org> Commit-Queue: Andrew Grieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#489626} [modify] https://crrev.com/a207fe68faa4ba90ea6c82a4cbada06059c40e3f/tools/resources/OWNERS [delete] https://crrev.com/3526d3a7f45d3c25fdffac894ab5d0c501b087c2/tools/resources/dummy.c [modify] https://crrev.com/a207fe68faa4ba90ea6c82a4cbada06059c40e3f/tools/resources/generate_resource_whitelist.gni
,
Jul 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/670ff2ddfd9022f321edd5357dad129de6c42c1b commit 670ff2ddfd9022f321edd5357dad129de6c42c1b Author: Andrew Grieve <agrieve@chromium.org> Date: Wed Jul 26 17:22:50 2017 Revert "Cludge to prevent .so from being a data_dep in resource_whitelist" This reverts commit 8e65260fedd5792d9f10eefae3f6650936aace34. Reason for revert: Breaks monochrome arm64 builds https://bugs.chromium.org/p/chromium/issues/detail?id=749003 Original change's description: > Cludge to prevent .so from being a data_dep in resource_whitelist > > BUG= 748113 > > Change-Id: I9aa6fa7cc0f8b7e4c84d0eea11686033ebf45e6e > Reviewed-on: https://chromium-review.googlesource.com/584170 > Reviewed-by: Dirk Pranke <dpranke@chromium.org> > Commit-Queue: Andrew Grieve <agrieve@chromium.org> > Cr-Commit-Position: refs/heads/master@{#489338} TBR=agrieve@chromium.org, dpranke@chromium.org, kraynov@google.com (cherry picked from commit a207fe68faa4ba90ea6c82a4cbada06059c40e3f) Change-Id: I65ccd9a1501a902efc8d87de7fb90732fd72ddb5 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 748113 , 749003 Reviewed-on: https://chromium-review.googlesource.com/586787 Reviewed-by: Andrew Grieve <agrieve@chromium.org> Commit-Queue: Andrew Grieve <agrieve@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#489626} Reviewed-on: https://chromium-review.googlesource.com/586473 Cr-Commit-Position: refs/branch-heads/3167@{#3} Cr-Branched-From: 5ad9ff6c130597b324fe94e4decc5a9c64b07218-refs/heads/master@{#489499} [modify] https://crrev.com/670ff2ddfd9022f321edd5357dad129de6c42c1b/tools/resources/OWNERS [delete] https://crrev.com/3f2a19c18ca5f6e2930d39e360fd61f348858fb6/tools/resources/dummy.c [modify] https://crrev.com/670ff2ddfd9022f321edd5357dad129de6c42c1b/tools/resources/generate_resource_whitelist.gni
,
Jul 27 2017
So since this got reverted due to it breaking monochrome, this is broken again now? Monochrome by default does indeed build both architectures in the same build as mentioned in issue 749003. I don't really know a lot about how this is hooked up and works; Michael did the work here (added cc). Do we want to try relanding some form of this dependency change again, and just need to figure out how to fix monochrome builds, or is the plan to do a different fix?
,
Jul 27 2017
Reland in CQ: https://chromium-review.googlesource.com/c/586469/5
,
Jul 27 2017
Ah, thanks.
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dde1afca9fb6b691f8bdc42a92051a7d1ccdcbda commit dde1afca9fb6b691f8bdc42a92051a7d1ccdcbda Author: Andrew Grieve <agrieve@chromium.org> Date: Thu Jul 27 15:57:07 2017 Reland: "Cludge to prevent .so from being a data_dep in resource_whitelist" This reverts commit a207fe68faa4ba90ea6c82a4cbada06059c40e3f. Reason for reland: Now tested with monochrome_public official build on arm64 Bug: 748113 , 749003 Change-Id: I0f2aeaa27857ec0456a65395c02a2c466f3261ed Reviewed-on: https://chromium-review.googlesource.com/586469 Commit-Queue: Andrew Grieve <agrieve@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Andrew Grieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#489947} [modify] https://crrev.com/dde1afca9fb6b691f8bdc42a92051a7d1ccdcbda/chrome/android/BUILD.gn [add] https://crrev.com/dde1afca9fb6b691f8bdc42a92051a7d1ccdcbda/tools/resources/dummy.c [modify] https://crrev.com/dde1afca9fb6b691f8bdc42a92051a7d1ccdcbda/tools/resources/generate_resource_whitelist.gni
,
Jul 27 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0c7bf24bd2f2b73cd3dec5383d5ec7834475682e commit 0c7bf24bd2f2b73cd3dec5383d5ec7834475682e Author: Bernhard Bauer <bauerb@chromium.org> Date: Thu Jul 27 23:03:50 2017 Revert "Grab ThreadTaskRunnerHandle in JsonSanitizerAndroid before calling into Java." This reverts commit d52d72afb9af50dfdedf7c26bd6fc2810ecdeb21. Reason for revert: Removing workaround now that the underlying issue has been fixed. Original change's description: > Grab ThreadTaskRunnerHandle in JsonSanitizerAndroid before calling into Java. > > This works around an issue where in official builds of unit_tests the > ThreadTaskRunnerHandle is null when calling into native code from Java. > > Bug: 739510 > Change-Id: Ide93e67076f905d03a6a9271a812e49553dcf0d8 > Reviewed-on: https://chromium-review.googlesource.com/570251 > Reviewed-by: Robert Sesek <rsesek@chromium.org> > Commit-Queue: Bernhard Bauer <bauerb@chromium.org> > Cr-Commit-Position: refs/heads/master@{#486750} Bug: 748113 , 739510 Change-Id: I145618710a3c2b3eeecfd6e45a0e929362efe352 Reviewed-on: https://chromium-review.googlesource.com/586127 Commit-Queue: Bernhard Bauer <bauerb@chromium.org> Reviewed-by: Robert Sesek <rsesek@chromium.org> Cr-Commit-Position: refs/heads/master@{#490058} [modify] https://crrev.com/0c7bf24bd2f2b73cd3dec5383d5ec7834475682e/components/safe_json/json_sanitizer_android.cc
,
Aug 15 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/655a12bcfb7014f35fe14afd293bfb9066878f46 commit 655a12bcfb7014f35fe14afd293bfb9066878f46 Author: Greg Kraynov <kraynov@chromium.org> Date: Tue Aug 15 18:26:21 2017 Exclude action's dep on shared library from runtime dep propagation. Shared library in deps of action and action_foreach was treated as a runtime dependency, which is unwanted behaviour. Bug: 748113 Change-Id: Ifdf103a71a8f15fdd6ba6b802a2cd5df182b3bd2 Reviewed-on: https://chromium-review.googlesource.com/584832 Reviewed-by: Andrew Grieve <agrieve@chromium.org> Reviewed-by: Brett Wilson <brettw@chromium.org> Commit-Queue: Grigoriy Kraynov <kraynov@google.com> Cr-Commit-Position: refs/heads/master@{#494453} [modify] https://crrev.com/655a12bcfb7014f35fe14afd293bfb9066878f46/tools/gn/docs/reference.md [modify] https://crrev.com/655a12bcfb7014f35fe14afd293bfb9066878f46/tools/gn/runtime_deps.cc [modify] https://crrev.com/655a12bcfb7014f35fe14afd293bfb9066878f46/tools/gn/runtime_deps_unittest.cc
,
Aug 15 2017
So GN has fixed :) Next steps: 1. Roll GN 2. Revert https://chromium-review.googlesource.com/c/586469
,
Aug 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/276a02c9f0daf896e2ddd08646544b7b27fb4067 commit 276a02c9f0daf896e2ddd08646544b7b27fb4067 Author: Greg Kraynov <kraynov@chromium.org> Date: Wed Aug 16 02:07:37 2017 Remove cludge preventing .so being runtime dep of resource_whitelist. Revert https://chromium-review.googlesource.com/c/586469 Fixed in GN https://chromium-review.googlesource.com/c/584832 Bug: 748113 Change-Id: Ie0fc0ceaaed017f8ee86ff223617cecf0a4296d5 Reviewed-on: https://chromium-review.googlesource.com/616280 Commit-Queue: Grigoriy Kraynov <kraynov@chromium.org> Reviewed-by: Andrew Grieve <agrieve@chromium.org> Cr-Commit-Position: refs/heads/master@{#494667} [modify] https://crrev.com/276a02c9f0daf896e2ddd08646544b7b27fb4067/chrome/android/BUILD.gn [delete] https://crrev.com/49124acbdd202db7fbf4db80b9f86eacfd0e48ab/tools/resources/dummy.c [modify] https://crrev.com/276a02c9f0daf896e2ddd08646544b7b27fb4067/tools/resources/generate_resource_whitelist.gni
,
Aug 16 2017
GN fixed https://chromium-review.googlesource.com/c/584832 GN rolled https://chromium-review.googlesource.com/c/616041 https://chromium-review.googlesource.com/c/616043 Temporary workaround removed https://chromium-review.googlesource.com/c/616280 Original issue confirmed resolved on tip of tree. Bug closed :) |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by primiano@chromium.org
, Jul 24 2017