Issue metadata
Sign in to add a comment
|
Flaky failure to load chrome on Android Release (Nexus 9) GPU.FYI bot |
||||||||||||||||||||
Issue descriptionRandom tests are failing since https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102 gpu_process_launch_tests trace_test webgl_conformance_tests hardware_accelerated_feature_tests info_collection_tests For some reason, Chrome fails to load occasionally: 11-03 17:02:34.022 4986 4997 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 11-03 17:02:34.036 4986 5001 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 11-03 17:02:34.036 4986 5001 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load 11-03 17:02:34.036 4986 5001 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so 11-03 17:02:34.036 4986 5001 E cr_LibraryLoader: Unable to load library: chrome 11-03 17:02:34.036 4986 5001 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without --------- beginning of crash 11-03 17:02:34.037 4986 5001 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain 11-03 17:02:34.037 4986 5001 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process1, PID: 4986 11-03 17:02:34.037 4986 5001 E AndroidRuntime: java.lang.AssertionError 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108) 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at org.chromium.base.library_loader.LibraryLoader.loadAlreadyLocked(LibraryLoader.java:317) 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at org.chromium.base.library_loader.LibraryLoader.loadNowOverrideApplicationContext(LibraryLoader.java:186) 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at org.chromium.content.app.ContentChildProcessServiceDelegate.loadNativeLibrary(ContentChildProcessServiceDelegate.java:131) 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at org.chromium.base.process_launcher.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:188) 11-03 17:02:34.037 4986 5001 E AndroidRuntime: at java.lang.Thread.run(Thread.java:818) Will try to log in to the bot to get more info.
,
Nov 13 2017
The logcat has the same errors in passing runs... hmm.
,
Nov 13 2017
Where do you find logcats for passing runs?
,
Nov 14 2017
Since this bot isn't on swarming, you can get the logcat for the full build at the "logcat dump" link under the "gsutil upload" step. And yeah, that linker error also shows up on a passing test. Not sure what's happening there.
,
Nov 16 2017
So, the linker error does not appear in https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9101, but appears from https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102 onward, even when tests pass. My guess is that the test manages to pass before the link error, or it happens during a test with Fail expectation.
,
Nov 16 2017
Actually, seems that in successful runs the crash happens before the tests start running. Here: https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FAndroid_Release__Nexus_9_%2F9251%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout We have: CRITICAL:root:Failed to connect to browser. ERROR:root:Failed with BrowserConnectionGoneException while starting the browser backend. WARNING:root:Browser start failed (attempt 1 of 3) INFO:root:Browser started (pid=3120). And in the logcat I see that the browser which started before 3120 was 2807, which started SandboxedProcessService0 2833, which crashed loading. 01479: 11-15 22:16:56.756 623 1095 I ActivityManager: Start proc 2833:org.chromium.chrome:sandboxed_process0/u0i0 for service org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0 01479: 11-15 22:16:56.757 2833 2833 I art : Late-enabling -Xcheck:jni 01479: 11-15 22:16:56.803 2833 2833 W dboxed_process0: type=1400 audit(0.0:46): avc: denied { search } for name="tmp" dev="dm-2" ino=3512069 scontext=u:r:isolated_app:s0:c512,c768 tcontext=u:object_r:shell_data_file:s0 tclass=dir permissive=0 01479: 11-15 22:16:56.813 2833 2833 I cr_ChildProcessService: Creating new ChildProcessService pid=2833 01479: 11-15 22:16:57.635 2833 2846 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 01479: 11-15 22:16:57.685 2833 2847 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 01479: 11-15 22:16:57.685 2833 2847 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load 01479: 11-15 22:16:57.685 2833 2847 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so 01479: 11-15 22:16:57.685 2833 2847 E cr_LibraryLoader: Unable to load library: chrome 01479: 11-15 22:16:57.685 2833 2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process0, PID: 2833 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: java.lang.AssertionError 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108) 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.base.library_loader.LibraryLoader.loadAlreadyLocked(LibraryLoader.java:317) 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.base.library_loader.LibraryLoader.loadNowOverrideApplicationContext(LibraryLoader.java:186) 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.content.app.ContentChildProcessServiceDelegate.loadNativeLibrary(ContentChildProcessServiceDelegate.java:131) 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.base.process_launcher.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:188) 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at java.lang.Thread.run(Thread.java:818) 01479: 11-15 22:18:06.473 2833 2833 I cr_ChildProcessService: Destroying ChildProcessService pid=2833 But SandboxedProcessService0 of browser 3120 was 3146, and it loaded successfully: 01479: 11-15 22:18:09.758 3146 3146 I art : Late-enabling -Xcheck:jni 01479: 11-15 22:18:09.796 3146 3146 W dboxed_process0: type=1400 audit(0.0:79): avc: denied { search } for name="tmp" dev="dm-2" ino=3512069 scontext=u:r:isolated_app:s0:c512,c768 tcontext=u:object_r:shell_data_file:s0 tclass=dir permissive=0 01479: 11-15 22:18:09.806 3146 3146 I cr_ChildProcessService: Creating new ChildProcessService pid=3146 01479: 11-15 22:18:10.482 3146 3157 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 01479: 11-15 22:18:10.515 3146 3159 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 01479: 11-15 22:18:10.640 3146 3159 I cr_LibraryLoader: Time to load native libraries: 157 ms (timestamps 789-946) https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9101 also had "Failed to obtain fixed address for load" errors, but they always happened in Browser process, never in GPU. I think browser handle that error because it is not sandboxed, but GPU can't. So, in 9101 browser always finished loading before GPU, that's why loading GPU succeeds. And in 9102 some scheduling was changed, and GPU attempts to load before browser and fails.
,
Nov 16 2017
The end of my last comment is wrong, even when browser finished loading before GPU, GPU still fails after build 9102. Alas, I can't find something obvious in 9102 blamelist, and I can't get a stable repro case on the bot to bisect.
,
Nov 17 2017
agrieve@, can you please take a look at this, you seem to be somewhat familiar with linker, and your https://chromium-review.googlesource.com/751284 is one of my suspects for this flakiness. Looks to me that if loading is so unstable that it can be broken by some innocent change in https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102 blamelist, then something should be done to fix it. Perhaps you will be able to find a good owner for this bug.
,
Nov 17 2017
The stack in #6 looks telling to me: 01479: 11-15 22:16:57.635 2833 2846 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 01479: 11-15 22:16:57.685 2833 2847 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 01479: 11-15 22:16:57.685 2833 2847 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load 01479: 11-15 22:16:57.685 2833 2847 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so 01479: 11-15 22:16:57.685 2833 2847 E cr_LibraryLoader: Unable to load library: chrome 01479: 11-15 22:16:57.685 2833 2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process0, PID: 2833 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: java.lang.AssertionError 01479: 11-15 22:16:57.693 2833 2847 E AndroidRuntime: at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108) It says it's failing to load using shared relros (which can happen from time to time), and it looks like the retry is hitting a java assert. Retry logic: https://cs.chromium.org/chromium/src/base/android/java/src/org/chromium/base/library_loader/LibraryLoader.java?rcl=694aa30260f2b13d8ae8d97db6190b7e559b33d3&l=285 Assert: https://cs.chromium.org/chromium/src/base/android/java/src/org/chromium/base/library_loader/ModernLinker.java?rcl=694aa30260f2b13d8ae8d97db6190b7e559b33d3&l=107 Best guess is that the asserted mPrepareLibraryLoadCalled variable should be reset to false if the load fails. Another very strange thing: 01479: 11-15 22:16:57.685 2833 2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without The log tag should be "LibraryLoader". No clue why that would be off... +torne as someone who knows about shared relros.
,
Nov 22 2017
The retry logic looks fine to me; prepare is only called once (before any library is loaded) and is not called again even if there are retries. The sole call to prepare should have happened before those other log messages are printed, so it shouldn't be possible to crash there :/ Note that pasko@ is in the middle of removing the modern linker..
,
Nov 22 2017
No, prepare() can be called multiple times. See how ContentChildProcessServiceDelegate::loadNativeLibrary() calls loadNowOverrideApplicationContext() twice. This will call LibraryLoader::loadAlreadyLocked() twice, which will call ModernLinker::prepareLibraryLoad() twice. What I'm worried about, however, is the flakiness - sometimes crazy.libchrome.so can be loaded with shared RELRO and sometimes it can't. This just doesn't seem right to me.
,
Nov 22 2017
Oh. The version in ContentChildProcessServiceDelegate is wrong, then. (I didn't look at that one). The retrying should be wrapped up inside of LibraryLoader as it is in the other case - ContentChildProcessServiceDelegate shouldn't call LibraryLoader twice. Shared RELRO depends on address space compatibility, which is unpredictable due to ASLR. Whether it works is basically random.
,
Nov 23 2017
torne - sounds like you have a good idea of the fix? (making you owner)
,
Dec 4 2017
Moving Infra>Client>Android -> Infra>Client>Chrome+OS=Android
,
Dec 4 2017
,
Jan 8 2018
I think this is obsolete after removing ModernLinker in issue 719977.
,
Jan 9 2018
No, the buggy code isn't in ModernLinker and applies to both linkers; taking a quick look at the current code it looks like it's still doing the wrong thing. I just haven't had a chance to look at this yet :/
,
Feb 26 2018
gpu triage: ping on idle p1 bug.
,
Nov 9
Is this still something we want to investigate? If so, but not pressing, let's drop priority?
,
Nov 20
Someone should just fix it as I described if the code is still like that; I haven't had time to look (this isn't really on my radar). The priority could be dropped but when I looked at it before it was a real bug with a known cause that isn't too hard to fix... |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by kbr@chromium.org
, Nov 13 2017Components: Internals>GPU>Testing Infra>Client>Android