New issue
Advanced search Search tips

Issue 784584 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug-Regression

Blocked on:
issue 719977



Sign in to add a comment

Flaky failure to load chrome on Android Release (Nexus 9) GPU.FYI bot

Project Member Reported by ynovikov@chromium.org, Nov 13 2017

Issue description

Random tests are failing since https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102

gpu_process_launch_tests
trace_test
webgl_conformance_tests
hardware_accelerated_feature_tests
info_collection_tests

For some reason, Chrome fails to load occasionally:
  	11-03 17:02:34.022  4986  4997 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
  	11-03 17:02:34.036  4986  5001 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
  	11-03 17:02:34.036  4986  5001 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load
  	11-03 17:02:34.036  4986  5001 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so
  	11-03 17:02:34.036  4986  5001 E cr_LibraryLoader: Unable to load library: chrome
  	11-03 17:02:34.036  4986  5001 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without
  	--------- beginning of crash
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process1, PID: 4986
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: java.lang.AssertionError
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108)
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at org.chromium.base.library_loader.LibraryLoader.loadAlreadyLocked(LibraryLoader.java:317)
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at org.chromium.base.library_loader.LibraryLoader.loadNowOverrideApplicationContext(LibraryLoader.java:186)
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at org.chromium.content.app.ContentChildProcessServiceDelegate.loadNativeLibrary(ContentChildProcessServiceDelegate.java:131)
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at org.chromium.base.process_launcher.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:188)
  	11-03 17:02:34.037  4986  5001 E AndroidRuntime: 	at java.lang.Thread.run(Thread.java:818)

Will try to log in to the bot to get more info.
 

Comment 1 by kbr@chromium.org, Nov 13 2017

Cc: jbudorick@chromium.org
Components: Internals>GPU>Testing Infra>Client>Android
I wonder whether the Chrome Android Infra folks will have a better idea of what's going on?

Anything suspicious in the blame list?

The logcat has the same errors in passing runs... hmm.
Where do you find logcats for passing runs?
Since this bot isn't on swarming, you can get the logcat for the full build at the "logcat dump" link under the "gsutil upload" step.

And yeah, that linker error also shows up on a passing test. Not sure what's happening there.
So, the linker error does not appear in https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9101, but appears from https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102 onward, even when tests pass.
My guess is that the test manages to pass before the link error, or it happens during a test with Fail expectation.
Actually, seems that in successful runs the crash happens before the tests start running. Here:
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.gpu.fyi%2FAndroid_Release__Nexus_9_%2F9251%2F%2B%2Frecipes%2Fsteps%2Fcontext_lost_tests%2F0%2Fstdout
We have:
CRITICAL:root:Failed to connect to browser.
ERROR:root:Failed with BrowserConnectionGoneException while starting the browser backend.
WARNING:root:Browser start failed (attempt 1 of 3)
INFO:root:Browser started (pid=3120).

And in the logcat I see that the browser which started before 3120 was 2807, which started SandboxedProcessService0 2833, which crashed loading.

01479:  11-15 22:16:56.756   623  1095 I ActivityManager: Start proc 2833:org.chromium.chrome:sandboxed_process0/u0i0 for service org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0
01479:  11-15 22:16:56.757  2833  2833 I art     : Late-enabling -Xcheck:jni
01479:  11-15 22:16:56.803  2833  2833 W dboxed_process0: type=1400 audit(0.0:46): avc: denied { search } for name="tmp" dev="dm-2" ino=3512069 scontext=u:r:isolated_app:s0:c512,c768 tcontext=u:object_r:shell_data_file:s0 tclass=dir permissive=0
01479:  11-15 22:16:56.813  2833  2833 I cr_ChildProcessService: Creating new ChildProcessService pid=2833
01479:  11-15 22:16:57.635  2833  2846 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
01479:  11-15 22:16:57.685  2833  2847 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
01479:  11-15 22:16:57.685  2833  2847 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load
01479:  11-15 22:16:57.685  2833  2847 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so
01479:  11-15 22:16:57.685  2833  2847 E cr_LibraryLoader: Unable to load library: chrome
01479:  11-15 22:16:57.685  2833  2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process0, PID: 2833
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: java.lang.AssertionError
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108)
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.base.library_loader.LibraryLoader.loadAlreadyLocked(LibraryLoader.java:317)
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.base.library_loader.LibraryLoader.loadNowOverrideApplicationContext(LibraryLoader.java:186)
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.content.app.ContentChildProcessServiceDelegate.loadNativeLibrary(ContentChildProcessServiceDelegate.java:131)
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.base.process_launcher.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:188)
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at java.lang.Thread.run(Thread.java:818)
01479:  11-15 22:18:06.473  2833  2833 I cr_ChildProcessService: Destroying ChildProcessService pid=2833

But SandboxedProcessService0 of browser 3120 was 3146, and it loaded successfully:
01479:  11-15 22:18:09.758  3146  3146 I art     : Late-enabling -Xcheck:jni
01479:  11-15 22:18:09.796  3146  3146 W dboxed_process0: type=1400 audit(0.0:79): avc: denied { search } for name="tmp" dev="dm-2" ino=3512069 scontext=u:r:isolated_app:s0:c512,c768 tcontext=u:object_r:shell_data_file:s0 tclass=dir permissive=0
01479:  11-15 22:18:09.806  3146  3146 I cr_ChildProcessService: Creating new ChildProcessService pid=3146
01479:  11-15 22:18:10.482  3146  3157 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
01479:  11-15 22:18:10.515  3146  3159 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
01479:  11-15 22:18:10.640  3146  3159 I cr_LibraryLoader: Time to load native libraries: 157 ms (timestamps 789-946)

https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9101 also had "Failed to obtain fixed address for load" errors, but they always happened in Browser process, never in GPU. I think browser handle that error because it is not sandboxed, but GPU can't. So, in 9101 browser always finished loading before GPU, that's why loading GPU succeeds. And in 9102 some scheduling was changed, and GPU attempts to load before browser and fails.
The end of my last comment is wrong, even when browser finished loading before GPU, GPU still fails after build 9102.
Alas, I can't find something obvious in 9102 blamelist, and I can't get a stable repro case on the bot to bisect.
Owner: agrieve@chromium.org
Status: Assigned (was: Started)
agrieve@, can you please take a look at this, you seem to be somewhat familiar with linker, and your https://chromium-review.googlesource.com/751284 is one of my suspects for this flakiness.
Looks to me that if loading is so unstable that it can be broken by some innocent change in https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%209%29/builds/9102 blamelist, then something should be done to fix it. Perhaps you will be able to find a good owner for this bug.
Cc: torne@chromium.org
The stack in #6 looks telling to me:

01479:  11-15 22:16:57.635  2833  2846 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
01479:  11-15 22:16:57.685  2833  2847 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
01479:  11-15 22:16:57.685  2833  2847 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load
01479:  11-15 22:16:57.685  2833  2847 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so
01479:  11-15 22:16:57.685  2833  2847 E cr_LibraryLoader: Unable to load library: chrome
01479:  11-15 22:16:57.685  2833  2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: Process: org.chromium.chrome:sandboxed_process0, PID: 2833
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: java.lang.AssertionError
01479:  11-15 22:16:57.693  2833  2847 E AndroidRuntime: 	at org.chromium.base.library_loader.ModernLinker.prepareLibraryLoad(ModernLinker.java:108)


It says it's failing to load using shared relros (which can happen from time to time), and it looks like the retry is hitting a java assert.
Retry logic: https://cs.chromium.org/chromium/src/base/android/java/src/org/chromium/base/library_loader/LibraryLoader.java?rcl=694aa30260f2b13d8ae8d97db6190b7e559b33d3&l=285
Assert: https://cs.chromium.org/chromium/src/base/android/java/src/org/chromium/base/library_loader/ModernLinker.java?rcl=694aa30260f2b13d8ae8d97db6190b7e559b33d3&l=107

Best guess is that the asserted mPrepareLibraryLoadCalled variable should be reset to false if the load fails.

Another very strange thing:
01479:  11-15 22:16:57.685  2833  2847 W cr_ContentCPSDelegate: Failed to load native library with shared RELRO, retrying without

The log tag should be "LibraryLoader". No clue why that would be off...

+torne as someone who knows about shared relros.

Comment 10 Deleted

Comment 11 by torne@chromium.org, Nov 22 2017

The retry logic looks fine to me; prepare is only called once (before any library is loaded) and is not called again even if there are retries. The sole call to prepare should have happened before those other log messages are printed, so it shouldn't be possible to crash there :/

Note that pasko@ is in the middle of removing the modern linker..
No, prepare() can be called multiple times.
See how ContentChildProcessServiceDelegate::loadNativeLibrary() calls loadNowOverrideApplicationContext() twice.
This will call LibraryLoader::loadAlreadyLocked() twice, which will call ModernLinker::prepareLibraryLoad() twice.

What I'm worried about, however, is the flakiness - sometimes crazy.libchrome.so can be loaded with shared RELRO and sometimes it can't. This just doesn't seem right to me.

Comment 13 by torne@chromium.org, Nov 22 2017

Oh. The version in ContentChildProcessServiceDelegate is wrong, then. (I didn't look at that one). The retrying should be wrapped up inside of LibraryLoader as it is in the other case - ContentChildProcessServiceDelegate shouldn't call LibraryLoader twice.

Shared RELRO depends on address space compatibility, which is unpredictable due to ASLR. Whether it works is basically random.
Owner: torne@chromium.org
torne - sounds like you have a good idea of the fix? (making you owner)
Components: Infra>Client>Chrome
Moving Infra>Client>Android -> Infra>Client>Chrome+OS=Android
Components: -Infra>Client>Android
Blockedon: 719977
Status: WontFix (was: Assigned)
I think this is obsolete after removing ModernLinker in issue 719977.
Status: Assigned (was: WontFix)
No, the buggy code isn't in ModernLinker and applies to both linkers; taking a quick look at the current code it looks like it's still doing the wrong thing. I just haven't had a chance to look at this yet :/
gpu triage: ping on idle p1 bug.
Is this still something we want to investigate? If so, but not pressing, let's drop priority?
Labels: -Pri-1 Pri-2
Owner: ----
Status: Available (was: Assigned)
Someone should just fix it as I described if the code is still like that; I haven't had time to look (this isn't really on my radar). The priority could be dropped but when I looked at it before it was a real bug with a known cause that isn't too hard to fix...

Sign in to add a comment