Android Release (Nexus 5X) flake OnGpuChannelTimeout crash |
||
Issue descriptionForked from issue 664341. Pasting relevant info here: failed to load the native library, yet it did *not* kill the child process. 08-03 01:21:28.068 3818 3843 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 08-03 01:21:28.107 3818 3846 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 08-03 01:21:28.107 3818 3846 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load 08-03 01:21:28.107 3818 3846 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so 08-03 01:21:28.107 3818 3846 W cr_ChildProcessService: Failed to load native library with shared RELRO, retrying without 08-03 01:21:28.108 3818 3846 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain 08-03 01:21:28.108 3818 3846 E AndroidRuntime: Process: org.chromium.chrome:privileged_process0, PID: 3818 08-03 01:21:28.108 3818 3846 E AndroidRuntime: java.lang.AssertionError 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at org.chromium.base.library_loader.ModernLinker.disableSharedRelros(ModernLinker.java:254) 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at org.chromium.content.app.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:195) 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at java.lang.Thread.run(Thread.java:818) The library load failure in #112 is a major (5.5% of builds) source of flakiness on "Android Release (Nexus 5X)" bot, which prevents us from adding its mirror android_optional_gpu_tests_rel to CQ. https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5187 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5186 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5172 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5147 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5124 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5116 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5113 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5069 https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5055
,
Jan 23 2017
I don't see any new occurrences since https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/5187. Which was 2 weeks and 300 builds ago. Since you suspect this was caused by some slowness in issue 680777, I think maybe this was caused by problems with NTP (New Task Page) and FRE (First Run Experience) fixed in issues 673831, 632199, 672255.
,
Jan 25 2017
I don't think any of that matters to linker or asserts. agrieve: can you summarize how java assert works on M?
,
Jan 29 2017
When asserts are enabled: https://cs.chromium.org/chromium/src/build/config/android/internal_rules.gni?q=is_java_debug&sq=package:chromium&dr=C&l=1048 Then they throw an AssertionError() when the conditions fail (same as pre-L). These exceptions are catchable via "catch (Throwable e)".
,
Jan 30 2017
Ahh, I know why the assertion is not crashing the process now, at least in this build configuration. I tested by just inserting an assertion failure in child start up code. When the exception is thrown, android pops up the "Unfortunately, Chromium has stopped" dialog. The process will hang around until that dialog is dismissed!! I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production. That just leaves actually fixing the wrong assertion, which is trivial Side comment.. modifying java bytecode to get assertions? That's crazy. I didn't even know that's possible.. https://cs.chromium.org/chromium/src/build/android/java_assertion_enabler/java/org/chromium/javaassertionenabler/AssertionEnabler.java
,
Jan 31 2017
Pasting from original bug for reference. The assert in disableSharedRelros is wrong, and can be triggered if sharing relro region in child process fails (not sure why it fails though): 08-03 01:21:28.068 3818 3843 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker 08-03 01:21:28.107 3818 3846 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk 08-03 01:21:28.107 3818 3846 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load 08-03 01:21:28.107 3818 3846 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so 08-03 01:21:28.107 3818 3846 W cr_ChildProcessService: Failed to load native library with shared RELRO, retrying without 08-03 01:21:28.108 3818 3846 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain 08-03 01:21:28.108 3818 3846 E AndroidRuntime: Process: org.chromium.chrome:privileged_process0, PID: 3818 08-03 01:21:28.108 3818 3846 E AndroidRuntime: java.lang.AssertionError 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at org.chromium.base.library_loader.ModernLinker.disableSharedRelros(ModernLinker.java:254) 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at org.chromium.content.app.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:195) 08-03 01:21:28.108 3818 3846 E AndroidRuntime: at java.lang.Thread.run(Thread.java:818)
,
Jan 31 2017
Worth filing a bug about the dialog causing a timeout. I'm not sure if there's already code somewhere that works around this, but it is suppressible via Thread.setDefaultUncaughtExceptionHandler() Here's the default uncaughtExceptionHandler: https://github.com/android/platform_frameworks_base/blob/master/core/java/com/android/internal/os/RuntimeInit.java#L116 Perhaps we want to disable it when instrumentation is active?
,
Jan 31 2017
> I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production. Err.. don't remember where I got that idea from. But using chrome://gpucrash on 55.0.2883.91 (current stable) official build, that most definitely causes this dialog. But gpucrash is native crash, so there is no chance of process surviving longer than the crash. Might be worthwhile to test this with a java exception instead. If this crashes as well, this could potentially explain a lot of the gpu timeouts in production.
,
Jan 31 2017
> Might be worthwhile to test this with a java exception instead. Actually, gonna skip that step. Seems there is enough evidence already..
,
Jan 31 2017
> > I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production. > > Err.. don't remember where I got that idea from. But using chrome://gpucrash on 55.0.2883.91 (current stable) official build, that most definitely causes this dialog. That suppression does exist, for native crashes, but is only enabled on user devices; see FinalizeCrashDoneAndroid in breakpad_linux.cc. Can remove the userdebug/eng check, and then add --enable-crash-reporter-for-testing, then chrome://crash won't throw up that dialog anymore. But if I insert the java exception in native library loading, that crash most definitely does still pop up in this configuration. I believe this is true for uncaught java exceptions even after native library is loaded, since breakpad has nothing to do with java crashes. Chrome does have JavaExceptionReporter, which inserts itself into the Thread.setDefaultUncaughtExceptionHandler chain. But it's only enabled for the browser process, not child processes. Also it's a chrome-only, and I don't know why, other than probably due to historical reasons. So there's a gap in our crash reporting for child processes. Yay..
,
Jan 31 2017
JavaExceptionReporter works by triggering a breakpad crash, and then trigger a breakpad dump, and have breakpad pick up the java exception (as a string). That obviously won't work, at least for the case I'm worried about, that exception happens before native libraries are even loaded. Other than that, JavaExceptionReporter should work just fine in child processes, I *think*. So ideally, we silence the dialog as early as possible, but then replace that with JavaExceptionReporter after native side is ready. Still need to test everything here though..
,
Feb 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9d06c9d1dbba894cb8cc79230675d3fefd189296 commit 9d06c9d1dbba894cb8cc79230675d3fefd189296 Author: boliu <boliu@chromium.org> Date: Wed Feb 01 15:57:52 2017 android: Remove wrong assert in disableSharedRelros This assert can be hit in practice: In child process, if sharing relro section fails, child will try to load the library again with relro disabled. But at that point, prepareLibraryLoad has already been called. So just remove the assert. BUG= 680775 Review-Url: https://codereview.chromium.org/2661073002 Cr-Commit-Position: refs/heads/master@{#447520} [modify] https://crrev.com/9d06c9d1dbba894cb8cc79230675d3fefd189296/base/android/java/src/org/chromium/base/library_loader/ModernLinker.java
,
Feb 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e commit 7a81c252f5f9cf8981f5df770a5acbb8d736eb6e Author: boliu <boliu@chromium.org> Date: Fri Feb 03 03:41:04 2017 android: Add debug url to throw uncaught exception Currently there are no crash handlers for uncaught java exceptions in child processes. Add a chrome debug url to crash the GPU process with an uncaught java exception to aid in testing this code path. The plumbing is same as chrome://gpucrash, except everything is behind OS_ANDROID guards. The actual implementation of throwing the exception is in base. BUG= 680775 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation Review-Url: https://codereview.chromium.org/2664373002 Cr-Commit-Position: refs/heads/master@{#447908} [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/BUILD.gn [add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/java/src/org/chromium/base/ThrowUncaughtException.java [add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/throw_uncaught_exception.cc [add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/throw_uncaught_exception.h [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/chrome/common/url_constants.cc [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/frame_host/debug_urls.cc [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/gpu/gpu_process_host_ui_shim.cc [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/gpu/gpu_process_host_ui_shim.h [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/common/gpu_host_messages.h [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/gpu/gpu_child_thread.cc [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/gpu/gpu_child_thread.h [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/public/common/url_constants.cc [modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/public/common/url_constants.h
,
Feb 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5 commit 9b51b5184179dd0a76fdc4e0e851b22a8aa842d5 Author: boliu <boliu@chromium.org> Date: Fri Feb 03 15:35:05 2017 android: Move JavaExceptionReporter to base So that it can be used in other processes as well. There as some comment changes, and added @MainDex. But otherwise a straightforward move. BUG= 680775 Review-Url: https://codereview.chromium.org/2667263002 Cr-Commit-Position: refs/heads/master@{#448000} [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/BUILD.gn [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/base_jni_registrar.cc [rename] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java/src/org/chromium/base/JavaExceptionReporter.java [rename] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java_exception_reporter.cc [add] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java_exception_reporter.h [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/android/java/src/org/chromium/chrome/browser/ChromeStrictMode.java [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/android/java_sources.gni [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/app/chrome_main_delegate.cc [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/browser/BUILD.gn [modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/browser/android/chrome_jni_registrar.cc [delete] https://crrev.com/9f800ec0044b9c7c7c88553f406cc5e146e72516/chrome/browser/android/java_exception_reporter.h
,
Feb 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738 commit 29b572b3ffeb491584ce5e6cd24a5eca9ac4d738 Author: boliu <boliu@chromium.org> Date: Mon Feb 06 18:14:36 2017 android: Crash child immediately on uncaught exception By installing our own handler that kills the process immediately. The default handler will pop up a dialog, and the process will stay alive until that dialog is dismissed by user. This is generally bad because browser will not notice this crash until user dismisses the dialog as well. Note that suppressing this dialog matches the behavior for native crashes. See breakpad::FinalizeCrashDoneAndroid for the equivalent logic. Refactor base a bit so the logic to check for release android build can be shared. BUG= 680775 Review-Url: https://codereview.chromium.org/2675013002 Cr-Commit-Position: refs/heads/master@{#448308} [modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/base/android/java/src/org/chromium/base/BuildInfo.java [modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/base/android/java/src/org/chromium/base/CommandLineInitUtil.java [modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/BUILD.gn [modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/java/src/org/chromium/content/app/ChildProcessServiceImpl.java [add] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/java/src/org/chromium/content/app/KillChildUncaughtExceptionHandler.java
,
Feb 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e76f62c07f2dfbddef18d4011c576f678fc1f54b commit e76f62c07f2dfbddef18d4011c576f678fc1f54b Author: boliu <boliu@chromium.org> Date: Wed Feb 08 15:30:36 2017 android: Report java exception in child Enable JavaExceptionReporter for child processes so that unhandled java exceptions generate crash reports. Note this also requires fixing InitNonBrowserCrashReporterForAndroid so that SetDumpWithoutCrashingFunction is called in child processes. Tested with chrome://gpu-java-crash added in r447908, manually enabling breakpad microdumps with local build, and adding --enable-crash-reporter-for-testing. Verified microdump is generated with the java stack information in logcat BUG= 680775 Review-Url: https://codereview.chromium.org/2677563006 Cr-Commit-Position: refs/heads/master@{#449000} [modify] https://crrev.com/e76f62c07f2dfbddef18d4011c576f678fc1f54b/chrome/app/chrome_main_delegate.cc [modify] https://crrev.com/e76f62c07f2dfbddef18d4011c576f678fc1f54b/components/crash/content/app/breakpad_linux.cc
,
Feb 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16397c1b5df27f30968f55166048f14d4d80f0cd commit 16397c1b5df27f30968f55166048f14d4d80f0cd Author: boliu <boliu@chromium.org> Date: Wed Feb 08 17:05:43 2017 Revert of android: Report java exception in child (patchset #2 id:20001 of https://codereview.chromium.org/2677563006/ ) Reason for revert: tobiasjs@ pointed out: DumpWithoutCrash doesn't really work on android in child processes since android child processes sends the dump through a FD, and browser only picks up the FD after it detects child has crashed. So if child calls DumpWithoutCrash and *doesn't* crash, then a subsequent real crash report won't be picked up. Unhandled java exception is actually not the problem here. The problem is all the other places in child processes that calls DumpWithoutCrash. Risk of breaking normal crash reports seems high to me, so reverting to be safe. Original issue's description: > android: Report java exception in child > > Enable JavaExceptionReporter for child processes so that unhandled java > exceptions generate crash reports. > > Note this also requires fixing InitNonBrowserCrashReporterForAndroid so > that SetDumpWithoutCrashingFunction is called in child processes. > > Tested with chrome://gpu-java-crash added in r447908, manually > enabling breakpad microdumps with local build, and adding > --enable-crash-reporter-for-testing. Verified microdump is generated > with the java stack information in logcat > > BUG= 680775 > > Review-Url: https://codereview.chromium.org/2677563006 > Cr-Commit-Position: refs/heads/master@{#449000} > Committed: https://chromium.googlesource.com/chromium/src/+/e76f62c07f2dfbddef18d4011c576f678fc1f54b TBR=rsesek@chromium.org,mariakhomenko@chromium.org,grt@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 680775 Review-Url: https://codereview.chromium.org/2684893003 Cr-Commit-Position: refs/heads/master@{#449018} [modify] https://crrev.com/16397c1b5df27f30968f55166048f14d4d80f0cd/chrome/app/chrome_main_delegate.cc [modify] https://crrev.com/16397c1b5df27f30968f55166048f14d4d80f0cd/components/crash/content/app/breakpad_linux.cc
,
Feb 10 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/328eda136de7172f4970f94774ba68528ab22adc commit 328eda136de7172f4970f94774ba68528ab22adc Author: boliu <boliu@chromium.org> Date: Fri Feb 10 16:39:57 2017 android: Report child exception by crashing This is the second attempt of https://codereview.chromium.org/2677563006/ DumpWithoutCrash should not be enabled for child processes on Android because the browser is not set up to read multiple crash reports for the same child process. So enabling DumpWithoutCrash could lead to real crash reports getting dropped. This new implementation just crashes the child process to cause a crash report to be generated. Note this does not replace the need for KillChildUncaughtExceptionHandler, which is installed much earlier during start up and is designed to not cause hangs for any problems during start up as well. BUG= 680775 Review-Url: https://codereview.chromium.org/2687193003 Cr-Commit-Position: refs/heads/master@{#449632} [modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java/src/org/chromium/base/JavaExceptionReporter.java [modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java_exception_reporter.cc [modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java_exception_reporter.h [modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/chrome/app/chrome_main_delegate.cc [modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/components/crash/content/app/breakpad_linux.cc
,
Feb 10 2017
No plan to revert that again. Also all this work is in an attempt to fix remaining OnGpuChannelTimeout in production, so block 680775 |
||
►
Sign in to add a comment |
||
Comment 1 by boliu@chromium.org
, Jan 12 2017