New issue
Advanced search Search tips

Issue 680775 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug

Blocking:
issue 680777



Sign in to add a comment

Android Release (Nexus 5X) flake OnGpuChannelTimeout crash

Project Member Reported by boliu@chromium.org, Jan 12 2017

Issue description

Forked from issue 664341. Pasting relevant info here:

failed to load the native library, yet it did *not* kill the child process.

08-03 01:21:28.068  3818  3843 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
08-03 01:21:28.107  3818  3846 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
08-03 01:21:28.107  3818  3846 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load
08-03 01:21:28.107  3818  3846 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so
08-03 01:21:28.107  3818  3846 W cr_ChildProcessService: Failed to load native library with shared RELRO, retrying without
08-03 01:21:28.108  3818  3846 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain
08-03 01:21:28.108  3818  3846 E AndroidRuntime: Process: org.chromium.chrome:privileged_process0, PID: 3818
08-03 01:21:28.108  3818  3846 E AndroidRuntime: java.lang.AssertionError
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at org.chromium.base.library_loader.ModernLinker.disableSharedRelros(ModernLinker.java:254)
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at org.chromium.content.app.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:195)
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at java.lang.Thread.run(Thread.java:818)


The library load failure in #112 is a major (5.5% of builds) source of flakiness on "Android Release (Nexus 5X)" bot, which prevents us from adding its mirror android_optional_gpu_tests_rel to CQ.

https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5187
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5186
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5172
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5147
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5124
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5116
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5113
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5069
https://build.chromium.org/p/chromium.gpu.fyi/builders/Android%20Release%20%28Nexus%205X%29/builds/5055
 

Comment 1 by boliu@chromium.org, Jan 12 2017

> Is this blocked on https://github.com/catapult-project/catapult/issues/3110 to be fixed?

No
I don't see any new occurrences since https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Android%20Release%20%28Nexus%205X%29/5187. Which was 2 weeks and 300 builds ago.
Since you suspect this was caused by some slowness in issue 680777, I think maybe this was caused by problems with NTP (New Task Page) and FRE (First Run Experience) fixed in issues 673831, 632199, 672255.

Comment 3 by boliu@chromium.org, Jan 25 2017

I don't think any of that matters to linker or asserts.

agrieve: can you summarize how java assert works on M?
When asserts are enabled: 
https://cs.chromium.org/chromium/src/build/config/android/internal_rules.gni?q=is_java_debug&sq=package:chromium&dr=C&l=1048

Then they throw an AssertionError() when the conditions fail (same as pre-L). These exceptions are catchable via "catch (Throwable e)".

Comment 5 by boliu@chromium.org, Jan 30 2017

Ahh, I know why the assertion is not crashing the process now, at least in this build configuration. I tested by just inserting an assertion failure in child start up code. When the exception is thrown, android pops up the "Unfortunately, Chromium has stopped" dialog. The process will hang around until that dialog is dismissed!!

I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production.

That just leaves actually fixing the wrong assertion, which is trivial

Side comment.. modifying java bytecode to get assertions? That's crazy. I didn't even know that's possible..
https://cs.chromium.org/chromium/src/build/android/java_assertion_enabler/java/org/chromium/javaassertionenabler/AssertionEnabler.java

Comment 6 by boliu@chromium.org, Jan 31 2017

Pasting from original bug for reference. The assert in disableSharedRelros is wrong, and can be triggered if sharing relro region in child process fails (not sure why it fails though):

08-03 01:21:28.068  3818  3843 I cr_LibraryLoader: Using linker: org.chromium.base.library_loader.ModernLinker
08-03 01:21:28.107  3818  3846 I cr_LibraryLoader: Loading chrome from within /data/app/org.chromium.chrome-1/base.apk
08-03 01:21:28.107  3818  3846 E cr_ChromiumAndroidLinker: LoadLibrary: Failed to obtain fixed address for load
08-03 01:21:28.107  3818  3846 E cr_LibraryLoader: Unable to load library: /data/app/org.chromium.chrome-1/base.apk!/lib/arm64-v8a/crazy.libchrome.so
08-03 01:21:28.107  3818  3846 W cr_ChildProcessService: Failed to load native library with shared RELRO, retrying without
08-03 01:21:28.108  3818  3846 E AndroidRuntime: FATAL EXCEPTION: ChildProcessMain
08-03 01:21:28.108  3818  3846 E AndroidRuntime: Process: org.chromium.chrome:privileged_process0, PID: 3818
08-03 01:21:28.108  3818  3846 E AndroidRuntime: java.lang.AssertionError
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at org.chromium.base.library_loader.ModernLinker.disableSharedRelros(ModernLinker.java:254)
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at org.chromium.content.app.ChildProcessServiceImpl$2.run(ChildProcessServiceImpl.java:195)
08-03 01:21:28.108  3818  3846 E AndroidRuntime:  at java.lang.Thread.run(Thread.java:818)
Worth filing a bug about the dialog causing a timeout. I'm not sure if there's already code somewhere that works around this, but it is suppressible via Thread.setDefaultUncaughtExceptionHandler()

Here's the default uncaughtExceptionHandler:
https://github.com/android/platform_frameworks_base/blob/master/core/java/com/android/internal/os/RuntimeInit.java#L116

Perhaps we want to disable it when instrumentation is active?

Comment 8 by boliu@chromium.org, Jan 31 2017

> I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production.

Err.. don't remember where I got that idea from. But using chrome://gpucrash on 55.0.2883.91 (current stable) official build, that most definitely causes this dialog.

But gpucrash is native crash, so there is no chance of process surviving longer than the crash. Might be worthwhile to test this with a java exception instead. If this crashes as well, this could potentially explain a lot of the gpu timeouts in production.

Comment 9 by boliu@chromium.org, Jan 31 2017

> Might be worthwhile to test this with a java exception instead.

Actually, gonna skip that step. Seems there is enough evidence already..

Comment 10 by boliu@chromium.org, Jan 31 2017

> > I know in release (or official?) builds, that dialog is supposed to be suppressed, so I think this isn't an issue in production.
>
> Err.. don't remember where I got that idea from. But using chrome://gpucrash on 55.0.2883.91 (current stable) official build, that most definitely causes this dialog.

That suppression does exist, for native crashes, but is only enabled on user devices; see FinalizeCrashDoneAndroid in breakpad_linux.cc. Can remove the userdebug/eng check, and then add --enable-crash-reporter-for-testing, then chrome://crash won't throw up that dialog anymore.

But if I insert the java exception in native library loading, that crash most definitely does still pop up in this configuration. I believe this is true for uncaught java exceptions even after native library is loaded, since breakpad has nothing to do with java crashes.

Chrome does have JavaExceptionReporter, which inserts itself into the Thread.setDefaultUncaughtExceptionHandler chain. But it's only enabled for the browser process, not child processes. Also it's a chrome-only, and I don't know why, other than probably due to historical reasons.

So there's a gap in our crash reporting for child processes. Yay..

Comment 11 by boliu@chromium.org, Jan 31 2017

JavaExceptionReporter works by triggering a breakpad crash, and then trigger a breakpad dump, and have breakpad pick up the java exception (as a string). That obviously won't work, at least for the case I'm worried about, that exception happens before native libraries are even loaded.

Other than that, JavaExceptionReporter should work just fine in child processes, I *think*. So ideally, we silence the dialog as early as possible, but then replace that with JavaExceptionReporter after native side is ready.

Still need to test everything here though..
Project Member

Comment 12 by bugdroid1@chromium.org, Feb 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9d06c9d1dbba894cb8cc79230675d3fefd189296

commit 9d06c9d1dbba894cb8cc79230675d3fefd189296
Author: boliu <boliu@chromium.org>
Date: Wed Feb 01 15:57:52 2017

android: Remove wrong assert in disableSharedRelros

This assert can be hit in practice:
In child process, if sharing relro section fails, child will try to load
the library again with relro disabled. But at that point,
prepareLibraryLoad has already been called.

So just remove the assert.

BUG= 680775 

Review-Url: https://codereview.chromium.org/2661073002
Cr-Commit-Position: refs/heads/master@{#447520}

[modify] https://crrev.com/9d06c9d1dbba894cb8cc79230675d3fefd189296/base/android/java/src/org/chromium/base/library_loader/ModernLinker.java

Project Member

Comment 13 by bugdroid1@chromium.org, Feb 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e

commit 7a81c252f5f9cf8981f5df770a5acbb8d736eb6e
Author: boliu <boliu@chromium.org>
Date: Fri Feb 03 03:41:04 2017

android: Add debug url to throw uncaught exception

Currently there are no crash handlers for uncaught java exceptions
in child processes. Add a chrome debug url to crash the GPU process
with an uncaught java exception to aid in testing this code path.

The plumbing is same as chrome://gpucrash, except everything is
behind OS_ANDROID guards. The actual implementation of throwing
the exception is in base.

BUG= 680775 
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/2664373002
Cr-Commit-Position: refs/heads/master@{#447908}

[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/BUILD.gn
[add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/java/src/org/chromium/base/ThrowUncaughtException.java
[add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/throw_uncaught_exception.cc
[add] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/base/android/throw_uncaught_exception.h
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/chrome/common/url_constants.cc
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/frame_host/debug_urls.cc
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/gpu/gpu_process_host_ui_shim.cc
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/browser/gpu/gpu_process_host_ui_shim.h
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/common/gpu_host_messages.h
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/gpu/gpu_child_thread.cc
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/gpu/gpu_child_thread.h
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/public/common/url_constants.cc
[modify] https://crrev.com/7a81c252f5f9cf8981f5df770a5acbb8d736eb6e/content/public/common/url_constants.h

Project Member

Comment 14 by bugdroid1@chromium.org, Feb 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5

commit 9b51b5184179dd0a76fdc4e0e851b22a8aa842d5
Author: boliu <boliu@chromium.org>
Date: Fri Feb 03 15:35:05 2017

android: Move JavaExceptionReporter to base

So that it can be used in other processes as well.
There as some comment changes, and added @MainDex. But otherwise
a straightforward move.

BUG= 680775 

Review-Url: https://codereview.chromium.org/2667263002
Cr-Commit-Position: refs/heads/master@{#448000}

[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/BUILD.gn
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/base_jni_registrar.cc
[rename] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java/src/org/chromium/base/JavaExceptionReporter.java
[rename] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java_exception_reporter.cc
[add] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/base/android/java_exception_reporter.h
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/android/java/src/org/chromium/chrome/browser/ChromeStrictMode.java
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/android/java_sources.gni
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/app/chrome_main_delegate.cc
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/browser/BUILD.gn
[modify] https://crrev.com/9b51b5184179dd0a76fdc4e0e851b22a8aa842d5/chrome/browser/android/chrome_jni_registrar.cc
[delete] https://crrev.com/9f800ec0044b9c7c7c88553f406cc5e146e72516/chrome/browser/android/java_exception_reporter.h

Project Member

Comment 15 by bugdroid1@chromium.org, Feb 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738

commit 29b572b3ffeb491584ce5e6cd24a5eca9ac4d738
Author: boliu <boliu@chromium.org>
Date: Mon Feb 06 18:14:36 2017

android: Crash child immediately on uncaught exception

By installing our own handler that kills the process immediately.
The default handler will pop up a dialog, and the process will stay
alive until that dialog is dismissed by user. This is generally bad
because browser will not notice this crash until user dismisses the
dialog as well.

Note that suppressing this dialog matches the behavior for native
crashes. See breakpad::FinalizeCrashDoneAndroid for the equivalent
logic.

Refactor base a bit so the logic to check for release android build
can be shared.

BUG= 680775 

Review-Url: https://codereview.chromium.org/2675013002
Cr-Commit-Position: refs/heads/master@{#448308}

[modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/base/android/java/src/org/chromium/base/BuildInfo.java
[modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/base/android/java/src/org/chromium/base/CommandLineInitUtil.java
[modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/BUILD.gn
[modify] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/java/src/org/chromium/content/app/ChildProcessServiceImpl.java
[add] https://crrev.com/29b572b3ffeb491584ce5e6cd24a5eca9ac4d738/content/public/android/java/src/org/chromium/content/app/KillChildUncaughtExceptionHandler.java

Project Member

Comment 16 by bugdroid1@chromium.org, Feb 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e76f62c07f2dfbddef18d4011c576f678fc1f54b

commit e76f62c07f2dfbddef18d4011c576f678fc1f54b
Author: boliu <boliu@chromium.org>
Date: Wed Feb 08 15:30:36 2017

android: Report java exception in child

Enable JavaExceptionReporter for child processes so that unhandled java
exceptions generate crash reports.

Note this also requires fixing InitNonBrowserCrashReporterForAndroid so
that SetDumpWithoutCrashingFunction is called in child processes.

Tested with chrome://gpu-java-crash added in r447908, manually
enabling breakpad microdumps with local build, and adding
--enable-crash-reporter-for-testing. Verified microdump is generated
with the java stack information in logcat

BUG= 680775 

Review-Url: https://codereview.chromium.org/2677563006
Cr-Commit-Position: refs/heads/master@{#449000}

[modify] https://crrev.com/e76f62c07f2dfbddef18d4011c576f678fc1f54b/chrome/app/chrome_main_delegate.cc
[modify] https://crrev.com/e76f62c07f2dfbddef18d4011c576f678fc1f54b/components/crash/content/app/breakpad_linux.cc

Project Member

Comment 17 by bugdroid1@chromium.org, Feb 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/16397c1b5df27f30968f55166048f14d4d80f0cd

commit 16397c1b5df27f30968f55166048f14d4d80f0cd
Author: boliu <boliu@chromium.org>
Date: Wed Feb 08 17:05:43 2017

Revert of android: Report java exception in child (patchset #2 id:20001 of https://codereview.chromium.org/2677563006/ )

Reason for revert:
tobiasjs@ pointed out:
DumpWithoutCrash doesn't really work on android in
child processes since android child processes sends
the dump through a FD, and browser only picks up
the FD after it detects child has crashed. So if
child calls DumpWithoutCrash and *doesn't* crash,
then a subsequent real crash report won't be picked
up.

Unhandled java exception is actually not the
problem here. The problem is all the other places
in child processes that calls DumpWithoutCrash.

Risk of breaking normal crash reports seems high
to me, so reverting to be safe.

Original issue's description:
> android: Report java exception in child
>
> Enable JavaExceptionReporter for child processes so that unhandled java
> exceptions generate crash reports.
>
> Note this also requires fixing InitNonBrowserCrashReporterForAndroid so
> that SetDumpWithoutCrashingFunction is called in child processes.
>
> Tested with chrome://gpu-java-crash added in r447908, manually
> enabling breakpad microdumps with local build, and adding
> --enable-crash-reporter-for-testing. Verified microdump is generated
> with the java stack information in logcat
>
> BUG= 680775 
>
> Review-Url: https://codereview.chromium.org/2677563006
> Cr-Commit-Position: refs/heads/master@{#449000}
> Committed: https://chromium.googlesource.com/chromium/src/+/e76f62c07f2dfbddef18d4011c576f678fc1f54b

TBR=rsesek@chromium.org,mariakhomenko@chromium.org,grt@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 680775 

Review-Url: https://codereview.chromium.org/2684893003
Cr-Commit-Position: refs/heads/master@{#449018}

[modify] https://crrev.com/16397c1b5df27f30968f55166048f14d4d80f0cd/chrome/app/chrome_main_delegate.cc
[modify] https://crrev.com/16397c1b5df27f30968f55166048f14d4d80f0cd/components/crash/content/app/breakpad_linux.cc

Project Member

Comment 18 by bugdroid1@chromium.org, Feb 10 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/328eda136de7172f4970f94774ba68528ab22adc

commit 328eda136de7172f4970f94774ba68528ab22adc
Author: boliu <boliu@chromium.org>
Date: Fri Feb 10 16:39:57 2017

android: Report child exception by crashing

This is the second attempt of
https://codereview.chromium.org/2677563006/

DumpWithoutCrash should not be enabled for child processes on Android
because the browser is not set up to read multiple crash reports for the
same child process. So enabling DumpWithoutCrash could lead to real crash
reports getting dropped.

This new implementation just crashes the child process to cause a crash
report to be generated.

Note this does not replace the need for
KillChildUncaughtExceptionHandler, which is installed much earlier
during start up and is designed to not cause hangs for any problems
during start up as well.

BUG= 680775 

Review-Url: https://codereview.chromium.org/2687193003
Cr-Commit-Position: refs/heads/master@{#449632}

[modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java/src/org/chromium/base/JavaExceptionReporter.java
[modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java_exception_reporter.cc
[modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/base/android/java_exception_reporter.h
[modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/chrome/app/chrome_main_delegate.cc
[modify] https://crrev.com/328eda136de7172f4970f94774ba68528ab22adc/components/crash/content/app/breakpad_linux.cc

Comment 19 by boliu@chromium.org, Feb 10 2017

Blocking: 680777
Status: Fixed (was: Available)
No plan to revert that again.

Also all this work is in an attempt to fix remaining OnGpuChannelTimeout in production, so block 680775

Sign in to add a comment