New issue
Advanced search Search tips

Issue 592556 link

Starred by 9 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Bug


Sign in to add a comment

WebView doesn't handle app callbacks throwing Java exceptions well

Project Member Reported by torne@chromium.org, Mar 7 2016

Issue description

WebView frequently calls application callbacks (via WebViewClient and WebChromeClient), and does not deal well with the application throwing a Java exception in one of these callbacks (hits assertion in base::android::CheckException). This results in a native crash and breakpad dump, which can obscure the original Java exception and mislead people into thinking that WebView has a problem when it's actually the app.

Over 25% of our native crashes are caused by Java exceptions being thrown and the vast majority of these are application code throwing exceptions and not real webview issues. Many of these exceptions are thrown in shouldOverrideUrlLoading, with onGeolocationPermissionsShowPrompt as the runner-up.

It'd be really nice to come up with a way to not have this result in a native crash, both to make it clearer to developers that their app is broken, and to remove this source of noise from WebView crash data so we can more easily spot *other* java exceptions that *are* caused by WebView code (there are a few, but they are somewhat lost in the huge noise of app exceptions).

Some options I've thought of in increasing order of difficulty:

1) We could catch application exceptions in the Java side code, log them, and return a default value (e.g. true for shouldOverrideUrlLoading to stop the load happening, or "denied" for geolocation permission). This probably isn't acceptable, because it will mean the app doesn't crash, and so will only be noticed if the developer happens to spot the logged error when running the app locally (or if a user reports it), instead of the developer getting crash reports from the field.

2) We could catch the exception and then rethrow it by posting a task to the UI thread's Handler. This would mean that the app would still crash, but from Java instead of native. We'd still have to return some default value as with 1, to keep the chromium native code happy. The crash might be delayed by an arbitrarily long time since it would depend how soon the task got run (there might be other queued work) and it's possible that the app would end up crashing in some OTHER way first because it's now in a somewhat ill-defined state, but I think generally this would work. It would result in a weird looking Java crash stack, though: we'd have to wrap the existing exception in a new one to rethrow it while preserving the original backtrace and it'd still require developers to notice that the WebView exception was caused-by their own exception.

3) For the callbacks which happen on the app's UI thread already (which I think includes the two mentioned above), we could, with some considerable refactoring to Chromium, actually handle the exception as JNI intends by changing the way that nativeDoRunLoopOnce works:

 - call the callback in an "unchecked" way such that it doesn't immediately assert when noticing it raised an exception

 - check for the JNI exception status ourselves in the native code

 - if exception occurred, we'd then need to set some thread-local "java exception, everybody panic" flag and return from the task we're currently running

 - base::MessageLoop should notice the flag is set and bail out of the loop, not running any later tasks (it must not try to call any java methods now that we're in an exception state)

 - nativeDoRunLoopOnce can then return to the JVM *without* clearing the exception state, which will cause the VM to actually throw the *original* Java exception with the original Java stack, with no gross native stuff stuck in there, just as if it was all Java the whole time :)

This should be viable for any callbacks that we execute on an existing Android Java thread from inside nativeDoRunLoopOnce, since we were always going to have returned to the VM at some point. It won't work for cases where we're running on a Chromium thread, but I don't think that applies to most of the problematic callbacks.

Any other ideas/comments on the above? Anyone fancy working on it? :p
 
I would say that something like 3) is the only viable option, and 2) can only be considered for prototyping and testing waters. Since WebView already works under quite complex conditions and people try doing crazy things with it, introducing unpredictable and confusing behavior isn't what we really want. 

Comment 2 by torne@chromium.org, Mar 9 2016

Yeah, probably the first two are going to be too weird for app developers to deal with. I think the most critical cases are invoked from nativeDoRunLoopOnce, so the tricky part would just be adding the code path to the message loop code to allow JNI code to bail in this very specific way. But I guess I'm a base OWNER now so hey ;)

Comment 3 by torne@chromium.org, Apr 22 2016

Issue 605990 has been merged into this issue.

Comment 4 by hush@chromium.org, Jun 3 2016

Cc: hush@chromium.org
Issue 629428 has been merged into this issue.
Owner: gsennton@chromium.org
Status: Assigned (was: Available)
Hmmm, could we add a test to ensure this doesn't regress somehow? (at least in specific cases such as for ShouldOverrideUrlLoading) We would have to catch the wanted Java exception somehow but AFAIK we don't have any way of reaching our UI thread SystemMessageHandler from e.g. an instrumentation test (to catch the exception).

I guess setting the uncaughtExceptionHandler wouldn't actually help us when creating a test since we can't prevent the exception from causing the process to crash? (or can we?) 

Comment 8 by torne@chromium.org, Jul 20 2016

I don't think this is going to be easy to test, no; it's explicitly not supposed to be something the app can catch. There's no way afaik to prevent the process crashing after the uncaught exception handler returns, so unless we can have a test that can flag itself as passed and then crash without that being a fail, I'm not sure we can do much here?
So we would have to start a separate process to load WebView in, and then use the uncaught-exception-handler for the UI thread of that process to notify our test (running in the original process) whether everything went as expected?

Could we just start a service in a separate process and have that service load WebView? 

Comment 10 by boliu@chromium.org, Jul 20 2016

I thought the uncaught exception handler *is* the thing that's causing the crash, so if we replace it, then it won't crash on an unhandled exception. That's not actually the case..?

Comment 11 by boliu@chromium.org, Jul 20 2016

> Could we just start a service in a separate process and have that service load WebView?

Too overkill maybe..? Is it possible to unit test the message loop changes only?
I created a test (currently not running because we have "temporarily" blocked the functionality under test) that creates a second browser instance inside a service running as a separate process, but within the same app. You can check it out here:

https://cs.chromium.org/chromium/src/android_webview/javatests/src/org/chromium/android_webview/test/AwSecondBrowserProcessTest.java

https://cs.chromium.org/chromium/src/android_webview/test/shell/src/org/chromium/android_webview/test/SecondBrowserProcess.java

Note that such a Service can't have any UI.

Comment 13 by torne@chromium.org, Jul 21 2016

The default Android uncaught exception handler logs to logcat, displays the crash dialog, and then exits the entire process.

I think I was actually wrong that there's no way to prevent the *process* crashing, though. The *thread* will always die when you return to the VM from the uncaught exception handler, there's no way to prevent that (unlike native signals), but if you set a different uncaught exception handler that doesn't kill the process then the other threads should keep going.

So, that might be easier to test than I thought. If the main thread dies then that might be a special case, though? Even if the VM/OS don't care (which I'm not sure about either way), then Android-specific will still be in a weird state since ActivityThread won't be able to receive messages any more, so it might be necessary to test this using a non-main thread?

Comment 14 by boliu@chromium.org, Jul 21 2016

that means apps can in theory make these not crash as well, which means we have to leave message loop in a consistent state after an exception (I guess was always an requirement)

Comment 15 by torne@chromium.org, Jul 21 2016

The current thread will still die, which will mean that message loop will never get executed again. We should leave *global* state consistent so that other threads don't crash while the crashing thread is in the process of dying, but Chromium is definitely going to be broken if the UI thread is no longer executing tasks.

Apps are already able to do worse today, by intercepting SIGABRT and not exiting the process, which will cause us to continue executing after an intentional native abort(). I think it's fine to tell anyone doing something totally unexpected that they shouldn't be doing it :)

Comment 16 by boliu@chromium.org, Jul 21 2016

that's terrible :/

Comment 17 by torne@chromium.org, Jul 21 2016

That's what happens when you're a library in a process someone else controls, alas.

I don't think anyone is *intentionally* going to resume from a fatal signal, but our own signal handler for breakpad has had bugs where we don't do things right before, and it's possible other people may make similar mistakes :/
[to confirm #13] I tried setting the UncaughtExceptionHandler and it does indeed prevent the process from crashing (the instrumentation test which runs from a different thread continues to run after the UnCaughtExceptionHandler is triggered).
(though it seems the shell command that started the instrumentation test times out, so a test where the UI thread crashes does end up failing anyway)

re Mikhail (#12): does it matter that the Service can't have any UI given that we only want to load some url with the WebView to cause shouldOverrideUrlLoading to crash?

re Bo (#11): we can probably unit test some of our requirements (like not calling any new tasks after we 'abort' the current task because it crashed), but AFAICT it would be fiddly (or impossible?) to ensure that we don't call back into java at all after crashing (there are other methods such as message_pump_android.Quit() that call into java). It would be nice to have an instrumentation test for this to ensure that we don't regress this behaviour anywhere in the call stack down to shouldOverrideUrlLoading.
Maybe not, I guess an off-screen WebView should work. Be aware that you shouldn't be creating any WebViews in the main process, as the data dir is shared.

Comment 21 by boliu@chromium.org, Jul 28 2016

do you need the all the service stuff now that you've tested UncaughtExceptionHandler can prevent crashes on an exception?

I was mainly objecting to having to a huge amount of set up (service and whatnot) just for a single test. Adding an instrumentation test in the main process sounds fine.
The instrumentation command (adb shell am instrument -w -r -e class org.chromium.android_webview.test\.JniCrashTest#t    estShouldOverrideUrlLoadingCrash org.chromium.android_webview.test/org.chrom    ium.android_webview.test.AwInstrumentationTestRunner) times out when I throw an exception from shouldOverrideUrlLoading in the main process and override its uncaughtExceptionHandler. Not sure why this is (I guess the instrumentation command is waiting for some kind of response from the main thread).
The instrumentation command is stuck at
com.android.commands.am.Am$InstrumentationWatcher.waitForFinish(Am.java:1739)

which waits for InstrumentationWatcher.instrumentationFinished to be called. This happens when the ActivityManagerService calls finishInstrumentationLocked which is called from Instrumentation.finish(..).

Comment 24 by boliu@chromium.org, Jul 29 2016

I guess UI thread is just dead, so it didn't do its clean shutdown and notify the instrumentation thread, something like that..?
yeah, probably, something like that :P (but that doesn't help :/)

Comment 26 by boliu@chromium.org, Jul 29 2016

hmm, can you write the test with a callback that's not on the UI thread? I guess the only other option is javabridge thread

alternative is somehow force the test to shutdown after the exception is caught, but unless that's a supported thing, it's always going to be brittle
When you say "a callback that's not on the UI thread" I assume you mean 'rather than shouldOverrideUrlLoading'? The point of the test is to test this for shouldOverrideUrlLoading (since that is the only callback I'm fixing right now) so we will always be running on the UI thread.
I'm leaning towards just causing the crash in a separate process.
Uploaded CL for testing this in a separate process in
https://codereview.chromium.org/2201783003/
Project Member

Comment 30 by bugdroid1@chromium.org, Aug 18 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac

commit ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac
Author: gsennton <gsennton@chromium.org>
Date: Thu Aug 18 22:34:12 2016

Properly throw java exceptions from shouldOverrideUrlLoading

[WebView]
Since all our java methods called through the jni-interface are
annotated with @CalledByNative (rather than @CalledByNativeUnchecked)
any java crash in those methods will cause a native crash since we call
jni_android::CheckException (which aborts in native code) when returning
from those methods.

In this CL we annotate the jni-call for shouldOverrideUrlLoading with
@CalledByNativeUnchecked to avoid crashing directly after the jni-call.

We also add an Abort function for the android specific message loop and
message pump to ensure no new tasks will be run before we return to the
Java handler (at which point the original Java exception will be
rethrown and thus correctly reported).

BUG=592556

Review-Url: https://codereview.chromium.org/2169553002
Cr-Commit-Position: refs/heads/master@{#412956}

[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/android_webview/browser/aw_content_browser_client.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/android_webview/java/src/org/chromium/android_webview/AwContentsClientBridge.java
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/android_webview/native/aw_contents_client_bridge.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/BUILD.gn
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/android/java/src/org/chromium/base/SystemMessageHandler.java
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/android/java_handler_thread.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/android/java_handler_thread.h
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/android/java_message_handler_factory.h
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/message_loop/message_loop.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/message_loop/message_loop.h
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/message_loop/message_loop_unittest.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/message_loop/message_pump_android.cc
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/message_loop/message_pump_android.h
[modify] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/BUILD.gn
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/android/java/src/org/chromium/base/TestSystemMessageHandler.java
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/android/java_handler_thread_for_testing.cc
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/android/java_handler_thread_for_testing.h
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/android/test_system_message_handler_link_android.cc
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/android/test_system_message_handler_link_android.h
[add] https://crrev.com/ebe2e203bd421bc1a5e4c28a48f7d54d407c6fac/base/test/run_all_base_unittests.cc

As we've explained, the native crashes are caused by the app throwing a java exception and this is an app bug; this bug is about making this clearer to both us and the app developer, not fixing the crash.
Blockedon: 718921
Blockedon: 718929
I'm assigning the bugs to me for now, to avoid the bugcop queue.
Blockedon: 718936
Blockedon: 718938
Note that in our new bugcop workflow, you don't really need to assign them to anybody unless you really want to. If you are not planning to do it for current milestone, you can lower it to P3 and leave it as available. 
Blockedon: 718949
Right, thanks Selim, I'll un-assign the ones I won't look at anytime soon.

For prioritization:
Numbers from WebView 58:
RunJavaScriptDialog        30% of JNI crashes
shouldInterceptRequest     19% of JNI crashes
InputStreamImpl::Read      7%
VideoCaptureDeviceFactoryAndroid::GetSupportedFormats  6%

and the rest if the calls take up less than 5% each of the JNI crashes. So atm, it looks like we should prioritize RunJavaScriptDialog.
Blockedon: 719394
Blockedon: 719395
Blockedon: 719396
Blockedon: 719400
Blockedon: 719406
Blockedon: 723699
Cc: -sgu...@chromium.org
Blockedon: -719394
Cc: -hush@chromium.org -mnaga...@chromium.org
Owner: ----
Status: Available (was: Assigned)
Unassigning this since I'm no longer on WebView, this is really just a tracker bug anyway.
Cc: tobiasjs@chromium.org
Labels: -Pri-2 Pri-3
Lowering priority because this doesn't belong in the bugcop queue.
> 3) For the callbacks which happen on the app's UI thread already (which I think includes the two mentioned above)

I would have expected all our callbacks to be on the UI thread. Which ones don't guarantee that?
shouldInterceptRequest happens on whichever chromium network thread we were already on, to avoid a thread hop for performance in the default case of "do not intercept". Reading from the input stream in the case where a request is intercepted happens on a dedicated thread pool. JS interface callbacks also happen on some arbitrary thread I think?

Almost all our callbacks do happen on the UI thread but there's a few exceptions like these - that *might* be all of them but it's hard to be sure without checking a lot ;)
Also there's a number of Java exceptions that don't come from app callbacks, just regular calls from our java code directly to the framework, that appear in the data, which might happen on just about any thread - but we typically want to fix these by catching the error in our java code and handling it without crashing, not by rethrowing carefully, since it usually represents weird framework compat issues on third party devices that we need to work around.

Sign in to add a comment