Issue metadata
Sign in to add a comment
|
signal 5 (SIGTRAP), code 4 (TRAP_HWBKPT) errors in Android App using embedded WebView
Reported by
m...@mikehardy.net,
Dec 19
|
||||||||||||||||||||||
Issue descriptionSteps to reproduce the problem: 1. Use AnkiDroid (a published open source app on the Play Store) 2. ?? 3. Crash What is the expected behavior? No Crash What went wrong? I'm a developer (one of the AnkiDroid devs) so I want to provide as many details as possible, but I don't have much to go on, my apologies. People using our app have started seeing this crash increase in frequency starting around Dec 3, with another large increase on Dec 6. The application itself has not been updated, so I suspect it was a Chrome / WebView update Crashed report ID: How much crashed? Just one tab Is it a problem with a plugin? No Did this work before? Yes unknown - only seen on Android 7, 7.1, 8.0 and 8.1. Possibly different Chrome / Webviews Chrome version: 71.0.3578.98 Channel: stable OS Version: 8.0 Flash Version: You can see the play store entry here, I understand that may allow you to look up WebView crashes in an internal system? https://play.google.com/store/apps/details?id=com.ichi2.anki The errors look like this in the Play Console: 4 minutes ago on app version 20804300 Motorola Moto G (5) Plus (potter_n), Android 8.1 Report 1 *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** pid: 0, tid: 0 >>> com.ichi2.anki <<< backtrace: #00 pc 0000000000e558a4 /data/app/com.android.chrome-bMimQoOQMudnJIFwXv3JiQ==/base.apk #01 pc 0000000000e557bd /data/app/com.android.chrome-bMimQoOQMudnJIFwXv3JiQ==/base.apk As mentioned, I wish I had more information to offer any adventurous triage person but I don't. If you could offer guidance on how to troubleshoot this I would appreciate it. We have about 1 million installs on active devices and see about 100 of these a day right now.
,
Dec 20
,
Dec 20
Unable to reproduce this issue on Pixel XL and Samsung J7, installed https://play.google.com/store/apps/details?id=com.ichi2.anki app and no crash of app is seen. @ mike: Is this issue consistently reproducible? Is this issue seen only on Moto G5 or seen on any other devices. This information would help in further triaging of the issue. Could someone from WebView team please look at the backtrace attached in C#0. Thanks!
,
Dec 20
Hi there! Thanks very much for the triage. I'll do my best to be responsive to any requests for information as we investigate I'm unable to reproduce it myself. I'm not sure what combination of versions (Android + WebView) is needed in combination with what user provided data (flashcard collection data) can trigger it unfortunately. The main bit of data I have is that with no new app release, and not enough time to justify a broad-based increase in error reports as a slow change in user behavior, we saw errors increase across multiple devices. I will attach the Play Console crash view showing version and device distribution. Looks like the Galaxy S8 (SCV36) is our top crasher, out of step with it's representation in our userbase of a total of maybe 60k users out of our almost 1MM total users
,
Dec 20
Thank you for providing more feedback. Adding the requester to the cc list. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 20
Just to be super clear - the I'm not sure exactly which device code the S8 SVC36 maps to (dreamlte? dream2lte? etc) so my device count quote was for all S8 models totaled, if the SVC36 is only one of them then the statistical overrepresentation would be even sharper. That said, you can see that we get this crash on all sorts of devices
,
Dec 20
If that's all it shows on the Play Store that's surprising - it used to include more information than that :|
We have ~1000 crashes for this app reported but there's no spike in December; there's a large number of crashes from one particular webview version but unfortunately this was a version in which we messed up our crash sampling rate and started collecting a larger fraction than intended, which means I can't make actual valid statistical comparisons there. The numbers for other versions are so low that it doesn't look like there was any dramatic change though. We have only single digit numbers of crashes for other times/versions. It's hard to compare the data we have to the play store dashboard because our 1% sampling rate means we have too few entries to get a good distribution.
There's only one identifiable crash with a good stack trace happening at any noticable rate and it's issue 889460 (sorry, not publicly visible) which is specific to the Redmi Note 5 ("whyred") and currently suspected to be a hardware/driver/kernel issue. The other devices don't have any particularly interesting patterns, most of their crashes are bad stacks that aren't easy to interpret but some of them are the exact crash you posted above (same address offset).
It's a LOG(FATAL), which is expected from TRAP_HWBKPT, but which one is a challenge to determine. The logcat output would tell you immediately but that's not available in our crash system or in the play developer console.
One fairly likely possible cause is virtual address space exhaustion - this is what's happening in a few of the decodable stacks I see, and it unfortunately causes the crash dumping process to not always work correctly and result in this form of bad stack. If this is what's happening then there's not a lot we or you can do about it without a way to reproduce the issue - running out of address space is not uncommon and may be caused either by the content just actually using a lot of memory, or by something leaking memory in our code (or your app), and it's not usually possible to figure out what really happened from a crash dump. If you see a spike that roughly coincides with a webview update then it's definitely plausible that we've got a bug that is affecting memory usage.
I'm not sure there's much else we can do here to investigate this, though - you'd need to communicate with your users to see if any of them can reproduce it, and if so to explain how, or at least try to grab logcat output during the crash.
We are replacing our crash dump generation system in the near future with one that will handle this case much better and produce more useful dumps in scenarios where the current one fails, which will at least help a little with this type of issue :|
,
Dec 20
Hmm we have ACRA (Acra.ch is the homepage, also open source), good crash report system) integrated with the app so we can get logcat output but we filter for our package name. Should we expand the filter to include something webview related maybe catch something? Unfortunately I don't think Acra is able to get these since they are native crashes but it's worth a try for the future maybe. Sounds like there really isn't anything actionable then unfortunately. The crash system change in the future sounds like the real action item. I should mention the word spike is a bit strong - a chose words like increase or uptick because it seemed like 400% or so, but it was off a low base. Shame it wasn't useful for digging an error out...
,
Dec 20
If you're capturing logcat on android 4.1 and later you shouldn't need to filter it: the system only lets you read the logcat output produced by your own process, so it doesn't matter what the tag/etc actually is, it's all code running in your app. Unfortunately the multiprocess WebView renderer on recent OS versions does run in a *different* process to yours and I'm not sure if the logcat output from that will be included at all (it's still associated with your application in the framework so it might be, but I haven't tried). If your crash tool doesn't work for native crashes, though, then it's not going to help here. We don't capture logcat, intentionally, because we don't want to accidentally catch any PII or other user/app data; our crash system is designed to give us only the minimal information to get a stack backtrace for the parts that are related to Chromium. Switching to our new crash system will make getting to the point I'm at here easier, but if I'm right that this is caused by running out of address space then it's not going to really help us understand *why* any better, it'll just confirm that that is actually the reason. :| If this is a flashcard app with user-generated content I'd probably be suspicious that users are just using really big images in their content; if you flick through enough very large images that have to be decoded and resized for rendering then that uses a significant amount of memory even in the best case, and it's not hard to have a bug in the page content/JS that causes too many images to be retained at once (nor is it impossible for us to have a bug in the image decoder that causes it to use more memory than is really necessary).
,
Dec 20
We still support API15 (!) so we need to filter for now I guess but I've logged an issue in our tracker for when we move up API levels to test unfiltered logcat in our reporting, always nice to simplify things, and maybe we get related WebView items. I just re-verified and ACRA is not able to catch a native crash unless it bubbles up as an Exception on the Java side of JNI. There are no known stable ways to do that apparently unless you're inside the platform I suppose. With regard to large images, that is definitely a possibility. However, we actually create a new WebView for each display. Arguably that's a terrible idea (and we have an enhancement request to recycle the WebView so people don't have to do session-local HTML5 storage to persist state on card flips from question to answer), but we do create a new one: https://github.com/mikehardy/Anki-Android/blob/49950bfc94154520c8bd96db6c94c76638c59d55/AnkiDroid/src/main/java/com/ichi2/anki/AbstractFlashcardViewer.java#L2616 Is it possible that even though we are instantiating a WebView each time, inside the platform there is recycling going on and it could eventually exhaust memory? Or is it possible that a single image could be large enough to crash the WebView, even in a new WebView? I'm trying to think of ways I could generate a repro here. Additionally if this over-represented device is South Korean (as the internet seems to indicate) it's likely the users have a complex web font installed, maybe further stressing the WebView.
,
Dec 20
You could just make an API level check for the filtering? There are various crash analytics tools that do support native crashes, but I don't know how easy any of them are to use. A reasonable number of applications use Chromium's own Breakpad, which is what we also use (though I'm not sure I'd recommend it, since we've also deprecated it and haven't been actively developing it for some time since it's being replaced). Are you actually disposing of the *old* webview? It's hard to follow your code just from a quick look, but it seems like you might just be creating a new webview and then leaving the old one to eventually be garbage collected - in the best case this temporarily uses a bunch of extra memory (since it may take a while for GC to notice), and in the worst case where there's a reference leak somewhere (which is a bug we've had in the past, though I don't know of a current one) they will actually leak forever. It is important to call WebView.destroy() at the appropriate time, because the Java GC does not know how much memory the webview is using (as almost all of it is native allocations) and will not know that it's a good idea to clean it up right away. You do have a destroyWebview function but it doesn't look like you call it until the activity is destroyed, and it also looks wrong (you must detach the webview from the view hierarchy before calling destroy(), because no methods must be called on it after destroy(), and attached views get methods called on them by the framework). For your case you should almost certainly just reuse the same webview, though; even if you destroy it correctly it's much more expensive to recreate it every time. There's not really any advantage to making a new one; every time you load a new page into the webview the old page is just gone (other than its navigation history entries, which are small), so that should not cause memory to grow unbounded. A single image absolutely can exhaust all address space and crash, but it has to be pretty big for that, and it depends what format the image is in and how much of it is actually on screen (some image decoders have optimisations for decoding down to a reduced scale, and we may not fully decode images that are only partially on screen). This isn't very common.
,
Dec 20
,
Dec 20
I could filter on API levels yes but if the goal is to simplify I could also just wait a couple months before the last API15 user exits the room and turns the lights off :-) then bump minSdk - it'll be soon I'm guessing your quick analysis of our WebView cleanup is also accurate. I'll investigate that on our side to make sure we clean up all WebViews and we do so correctly. I want to implement WebView re-use as well of course, just working through the immediate crash issues and API changes first since the app had run well enough without much change for a couple years that it accumulated a good deal of bitrot... Either way, thanks for all the time looking at this. I'll see if I can produce a crash reproduction and maybe also log something to warn users when images are too large. If there was a reference table for types of images and sizes that are iffy I could maybe use that? If not I could try to cook something up. Cheers
,
Dec 20
The main thing to avoid with images is using large images and rendering them at small sizes - i.e. if you have an 8000x8000 JPG that is going to be rendered in an <img width=800 height=800>, or the like. As with any web development it's better to resize the images once, when you make the content, instead of in the browser when it's rendered. Images that are not at least several times larger than the screen resolution of the device you're viewing them on are not (at least individually) a problem. One other thing to avoid (as the app developer, rather than a content author) is using wrap_content for the dimensions of the webview (including having it set to match_parent where the parent is itself using wrap_content). This can lead to the webview being larger than the screen, which forces us to render more (we can't always avoid rendering the parts that are offscreen in this scenario). Make sure the webview's dimensions are either fixed or derived from a parent whose dimensions are no larger than the screen :)
,
Dec 20
We fix the webview at 100% x 90% https://github.com/ankidroid/Anki-Android/blob/master/AnkiDroid/src/main/assets/flashcard.css#L62, but we happily let the user use their camera to take a photo and include it on a card that will display on the tiny screen, with no source resize whatsoever (another enhancement request in our system, of course - which may get priority if this is a crash source!). Users may put their own CSS in if they like unfortunately so this is unconstrained but when I see the crash on so many different devices I think it's probably not CSS overrides as that's pretty advanced and I don't believe it's done frequently, though I just added that to our list of things to tag in analytics to see if that guess is supported by data. Hmm. All in all I don't believe this has helped you + Chromium, but I've got a lot to think about and little things to clean up on our side, which should in the end result in happier users. Thanks again.
,
Dec 20
I mean the dimensions of the actual WebView object itself, in your Android layout; not anything inside the HTML content. But yes, if you're resizing all images down to the screen size with CSS then people won't notice if they are using very large images, and if the images are big enough then they're going to eat a *lot* of RAM in the course of being decoded and rescaled to fit the screen. There's no need for users to be overriding the CSS to get this problem: your default CSS is doing it already.
,
Dec 20
Ah I see. For the first time in this conversation I don't think this will generate a bug (or bug annotation) on my side - I looked through our layout xml and while I will double-check, none of the webviews appears to be going into anything that isn't either fill_parent or match_parent chaining to fill_parent. The resize we do via CSS is nice for hiding the underlying potentially-gigantic source image issue of course.
,
Dec 20
Just in case this goes somewhere in the future / for posterity I examined our WebView usage, and we are actually re-using the WebView. Within the same Activity (i.e., one review session in one deck - possibly a hundred or more question / answer card flips) we just call WebView.loadDataWithBaseURL over and over again For the eventual WebView.destroy() it is only done when the Activity goes down with onDestroy(), but that mirrors the behavior that the WebView is only created as part of control flow from Activity.onCreate(). During the destroy we first call it's View parent's .removeAllView() method then we null some things in the WebView before calling .destroy() itself. So that is actually more efficient than I thought, but if repeated loads of large images scaled to the screen could crash the WebView, that's our pattern. https://github.com/ankidroid/Anki-Android/issues/5175#issuecomment-449147746 I'll try to re-prioritize some auto-scaling-by-default for images as they are captured, and some analytics to get data on image size.
,
Dec 20
OK that doesn't sound like you have an issue there then - I didn't look through your code in great detail so if I misparsed it that's not surprising ;) Every time you load a new card with loadDataWithBaseURL the previous page contents inside webview will get thrown away, so in theory it shouldn't matter if you show a long sequence of different images - it should only be the "worst case" for a single image there that is a problem. The repeated nature is only an issue if there's something accidentally keeping old ones around, and it doesn't sound like this is likely in your case (unless there's actually a bug in our code). If you want to experiment a bit to see if you can reproduce this (though this may not really prove anything conclusively), you might try making a couple of decks where each one contains a large number of images that are roughly the same size, but for a few different sizes (i.e. a deck of images that are, say, 4x the screen size, another deck that are 16x the screen size, etc), and then seeing if you can provoke a crash by flipping through one of them: 1) If it happens consistently and in a reasonably small number of flips for a particular size of image, but it doesn't crash at all for smaller images even after lots of flips, then I would guess that it's "certain single images are just so big that they blow through too much memory on their own" and there's probably not a lot we can do about this other than saying "try not to use really massive images" (although.. it's not necessarily impossible to handle it more gracefully). 2) If you can get it to happen with several different sizes but it takes flipping through quite a lot before it happens (and especially if it happens *sooner* with bigger images than smaller ones), then it's more likely to be something that is accumulating over time, which is much more likely to be an actual bug that we could fix (or help you to fix, if it's something in your app that we've overlooked in this discussion so far). In either case, if you do actually take the time to do this and you *can* get it to crash at all, give us a repro case and we can double check exactly what happens. Thanks for being willing to discuss this - it's really helpful when we get a chance to investigate a problem with a developer, even if we can't ultimately come to a good resolution, because it gives us an idea of the kinds of problems developers actually run into in the wild and may give us ideas for future improvements. I'm going to close this bug out for now as it's not clear that we can do much more for you, but if you do find something, feel free to comment here and I can reopen it (I've added myself to the cc list).
,
Dec 21
I agree there's nothing that seems actionable here so closing is the right course. The discussion was very productive for me at least - I've got some ideas in mind that should help our users going forward, and it was useful to audit that area of our code. If I can get a repro I'll definitely chime in again. Thanks for taking the time. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by m...@mikehardy.net
, Dec 19