Safe Mode for WebView Variations (a.k.a. Finch) |
|||
Issue descriptionIn bug 727984, I am implementing Safe Mode for Chrome Variations/Finch. The intention is to provide an automated recovery fallback in case of a bad configuration push. In https://crrev.com/c/830661, I am stubbing out this implementation for WebView for now. We'll need a bit of WebView-custom logic if we wish to support a Safe Mode fallback for WebView as well.
,
Jan 16 2018
Context: https://chromium-review.googlesource.com/c/chromium/src/+/830661#message-0fbaaba7b965e0fd33876dee1f051bbd9c1d2b1b iiuc, this needs a way to detect if previous launched ended with browser crash, early in start up (?) I can't think of a reliable way to do that for webview, but maybe others who work in crash reporting do?
,
Jan 16 2018
We've discussed this a bit before in regards to the UMA crash stats and didn't come up with any realistic way to do this. Clank persists its state when it goes into the background and sets itself "not crashed" at that time (and then resets itself to "crashed" when it comes back into the foreground again), but we don't have dependable signals for these events in WebView, so there's no point at which we can be reasonably confident that a subsequent process death is a crash as opposed to just the android system reclaiming memory "as normal".
,
Jan 16 2018
We might not be able to detect when a Finch config causes crashes but we could detect when a Finch config breaks downloading Finch seeds. Would that still be worth it?
,
Jan 16 2018
> we could detect when a Finch config breaks downloading Finch seeds hmm, how do you do that?
,
Jan 16 2018
We could remember the config under which we last successfully downloaded a new config. After some time/attempts failing to download a newer config, we could revert to the known-good one.
,
Jan 17 2018
there are legitimate reasons why downloading config could fail though, right? no network, android killing the service, just general "unlukcy sequence of events" because it takes a few launches to update in webview. how do you distinguish between legit cases vs cases that's broken due to the current config?
,
Jan 17 2018
I agree, you can't. But then, Chrome doesn't know whether a crash was caused by a bad config either.
,
Jan 17 2018
Correct, we can't be sure whether a bad config is to blame. For Chrome, our (almost, but not-quite-yet implemented) approach is to watch for N consecutive failures. For a sufficiently large value of N, we assume that the Finch config might be to blame, and roll back to an older, observed-to-be-safe config. This will definitely have some false positives for any useful value of N that we might choose; the hope is that we can find a value of N that strikes a good balance, i.e. generates relatively few false positives while still providing a meaningful fallback mechanism.
,
Jan 17 2018
But there are unlikely to be false positive for crashes (I would hope). If we detect a crash, then something bad has very likely happened. Whereas I'd say for most cases, seed fetch failure are "legitimate" failures. What's N for chrome?
,
Jan 17 2018
I think "false positive" in this case means "there was a problem, but it wasn't caused by the config" (which is likely), not "there was a crash, but it's not a problem".
,
Feb 1 2018
Apologies for the long delay here! I needed to do some research to provide the value for N, and kind of lost track of the task for a while... I've recently uploaded a CL that proposes concrete N values for Chrome: https://chromium-review.googlesource.com/c/chromium/src/+/895182. We're currently discussing, on that CL, how to choose a reasonable value. TL;DR: for crashes, N is probably 3 or 4; for fetch failures, it's much harder to pin down – maybe as low as 4, or as high as ~25, depending on how low we want the false positive rate to be. |
|||
►
Sign in to add a comment |
|||
Comment 1 by isherman@chromium.org
, Jan 13 2018