New issue
Advanced search Search tips

Issue 801771 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 3
Type: Feature

Blocked on:
issue 678288
issue 727984



Sign in to add a comment

Safe Mode for WebView Variations (a.k.a. Finch)

Project Member Reported by isherman@chromium.org, Jan 13 2018

Issue description

In bug 727984, I am implementing Safe Mode for Chrome Variations/Finch. The intention is to provide an automated recovery fallback in case of a bad configuration push.

In https://crrev.com/c/830661, I am stubbing out this implementation for WebView for now. We'll need a bit of WebView-custom logic if we wish to support a Safe Mode fallback for WebView as well.
 
Cc: paulmiller@chromium.org boliu@chromium.org

Comment 2 by boliu@chromium.org, Jan 16 2018

Cc: torne@chromium.org tobiasjs@chromium.org
Context: https://chromium-review.googlesource.com/c/chromium/src/+/830661#message-0fbaaba7b965e0fd33876dee1f051bbd9c1d2b1b

iiuc, this needs a way to detect if previous launched ended with browser crash, early in start up (?)

I can't think of a reliable way to do that for webview, but maybe others who work in crash reporting do?

Comment 3 by torne@chromium.org, Jan 16 2018

We've discussed this a bit before in regards to the UMA crash stats and didn't come up with any realistic way to do this.

Clank persists its state when it goes into the background and sets itself "not crashed" at that time (and then resets itself to "crashed" when it comes back into the foreground again), but we don't have dependable signals for these events in WebView, so there's no point at which we can be reasonably confident that a subsequent process death is a crash as opposed to just the android system reclaiming memory "as normal".
Blockedon: 678288
Cc: isherman@chromium.org
Labels: Pri-3
Owner: paulmiller@chromium.org
Status: Assigned (was: Untriaged)
We might not be able to detect when a Finch config causes crashes but we could detect when a Finch config breaks downloading Finch seeds. Would that still be worth it?

Comment 5 by boliu@chromium.org, Jan 16 2018

> we could detect when a Finch config breaks downloading Finch seeds

hmm, how do you do that?
We could remember the config under which we last successfully downloaded a new config. After some time/attempts failing to download a newer config, we could revert to the known-good one.

Comment 7 by boliu@chromium.org, Jan 17 2018

there are legitimate reasons why downloading config could fail though, right? no network, android killing the service, just general "unlukcy sequence of events" because it takes a few launches to update in webview.

how do you distinguish between legit cases vs cases that's broken due to the current config?
I agree, you can't. But then, Chrome doesn't know whether a crash was caused by a bad config either.
Correct, we can't be sure whether a bad config is to blame.  For Chrome, our (almost, but not-quite-yet implemented) approach is to watch for N consecutive failures.  For a sufficiently large value of N, we assume that the Finch config might be to blame, and roll back to an older, observed-to-be-safe config.

This will definitely have some false positives for any useful value of N that we might choose; the hope is that we can find a value of N that strikes a good balance, i.e. generates relatively few false positives while still providing a meaningful fallback mechanism.

Comment 10 by boliu@chromium.org, Jan 17 2018

But there are unlikely to be false positive for crashes (I would hope). If we detect a crash, then something bad has very likely happened. Whereas I'd say for most cases, seed fetch failure are "legitimate" failures.

What's N for chrome?
I think "false positive" in this case means "there was a problem, but it wasn't caused by the config" (which is likely), not "there was a crash, but it's not a problem".
Apologies for the long delay here!  I needed to do some research to provide the value for N, and kind of lost track of the task for a while...

I've recently uploaded a CL that proposes concrete N values for Chrome: https://chromium-review.googlesource.com/c/chromium/src/+/895182.  We're currently discussing, on that CL, how to choose a reasonable value.  TL;DR: for crashes, N is probably 3 or 4; for fetch failures, it's much harder to pin down – maybe as low as 4, or as high as ~25, depending on how low we want the false positive rate to be.

Sign in to add a comment