New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 637154 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
OOO until Feb 4th
Closed: Aug 2016
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

TERMINATION_STATUS_OOM_PROTECTED not handled by stability metrics provider

Project Member Reported by wfh@chromium.org, Aug 12 2016

Issue description

I noticed when looking at  issue 636541  that TERMINATION_STATUS_OOM_PROTECTED , which is Android only, is not handled at all by the stability_metrics_helper.cc which is used by chrome_stability_metrics_provider.cc.

I have left the behavior the same in https://codereview.chromium.org/2233303002/ but this bug should investigate whether we are missing data for this termination type.

 

Comment 1 by wfh@chromium.org, Aug 12 2016

Cc: wnwen@chromium.org yfried...@chromium.org
amineer@ tells me yfriedman@ or wnwen@ might know the answer here?
Cc: siev...@chromium.org
Owner: wnwen@chromium.org
Status: Assigned (was: Untriaged)
Peter touched this most recently and should be able to speak authoritatively. +sievers as well.

I honestly don't remember how this works but OOM_PROTECTED is how we know that it was a real crash. If it's not OOM_PROTECTED we ignore the renderer crash since it's inactionable as it's likely Android OOM killer killing a background renderer to recoup ram

Comment 3 by wfh@chromium.org, Aug 12 2016

If this is "a real crash" then it sounds like it should be incrementing the crash counts. Does anyone know how many of these are not being uncounted? Perhaps before "fixing" the code, we should add another count to see how many we've been not counting?
yfriedman@, wnwen@ - my assumption with the OOM accounting was that we'd still capture *all* crashes (including system kills), but we'd use histograms to determine which items were system kills vs. which weren't.  The idea here is that we can then track to ensure the system doesn't start killing us at unexpected levels.  This is what we're currently tracking by having Chrome Dash (which includes all "crashes") and UMA side by side on the stability page (check it out, I finally made that update!).

In c#0 it sounds like we're no longer uploading data for system kills at all, which differs from my expectation.  Is my understanding here wrong?
Oh, that's because the renderer crash is ticked elsewhere on Android.

It bubbles up to a higher layer where we know visibility:

https://cs.chromium.org/chromium/src/chrome/android/java/src/org/chromium/chrome/browser/tab/TabWebContentsObserver.java?q=umasessionstats.logrenderercrash&sq=package:chromium&l=128&dr=C

It's effectively WebContentsObserver.renderProcessGone.
It's possibly that this could be cleared up to be more like other platforms but I don't know for certain.

Re: #4:  We do still upload system kills in a different histogram (also in that code I linked)

Comment 6 by wfh@chromium.org, Aug 12 2016

okay thanks! This ends up here - https://cs.chromium.org/chromium/src/chrome/browser/android/metrics/uma_session_stats.cc?sq=package:chromium&l=125&dr=CSs

are you sure we're not double counting some crashes then?
Yep!

Last time we audited this, we weren't double-counting
In which way do you think we may be? Because of multiple observers or because of this seprate path in android vs the shared path? Android has always done rendrer crashes at this higher level so we probably would've seen a regression in metrics if we started double counting.

Comment 8 by wfh@chromium.org, Aug 12 2016

I was really just double checking it sounds like this code has been very well looked at (and I haven't looked at it as much as you have!) so sgtm.

My only thoughts were based on (not knowing Android Chrome at all, so maybe this is all wrong) but the stack for creating ChromeStabilityMetricsProvider looks like this:

ChromeStabilityMetricsProvider::ChromeStabilityMetricsProvider
ChromeMetricsServiceClient::Initialize
ChromeMetricsServiceClient::Create
ChromeMetricsServicesManagerClient::CreateMetricsServiceClient
metrics_services_manager::MetricsServicesManager::GetMetricsServiceClient
metrics_services_manager::MetricsServicesManager::GetMetricsService
ChromeBrowserMainParts::SetupMetricsAndFieldTrials
ChromeBrowserMainParts::PreCreateThreadsImpl
trace_event_internal::ScopedTracer::Initialize
ChromeBrowserMainParts::PreCreateThreads
ChromeBrowserMainPartsWin::PreCreateThreads
content::BrowserMainLoop::PreCreateThreads
base::Callback<int (), base::internal::CopyMode::Copyable>::Run
content::StartupTaskRunner::RunAllTasksNow
content::BrowserMainLoop::CreateStartupTasks
content::BrowserMainRunnerImpl::Initialize
content::BrowserMain
content::RunNamedProcessTypeMain
content::ContentMainRunnerImpl::Run
content::ContentMain
ChromeMain

And I couldn't find any if defined(OS_ANDROID) wrappings around any of this (by my scant looking) and it certainly looks like so it looks like chrome_stability_metrics_provider.cc is linked with the chrome binary on Android, so assuming ChromeStabilityMetricsProvider gets constructed, at least, and it registers for the browser child notifications in the constructor.

So (again, not knowing really how Android Chrome works, sorry I have never even built it...) either 1) Android Chrome doesn't send browser child notifications or sends them in a different way; or 2) I found a MetricsServiceClient implementation in Java Land called "AwBrowserContext.java" so maybe this is being used instead somehow? or 3) Start up in Android is completely different and ChromeBrowserMainParts::SetupMetricsAndFieldTrials is never called.

Either way, it sounds like you would have spotted any double counting so I'm happy to close this bug.

Comment 9 by wnwen@chromium.org, Aug 15 2016

Status: WontFix (was: Assigned)
On android we need to know the java-side state of the activity/application to know whether we should log a crash, and even then we have a separate native-side histogram to help us isolate which of those crashes are system kills and which are actual crashes in our code.

Both cases use TERMINATION_STATUS_OOM_PROTECTED as one of the conditions to logging, and since they log separately and the numbers roughly line up (due to timing and delays in UMA uploading, stability and histograms never completely line up, always a 2-5% difference), likelihood of double counting is very small.

Since you kept the behaviour of not logging in stability_metrics_helper for Android, it should still be the same as before, where we only log in uma_session_stats.cc's LogRendererCrash.

Sign in to add a comment