TERMINATION_STATUS_OOM_PROTECTED not handled by stability metrics provider |
|||
Issue descriptionI noticed when looking at issue 636541 that TERMINATION_STATUS_OOM_PROTECTED , which is Android only, is not handled at all by the stability_metrics_helper.cc which is used by chrome_stability_metrics_provider.cc. I have left the behavior the same in https://codereview.chromium.org/2233303002/ but this bug should investigate whether we are missing data for this termination type.
,
Aug 12 2016
Peter touched this most recently and should be able to speak authoritatively. +sievers as well. I honestly don't remember how this works but OOM_PROTECTED is how we know that it was a real crash. If it's not OOM_PROTECTED we ignore the renderer crash since it's inactionable as it's likely Android OOM killer killing a background renderer to recoup ram
,
Aug 12 2016
If this is "a real crash" then it sounds like it should be incrementing the crash counts. Does anyone know how many of these are not being uncounted? Perhaps before "fixing" the code, we should add another count to see how many we've been not counting?
,
Aug 12 2016
yfriedman@, wnwen@ - my assumption with the OOM accounting was that we'd still capture *all* crashes (including system kills), but we'd use histograms to determine which items were system kills vs. which weren't. The idea here is that we can then track to ensure the system doesn't start killing us at unexpected levels. This is what we're currently tracking by having Chrome Dash (which includes all "crashes") and UMA side by side on the stability page (check it out, I finally made that update!). In c#0 it sounds like we're no longer uploading data for system kills at all, which differs from my expectation. Is my understanding here wrong?
,
Aug 12 2016
Oh, that's because the renderer crash is ticked elsewhere on Android. It bubbles up to a higher layer where we know visibility: https://cs.chromium.org/chromium/src/chrome/android/java/src/org/chromium/chrome/browser/tab/TabWebContentsObserver.java?q=umasessionstats.logrenderercrash&sq=package:chromium&l=128&dr=C It's effectively WebContentsObserver.renderProcessGone. It's possibly that this could be cleared up to be more like other platforms but I don't know for certain. Re: #4: We do still upload system kills in a different histogram (also in that code I linked)
,
Aug 12 2016
okay thanks! This ends up here - https://cs.chromium.org/chromium/src/chrome/browser/android/metrics/uma_session_stats.cc?sq=package:chromium&l=125&dr=CSs are you sure we're not double counting some crashes then?
,
Aug 12 2016
Yep! Last time we audited this, we weren't double-counting In which way do you think we may be? Because of multiple observers or because of this seprate path in android vs the shared path? Android has always done rendrer crashes at this higher level so we probably would've seen a regression in metrics if we started double counting.
,
Aug 12 2016
I was really just double checking it sounds like this code has been very well looked at (and I haven't looked at it as much as you have!) so sgtm. My only thoughts were based on (not knowing Android Chrome at all, so maybe this is all wrong) but the stack for creating ChromeStabilityMetricsProvider looks like this: ChromeStabilityMetricsProvider::ChromeStabilityMetricsProvider ChromeMetricsServiceClient::Initialize ChromeMetricsServiceClient::Create ChromeMetricsServicesManagerClient::CreateMetricsServiceClient metrics_services_manager::MetricsServicesManager::GetMetricsServiceClient metrics_services_manager::MetricsServicesManager::GetMetricsService ChromeBrowserMainParts::SetupMetricsAndFieldTrials ChromeBrowserMainParts::PreCreateThreadsImpl trace_event_internal::ScopedTracer::Initialize ChromeBrowserMainParts::PreCreateThreads ChromeBrowserMainPartsWin::PreCreateThreads content::BrowserMainLoop::PreCreateThreads base::Callback<int (), base::internal::CopyMode::Copyable>::Run content::StartupTaskRunner::RunAllTasksNow content::BrowserMainLoop::CreateStartupTasks content::BrowserMainRunnerImpl::Initialize content::BrowserMain content::RunNamedProcessTypeMain content::ContentMainRunnerImpl::Run content::ContentMain ChromeMain And I couldn't find any if defined(OS_ANDROID) wrappings around any of this (by my scant looking) and it certainly looks like so it looks like chrome_stability_metrics_provider.cc is linked with the chrome binary on Android, so assuming ChromeStabilityMetricsProvider gets constructed, at least, and it registers for the browser child notifications in the constructor. So (again, not knowing really how Android Chrome works, sorry I have never even built it...) either 1) Android Chrome doesn't send browser child notifications or sends them in a different way; or 2) I found a MetricsServiceClient implementation in Java Land called "AwBrowserContext.java" so maybe this is being used instead somehow? or 3) Start up in Android is completely different and ChromeBrowserMainParts::SetupMetricsAndFieldTrials is never called. Either way, it sounds like you would have spotted any double counting so I'm happy to close this bug.
,
Aug 15 2016
On android we need to know the java-side state of the activity/application to know whether we should log a crash, and even then we have a separate native-side histogram to help us isolate which of those crashes are system kills and which are actual crashes in our code. Both cases use TERMINATION_STATUS_OOM_PROTECTED as one of the conditions to logging, and since they log separately and the numbers roughly line up (due to timing and delays in UMA uploading, stability and histograms never completely line up, always a 2-5% difference), likelihood of double counting is very small. Since you kept the behaviour of not logging in stability_metrics_helper for Android, it should still be the same as before, where we only log in uma_session_stats.cc's LogRendererCrash. |
|||
►
Sign in to add a comment |
|||
Comment 1 by wfh@chromium.org
, Aug 12 2016