New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 803621 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: 2018-02-26
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

crashes are not seen by metrics

Project Member Reported by wfh@chromium.org, Jan 18 2018

Issue description

Chrome Version: 63.0.3239.132 (Official Build) (32-bit) (cohort: Stable Installs Only)
OS: Windows 7 build 7601 service pack 1 32-bit

What steps will reproduce the problem?
(1) crash renderer using chrome://crash or chrome://memory-exhaust
(2) check for evidence that this was caught/logged correctly in chrome://crashes, chrome://histograms and chrome://local-state
(3)

What is the expected result?

1. A crash is reported in chrome://crashes and it gets bucketed into "[Out of Memory] content::ExhaustMemory" e.g. c55eb75c0f595905
2. CrashExitCodes.Renderer gets entries for each crash, the OOM type for chrome://memory-exhaust (536870904) and the exception type for chrome://crash (-1073741819)
3. renderer_crash_count is incremented in the system profile cache in chrome://local-state

What happens instead?

1 happens, but 2 and 3 don't happen. This is concerning.

Please use labels and text to provide additional information.


For graphics-related bugs, please copy/paste the contents of the about:gpu
page at the end of this report.

 

Comment 1 by wfh@chromium.org, Jan 18 2018

we even have tests for this. I do not know how this could be happening on stable.

https://cs.chromium.org/chromium/src/chrome/browser/metrics/metrics_service_browsertest.cc?l=151

Comment 2 by rkaplow@google.com, Jan 18 2018

Have you tried on other platforms/channels yet?

Comment 3 by wfh@chromium.org, Jan 18 2018

well I couldn't repro this for a bisect so perhaps this is some experiment on stable.

Comment 4 by wfh@chromium.org, Jan 18 2018

and I can't (easily) bisect with matching experiment config from the machine(s) I can repro on, because of issue 694675

Comment 5 by wfh@chromium.org, Jan 18 2018

Summary: crashes are not seen by metrics (was: crashes are not seen by UMA)
For future reference and for when I can come back to this bug - the experiments from the Chrome Stable I can repro the bug on are:

c134752e-b8b72c88
3095aa95-3f4a17df
6c43306f-ca7d8d80
47e5d3db-3d47f4f4
1210a805-ecd831c
b1edbc38-cf4f6ead
ba3f87da-45bda656
776de70c-eadfd437
79616653-3f4a17df
9e201a2b-6e3ce1c
68812885-4d2fac87
5e3a236d-4113a79e
f347910c-3d47f4f4
4b61504a-d25ea691
9773d3bd-f23d1dea
8e3b2dc5-93702590
9e5c75f1-ffd2375f
f79cb77b-3d47f4f4
4ea303a6-49c9e003
d92562a9-ca7d8d80
90bcbadc-3f4a17df
447469ba-13d9f35f
7aa46da5-c946b150
25fc488a-4d2fac87
58a025e3-c2b41702
1bced4a3-90fa85cd
b2f0086-93053e47
ef25c1eb-3f4a17df
494d8760-6843eff2
f47ae82a-86f22ee5
3ac60855-486e2a9c
f296190c-a90023b1
4442aae2-6e597ede
ed1d377-e1cc0f14
75f0f0a0-6bdfffe7
e2b18481-bd104136
e7e71889-4ad60575
94e68624-803f8fc4
f141d4bc-28ad44a
e9ce63c1-36ab09a2
da4aaa01-ca7d8d80

Comment 6 by rkaplow@google.com, Jan 18 2018

I believe gayane is working on 694675 right now actually. I don't believe it's finished so you can't use it yet unfortunately.

Comment 7 by wfh@chromium.org, Jan 19 2018

Cc: mark@chromium.org pam@chromium.org scottmg@chromium.org
I can also repro on a win10 machine from a fresh stable (64-bit) install. I cannot repro with any bisect, and obviously the tests are passing, so perhaps this is something to do with being on a real channel. This does still seem quite concerning to me.

Comment 8 by wfh@chromium.org, Jan 19 2018

Debugging the running process, content::RenderProcessHostImpl::ProcessDied is being called correctly, but ChromeStabilityMetricsProvider::Observe is not being called, almost as if it is not registered as an observer. Still trying to work out why this would be the case.

Comment 9 by wfh@chromium.org, Jan 19 2018

metrics is being stopped here:

0:000> kn
 # ChildEBP RetAddr  
00 04daed18 6a89da90 chrome_69190000!ChromeStabilityMetricsProvider::OnRecordingDisabled [c:\src\gclient\src\chrome\browser\metrics\chrome_stability_metrics_provider.cc @ 66]
01 04daed28 6a896f58 chrome_69190000!metrics::DelegatingProvider::OnRecordingDisabled+0x16 [c:\src\gclient\src\components\metrics\delegating_provider.cc @ 50]
02 04daedf0 6b1657b2 chrome_69190000!metrics::MetricsService::DisableRecording+0x88 [c:\src\gclient\src\components\metrics\metrics_service.cc @ 336]
03 04daeec0 6b165886 chrome_69190000!metrics_services_manager::MetricsServicesManager::UpdateRunningServices+0xca [c:\src\gclient\src\components\metrics_services_manager\metrics_services_manager.cc @ 129]
04 04daef8c 6b165959 chrome_69190000!metrics_services_manager::MetricsServicesManager::UpdatePermissions+0x96 [c:\src\gclient\src\components\metrics_services_manager\metrics_services_manager.cc @ 105]
05 04daefac 69cbda7d chrome_69190000!metrics_services_manager::MetricsServicesManager::UpdateUploadPermissions+0x45 [c:\src\gclient\src\components\metrics_services_manager\metrics_services_manager.cc @ 166]
06 04daf018 69cbf981 chrome_69190000!ChromeBrowserMainParts::StartMetricsRecording+0xb5 [c:\src\gclient\src\chrome\browser\chrome_browser_main.cc @ 759]
07 04daf190 69cbf82f chrome_69190000!ChromeBrowserMainParts::PreMainMessageLoopRunImpl+0x93 [c:\src\gclient\src\chrome\browser\chrome_browser_main.cc @ 1411]
08 04daf1d8 695bf9bc chrome_69190000!ChromeBrowserMainParts::PreMainMessageLoopRun+0xad [c:\src\gclient\src\chrome\browser\chrome_browser_main.cc @ 1218]
09 04daf220 698e05c4 chrome_69190000!content::BrowserMainLoop::PreMainMessageLoopRun+0x44 [c:\src\gclient\src\content\browser\browser_main_loop.cc @ 1182]
0a (Inline) -------- chrome_69190000!base::RepeatingCallback<int ()>::Run+0xb [c:\src\gclient\src\base\callback.h @ 94]
0b 04daf238 695be48c chrome_69190000!content::StartupTaskRunner::RunAllTasksNow+0x1e [c:\src\gclient\src\content\browser\startup_task_runner.cc @ 42]
0c 04daf340 695c24a8 chrome_69190000!content::BrowserMainLoop::CreateStartupTasks+0x292 [c:\src\gclient\src\content\browser\browser_main_loop.cc @ 968]
0d 04daf3b4 695bccb6 chrome_69190000!content::BrowserMainRunnerImpl::Initialize+0x210 [c:\src\gclient\src\content\browser\browser_main_runner.cc @ 117]
0e 04daf3fc 69c28a2e chrome_69190000!content::BrowserMain+0x8a [c:\src\gclient\src\content\browser\browser_main.cc @ 42]
0f 04daf4cc 69c28f6a chrome_69190000!content::RunNamedProcessTypeMain+0xee [c:\src\gclient\src\content\app\content_main_runner.cc @ 426]
10 04daf5c8 69c40785 chrome_69190000!content::ContentMainRunnerImpl::Run+0x118 [c:\src\gclient\src\content\app\content_main_runner.cc @ 720]
11 04daf6d8 69c28917 chrome_69190000!service_manager::Main+0x2a5 [c:\src\gclient\src\services\service_manager\embedder\main.cc @ 456]
12 04daf718 6919119e chrome_69190000!content::ContentMain+0x33 [c:\src\gclient\src\content\app\content_main.cc @ 19]
13 04daf788 008c59aa chrome_69190000!ChromeMain+0x122 [c:\src\gclient\src\chrome\app\chrome_main.cc @ 131]
14 04daf814 008c1551 chrome!MainDllLoader::Launch+0x230 [c:\src\gclient\src\chrome\app\main_dll_loader_win.cc @ 199]
15 04daf98c 009a5dd8 chrome!wWinMain+0x551 [c:\src\gclient\src\chrome\app\chrome_exe_main_win.cc @ 231]
16 (Inline) -------- chrome!invoke_main+0x1a [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 118]
17 04daf9d8 778a8654 chrome!__scrt_common_main_seh+0xf6 [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 283]
WARNING: Stack unwind information not available. Following frames may be wrong.
18 04daf9ec 779d4a77 KERNEL32!BaseThreadInitThunk+0x24
19 04dafa34 779d4a47 ntdll!RtlGetAppContainerNamedObjectPath+0x137
1a 04dafa44 00000000 ntdll!RtlGetAppContainerNamedObjectPath+0x107

void MetricsServicesManager::UpdateRunningServices() {
  DCHECK(thread_checker_.CalledOnValidThread());
  metrics::MetricsService* metrics = GetMetricsService();

  const base::CommandLine* cmdline = base::CommandLine::ForCurrentProcess();
  if (cmdline->HasSwitch(metrics::switches::kMetricsRecordingOnly)) {
    metrics->StartRecordingForTests();
    GetRapporServiceImpl()->Update(true, false);
    return;
  }

  client_->UpdateRunningServices(may_record_, may_upload_);

  if (may_record_) {
    if (!metrics->recording_active())
      metrics->Start();
    if (may_upload_)
      metrics->EnableReporting();
    else
      metrics->DisableReporting();
  } else {
    metrics->Stop();   <- HERE
  }

  UpdateUkmService();

  GetRapporServiceImpl()->Update(may_record_, may_upload_);
}

for some reason metrics thinks it's disabled. but settings shows it's not, see screenshot.
metrics_enabled.png
3.2 KB View Download

Comment 10 by wfh@chromium.org, Jan 19 2018

Cc: asvitk...@chromium.org
turning reporting off, restarting, turning it back on, restarting, and then performing the steps in #0 results in the metrics being recorded again.
Is it possible your client is being sampled out by the UMA opt out sampling? We only receive data from 10% of users on Windows by design. go/uma-opt-out-faq

Comment 12 by wfh@chromium.org, Jan 19 2018

re: #11 would this correspond to having running experiment:

MetricsAndCrashSampling-OutOfReportingSample
I think toggling UMA reporting state will reset your UMA client id. The UMA client id is used in Finch experiment randomization. So toggling the state as you've done in comment 10 will re-roll Finch experiments and possibly put you in a different "being sampled" state.

Presumably if you do it 100 times, the expected value is 10 of those times would be reported while 90 wouldn't be.
re: 12, yeah - I just added a Q about it to go/uma-opt-out-faq

Comment 15 by wfh@chromium.org, Jan 19 2018

when trying to do a bisect I went through each of the field trial configs from #5 that I considered might have an effect on this bug (based on guesswork, and not wanting to have to generate command line for every experiment), and came up with:

LoadingWithMojo-Enabled_Launch
MetricsAndCrashSampling-OutOfReportingSample
ResourceLoadScheduler-Default
SignInProcessIsolation-Enabled_100_20180103

(Given issue 694675 is still being worked on) I manually went through the experiment configs and came up with the command line switches to replicate this environment, which came up as:

--enable-features=LoadingWithMojo,sign-in-process-isolation --disable-features=MetricsReporting,ResourceLoadScheduler

I was surprised that even with that command line, I was still unable to reproduce this...? Does "--disable-features=MetricsReporting" not equate to being in the MetricsAndCrashSampling-OutOfReportingSample experiment group?
Hmm. I would expect --disable-features=MetricsReporting to be equivalent to MetricsAndCrashSampling-OutOfReportingSample.

Are you able to confirm that running with that you still see it work? (in your current install where it's now working)

Comment 17 by wfh@chromium.org, Jan 19 2018

hmm I was doing --disable-features=MetricsReporting on a bisect command line:

python bisect_builds.py -a win64 -r -g 63.0.3239.0 -b 63.0.3239.132 --verify-range -- --enable-features=LoadingWithMojo,sign-in-process-isolation --disable-features=MetricsReporting,ResourceLoadScheduler

for both ends of the range I saw that CrashExitCodes.Renderer was being incremented when I visited chrome://crash perhaps metrics reporting is force enabled for developer/unknown builds?

I can try the switch on a real stable build, but I've already reverted my VM so I'll have to keep rolling the die until I get into the 10% group again :)

Comment 18 by wfh@chromium.org, Jan 19 2018

okay I rolled a '1' and was opted into metrics reporting which I verified by seeing 5e3a236d-59e286d0 in chrome://version. In this configuration I see CrashExitCodes.Renderer correctly incremented upon a crash.

I then add --disable-features=MetricsReporting to command line and then the behavior reverts to not recording metrics as you predicted it would.

So, I can't explain why running builds from a bisect with --disable-features=MetricsReporting shows metrics for crashes (in #17 and #7).

Comment 19 by wfh@chromium.org, Feb 22 2018

Cc: brucedaw...@chromium.org
Thinking about this more, I think we should always connect the chrome stability metrics logger to the render process host, because it's often really useful in diagnosing local issues to see these histograms and also see the entries in the system profile. We can log them but just not upload them to Uma.

Can someone in metrics do this, or shall I land a cl?

Comment 20 by wfh@chromium.org, Feb 22 2018

NextAction: 2018-02-26
The NextAction date has arrived: 2018-02-26
Owner: wfh@chromium.org
Status: Assigned (was: Untriaged)
Will, I think you're the most likely person to pick this up, assuming you're still interested in it.
Project Member

Comment 23 by bugdroid1@chromium.org, Mar 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a613a90c3a45fa037344903d0c00f5b43add78c2

commit a613a90c3a45fa037344903d0c00f5b43add78c2
Author: Will Harris <wfh@chromium.org>
Date: Mon Mar 05 00:04:04 2018

Only stop recording Chrome stability metrics during destruction.

BUG= 803621 

Change-Id: I2a028ce864808a88f71c0b0a0e0d3f493e0f1f77
Reviewed-on: https://chromium-review.googlesource.com/940616
Reviewed-by: Steven Holte <holte@chromium.org>
Commit-Queue: Will Harris <wfh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540773}
[modify] https://crrev.com/a613a90c3a45fa037344903d0c00f5b43add78c2/chrome/browser/metrics/chrome_stability_metrics_provider.cc

Comment 24 by wfh@chromium.org, Mar 6 2018

Cc: wfh@chromium.org
 Issue 697461  has been merged into this issue.

Comment 25 by wfh@chromium.org, May 31 2018

Status: Verified (was: Assigned)
verified fixed on m67 stable (67.0.3396.62 (Official Build) (64-bit) (cohort: 67_win_62)) by checking client is in MetricsAndCrashSampling/OutOfReportingSample group and still seeing entry in CrashExitCodes.Renderer after visiting chrome://crash

Sign in to add a comment