New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 732927 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug
Hotlist-MemoryInfra

Blocked on:
issue 733165



Sign in to add a comment

"ProcessMemoryMetricsEmitterTest.FetchThreeTimes" is flaky

Project Member Reported by chromium...@appspot.gserviceaccount.com, Jun 13 2017

Issue description

"ProcessMemoryMetricsEmitterTest.FetchThreeTimes" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 4 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyOgsSBUZsYWtlIi9Qcm9jZXNzTWVtb3J5TWV0cmljc0VtaXR0ZXJUZXN0LkZldGNoVGhyZWVUaW1lcww.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
 
Owner: erikc...@chromium.org
Hi erikchen@, please help triage this bug,
Cc: erikc...@chromium.org hjd@chromium.org
Components: Internals>Instrumentation>Memory
Labels: OS-Mac
Owner: primiano@chromium.org
Status: Assigned (was: Untriaged)
[38409:771:0613/094402.412234:FATAL:coordinator_impl.cc(250)] Check failed: 0u == pmd->os_dump.resident_set_kb (0 vs. 202924)

This looks like a real implementation bug. Over to primiano.
Cc: primiano@chromium.org
 Issue 732972  has been merged into this issue.
FetchDuringTrace is flaky too. Going to disable both since they are flaking try bots. 
Project Member

Comment 6 by bugdroid1@chromium.org, Jun 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1a6cf45a50e580ccae7ced17ee73602cbf9fa324

commit 1a6cf45a50e580ccae7ced17ee73602cbf9fa324
Author: Dale Curtis <dalecurtis@chromium.org>
Date: Wed Jun 14 00:07:04 2017

Disable flaky ProcessMemoryMetricsEmitterTest.FetchXXX tests.

BUG= 732927 
TBR=isherman

Change-Id: I0850e40550134b5c834801c807ff6e29c707ac2c
Reviewed-on: https://chromium-review.googlesource.com/534503
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#479216}
[modify] https://crrev.com/1a6cf45a50e580ccae7ced17ee73602cbf9fa324/chrome/browser/metrics/process_memory_metrics_emitter_browsertest.cc

Labels: -Sheriff-Chromium
Status: Started (was: Assigned)
I can't repro the flakiness but can repro an odd situation which is very likely causing this-
I think the problem here is that some processes come back with PID 0, and then we hit that DCHECK when they try to override each other data.
I remember seeing this during my flight, and I can see this again today adding some logging:

OnServiceStarted content_browser.[7ce95b0d-e23c-40de-b266-010e073af5f3] = 96782
OnServiceStarted preferences.[7ce95b0d-e23c-40de-b266-010e073af5f3] = 0
OnServiceStarted content_renderer.3_1[7ce95b0d-e23c-40de-b266-010e073af5f3] = 96788
OnServiceStarted device.[7ce95b0d-e23c-40de-b266-010e073af5f3] = 0
OnServiceStarted content_gpu.2[505C0EE9-3013-43C0-82B0-A84F50CF8D84] = 96787
OnServiceStarted content_renderer.4_1[7ce95b0d-e23c-40de-b266-010e073af5f3] = 96790

The part after the = is the pid. As you can see we have two processes (preferences and device) that report PID=0.
Will figure out something.
And on top of that, because the browser process takes a shortcut (leveraging the knowledge that it's the same process that hosts the service) and doesn't use mojo (I am getting rid of this in [1]) we also record that as PID 0:

[98426:775:0614/102210.278048:ERROR:coordinator_impl.cc(211)] Received response from "content_gpu.2", pid=98431
[98426:775:0614/102210.294322:ERROR:coordinator_impl.cc(211)] Received response from "content_renderer.4_1", pid=98434
[98426:775:0614/102210.294322:ERROR:coordinator_impl.cc(211)] Received response from "content_renderer.3_1", pid=98432
[98426:775:0614/102210.294404:ERROR:coordinator_impl.cc(211)] Received response from ".", pid=0

"." here is our perception of the identity of the browser process (...).

So what strikes us is a combination of  Issue 733153  (some pids being 0), the "." browser identity (I'm on it) and our code not too robust for zero pids.
Here's what I am going to do:
 1. Right now, I'm going to robustify the code to be smarter on null pids and add a special hack for "." and reenable the tests
 2. Soon after, go back to [1], make the browser process just use the mojo interface like everything else and get rid of the aforementioned "." hack.

[1] https://chromium-review.googlesource.com/c/533215/
Blockedon: 733165
Project Member

Comment 11 by chromium...@appspot.gserviceaccount.com, Jun 14 2017

Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "ProcessMemoryMetricsEmitterTest.FetchThreeTimes". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyOgsSBUZsYWtlIi9Qcm9jZXNzTWVtb3J5TWV0cmljc0VtaXR0ZXJUZXN0LkZldGNoVGhyZWVUaW1lcww. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Project Member

Comment 12 by bugdroid1@chromium.org, Jun 15 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7cb90dcac57732686408a127941865df2aa2b577

commit 7cb90dcac57732686408a127941865df2aa2b577
Author: Primiano Tucci <primiano@chromium.org>
Date: Thu Jun 15 08:14:30 2017

Make the memory-infra service robust in presence of null pids.

Two things are happening here:
1) Some services seem to report PID==0 ( crbug.com/733153 )
2) The browser process itself reports an invalid identity AND
   PID == 0 because of  crbug.com/733165  . A proper fix to
   this will come next.

In the mean time this CL makes the service more robust and
drops on the floor dumps received by clients with an invalid
identity. Also reenables the tests that were disabled 
because of this.

BUG= 732927 
TBR=isherman (for re-enabling tests)

Change-Id: Ie8b598946593681ab640a9ede7161c22ff1def88
Reviewed-on: https://chromium-review.googlesource.com/535675
Reviewed-by: Primiano Tucci <primiano@chromium.org>
Reviewed-by: Hector Dearman <hjd@chromium.org>
Commit-Queue: Primiano Tucci <primiano@chromium.org>
Cr-Commit-Position: refs/heads/master@{#479635}
[modify] https://crrev.com/7cb90dcac57732686408a127941865df2aa2b577/chrome/browser/metrics/process_memory_metrics_emitter_browsertest.cc
[modify] https://crrev.com/7cb90dcac57732686408a127941865df2aa2b577/services/resource_coordinator/memory_instrumentation/coordinator_impl.cc
[modify] https://crrev.com/7cb90dcac57732686408a127941865df2aa2b577/services/resource_coordinator/memory_instrumentation/process_map.cc
[modify] https://crrev.com/7cb90dcac57732686408a127941865df2aa2b577/services/resource_coordinator/memory_instrumentation/process_map.h
[modify] https://crrev.com/7cb90dcac57732686408a127941865df2aa2b577/services/resource_coordinator/memory_instrumentation/process_map_unittest.cc

Labels: -Sheriff-Chromium

Comment 14 by hjd@chromium.org, Jun 19 2017

Status: Fixed (was: Started)
Hopefully fixed now that the CL has landed.

Sign in to add a comment