New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 750696 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug-Regression
Hotlist-MemoryInfra



Sign in to add a comment

Discount memory-infra service memory use.

Project Member Reported by majidvp@google.com, Jul 31 2017

Issue description

Seems to be limited to android bots.
 
Project Member

Comment 1 by 42576172...@developer.gserviceaccount.com, Jul 31 2017

All graphs for this bug:
  https://chromeperf.appspot.com/group_report?bug_id=750696

(For debugging:) Original alerts at time of bug-filing:
  https://chromeperf.appspot.com/group_report?sid=8ea8945be330001acf8922704fa563e0a70ac275dce91258b116e9c6cd1b8d14


Bot(s) for this bug's original alert(s):

android-nexus5
android-nexus6
android-nexus7v2
android-webview-nexus5X
Project Member

Comment 3 by 42576172...@developer.gserviceaccount.com, Jul 31 2017

Cc: hjd@google.com
Owner: hjd@google.com

=== Auto-CCing suspected CL author hjd@google.com ===

Hi hjd@google.com, the bisect results pointed to your CL, please take a look at the
results.


=== BISECT JOB RESULTS ===
Perf regression found with culprit

Suspected Commit
  Author : Hector Dearman
  Commit : 3d197c0b3c6b2ff9cbce4990775f7bb120838231
  Date   : Fri Jul 28 11:34:31 2017
  Subject: memory-infra: Start using new RequestOSMemoryDump API

Bisect Details
  Configuration: android_nexus5_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:chrome:all_processes:reported_by_chrome:malloc:effective_size_avg/load_games/load_games_spychase
  Change       : 3.03% | 20707537.3333 -> 21334402.6667

Revision             Result                  N
chromium@490179      20707537 +- 201410      6      good
chromium@490280      20826725 +- 197181      6      good
chromium@490330      20772580 +- 292103      6      good
chromium@490355      20816724 +- 139181      6      good
chromium@490359      20810340 +- 244497      6      good
chromium@490360      20812675 +- 190797      6      good
chromium@490361      21380413 +- 119745      6      bad       <--
chromium@490362      21355447 +- 107814      6      bad
chromium@490368      21309777 +- 157651      6      bad
chromium@490380      21334403 +- 213126      6      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.games.spychase system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972540375208027568


For feedback, file a bug with component Speed>Bisection
Project Member

Comment 4 by 42576172...@developer.gserviceaccount.com, Jul 31 2017

 Issue 750654  has been merged into this issue.
Owner: hjd@chromium.org
Cc: sande...@chromium.org
 Issue 750920  has been merged into this issue.
Cc: erikc...@chromium.org
 Issue 750769  has been merged into this issue.

Comment 8 by hjd@chromium.org, Aug 1 2017

Cc: primiano@chromium.org ssid@chromium.org
Status: Started (was: Untriaged)
+primiano@, +ssid@, perezju@

Using the separate OS dumps mechanism memory-infra seems to have regressed memory by up to 1mb :( I find it a little hard to believe this is accurate, it seems more likely I messed something during the switch, taking a look now.

Comment 10 by hjd@chromium.org, Aug 1 2017

Split off regressions which were before my CL:  issue 751052 

=== BISECT JOB RESULTS ===
NO Perf regression found

Bisect Details
  Configuration: android_webview_arm64_aosp_perf_bisect
  Benchmark    : system_health.memory_mobile
  Metric       : memory:webview:all_processes:reported_by_chrome:v8:effective_size_avg/load_tools/load_tools_stackoverflow

Revision             Result                 N
chromium@490076      5980704 +- 959941      21      good
chromium@490135      6015339 +- 861047      21      bad

Please refer to the following doc on diagnosing memory regressions:
  https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md

To Run This Test
  src/tools/perf/run_benchmark -v --browser=android-webview --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.tools.stackoverflow system_health.memory_mobile

More information on addressing performance regressions:
  http://g.co/ChromePerformanceRegressions

Debug information about this bisect:
  https://chromeperf.appspot.com/buildbucket_job_status/8972471000360007152


For feedback, file a bug with component Speed>Bisection
Hmm take a look to the traces, this feels more like we are missing something in the new dumps, or maybe just timings.
When you look to the traces where does the 1MB come from?

Comment 13 by hjd@chromium.org, Aug 1 2017

After getting side tracked a lot I ran load:search:amazon with and without my CL with --enable-heap-profiling

I am deeply suspicious of the following entry which changes:

        Type                                          Total Size    Count
Before: memory_instrumentation::mojom::ClientProcess	  2.3KiB       21
After:  memory_instrumentation::mojom::ClientProcess    553.2KiB    5,727

Why do we have 5,706 more instances of memory_instrumentation::mojom::ClientProcess after my CL? :(

This probably explains the how, now to work out why...

Before: https://screenshot.googleplex.com/M6kDnkfq4ot.png
After: https://screenshot.googleplex.com/oVQxdtb8oeF.png

Before: https://hjd.users.x20web.corp.google.com/www/2017-08-01/load_search_amazon22017-08-01_16-35-17.html 
After: https://hjd.users.x20web.corp.google.com/www/2017-08-01/load_search_amazon22017-08-01_16-25-50.html

Comment 15 by hjd@chromium.org, Aug 3 2017

Actually misleading heap profile aside this makes sense:
A test on my z840 shows opening a new chrome and taking a memory dump leads to us sending 7745 memory maps to the browser. Each memory dump is a struct like this:
https://cs.chromium.org/chromium/src/services/resource_coordinator/public/interfaces/memory_instrumentation/memory_instrumentation.mojom?l=50

Optimistically that is at least:
uint64 * 9 + uint32 * 1 + string* * 1 = 84 bytes
84 bytes * 7745 = 650580 bytes = 635.33203125kb 

635kb is the right order of magnitude (and this doesn't include the string itself and the overhead from mojo unique_ptrs for each map).

So all our infrastructure works correctly!

We don't take detailed dumps in the wild (right?) so this shouldn't regress for actual users but it kind of sucks that all our benchmarks are going to be affected by these allocations.
Also note that this puts +/-500kb of noise in browser malloc (since we we may or may not see these allocations depending on the ordering of os_dumps vs chrome_dumps.
Ahh right, good point.
So w.r.t. overhead, we have a mechanism to discount it using TraceEventMemoryOverhead so won't show up in benchmarks.
About sequencing, I need to understand the problem a bit more. Where does the memory accumulate into? client or service side?

Comment 17 by hjd@chromium.org, Aug 3 2017

Mostly in the service, overhead in the clients should have doubled for now but that should disappear when my CL to remove process_metrics_dump_provider lands.
I see so I guess it accumulates until all clients reply and the data gets serialized. 
I think that if we use TraceEventMemoryOverhead right that should go away

Issue 750707 has been merged into this issue.
Issue 750815 has been merged into this issue.
Issue 750707 has been merged into this issue.
Project Member

Comment 22 by 42576172...@developer.gserviceaccount.com, Aug 10 2017

Issue 753955 has been merged into this issue.
Project Member

Comment 23 by 42576172...@developer.gserviceaccount.com, Aug 11 2017

Issue 753957 has been merged into this issue.
hjd: any update on this?

Comment 25 by hjd@chromium.org, Sep 23 2017

Components: Internals>Instrumentation>Memory
Labels: -Performance-Sheriff
Status: Assigned (was: Started)
Summary: Discount memory-infra service memory use. (was: 1%-5.2% regression in system_health.memory_mobile at 490180:490381)
After the dust settles from the remaining memory-infra refactoring I'll figure out how to add this to our discount logic, I've updated the bug title and remove the sheriff label (since this measurement artifact). 
 

Sign in to add a comment