New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 733714 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 733323



Sign in to add a comment

Potential memory leak of CSSSelectors.

Project Member Reported by erikc...@chromium.org, Jun 15 2017

Issue description

I've been running Chrome with memory profiling enabled. I started with a gmail tab that I used for ~2 weeks. Then I forced a GC, navigated to about:blank, and then forced another GC. The renderer that was hosting the about:blank frame still has >100 thousand objects created from core/CSS that have not been destroyed. This seems likely to be a CSS-related leak. 

Raw & processed data: https://bugs.chromium.org/p/chromium/issues/detail?id=733323#c2
Methodology: https://docs.google.com/document/d/1fN5balfyrd7sRpd6DRaUI1TwoOwYjLyRSd7mwZT5US8/edit#

These stats come from a renderer that is only hosting about:blank.

Each screenshot shows:
  1) # of objects created [that have not been destroyed]
  2) The stack trace of the code that created the object.

The screenshots from partition-alloc show >100k instances of CSSSelector, allocated from blink::CSSSelectorParser::ConsumeComplexSelectorList(blink::CSSParserTokenRange&).

The screenshots from oilpan also show >100k CSS-related objects that are still alive.

 
Screen Shot 2017-06-15 at 10.29.11 AM.png
154 KB View Download
Screen Shot 2017-06-15 at 10.29.18 AM.png
155 KB View Download
Screen Shot 2017-06-15 at 10.29.24 AM.png
173 KB View Download
Screen Shot 2017-06-15 at 10.29.30 AM.png
148 KB View Download
Screen Shot 2017-06-15 at 10.33.43 AM.png
128 KB View Download
Screen Shot 2017-06-15 at 10.33.50 AM.png
142 KB View Download
Screen Shot 2017-06-15 at 10.33.56 AM.png
157 KB View Download
Summary: Potential memory leak of CSSSelectors. (was: Potential memory leaks in CSSSelectorParser::ConsumeComplexSelectorList.)
Cc: keishi@chromium.org haraken@chromium.org
When I spoke with ikilpatrick, we weren't able to figure out the cause of the leak, since all we have are allocation stack traces. Given that this is potentially a severe issue, what are the appropriate next steps here?

Should we be doing something to better track CSS objects? e.g. should we make a MemoryDumpProvider that can emit # of selectors, etc.?
Blocking: 733323

Comment 4 by r...@opera.com, Jun 15 2017

Did you clear the memory cache? Could it be the cached StyleSheetContents stored with the resources? I don't know if they are disposed once none of the documents reference them.

Hm. How do you clear the "memory cache"? [I'm not sure which cache you're referring to]. I still have the renderer open and untouched since navigation to about:blank.
Cc: hirosh...@chromium.org
hiroshige: Do you have any thoughts on this? Is it possible that the MemoryCache is holding the style sheet resources?

Hm. I closed all tabs other than about:blank and a chrome://tracing tab and took a memory infra trace. Trace:
https://drive.google.com/a/google.com/file/d/0B6ho_7_ut5e1NmtlTXdjZVJpa3M/view?usp=sharing

The relevant process is 10717. According to the tracing process title, it's still a gmail tab, but I can confirm that it's definitely about:blank. It has 80MB in blink_gc, spread across ~88k live objects. It has ~34MB size in partition_alloc. # of allocated objects unclear. web_cache has nothing emitted, so I'm guessing it's empty?
Status: Available (was: Untriaged)
Labels: -Pri-1 Pri-2
Labels: Update-Fortnightly

Comment 11 by r...@opera.com, Jun 16 2017

Just did some simple testing here with a file:/ url loading a google site stylesheet with a StyleSheetContents added to the resource cache. It is garbage collected after a couple of seconds after going to about:blank. I haven't looked at the resource memory cache implementation to see if it would hold on to anything not being referenced by any documents under certain circumstances.

Do you know if we leak StyleSheetContents as well as the selectors, or if we have dangling selector objects?

Hm. I took another trace of about:blank today, and there are still many live CSS objects, but fewer than before. Maybe ~100k total between oilpan and partition alloc.

partition alloc: 18k CSSSelectors, 13k atomicstrings [owned by CSSParserSelector], 12k DescendantInvalidationSet

oilpan: 18k rules, 18k StyleRules, 18k ImmutableStylePropertySet, 12k rules...

re: StyleSheetContents - don't know. It sounds like it might be helpful to add a Blink-related memory dump provider, b/c that will help us determine the type of root object that's leaking?

haraken, meade - what are next steps here?
process_10717_blink_gc-leaks.json
89.4 KB View Download
process_10717_malloc-leaks.json
75.1 KB View Download
process_10717_partition_alloc-leaks.json
79.7 KB View Download
> MemoryCache
MemoryCache doesn't hold strong references to Resources and thus doesn't affect the lifetime of Resources (and thus StyleSheetContents).

If Resources are leaking, then someone else should hold strong references (e.g. ResourceFetcher's preloading control mechanism or someone outside loading code).

So what's the repro step?

1) Crawl many websites.
2) Go to about:blank
3) Force GCs

?

Comment 16 by meade@chromium.org, Jun 20 2017

Hmm, I don't have any particular insights from the style side. It'd be pretty hard to diagnose with only the allocation traces - much more useful would be tools to help find the root object that's leaking, which haraken/the oilpan folks would be in the best position to help.
Cc: mariakho...@chromium.org dskiba@chromium.org ssid@chromium.org
Cc: yuzus@chromium.org
+yuzus. Blink leak detector might be very useful on real sites like gmail.

Repro steps:

Leave open a gmail tab [and use it regularly]. After a week, navigate to about:blank and force a GC, and then notice that there are many leaked CSSSelectors.

To turn on native-heap-profiling, navigate to chrome://flags and set #enable-heap-profiling to Native. Then restart Chrome.

Comment 19 by yuzus@chromium.org, Jun 28 2017

Agreed!
I'm currently implementing a DevTools Protocol method to detect Blink memory leaks and it should be able to catch this kind of leak.
yuzus I have very little context about the leak detector so what I am going to say might make little sense.
would it be possible to expose the data that the leak detector uses via a MemoryDumpProvider (see [1])?
My weak understanding that it is not based on counting actual memory but number of objects. How invasive is that perf-wise and code-wise?
I'd be happy if we could expose even just the number objects in tracing (there is already the possibility to have some non-bytes columns) and create a metrics that alerts on that on the waterfall.


[1] https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/adding_memory_infra_tracing.md
It's technically easy to expose the number of those objects (Nodes, Documents, WorkerGlobalScopes etc) but what do you want to do with it on the waterfall? If we can run a long-running browsing stories on the waterfall (e.g., crawl 100 pages), it would be useful though.

The leak detector is working by comparing the count between before loading the page and after navigating to about:blank.


How many navigations it takes to detect get a leak with good confidence? If it's a matter of tens of pages should be doable. If is hundreds might be a stretch timing-wise for the waterfall capacity.
If we expose this via a MemoryDumpProvider, Etienne is currently working on super-long-running tests [e.g. 1000s of navigations], and any leaks will be very obvious. Let's confirm whether exposing these objects provides utility, and then if it does, figure out a way to build non-noisy waterfall tests?
> How many navigations it takes to detect get a leak with good confidence?

Only one. about:blank => website => about:blank would be enough to detect leaks of the website.

I'm a bit confused. The current plan is to use telemetry to do the navigation for hundreds of websites on the performance waterfall. How is it different from the leak detector with MemoryDumpProvider you're talking about here? It seems like the difference is just whether we use telemetry or MemoryDumpProvider...?


If this information is exposed in a MemoryDumpProvider, it will be automatically available in memory dumps, and could be exposed in memory-UMA. This may help with debugging real world examples of leaks.
Ah, that makes sense.

And the short answer to Primino's question is: It's easy to expose the numbers to MemoryDumpProvider.

Comment 27 Deleted

Comment 28 Deleted

Project Member

Comment 29 by ClusterFuzz, Aug 9 2017

Labels: ClusterFuzz-Verified
Status: Verified (was: Available)
ClusterFuzz testcase 5515632539074560 is verified as fixed, so closing issue as verified.

If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
Labels: ClusterFuzz-Wrong
Status: Available (was: Verified)
Don't see how this is related to clusterfuzz or why the issue was marked as verified
It's not related to the custerfuzz bug.
I incorrectly merged this bug, and unmerged it later.
I bet clusterfuzz is not aware of the unmerge.
Labels: -ClusterFuzz-Wrong
We have made a bunch of changes on ClusterFuzz side, so resetting ClusterFuzz-Wrong label.
Labels: Code-Parser
Labels: ApproachableBug
Labels: -Update-Fortnightly
Labels: -ClusterFuzz-Verified

Sign in to add a comment