Potential memory leak of CSSSelectors. |
||||||||||||||||
Issue descriptionI've been running Chrome with memory profiling enabled. I started with a gmail tab that I used for ~2 weeks. Then I forced a GC, navigated to about:blank, and then forced another GC. The renderer that was hosting the about:blank frame still has >100 thousand objects created from core/CSS that have not been destroyed. This seems likely to be a CSS-related leak. Raw & processed data: https://bugs.chromium.org/p/chromium/issues/detail?id=733323#c2 Methodology: https://docs.google.com/document/d/1fN5balfyrd7sRpd6DRaUI1TwoOwYjLyRSd7mwZT5US8/edit# These stats come from a renderer that is only hosting about:blank. Each screenshot shows: 1) # of objects created [that have not been destroyed] 2) The stack trace of the code that created the object. The screenshots from partition-alloc show >100k instances of CSSSelector, allocated from blink::CSSSelectorParser::ConsumeComplexSelectorList(blink::CSSParserTokenRange&). The screenshots from oilpan also show >100k CSS-related objects that are still alive.
,
Jun 15 2017
When I spoke with ikilpatrick, we weren't able to figure out the cause of the leak, since all we have are allocation stack traces. Given that this is potentially a severe issue, what are the appropriate next steps here? Should we be doing something to better track CSS objects? e.g. should we make a MemoryDumpProvider that can emit # of selectors, etc.?
,
Jun 15 2017
,
Jun 15 2017
Did you clear the memory cache? Could it be the cached StyleSheetContents stored with the resources? I don't know if they are disposed once none of the documents reference them.
,
Jun 15 2017
Hm. How do you clear the "memory cache"? [I'm not sure which cache you're referring to]. I still have the renderer open and untouched since navigation to about:blank.
,
Jun 15 2017
hiroshige: Do you have any thoughts on this? Is it possible that the MemoryCache is holding the style sheet resources?
,
Jun 16 2017
Hm. I closed all tabs other than about:blank and a chrome://tracing tab and took a memory infra trace. Trace: https://drive.google.com/a/google.com/file/d/0B6ho_7_ut5e1NmtlTXdjZVJpa3M/view?usp=sharing The relevant process is 10717. According to the tracing process title, it's still a gmail tab, but I can confirm that it's definitely about:blank. It has 80MB in blink_gc, spread across ~88k live objects. It has ~34MB size in partition_alloc. # of allocated objects unclear. web_cache has nothing emitted, so I'm guessing it's empty?
,
Jun 16 2017
,
Jun 16 2017
,
Jun 16 2017
,
Jun 16 2017
Just did some simple testing here with a file:/ url loading a google site stylesheet with a StyleSheetContents added to the resource cache. It is garbage collected after a couple of seconds after going to about:blank. I haven't looked at the resource memory cache implementation to see if it would hold on to anything not being referenced by any documents under certain circumstances. Do you know if we leak StyleSheetContents as well as the selectors, or if we have dangling selector objects?
,
Jun 16 2017
Hm. I took another trace of about:blank today, and there are still many live CSS objects, but fewer than before. Maybe ~100k total between oilpan and partition alloc. partition alloc: 18k CSSSelectors, 13k atomicstrings [owned by CSSParserSelector], 12k DescendantInvalidationSet oilpan: 18k rules, 18k StyleRules, 18k ImmutableStylePropertySet, 12k rules... re: StyleSheetContents - don't know. It sounds like it might be helpful to add a Blink-related memory dump provider, b/c that will help us determine the type of root object that's leaking? haraken, meade - what are next steps here?
,
Jun 16 2017
> MemoryCache MemoryCache doesn't hold strong references to Resources and thus doesn't affect the lifetime of Resources (and thus StyleSheetContents). If Resources are leaking, then someone else should hold strong references (e.g. ResourceFetcher's preloading control mechanism or someone outside loading code).
,
Jun 17 2017
So what's the repro step? 1) Crawl many websites. 2) Go to about:blank 3) Force GCs ?
,
Jun 19 2017
,
Jun 20 2017
Hmm, I don't have any particular insights from the style side. It'd be pretty hard to diagnose with only the allocation traces - much more useful would be tools to help find the root object that's leaking, which haraken/the oilpan folks would be in the best position to help.
,
Jun 27 2017
,
Jun 27 2017
+yuzus. Blink leak detector might be very useful on real sites like gmail. Repro steps: Leave open a gmail tab [and use it regularly]. After a week, navigate to about:blank and force a GC, and then notice that there are many leaked CSSSelectors. To turn on native-heap-profiling, navigate to chrome://flags and set #enable-heap-profiling to Native. Then restart Chrome.
,
Jun 28 2017
Agreed! I'm currently implementing a DevTools Protocol method to detect Blink memory leaks and it should be able to catch this kind of leak.
,
Jun 28 2017
yuzus I have very little context about the leak detector so what I am going to say might make little sense. would it be possible to expose the data that the leak detector uses via a MemoryDumpProvider (see [1])? My weak understanding that it is not based on counting actual memory but number of objects. How invasive is that perf-wise and code-wise? I'd be happy if we could expose even just the number objects in tracing (there is already the possibility to have some non-bytes columns) and create a metrics that alerts on that on the waterfall. [1] https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/adding_memory_infra_tracing.md
,
Jun 28 2017
It's technically easy to expose the number of those objects (Nodes, Documents, WorkerGlobalScopes etc) but what do you want to do with it on the waterfall? If we can run a long-running browsing stories on the waterfall (e.g., crawl 100 pages), it would be useful though. The leak detector is working by comparing the count between before loading the page and after navigating to about:blank.
,
Jun 28 2017
How many navigations it takes to detect get a leak with good confidence? If it's a matter of tens of pages should be doable. If is hundreds might be a stretch timing-wise for the waterfall capacity.
,
Jun 28 2017
If we expose this via a MemoryDumpProvider, Etienne is currently working on super-long-running tests [e.g. 1000s of navigations], and any leaks will be very obvious. Let's confirm whether exposing these objects provides utility, and then if it does, figure out a way to build non-noisy waterfall tests?
,
Jun 28 2017
> How many navigations it takes to detect get a leak with good confidence? Only one. about:blank => website => about:blank would be enough to detect leaks of the website. I'm a bit confused. The current plan is to use telemetry to do the navigation for hundreds of websites on the performance waterfall. How is it different from the leak detector with MemoryDumpProvider you're talking about here? It seems like the difference is just whether we use telemetry or MemoryDumpProvider...?
,
Jun 28 2017
If this information is exposed in a MemoryDumpProvider, it will be automatically available in memory dumps, and could be exposed in memory-UMA. This may help with debugging real world examples of leaks.
,
Jun 28 2017
Ah, that makes sense. And the short answer to Primino's question is: It's easy to expose the numbers to MemoryDumpProvider.
,
Aug 9 2017
ClusterFuzz testcase 5515632539074560 is verified as fixed, so closing issue as verified. If this is incorrect, please add ClusterFuzz-Wrong label and re-open the issue.
,
Aug 9 2017
Don't see how this is related to clusterfuzz or why the issue was marked as verified
,
Aug 9 2017
It's not related to the custerfuzz bug. I incorrectly merged this bug, and unmerged it later. I bet clusterfuzz is not aware of the unmerge.
,
Sep 18 2017
We have made a bunch of changes on ClusterFuzz side, so resetting ClusterFuzz-Wrong label.
,
Oct 30 2017
,
Nov 27 2017
,
Dec 6 2017
,
Apr 25 2018
|
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by erikc...@chromium.org
, Jun 15 2017