Crazy heap churn in safe browsing |
||||
Issue descriptionI'm working on a CL to expose memory metrics in chrome://profiler. What consistently sticks out as a massive heap churner is the SafeBrowsing database. See attached HTML file. Looks like the task is LocalSafeBrowsingDatabaseManager::DatabaseUpdateFinished. This seems to churn through an average of 1.4M heap allocations and frees per run which consumes >1.5s per run on my Z620. It also looks like this is dealing with a large number of small allocations, and so a large fraction of the memory used is wasted on heap overhead. Something like 25% of the memory allocated seems to be overhead by my reading.
,
Nov 30 2016
Interesting. Though... that code is being all being replaced with new structures in components/safe_browsing_db, launching in M56. We shouldn't spend time optimizing the old code at this point. It may be useful to run the same analysis on the new code.
,
Nov 30 2016
This is the tip of tree in my local build. Sadly my instrumentation goes only to the task level, so I can't say where the allocs are actually happening.
,
Nov 30 2016
My CL is hopefully landing this week (https://codereview.chromium.org/2386123003/), at which point the memory metrics will be available in all dev builds in chrome://profiler. There are also macros in https://cs.chromium.org/chromium/src/base/profiler/scoped_profile.h that allow profiling a specific scope, presenting in the same UI.
,
Nov 30 2016
We'll look at the new code with //profiler. Thx.
,
Dec 2 2016
,
Dec 13 2016
Just took a look at this on my official SyzyASAN canary. Looks like the V4 implementation runs in 134ms average on a SyzyASAN instrumented build. It's making ~150 allocs average. The high-watermark of allocation is larger at 10MB, but overhead is down to way below 1%, from ~25%. Cool beans - this seems to be running 10X faster in the new implementation on a SyzyASAN build, than the old impl did on an uninstrumented build. SyzyASAN slows you down about 3X, so that's perhaps on the order of 30X speedup :).
,
Dec 13 2016
Excellent! Thanks for the update. I wonder if we're counting the right functions, since 30x seems large. But it should be a lot more streamlined (more large-block allocation and manipulation), with a trade off of being less compressed in RAM (hence the higher total RAM usage). I'll call this fixed.
,
Dec 13 2016
I question the "less compressed in RAM" statement. If your allocations are large, then you have less heap overhead (which accounted for ~25% of the previous implementation's usage). You should also have better locality, at least at the page level. Have you considered keeping the DB in a mapped file? This would keep the REST format synonymous with the in-memory format, allow demand paging on use, and allow the OS to manage write-out.
,
Dec 15 2016
Ah, sorry -- The Pver3 in-memory format actually compressed the data by storing a sort of prefix tree, so some of the data is de-dupped. The Pver4 format doesn't do that. It's simpler but I expect it uses more RAM (that's what we see, yes?).
,
Dec 15 2016
Can't say more RAM, the instrumentation really doesn't allow looking at outstanding allocs. The high-watermark, e.g. the number of bytes outstanding at any time is definitely higher.
,
Dec 15 2016
As for mapped file: The on-disk format is a proto, which requires unmarshalling. So we'd need to do that and then... maybe map the block of memory to a new file on disk. Would that write it immediately? Either way, I'm wary of the latency incurred by paging in parts of the blacklist, since there isn't really any hot/cold spots to it -- it's really random access. |
||||
►
Sign in to add a comment |
||||
Comment 1 by siggi@chromium.org
, Nov 30 2016201 KB
201 KB View Download