New issue
Advanced search Search tips

Issue 670002 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Crazy heap churn in safe browsing

Project Member Reported by siggi@chromium.org, Nov 30 2016

Issue description

I'm working on a CL to expose memory metrics in chrome://profiler. What consistently sticks out as a massive heap churner is the SafeBrowsing database. See attached HTML file.

Looks like the task is LocalSafeBrowsingDatabaseManager::DatabaseUpdateFinished. This seems to churn through an average of 1.4M heap allocations and frees per run which consumes >1.5s per run on my Z620. It also looks like this is dealing with a large number of small allocations, and so a large fraction of the memory used is wasted on heap overhead. Something like 25% of the memory allocated seems to be overhead by my reading.
 
profiler.html
24.1 KB View Download

Comment 1 by siggi@chromium.org, Nov 30 2016

Attaching an image of the html attachment, as it don't render so good here.
heap_churn.png
201 KB View Download
Interesting.  Though... that code is being all being replaced with new structures in components/safe_browsing_db, launching in M56.  We shouldn't spend time optimizing the old code at this point.

It may be useful to run the same analysis on the new code.

Comment 3 by siggi@chromium.org, Nov 30 2016

This is the tip of tree in my local build. Sadly my instrumentation goes only to the task level, so I can't say where the allocs are actually happening.

Comment 4 by siggi@chromium.org, Nov 30 2016

Cc: nparker@chromium.org
Owner: vakh@chromium.org
My CL is hopefully landing this week (https://codereview.chromium.org/2386123003/), at which point the memory metrics will be available in all dev builds in chrome://profiler. 
There are also macros in https://cs.chromium.org/chromium/src/base/profiler/scoped_profile.h that allow profiling a specific scope, presenting in the same UI.
We'll look at the new code with //profiler.  Thx.

Comment 6 by vakh@chromium.org, Dec 2 2016

Labels: SafeBrowsing-Triaged

Comment 7 by siggi@chromium.org, Dec 13 2016

Just took a look at this on my official SyzyASAN canary. Looks like the V4 implementation runs in 134ms average on a SyzyASAN instrumented build. It's making ~150 allocs average. The high-watermark of allocation is larger at 10MB, but overhead is down to way below 1%, from ~25%.
Cool beans - this seems to be running 10X faster in the new implementation on a SyzyASAN build, than the old impl did on an uninstrumented build. SyzyASAN slows you down about 3X, so that's perhaps on the order of 30X speedup :).
safebrowsing_v4.png
212 KB View Download
Status: Fixed (was: Untriaged)
Excellent! Thanks for the update.  I wonder if we're counting the right functions, since 30x seems large.  But it should be a lot more streamlined (more large-block allocation and manipulation), with a trade off of being less compressed in RAM (hence the higher total RAM usage).

I'll call this fixed.

Comment 9 by siggi@chromium.org, Dec 13 2016

I question the "less compressed in RAM" statement. If your allocations are large, then you have less heap overhead (which accounted for ~25% of the previous implementation's usage). You should also have better locality, at least at the page level.
Have you considered keeping the DB in a mapped file? This would keep the REST format synonymous with the in-memory format, allow demand paging on use, and allow the OS to manage write-out.


Ah, sorry -- The Pver3 in-memory format actually compressed the data by storing a sort of prefix tree, so some of the data is de-dupped.  The Pver4 format doesn't do that.  It's simpler but I expect it uses more RAM (that's what we see, yes?).

Comment 11 by siggi@chromium.org, Dec 15 2016

Can't say more RAM, the instrumentation really doesn't allow looking at
outstanding allocs. The high-watermark, e.g. the number of bytes
outstanding at any time is definitely higher.
As for mapped file:  The on-disk format is a proto, which requires unmarshalling.  So we'd need to do that and then... maybe map the block of memory to a new file on disk.  Would that write it immediately? Either way, I'm wary of the latency incurred by paging in parts of the blacklist, since there isn't really any hot/cold spots to it -- it's really random access.

Sign in to add a comment