Project: chromium Issues People Development process History Sign in
New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 7 users
Status: Fixed
Owner:
ex-Googler
Closed: Oct 2012
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug



Sign in to add a comment
Create a smaller alternative to the bloom filter for safe-browsing's in-memory structure.
Project Member Reported by sh...@chromium.org, Feb 3 2011 Back to list
[A bug to capture and track some work I'm doing on replacing the bloom filter.]

Currently, my safe-browsing database has about 650k prefixes in the bloom filter.  This requires 1.94M in the bloom filter (25 bits per prefix).  Previously, the cap was at 2M, but the safe-browsing team pushed above that temporarily (see  issue 66096 ), and wishes to keep the new 3M cap for future growth and/or transitional spikes.

http://codereview.chromium.org/6286072/ implements an alternative structure which stores a small index and 16-bit deltas for most prefixes.  On the same input data it requires 1.28M, meaning that it can store 50% more prefixes in the same space as the bloom filter, and has no false positives.  Searches take about 3x as long, but are it should scale well (logN, so the size would have to grow a lot to make things much worse).  Memory usage is essentially linear in input size for more than 200k or so inputs, but the index overhead grows for smaller sets (though the set is smaller, so the worst-case efficiency for smaller sets is still only .5M of memory usage).

My plan is to first roll it to the trunk/dev channel with both PrefixSet and BloomFilter in operation, recording histograms.  Then the bloom filter version can be removed.  If reviewers are super confident I could go direct to the new version pretty easily.
 
Labels: Feature-Safebrowsing
Project Member Comment 2 by bugdroid1@chromium.org, Feb 10 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=74487

------------------------------------------------------------------------
r74487 | shess@chromium.org | Thu Feb 10 13:50:43 PST 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=74487&r2=74486&pathrev=74487
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_browser.gypi?r1=74487&r2=74486&pathrev=74487
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.h?r1=74487&r2=74486&pathrev=74487
 A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=74487&r2=74486&pathrev=74487
 A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=74487&r2=74486&pathrev=74487
 A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=74487&r2=74486&pathrev=74487
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_tests.gypi?r1=74487&r2=74486&pathrev=74487

PrefixSet as an alternate to BloomFilter for safe-browsing.

The safe-browsing prefix data is uniformly distributed across the
32-bit integer space.  When sorted, the average delta between items is
about 8,000, which can be encoded in a 16-bit integer.  PrefixSet
takes advantage of this to compress the prefixes into a structure
which is relatively efficient to query.

The current CL adds the new structure, but continues to use the bloom
filter to control things.  Histograms are logged to track differences
in results.

BUG= 71832 
TEST=none

Review URL: http://codereview.chromium.org/6286072
------------------------------------------------------------------------
Project Member Comment 3 by bugdroid1@chromium.org, Mar 4 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=76853

------------------------------------------------------------------------
r76853 | shess@chromium.org | Thu Mar 03 17:16:58 PST 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=76853&r2=76852&pathrev=76853
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=76853&r2=76852&pathrev=76853
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=76853&r2=76852&pathrev=76853
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=76853&r2=76852&pathrev=76853

Additional validation code for PrefixSet.

The PrefixSet-vs-BloomFilter histograms showed a minor discrepency,
with a very small number of PREFIX_SET_EVENT_BLOOM_MISS_PREFIX_HIT
reports.  This CL adds code to regenerate the prefix list and
manually double-check.

Additionally, reduce memory use by requiring the input prefix vector
to be pre-sorted.

BUG= 71832 
TEST=none

Review URL: http://codereview.chromium.org/6591087
------------------------------------------------------------------------
Comment 4 by sh...@chromium.org, Mar 7 2011
Labels: -Mstone-11 Mstone-12
Due to the discrepancy noted in earlier comments, I'm bumping this to M12.  Really aren't any user-visible changes to this.   Issue 75177  is for tracking reverting for M11.
Project Member Comment 5 by bugdroid1@chromium.org, Mar 8 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=77314

------------------------------------------------------------------------
r77314 | shess@chromium.org | Tue Mar 08 11:10:05 PST 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=77314&r2=77313&pathrev=77314
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=77314&r2=77313&pathrev=77314
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=77314&r2=77313&pathrev=77314

safe_browsing::PrefixSet persistence code.

Just the ability to read/write the data structure to a file.  File
format is a simple header, the contents of the vectors, and a
checksum.

BUG= 71832 
TEST=none

Review URL: http://codereview.chromium.org/6625002
------------------------------------------------------------------------
Project Member Comment 6 by bugdroid1@chromium.org, Mar 18 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=78667

------------------------------------------------------------------------
r78667 | shess@chromium.org | Thu Mar 17 22:25:48 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=78667&r2=78666&pathrev=78667
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=78667&r2=78666&pathrev=78667
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=78667&r2=78666&pathrev=78667
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=78667&r2=78666&pathrev=78667
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_util.h?r1=78667&r2=78666&pathrev=78667

Safe-browsing PrefixSet cleanups.

Make sure SBPrefix is a fixed size.
PrefixSet tests for single-element set, set with large deltas, and
  int32 space edge cases.
PrefixSet::GetPrefixes() can be const.
Consolidate the SafeBrowsingDatabase GetPrefixes() checking code.
Check whether deltas fit by directly checking whether the delta fit.
Add a histogram for checking if SBPrefix really was crazy.

BUG= 71832 
TEST=none

Review URL: http://codereview.chromium.org/6711021
------------------------------------------------------------------------
Project Member Comment 7 by bugdroid1@chromium.org, Mar 18 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=78764

------------------------------------------------------------------------
r78764 | shess@chromium.org | Fri Mar 18 15:21:51 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=78764&r2=78763&pathrev=78764

Remove std::set from PrefixSet checking code.

Some implementations of std::set<> make an allocation per element,
with the allocations stitched into a list.  This allows iterators to
remain valid while other elements in the set are added or removed.
Unfortunately, for small elements like SBPrefix in large sets like
safe-browsing, this temporarily uses an unexpectedly large amount of
memory.  [Apologies if I got that wrong!]

This refactors things to allow the std::vector<> version to be used.

BUG= 71832 
TEST=none

Review URL: http://codereview.chromium.org/6711044
------------------------------------------------------------------------
Project Member Comment 8 by bugdroid1@chromium.org, Mar 19 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=78807

------------------------------------------------------------------------
r78807 | shess@chromium.org | Fri Mar 18 19:46:27 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=78807&r2=78806&pathrev=78807

Check PrefixSet results for sorting and duplication.

Check whether PrefixSet::GetPrefixes() returns results that are
unsorted or contain duplicates, and indicate how many items are unique
to the restored vector (or aren't in the restored vector).

BUG= 71832 
TEST=histograms

Review URL: http://codereview.chromium.org/6711054
------------------------------------------------------------------------
Project Member Comment 10 by bugdroid1@chromium.org, Apr 2 2011
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=80256

------------------------------------------------------------------------
r80256 | shess@chromium.org | Fri Apr 01 21:53:10 PDT 2011

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=80256&r2=80255&pathrev=80256
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=80256&r2=80255&pathrev=80256
 M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=80256&r2=80255&pathrev=80256

Add a primitive checksum to PrefixSet construction.

There's a bug where the PrefixSet doesn't seem to work in a small
fraction of cases.  One possibility is that memory is being corrupted
during construction.  Though this seems very unlikely, this should
make sure...

BUG= 71832 
TEST=Monitor histograms

Review URL: http://codereview.chromium.org/6706032
------------------------------------------------------------------------
Comment 11 by k...@google.com, Apr 25 2011
Labels: -Mstone-12 Mstone-13 MovedFrom12
Moving out of M12.
Labels: -Mstone-13 Mstone-14 MovedFrom-13
Moving !type=meta|regression and !releaseblocker to next mstone
Labels: -MovedFrom12 MovedFrom-12
Comment 14 by k...@google.com, Jul 28 2011
Labels: -Mstone-14 Mstone-15 MovedFrom-14
Punting out non-critical bugs.  Please move back to 14 if you believe this was done in error.
Comment 15 by sh...@chromium.org, Aug 30 2011
Labels: -Mstone-15 Mstone-16
Comment 16 by laforge@google.com, Oct 24 2011
Labels: -Mstone-16 MovedFrom-16 Mstone-17
Comment 17 by k...@google.com, Dec 19 2011
Labels: -Mstone-17 Mstone-18 MovedFrom-17
Moving bugs marked as Started but not blockers from M17 to M18.  Please move back if you think this is a blocker, and add the ReleaseBlock-Stable label.  If you're able.
Comment 18 by sh...@chromium.org, Jan 19 2012
Labels: -Mstone-18 Mstone-19
I've resurrected this.  I'm pretty convinced that the mis-matches seen in the wild are memory corruptions, and have a tiny bit of new code to see if it can be isolated a bit more.  Unless that code shows up a smoking gun against PrefixSet, I'm going to roll forward with it.

Analyzing the hit rates, over the fleet PrefixSet gives almost exactly the results expected when you account for the volume of checks, the BloomFilter hit rate, and the false-positive rate implied by how we construct the BloomFilter.  The cases of mismatch don't seem to correlate with volume at all.
Labels: -Mstone-19 Mstone-20
Doubt I'll get to these in M-19.
Comment 20 by k...@google.com, Apr 27 2012
Labels: -Mstone-20 MovedFrom-20
These bugs have hit their move limit.  Please re-target as appropriate.
Comment 21 by sh...@chromium.org, Jul 11 2012
Blocking: chromium:136668
Comment 22 by sh...@chromium.org, Jul 24 2012
Blocking: -chromium:136668
Project Member Comment 23 by bugdroid1@chromium.org, Jul 25 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=148299

------------------------------------------------------------------------
r148299 | shess@chromium.org | 2012-07-25T07:07:07.647315Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter_unittest.cc?r1=148299&r2=148298&pathrev=148299
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter.cc?r1=148299&r2=148298&pathrev=148299
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter.h?r1=148299&r2=148298&pathrev=148299

Add construction checksum to safe-browsing's BloomFilter.

Intending to use this to track down suspected memory corruption.

BUG= 71832 
TEST=none


Review URL: https://chromiumcodereview.appspot.com/10818042
------------------------------------------------------------------------
Project Member Comment 24 by bugdroid1@chromium.org, Jul 25 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=148301

------------------------------------------------------------------------
r148301 | shess@chromium.org | 2012-07-25T08:26:45.801308Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=148301&r2=148300&pathrev=148301

Cleanup histograms tracking prefix-set issues in safe-browsing.

Remove code for histograms which aren't yielding additional useful
info.  Cleanup naming slightly.

BUG= 71832 
TEST=none


Review URL: https://chromiumcodereview.appspot.com/10802098
------------------------------------------------------------------------
Project Member Comment 25 by bugdroid1@chromium.org, Aug 7 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=150403

------------------------------------------------------------------------
r150403 | shess@chromium.org | 2012-08-07T21:09:07.014895Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=150403&r2=150402&pathrev=150403
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=150403&r2=150402&pathrev=150403
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=150403&r2=150402&pathrev=150403

Instrument to catch corruption constructing safe-browsing filters.

Prior instrumentation provided some evidence that memory corruption
was causing inconsistencies between PrefixSet and BloomFilter results.
This CL attempts to detect three types of memory corruption:
 - internal to the PrefixSet data structure.
 - internal to the BloomFilter data structure.
 - the input data used to construct either.

BUG= 71832 
TEST=none


Review URL: https://chromiumcodereview.appspot.com/10834045
------------------------------------------------------------------------
Project Member Comment 26 by bugdroid1@chromium.org, Aug 15 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=151641

------------------------------------------------------------------------
r151641 | shess@chromium.org | 2012-08-15T02:42:30.373146Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=151641&r2=151640&pathrev=151641

Additional corruption instrumentation in safe-browsing filters.

Prior instrumentation indicates potential memory corruption detected
in creating the bloom filter and prefix set.  Add new code to see if
it's transient (probably memory corruption) or persistent (probably
code flaws).

BUG= 71832 


Review URL: https://chromiumcodereview.appspot.com/10857002
------------------------------------------------------------------------
Comment 27 by sh...@chromium.org, Aug 28 2012
The results from the latest CL are that every single case where PrefixSet failed PREFIX_SET_CREATE_PREFIX_SET_CHECKSUM, it succeeded PREFIX_SET_CREATE_PREFIX_SET_CHECKSUM_SUCCEEDED.  There are actually many more cases where the BloomFilter fails PREFIX_SET_CREATE_BLOOM_FILTER_CHECKSUM and it always succeeds PREFIX_SET_CREATE_BLOOM_FILTER_CHECKSUM_SUCCEEDED, except that for each day it seems to fail the latter once (so the values will be 20 and 19, or 15 and 14, like that).  I'm inclined to think that that's an artifact in the system somewhere.  Also there are a small number of checksum failures found in the input data, which means they changed while things were being constructed.

As a result, I'm going to move the transition forward.
Project Member Comment 28 by bugdroid1@chromium.org, Aug 29 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=153816

------------------------------------------------------------------------
r153816 | shess@chromium.org | 2012-08-29T01:02:24.414678Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.h?r1=153816&r2=153815&pathrev=153816
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=153816&r2=153815&pathrev=153816
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=153816&r2=153815&pathrev=153816
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=153816&r2=153815&pathrev=153816

Strip out safe-browsing prefix-set checksumming code.

Stripping out the instrumentation in preparation for landing code to
transition from bloom filter to prefix set.  The histograms indicate
that virtually all of the detected problems can be explained by memory
corruption, and it's reasonable to assume that the rest are, too (just
in a form the tests didn't catch).

BUG= 71832 


Review URL: https://chromiumcodereview.appspot.com/10892016
------------------------------------------------------------------------
Project Member Comment 29 by bugdroid1@chromium.org, Sep 8 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=155543

------------------------------------------------------------------------
r155543 | shess@chromium.org | 2012-09-08T00:54:19.862488Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database_unittest.cc?r1=155543&r2=155542&pathrev=155543
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=155543&r2=155542&pathrev=155543
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.h?r1=155543&r2=155542&pathrev=155543
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set_unittest.cc?r1=155543&r2=155542&pathrev=155543
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/prefix_set.cc?r1=155543&r2=155542&pathrev=155543

Transition safe browsing from bloom filter to prefix set.

If there is a saved prefix set file, use it and delete the bloom
filter.  Otherwise if there is a bloom filter file, use it until first
update.  On update, construct and write a prefix set file, and delete
the bloom filter.

BUG= 71832 


Review URL: https://chromiumcodereview.appspot.com/10896048
------------------------------------------------------------------------
Comment 30 by sh...@chromium.org, Sep 19 2012
For 23.0.1262.0 (Dev), SB2.FilterLoad is around 2% BLOOM_FILTER and 98% PREFIX_SET.  File sizes are down as expected, and load times also down, though there are more outliers (corresponding to outlier browse database sizes, see  issue 148722 ).  Looks likely to stick!
Project Member Comment 31 by bugdroid1@chromium.org, Oct 19 2012
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=162949

------------------------------------------------------------------------
r162949 | shess@chromium.org | 2012-10-19T06:28:21.211103Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database_unittest.cc?r1=162949&r2=162948&pathrev=162949
   D http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter_unittest.cc?r1=162949&r2=162948&pathrev=162949
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.cc?r1=162949&r2=162948&pathrev=162949
   D http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter.cc?r1=162949&r2=162948&pathrev=162949
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/safe_browsing_database.h?r1=162949&r2=162948&pathrev=162949
   D http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/safe_browsing/bloom_filter.h?r1=162949&r2=162948&pathrev=162949
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/PRESUBMIT.py?r1=162949&r2=162948&pathrev=162949
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_tests.gypi?r1=162949&r2=162948&pathrev=162949
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_browser.gypi?r1=162949&r2=162948&pathrev=162949

Finish conversion from bloom filter to prefix set in safe browsing.

Always use the prefix set, delete any old bloom filter.  Users who
haven't launched since M-22 (so more than six weeks) will run without
protection until first update.

BUG= 71832 
TBR=ben@chromium.org

Review URL: https://chromiumcodereview.appspot.com/11196036
------------------------------------------------------------------------
Comment 32 by sh...@chromium.org, Oct 19 2012
Status: Fixed
Project Member Comment 33 by bugdroid1@chromium.org, Mar 10 2013
Labels: -Area-Internals -Feature-Safebrowsing Cr-UI-Browser-SafeBrowsing Cr-Internals
Sign in to add a comment