New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 608885 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Make Courgette use libdivsufsort

Project Member Reported by hua...@chromium.org, May 3 2016

Issue description

Courgette uses BSDiff, which uses QSufSort to generate suffix arrays.  deymo@ suggested that we switch to replace QSufSort with libdivsufsort, which is faster and uses less memory; indeed, AOSP as been doing this since 2011:

https://android.googlesource.com/platform/external/bsdiff/+/c2c68a78ac72e056cf23b7017206f8a530c0aa63

We want try this in Courgette.

Another issue: Creating chrome.7z patch requires suffix array with 270+ MB entries, i.e., ~1.1 GB of RAM!  To be robust against memory fragmentation, Courgette uses PagedArray, which allocates memory in chunks and pretend it's one big array.  We have to revisit this issue, in light of the fact that in production we use courgette64 anyway.
 
Preliminary experiment for Courgette-gen on Win32 chrome.7z.

Summary: Speeds up Courgette-gen by ~25%  (code: http://crrev.com/1948843002)

Details:
- Testing chrome.7z from 48.0.2563.0 to 49.0.2623.0.
- Final suffix array sort computation is applied on 275,581,363 bytes of data. (Suffix array sort occurs elsewhere, but this is the biggest chunk).
- Experiment is run using courgette64.exe (32-bit courgette.exe may fail)

- Variations:
A: Baseline: QSufSort + PagedArray.
B: Variation 1: QSufSort + std::vector
C: Variation 2: libdivsufsort + std::vector

- Test code (for C only) http://crrev.com/1948843002

- Results for a *single run*:
                         A         B         C
   EXE size (bytes):  602,624   601,600   622,592
Final sort time (s):    184.7     122.9      32.0
     Total time (s):    893.2     799.3     665.9

- Percent changes relative to baseline:
                         A         B         C
           EXE size:    (0)      -0.2%      +3.3%
    Final sort time:    (0)     -33.5%     -82.7%
         Total time:    (0)     -10.5%     -25.4%   <== ~25% faster.

The last case of "libdivsufsort + PagedArray" would need more involved code change.

Meanwhile, we should decide whether to replace PagedArray with std::vector.  If so then it should be a separate change.  If not (e.g., to be conservative, or usabilitiy of 32-bit version) then we would need to change to libdivsufsort, and get less gain.

Comment 3 by de...@chromium.org, May 3 2016

nit: CrOS has been doing this since 2011. AOSP merged CrOS' changes with a git merge, so the changes look like from 2011 but they indeed landed in AOSP last year. CrOS uses bsdiff on basically every incremental update since 2011 and there were corner-case fixes to bsdiff at the time. tbao@ tested that version on Android updates and also had some improvement.

Also, note that libdivsuffsort compiles two different variants: 32 and 64 bits; regardless of the arch you are building for, these are size of the suffix array indices. If your data is less than 2GB, then it is safe to use the 32 bits version (the 64 bit one requires twice as much RAM because each index is 64 bit instead of 32).

Thanks for the info!  I'll likely stick with 32-bit (index size); nearly halving RAM usage unwound the death-clock of 32-bit Courgette-gen.

I want to reformat libdivsort to conform to our style (with tests), and try to adapt PagedArray.

It would also be interesting to investigate using Courgette on CrOS.
divsufsort()'s third param |n| seems to be documented wrong.  deymo@: This might create a benign bug in AOSP's usage?

I'm adapting
https://code.google.com/p/chromium/codesearch#chromium/src/courgette/third_party/qsufsort_unittest.cc
To get it to pass I had to +1 on length, e.g., change:
  courgette::qsuf::qsufsort<int*>(&I[0], &V[0], s, len);
to
  divsufsort(s, &I[0], len + 1);
                           ^^^

Digging deeper: the interface claims:

/**
 * Constructs the suffix array of a given string.
 * @param T[0..n-1] The input string.
 * @param SA[0..n] The output array of suffixes.
 * @param n The length of the given string.
 * @return 0 if no error occurred, -1 or -2 otherwise.
 */
saint_t
divsufsort(const sauchar_t *T, saidx_t *SA, saidx_t n);

But really you should pass length of |T| + 1 as |n|, i.e., one more than qsufsort()'s interface.  Indeed, if you examine divsufsort()'s implementation:

saint_t
divsufsort(const sauchar_t *T, saidx_t *SA, saidx_t n) {
  ...
  /* Check arguments. */
  if((T == NULL) || (SA == NULL) || (n < 0)) { return -1; }
  else if(n == 0) { return 0; }
  else if(n == 1) { SA[0] = 0; return 0; }
  else if(n == 2) { m = (T[0] < T[1]); SA[m ^ 1] = 0, SA[m] = 1; return 0; }

For T = ['A'], we'd expect SA = [0, 1], for [[], ['A']].  However, passing n = 1 == len(T) yields SA = [0].  This appears to apply for longer inputs.

Comment 6 by wfh@chromium.org, May 4 2016

re: comment 4 we can still use the 32-bit index variant but compile for 64-bit - I think other parts of courgette might still need the extra memory space that 64-bit gives us - or is the sort the part with the largest mem usage?

Comment 7 by de...@chromium.org, May 5 2016

re #4: dgarrett@ had an intern 2 or 3 years ago who worked on courgette for arm to see if we could use it in Chrome OS. The TL;DR answer is that courgette requires a LOT of RAM that we are not ok with for background updates.

I would recommend against modifying libdivsufsort and I would treat it as an external library instead.

re #5:
https://github.com/y-256/libdivsufsort in the readme shows something a little different WRT SA:


/*
 * Constructs the suffix array of a given string.
 * @param T[0..n-1] The input string.
 * @param SA[0..n-1] The output array or suffixes.
 * @param n The length of the given string.
 * @return 0 if no error occurred, -1 or -2 otherwise.
 */
saint_t
divsufsort(const sauchar_t *T, saidx_t *SA, saidx_t n);


I don't know where did you get the interface documentation, but https://github.com/y-256/libdivsufsort/blob/master/include/divsufsort.h.cmake#L67 also says SA[0..n-1] and not SA[0..n].

For T=['A'], the suffix array of that string is an array of length 1: SA = [0] by definition and that's what we get from divsufsort(). 
Re. #4: Yes we're working on the client-side RAM problem. The intermediate data can take 100's of MB, but these currently use memory-mapped IO so don't need RAM (needs address space though). As for peak RAM for supporting data structures, we have a CL underway to reduce it by 25%, and can probably reduce by 15% more with another change.  There's a more ambitious goal to also reduce the 100's of MB mentioned earlier, but that's lofty.

Regarding modifying libdivsufsort, I want to measure performance impact of adapting it to use PagedArray.

Re. #5: I looked back a my original cmake-generated divsufsort.h and found "n-1", so you're right; sorry for the noise (accidentally deleted??).  Meanwhile, it still stands that BSDiff followed QSufSort's convention of including empty string in suffix array.  Indeed, your modified version still have I[] allocated with size |oldsize+1|.  And since you continued to use BSDiff's search(), I don't know if there will be some edge case.  Anyway, don't worry about this for now.

Comment 9 by de...@chromium.org, May 17 2016

I think we hit the edge-case you mention recently only when enabling bsdiff improvements in imgdiff's android implementation. I haven't seen this problem in CrOS, maybe it just manifests as a worse-than-ideal patch size when you are searching for the lexicographically largest string, but still valid patch. We test all the bsdiff patches on the server side with bspatch.

Anyway, the fix is here: https://android-review.googlesource.com/#/c/223480/

Thanks for the update!  The convention used by qsufsort and search() is that to include '', thus giving rise to n+1 suffixes, which search() relies on.  Meanwhile, divsufsort excludes '', and its accompanying sa_search() uses the convention.  However, sa_search() applies lower_bound() rather than finding longest match.  Although it's more efficient, some work is needed to get it to do what search() does.

So right now Android's bsdiff (and soon Courgette) applies a "Frankenstein" fusion of divsufsort() and (BSDiff's) search().  Re.  Bug 223480 , I don't think the "oldsize" -> "oldsize-1" fix is correct.  I'd propose to leave search() alone, but adapt divsufsort() to produce the same output as qsufsort.  So instead of

if(divsufsort(old_buf, I, oldsize)) err(1, "divsufsort");

perhaps do:

I[0] = oldsize;  // Empty string
if(divsufsort(old_buf, &I[1], oldsize)) err(1, "divsufsort");

Meanwhile here's experiment for Courgette: https://codereview.chromium.org/1948843002/
This is complicated by Courgette's usage of PagedArray (and my decision to keep using it to extend the usability of 32-bit Courgette). I followed your advice and sought to minimize code changes to divsufsort().  :)
And if you want to add unit tests, please feel free to take Courgette's tests as a starting point.  https://codereview.chromium.org/1948843002/ as a version for Franken-divsufsort.
Project Member

Comment 12 by bugdroid1@chromium.org, Jun 2 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/804ed8a1e0f236f22e19abef7013e9641a3ad49d

commit 804ed8a1e0f236f22e19abef7013e9641a3ad49d
Author: huangs <huangs@chromium.org>
Date: Thu Jun 02 06:26:36 2016

[Courgette] PagedArray: Add Iterators and Parametrize Page Size as int Template.

This is a refactoring CL to enable PagedArray usage by libdivsufsort.
In addition to overloading operator[], for more general usage we need
need pointer-like accessors to PagedArray. To this end we implement
PagedArray_const_iterator and PagedArray_const_iterator, which merely
wraps a PagedArray pointer along with an index. We also add various
operators needed by libdivsufsort. For optimization, '<' and '<='
operators omits pointer checks.

By default PagedArray page size is 2**18 elements (1 MiB for int32_t).
To enable better testing, we made (log) page size a tepmlate parameter.

BUG= 608885 

Review-Url: https://codereview.chromium.org/2008553007
Cr-Commit-Position: refs/heads/master@{#397311}

[modify] https://crrev.com/804ed8a1e0f236f22e19abef7013e9641a3ad49d/courgette/third_party/bsdiff/paged_array.h
[modify] https://crrev.com/804ed8a1e0f236f22e19abef7013e9641a3ad49d/courgette/third_party/bsdiff/paged_array_unittest.cc

Third-party library inclusion proposal: go/courgette-use-libdivsufsort
Project Member

Comment 14 by bugdroid1@chromium.org, Jul 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089

commit 7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089
Author: huangs <huangs@chromium.org>
Date: Tue Jul 26 21:46:13 2016

[Courgette] Refactor BSDiff namespaces and bsdiff::search() interface.

Details:
- Move BSDiff (but not PagedArray) from namespace courgette to bsdiff.
- Change namespace courgette::qsuf to qsuf.
- Change bsdiff:search() to return struct {pos, size} so we don't need
  awkward pointer passing; update callers.
- Updated BSDiff callers. Also fix weird hybrid usage by setup_util.cc,
  which calls Courgette's BSDiff, but using OK from BSPatch.

BUG= 608885 

Review-Url: https://codereview.chromium.org/2031193002
Cr-Commit-Position: refs/heads/master@{#407924}

[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/chrome/installer/setup/setup_util.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/chrome/utility/chrome_content_utility_client.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/components/update_client/component_patcher_operation.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/bsdiff_memory_unittest.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/courgette_tool.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/simple_delta.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/README.chromium
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/bsdiff.h
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/bsdiff_apply.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/bsdiff_create.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/bsdiff_search.h
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/bsdiff_search_unittest.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/qsufsort.h
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/third_party/bsdiff/qsufsort_unittest.cc
[modify] https://crrev.com/7054b5a2e3b2ce7f5b2e868e91d54b5ec0773089/courgette/versioning_unittest.cc

Project Member

Comment 15 by bugdroid1@chromium.org, Jul 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4

commit d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4
Author: huangs <huangs@chromium.org>
Date: Wed Jul 27 16:07:51 2016

[Courgette] Add third-party library: libdivsufsort.

We wish to use third-party library libdivsufsort in Courgette to replace
QSufSort in Courgette-gen. This will reduce time and memory usage. This
CL focuses on adding libdivsufsort and tests. We will make the switch in
a follow-up CL.

Proposal: go/courgette-use-libdivsufsort
Code source: https://github.com/y-256/libdivsufsort

BUG= 631482 , 608885 

Review-Url: https://codereview.chromium.org/2156223002
Cr-Commit-Position: refs/heads/master@{#408141}

[modify] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/BUILD.gn
[modify] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/courgette.gyp
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/LICENSE
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/README.chromium
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/divsufsort.cc
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/divsufsort.h
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/divsufsort_private.h
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/divsufsort_unittest.cc
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/sssort.cc
[add] https://crrev.com/d63dcb01c3a8fc30a840ce5aaab8ea5e6c9107e4/courgette/third_party/divsufsort/trsort.cc

Did more experiments, confirmed the 25% time reduction (for courgette64.exe):

CL: http://crrev.com/2187953003/
Data: https://docs.google.com/a/chromium.org/spreadsheets/d/1gVyy5olmXVy7aGr0ibCLDGVmoD4ljxAwCI_WQKwZP8g/edit?usp=sharing

Summary:
- 32-bit Courgette-gen (courgette.exe) used to fail for 64-bit Chrome, that's fixed (due to significant memory reduction, ~1 GB).
- 64-bit Courgette-gen: Time reduced by 25% (for 64-bit Chrome, 960s to 660s)
- 32-bit Courgette-gen: Time reduced by 22% (32-bit only, since baseline failed).
- Suffix sorting time: Reduced by 70% (for 64-bit Chrome, 400s to 100s).
- courgette.exe and courgette64.exe each increased in size by 7%.  Note: should not affect client-side code.

Details:
- 2 configurations:
  - Baseline
  - DivSufSort.
- 2 executables: courgette.exe (32-bit) and courgette64.exe.
- 3 data sets (patching chrome.7z):
  - A: 32-bit 50.0.2661.106 to 51.0.2704.101
  - B: 51.0.2704.101 to 52.0.2743.39
  - C: 51.0.2704.101 to 52.0.2743.39
- 6 trial runs.
- Super nitty note: etiennep@ reduced time by 4% in http://crrev.com/2055343002 by an orthogonal fashion.
  - If we factor this *out*: Normalized to current baseline = 1, old time would be 1/(0.04) = 1.04167, and our reduction of 0.25 would become 1-(1.04167-0.25)/1.04167 = 24%. But we're already being conservative (64-bit Chrome had 31% reduction). So I think it's fair to claim 25% reduction.

Status: Fixed (was: Assigned)

Sign in to add a comment