New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 620867 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

BSDiff search() routine uses incorrect comparison.

Project Member Reported by hua...@chromium.org, Jun 16 2016

Issue description

Courgette uses BSDiff's search(), which finds a subarray of |a| that maximally matches the prefix |b|.  However, for comparison it checks

  memcmp(a + pos, b, min(a_size - pos, b_size)) < 0

when it really should check

  std::lexicographical_compare(a + pos, a + a_size, b + pos, b + b_size).

In obscure cases this lead to sub-optimal solutions.  Although the impact is likely small, it's still good to nip this bug.

This bug was discovered when adding unit tests to search().
 

Comment 1 by hua...@chromium.org, Jun 16 2016

There have been much refactoring in search(), but the issue appear to have been around in the original BSDiff.

An example where the bug manifests: Let A = "aaaaa" and B = "aa".  The sorted suffixes of "A" are (using $ as end sentinel):

  0:$        => pos[0] = 5
  1:a$       => pos[1] = 4
  2:aa$      => pos[2] = 3
  3:aaa$     => pos[3] = 2
  4:aaaa$    => pos[4] = 1
  5:aaaaa$   => pos[5] = 0

Applying search() on A and B should yield 2, with match position at any {2,3,4,5}.  The binary search in search() looks for the highest-rank prefix that's *strictly* less than B = "aa" (1:a$).  It then compares it against the next prefix (2:aa$), and choose the best matching one.

However, under the existing code, the binary search on [lo, hi) would yield:

[0, 5) => (0+5)/2 = 2 => 2:aa$: "aa" < "aa" is false, so hi := 2.
[0, 2) => (0+2)/1 = 1 => 1:a$: We'd expect "a" < "aa" to be true, and we'd assign lo := 1.  However, when we apply memcpy(), we truncate comparison length to
  
  min(a_size - pos[1], b_size) = min(5 - 4, 2) = 1

So we're effectively comparing "a" < "a".  This is false, and we'd get hi := 1!!

[0, 1): Done binary search.  The final choices are between "0:$" and "1:a$", and best match was 1 (should be 2).

Comment 2 by hua...@chromium.org, Jun 27 2016

In #1 I mean memcmp(), not memcpy().

Comment 3 by hua...@chromium.org, Jun 28 2016

CL: http://crrev.com/2078743002

Ran Courgette-gen performance experiments. Only timing CreateBinaryPatch() loop that calls search().

Old = Original (buggy) algorithm.
New = std::lexicographical_compare().

Only ran for chrome.7z (32-bit) for 51.0.2704.101 => 52.0.2743.39. 3 trials each. Results:

- Patch estimate create: 32-bit (s):
  - Old: 13.6691; 13.907; 13.3397
  - New: 17.5757; 13.997; 13.6073
- Patch estimate create: 64-bit (s):
  - Old: 14.4447; 13.7234; 13.704
  - New: 15.0255; 14.8204; 16.0695
- Patch correction: 32-bit (s):
  - Old: 5.24407; 6.38889; 5.77116
  - New: 5.21153; 5.01745; 5.30495
- Patch correction: 64-bit (s):
  - Old: 5.22656; 6.41601; 6.04272
  - New: 4.98933; 5.03482; 5.21103

So new algorithm might be slightly slower (~10%?), but impact of < 3 s over 800-900 s patch generation time is negligible.

Going to commit.

Comment 6 by hua...@chromium.org, Jun 29 2016

Status: Fixed (was: Started)

Sign in to add a comment