BSDiff search() routine uses incorrect comparison. |
||
Issue descriptionCourgette uses BSDiff's search(), which finds a subarray of |a| that maximally matches the prefix |b|. However, for comparison it checks memcmp(a + pos, b, min(a_size - pos, b_size)) < 0 when it really should check std::lexicographical_compare(a + pos, a + a_size, b + pos, b + b_size). In obscure cases this lead to sub-optimal solutions. Although the impact is likely small, it's still good to nip this bug. This bug was discovered when adding unit tests to search().
,
Jun 27 2016
In #1 I mean memcmp(), not memcpy().
,
Jun 28 2016
CL: http://crrev.com/2078743002 Ran Courgette-gen performance experiments. Only timing CreateBinaryPatch() loop that calls search(). Old = Original (buggy) algorithm. New = std::lexicographical_compare(). Only ran for chrome.7z (32-bit) for 51.0.2704.101 => 52.0.2743.39. 3 trials each. Results: - Patch estimate create: 32-bit (s): - Old: 13.6691; 13.907; 13.3397 - New: 17.5757; 13.997; 13.6073 - Patch estimate create: 64-bit (s): - Old: 14.4447; 13.7234; 13.704 - New: 15.0255; 14.8204; 16.0695 - Patch correction: 32-bit (s): - Old: 5.24407; 6.38889; 5.77116 - New: 5.21153; 5.01745; 5.30495 - Patch correction: 64-bit (s): - Old: 5.22656; 6.41601; 6.04272 - New: 4.98933; 5.03482; 5.21103 So new algorithm might be slightly slower (~10%?), but impact of < 3 s over 800-900 s patch generation time is negligible. Going to commit.
,
Jun 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/01c1c6c377ec5df7ea13e931794dd5d393736f2c commit 01c1c6c377ec5df7ea13e931794dd5d393736f2c Author: huangs <huangs@chromium.org> Date: Tue Jun 28 15:27:06 2016 [Courgette] Make BSDiff search() use lexicographical_compare(). Other changes: - Moved search() to new file bsdiff_search.h. - Created BSDiffSearchTest, taking 1 test from QSufSortTest and adding 1 new test. BUG= 620867 Review-Url: https://codereview.chromium.org/2078743002 Cr-Commit-Position: refs/heads/master@{#402476} [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/BUILD.gn [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/courgette.gyp [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_create.cc [add] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_search.h [add] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_search_unittest.cc [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/qsufsort.h [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/qsufsort_unittest.cc
,
Jun 28 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/01c1c6c377ec5df7ea13e931794dd5d393736f2c commit 01c1c6c377ec5df7ea13e931794dd5d393736f2c Author: huangs <huangs@chromium.org> Date: Tue Jun 28 15:27:06 2016 [Courgette] Make BSDiff search() use lexicographical_compare(). Other changes: - Moved search() to new file bsdiff_search.h. - Created BSDiffSearchTest, taking 1 test from QSufSortTest and adding 1 new test. BUG= 620867 Review-Url: https://codereview.chromium.org/2078743002 Cr-Commit-Position: refs/heads/master@{#402476} [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/BUILD.gn [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/courgette.gyp [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_create.cc [add] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_search.h [add] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/bsdiff_search_unittest.cc [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/qsufsort.h [modify] https://crrev.com/01c1c6c377ec5df7ea13e931794dd5d393736f2c/courgette/third_party/bsdiff/qsufsort_unittest.cc
,
Jun 29 2016
|
||
►
Sign in to add a comment |
||
Comment 1 by hua...@chromium.org
, Jun 16 2016There have been much refactoring in search(), but the issue appear to have been around in the original BSDiff. An example where the bug manifests: Let A = "aaaaa" and B = "aa". The sorted suffixes of "A" are (using $ as end sentinel): 0:$ => pos[0] = 5 1:a$ => pos[1] = 4 2:aa$ => pos[2] = 3 3:aaa$ => pos[3] = 2 4:aaaa$ => pos[4] = 1 5:aaaaa$ => pos[5] = 0 Applying search() on A and B should yield 2, with match position at any {2,3,4,5}. The binary search in search() looks for the highest-rank prefix that's *strictly* less than B = "aa" (1:a$). It then compares it against the next prefix (2:aa$), and choose the best matching one. However, under the existing code, the binary search on [lo, hi) would yield: [0, 5) => (0+5)/2 = 2 => 2:aa$: "aa" < "aa" is false, so hi := 2. [0, 2) => (0+2)/1 = 1 => 1:a$: We'd expect "a" < "aa" to be true, and we'd assign lo := 1. However, when we apply memcpy(), we truncate comparison length to min(a_size - pos[1], b_size) = min(5 - 4, 2) = 1 So we're effectively comparing "a" < "a". This is false, and we'd get hi := 1!! [0, 1): Done binary search. The final choices are between "0:$" and "1:a$", and best match was 1 (should be 2).