New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 605565 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Courgette fails diffing 32-bit 50.0.2661.75 -> 50.0.2661.86

Project Member Reported by mmoss@chromium.org, Apr 21 2016

Issue description

Three attempts to generate the 32-bit 50.0.2661.86_50.0.2661.75_chrome_updater.exe have all failed with:

courgette64.exe exit status -1073741571

Note that the diff of the 64-bit builds of those versions worked fine, as have all other recent previous and subsequent 32-bit build diffs. This failure only seems to crop up diffing those exact builds.


The script that generates the diff is gen_diffs.py in:
https://chrome-internal.googlesource.com/chrome/tools/release/scripts/

The input installers are:
old: https://storage.googleapis.com/chrome-signed/desktop-5c0tCh/50.0.2661.75/win/50.0.2661.75_chrome_installer.exe
new: https://storage.googleapis.com/chrome-signed/desktop-5c0tCh/50.0.2661.86/win/50.0.2661.86_chrome_installer.exe

and the courgette inputs are:
old: https://storage.googleapis.com/chrome-unsigned/desktop-5c0tCh/50.0.2661.75/win/courgette64.exe
new: https://storage.googleapis.com/chrome-unsigned/desktop-5c0tCh/50.0.2661.86/win/courgette64.exe

You can see the exact command-line used at the top of the logs.


The logs for the failed runs are at:
https://uberchromegw.corp.google.com/i/official.diffs/builders/Official%20Win%20Diffs/builds/23158/steps/Create%20Diff%20Installer%20diff-request.20160420-081933-982608.txt.Gen%20Diffs%2050.0.2661.75_chrome_installer.exe%3A50.0.2661.86_chrome_installer.exe/logs/stdio

https://uberchromegw.corp.google.com/i/official.diffs/builders/Official%20Win%20Diffs/builds/23129/steps/Create%20Diff%20Installer%20diff-request.20160419-232326-275286.txt.Gen%20Diffs%2050.0.2661.75_chrome_installer.exe%3A50.0.2661.86_chrome_installer.exe/logs/stdio

https://uberchromegw.corp.google.com/i/official.diffs/builders/Official%20Win%20Diffs/builds/23126/steps/Create%20Diff%20Installer%20diff-request.20160419-220840-766679.txt.Gen%20Diffs%2050.0.2661.75_chrome_installer.exe%3A50.0.2661.86_chrome_installer.exe/logs/stdio


 

Comment 1 by wfh@chromium.org, Apr 21 2016

Status: Started (was: Assigned)
I can reproduce this crash locally. It is a stack overflow.

Comment 2 by wfh@chromium.org, Apr 21 2016

Cc: wfh@chromium.org
Labels: -Pri-3 Pri-2
Owner: hua...@chromium.org
Status: Assigned (was: Started)
It's this recursion:

  if (start < j)
    split<T>(I, V, start, j, h); <- here

https://code.google.com/p/chromium/codesearch#chromium/src/courgette/third_party/qsufsort.h&sq=package:chromium&l=104

last touched by huangs@ in 21773eb242e96704d0ec486d1adf95f2da5d55f7

huangs@ -> can you take a look at this crash?

Comment 3 by hua...@chromium.org, Apr 22 2016

Reproduced the problem on courgette64.exe.  Read the QsufSort algo and didn't find anything obvious.

However, courgette.exe (32-bit) successfully generates the patch! TODO:
- Run with courgette.exe from before switch to VS2015; it might be a compiler issue?
- Look at disassembly of courgette::qsuf::split() in courgette64.exe, see if its recursion does something funky.

Will do these tomorrow.

Comment 4 by wfh@chromium.org, Apr 22 2016

Cc: brucedaw...@chromium.org
hmm I wonder if it's an optimization or compiler issus with VS2015? perhaps try different optimization settings for 64-bit. +brucedawson

Comment 5 by hua...@chromium.org, Apr 22 2016

Might not be compiler.  I extracted the offending 271,430,404 bytes in to a file "big" and ran "courgette[64].exe -genbsdiff big dummy out" for some small file "dummy".  Then I dump the recursion depth in split() if it exceeds 1000.

For courgette64.exe: Depth got to ~9230 on offset 200,881,164 then crashed due to stack .
For courgette.exe: Depth got to 10746 on the same offset, then succeeded.

So the success of 32-bit is due to less stack usage per frame.  There's an algorithm issue.  Potentially, hitting worst-case behavior of 3-way quicksort??  Digging some more.

Comment 6 by hua...@chromium.org, Apr 22 2016

I think I figured out the (at least "a") problem: it's due to bad choice of pivot in quicksort.  So split() applies a 3-way quicksort on I[start:end] using V[I[i] + h] as the key.  The pivot chosen is the key for the middle element:

V[I[(start + end) / 2] + h]

If keys are arranged in a "mountain" shape, e.g.:
  [1, 2, 3, 4, 5, 5, 4, 3, 2, 1]
then pivot chooses 5.  The partition algorithm is stable for the "<" portion, so we get
  [1, 2 ,3, 4, 4, 3, 2, 1, 5, 5]
then recursion is applied to
  [1, 2, 3, 4, 4, 3, 2, 1].
So recursion depth would be size / 2, i.e., linear in the array size, and overflow ensues.

A number of ways to fix:
- Find better pivot selection: this is a pretty old problem!
- Use a different in-place sort; maybe heap sort?
- Modify this to use std::sort(); do V[] update outside of sort.

Comment 7 by wfh@chromium.org, Apr 22 2016

what about a max recursion depth, at which point it will revert back to std::sort.

Comment 8 by hua...@chromium.org, Apr 22 2016

Confirmed: Attaching the face of "Mount Doom" from data.  A small difference from the explanation above is that values are distinct; the summit looks like

  200874810
  200874812
  200874814
  200874816  <= Frodo
  200874815
  200874813
  200874811

The algorithm does quicksort manually because it updates the V[] array during the process. So yes I think std::sort() will be the cleanest solution, but we may as well do it for the entire thing. I hope we don't need a quick fix this weekend?
mount_doom.png
16.2 KB View Download

Comment 9 by mmoss@chromium.org, Apr 22 2016

I don't think the fix is super urgent. As I noted up top, the error only triggered (so far) in this one specific instance. Of course, it came at pretty much the worst time in the release cycle, but stuff happens.
Cc: chrisha@chromium.org sebmarchand@chromium.org
Dug more into this and found an simpler input case that would crash stack overflow in QSufSort: 'A' * 32767 + 'B' + 'A' * 32767 + 'C'.  Demo (Windows):

echo print("import sys"+chr(10)+"sys.stdout.write('A'*32767+'B'+'A'*32767+'C')") | python | python > big
echo print("import sys"+chr(10)+"sys.stdout.write('A'*4)") | python | python > dummy
courgette64.exe -genbsdiff big dummy out

The mechanism is rather complex. First, my 3-way partition optimization https://codereview.chromium.org/1288703002 affects this, but the problem exists also without it.  The new 3-way partition can alter order of elements, and leads to the "Mount Doom" shape.  But even if 3-way partition were stable, the problem still exists (though "Mount Doom" becomes double saw-tooth, i.e.: /|/|).

Explanation to follow.
QSufSort creates lexicographical ordered suffixes of input data.

Example 1: 'AAABAAAC' has 9 suffixes indexed by position ('$' is sentinel).  QSufSort generates:

0:AAABAAAC$       8:$
1:AABAAAC$        0:AAABAAAC$
2:ABAAAC$         4:AAAC$
3:BAAAC$          1:AABAAAC$
4:AAAC$       =>  5:AAC$
5:AAC$            2:ABAAAC$
6:AC$             6:AC$
7:C$              3:BAAAC$
8:$               7:C$

Outputs (representation as permutation):
I = [8, 0, 4, 1, 5, 2, 6, 3, 7]  (I[v] = index of the suffix with rank v)
V = [1, 3, 5, 7, 2, 4, 6, 8, 0]  (V[i] = rank of the suffix with index i) = inverse of I.

The the algorithm iterates on length-h prefixes of suffixes, for h = 1, 2, 4, 8, etc. Each iteration doubles sorted prefixes. When ambiguity prefixes exist, split() is called. So for h = 1, we'd sort ['0:A', '1:A', '2:A', '4:A', '5:A', '6:A']; and for h = 2, ['0:AA', '1:AA', '4:AA', '5:AA'].

Throughout I[] and V[] get updated with intermediate values, but we need not care about intermediate values.

For h = 2 with common prefix 'AA', we would have placed other suffixes, i.e., we'd known
V = [?, ?, 5, 7, ?, ?, 6, 8, 0]; and need to solve V[0], V[1], V[4], V[5].  The sort key used (algorithm detail) is:

key('0:AAABAAAC$') = V['2:ABAAAC$']  since 2 = 0 + h
key('1:AABAAAC$')  = V['3:BAAAC$']   since 3 = 1 + h
key('4:AAAC$')     = V['6:AC$']      since 6 = 4 + h
key('5:AAC$')      = V['7:C$']       since 7 = 5 + h

*Assuming these elements are in the original order*, the keys are V[2, 3, 6, 7] = [5, 7, 6, 8].


Example 2: 'AAAAAAABAAAAAAAC'.  The final sort is h = 4 on 'AAAA', with 8 candidates at indexes [0, 1, 2, 3, 8, 9, 10, 11].
The keys are V['4:AAABA...$', '5:AABA...$', '6:ABA...$', '7:BA...$', '12:AAAC$', '13:AAC$', '14:AC$', '15:C$'].  '$' and the 8 'AAAA...' are ranked above them.  By inspection, the keys are [9, 11, 13, 15, 10, 12, 14, 16].


The two keys seen both have the "double saw-tooth" pattern "/|/|".  Suppose we apply quicksort on these keys, using the middle element as pivot, and assuming partition is stable.  We'd have the same problem as "Mount Doom":

[9, 11, 13, 15, 10, 12, 14, 16] => pivot = 10, partition to {[9, 10], [11, 13, 15, 12, 14, 16]}, also /|/|.
[11, 13, 15, 12, 14, 16] => pivot = 12, partition to {[11, 12], [13, 15, 14, 16]}, also /|/|.
[13, 15, 14, 16] => pivot = 12, partition to {[13, 14], [15, 16]}, also /|/|.

So recursion depth is linear to stack size => bad.

The case we had uses 2^n = 65536.  In this case the final sort involves h = 16384 = length of list.  Recursion depth would be 8192, which is too much for courgette64.exe to handle on my machine.

We've assumed that partition is stable, but currently it's not.  This somehow changes the case to "Mount Doom", and is hard to analyze.  But the analysis here shows that making partition stable would not help!

Labels: Restrict-View-Google
This bug might allow user to crash internal programs running QSufSort or BSDiff?

Restricting view to be cautious.  Will check internal usages next week.
Cc: grt@chromium.org
- The original QSufSort by N. Jesper Larsson carefully chooses pivots, so likely doesn't have this problem:
http://www.larsson.dogma.net/qsufsort.c

- In BSDiff 4.3 (most recent), pivot selection is simplified to taking the element in middle, and the bug arises (reproed):
http://www.daemonology.net/bsdiff/

- The Python library bsdiffr 1.1.4 copies the .c source, and likely has the problem (didn't test):
https://pypi.python.org/pypi/bsdiff4/1.1.4

- The Go QSufSort implementation uses regular sort followed by updates, and likely avoids this problem (didn't test):
https://golang.org/src/index/suffixarray/qsufsort.go

To fix this I'll go with the Go approach of applying regular sort, followed by fixing the V[] array.
Cc: andrewhayden@chromium.org
+andrewhayden@: This is a bug in BSDiff.
Cc: hartma...@chromium.org
+hartmanng@, as an FYI (recently did some work porting bsdiff from C -> Java)
Implemented a prototype fix in Python (taking qsufsort() from courpy.py), since I'm on a train.

It's viable to rewrite split() using sort on I[start:end], using V[x + h] as key, for x in I[start:end].  The tricky part is to update I and V after!

The update loops over i for start < i <= end, look for consecutive keys (now sorted) to extract blocks, and size-1 blocks are marked as complete by assigning I[i] = -1. For each block with range [a, b), V[I[a]] = V[I[a + 1] = ... = V[I[b - 1]] = b - 1.  Problem: We write to V as we scan through V[I[i] + h] (the key, now sorted) for block boundaries -- collision may occur!

The Go version solves this by scanning for block boundaries, put into temp storage, then do a second pass.  We really want to avoid allocating more memories.

Solution: Precondition of split() is that V[I[i]] = end - 1 for all i in [start, end); and post condition is start <= V[I[i]] <= end - 1 for all i in [start, end), and these values are *never used* outside range.  So we can just write the naive code, but when we read V[I[i] + h], we "revert" to the old value by mapping anything in [start, end) to end - 1!

The algorithm change probably will also speed up BSDiff and Courgette-gen!  Should measure that.

Attached demo.  Will port to C++ and test on Courgette tomorrow.
test_qsufsort.py
4.0 KB View Download

Comment 17 by de...@chromium.org, Apr 26 2016

Hi there. With the risk of being supper annoying about this, may I suggest to take the Chrome OS (now also AOSP) implementation of bsdiff which replaced the larsson-sadakane qusufsort algorihtm with the libdivsufsort implementation which is faster and requires less RAM?

Latest bsdiff code is here: https://android.googlesource.com/platform/external/bsdiff

The CL(s) that replaced the suffix array construction algorithm is ~5 years old now and we happily use bsdiff in Chrome OS since then ( https://android.googlesource.com/platform/external/bsdiff/+/c2c68a78ac72e056cf23b7017206f8a530c0aa63 ).
Thanks for the suggestion!  The memory saving alone is definitely a big advantage.  In the short term (to fix this bug) I'd prefer with going with smaller change.  Meanwhile this is definitely worth experimenting on, and potentially adoption.  Added to list of Courgette ideas (go/courgette-ideas-2016).
Another issue: For QSufSort uses PagedArray to store results, presumably to overcome memory fragmentation by representing a large array as array of arrays.  When quicksort was done manually, operator[] wraps around this.  But to use std::sort() we'd need to implement PagedArray::iterator as RandomAccessIterator.  I'll look into this.

Tried with std::deque for this, but it's exceedingly slow.
Project Member

Comment 20 by bugdroid1@chromium.org, Apr 27 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a937d0438bab5dc41c6cfc571a4b16c0d68a6312

commit a937d0438bab5dc41c6cfc571a4b16c0d68a6312
Author: huangs <huangs@chromium.org>
Date: Wed Apr 27 18:32:22 2016

[Courgette] Fix sorting in QSufSort's split().

BSDiff implements Quicksort in its adaptation of QSufSort. To improve
robustness, we reverted the simplified pivot selection algorithm to
the original one due to Bentley & McIlroy used in QSufSort:

http://www.larsson.dogma.net/qsufsort.c

Also reverting split() selection sort threshold from 6 back to 16.

Resulting Courgette-gen and QSufSort has same performance as before
(preliminary: ~1% faster).

BUG= 605565 

Review-Url: https://codereview.chromium.org/1914303002
Cr-Commit-Position: refs/heads/master@{#390134}

[modify] https://crrev.com/a937d0438bab5dc41c6cfc571a4b16c0d68a6312/courgette/third_party/qsufsort.h

Status: Fixed (was: Assigned)
Went with changing the pivot selection algorithm, since the std::sort() method was more complex than originally thought, and made things slower.  Setting to fixed.
Labels: -Restrict-View-Google
Removing restriction label: BSDiff was known to have stack overflow issues:

http://stackoverflow.com/questions/12751775/why-does-bsdiff-exe-have-trouble-with-this-smaller-file/20305493
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=409664

Cc: thomasvl@chromium.org mark@chromium.org
Just found /src/chrome/installer/mac/third_party/bsdiff/goobsdiff.c , so CC mark@ and thomasvl@.

Maybe BSDiff, or suffix sorting should become a component?

Sign in to add a comment