As more projects are archiving large amount of data with low churn, keeping a local cache of the hashes becomes necessary performance wise. This enables the case where more than 100GB of data is archived repeatedly.
The mechanism is:
- "isolate archive" add a flag to specify the location to keep the hashing cache
- when specified, keep a lookup table for all files {inode:hash}
The lookup table key is not actually the "inode" but a combination of metadata from the filesystem via os.Stat(). The namespace (mainly to determine the hashing algorithm) must also be kept.
https://www.kernel.org/pub/software/scm/git/docs/technical/racy-git.txt is an good primer on the subject.
Comment 1 by sheriffbot@chromium.org
, Jan 10Status: Untriaged (was: Available)