LLD should support reproducible builds |
|||||
Issue descriptionTracking bug for LLD reproducible builds.
,
Mar 2 2018
,
Mar 2 2018
,
Mar 2 2018
This is also currently blocked on https://bugs.llvm.org/show_bug.cgi?id=35914 - setting the link timestamp to something non-zero (non-constant) but deterministic. It clearly needs to be a hash of something and we need to figure out what. Microsoft has started doing reproducible builds and this is discussed in this timestamp post: https://blogs.msdn.microsoft.com/oldnewthing/20180103-00/?p=97705 However no deep details are shared. In particular, the switch used to enable reproducible builds (if any, maybe it's a post-build process) is not mentioned. And, Raymond does an inaccurate summary of the importance of the timestamp. He says """The timestamp is really a unique ID that tells the loader, "The exports of this DLL have not changed since the last time anybody bound to it." And a hash is a reproducible unique ID.""" It is true that that is one use of the timestamp. Another critical use is for symbol servers where the timestamp is often the only thing that differentiates two entries. One tempting strategy is to set the timestamp to a 32-bit hash of the binary, not counting its possibly-changeable debug records. This means that if you do a build (and add the PDB and EXE to a symbol server) and then add some comments and then do another build - generating the same code bytes but different debug records and PDB - (and add the PDB and EXE to a symbol server) then the first EXE will be overwritten in the symbol server. The question is, does this matter? There are some edge cases where it could be confusing but most people will never hit these, or will never notice them if they hit them, so ???
,
Mar 2 2018
,
Mar 2 2018
> One tempting strategy is to set the timestamp to a 32-bit hash of the binary, not counting its possibly-changeable debug records. This means that if you do a build (and add the PDB and EXE to a symbol server) and then add some comments and then do another build - generating the same code bytes but different debug records and PDB - (and add the PDB and EXE to a symbol server) then the first EXE will be overwritten in the symbol server. The question is, does this matter? There are some edge cases where it could be confusing but most people will never hit these, or will never notice them if they hit them, so ??? Can't we get around this by counting the debug records, instead of ignoring them as you suggest? What's the disadvantage in doing that?
,
Mar 2 2018
Hashing the debug records as well is an excellent solution as long as we still get reproducibility. I thought there were obstacles to including them but if not then make-it-so.
,
Mar 2 2018
We can include them, it's just that what we are including right now is itself a source of non-reproducibility (see bug 818241 ). But we can certainly just include it for now, and once that bug is fixed, things will "just work".
,
Mar 2 2018
Perfect! I can't think of any problems with that.
,
Aug 3
This bug has an owner, thus, it's been triaged. Changing status to "assigned".
,
Aug 22
(Putting a hash of the binary in the timestamp broke win7, see issue 843199 . This may get better if we do https://bugs.llvm.org/show_bug.cgi?id=38429 but I wouldn't bet on it) |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by zturner@chromium.org
, Mar 2 2018