New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 818235 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 818243
issue 801349
issue 818241



Sign in to add a comment

LLD should support reproducible builds

Project Member Reported by zturner@chromium.org, Mar 2 2018

Issue description

Tracking bug for LLD reproducible builds.
 
Cc: ruiu@google.com dxf@google.com
Blockedon: 818241
Blockedon: 818243
This is also currently blocked on https://bugs.llvm.org/show_bug.cgi?id=35914 - setting the link timestamp to something non-zero (non-constant) but deterministic. It clearly needs to be a hash of something and we need to figure out what.

Microsoft has started doing reproducible builds and this is discussed in this timestamp post:

https://blogs.msdn.microsoft.com/oldnewthing/20180103-00/?p=97705

However no deep details are shared. In particular, the switch used to enable reproducible builds (if any, maybe it's a post-build process) is not mentioned.

And, Raymond does an inaccurate summary of the importance of the timestamp. He says

"""The timestamp is really a unique ID that tells the loader, "The exports of this DLL have not changed since the last time anybody bound to it." And a hash is a reproducible unique ID."""

It is true that that is one use of the timestamp. Another critical use is for symbol servers where the timestamp is often the only thing that differentiates two entries.


One tempting strategy is to set the timestamp to a 32-bit hash of the binary, not counting its possibly-changeable debug records. This means that if you do a build (and add the PDB and EXE to a symbol server) and then add some comments and then do another build - generating the same code bytes but different debug records and PDB - (and add the PDB and EXE to a symbol server) then the first EXE will be overwritten in the symbol server. The question is, does this matter? There are some edge cases where it could be confusing but most people will never hit these, or will never notice them if they hit them, so ???

Blockedon: 801349
> One tempting strategy is to set the timestamp to a 32-bit hash of the binary, not counting its possibly-changeable debug records. This means that if you do a build (and add the PDB and EXE to a symbol server) and then add some comments and then do another build - generating the same code bytes but different debug records and PDB - (and add the PDB and EXE to a symbol server) then the first EXE will be overwritten in the symbol server. The question is, does this matter? There are some edge cases where it could be confusing but most people will never hit these, or will never notice them if they hit them, so ???

Can't we get around this by counting the debug records, instead of ignoring them as you suggest?  What's the disadvantage in doing that?
Hashing the debug records as well is an excellent solution as long as we still get reproducibility. I thought there were obstacles to including them but if not then make-it-so.

We can include them, it's just that what we are including right now is itself a source of non-reproducibility (see  bug 818241 ).  But we can certainly just include it for now, and once that bug is fixed, things will "just work".
Perfect! I can't think of any problems with that.
Status: Assigned (was: Untriaged)
This bug has an owner, thus, it's been triaged. Changing status to "assigned".
(Putting a hash of the binary in the timestamp broke win7, see  issue 843199 . This may get better if we do https://bugs.llvm.org/show_bug.cgi?id=38429 but I wouldn't bet on it)

Sign in to add a comment