New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 804869 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug-Regression

Blocking:
issue 869348



Sign in to add a comment

linker appears non-deterministic on incremental builds

Project Member Reported by tandrii@chromium.org, Jan 23 2018

Issue description

Reproduction:
 run chromium "Linux Builder" builder on the same revision twice.
Expected: same isolate hashes
Actual: different ones

However, actual files isolated are the same, but tar-ed groups of small files result in different tar file hashes, likely due to varying order of small files in tar archives.

Good news is that this doesn't affect huge binaries, these aren't uploaded twice, so isolate server load is substantially worse. Bad news is that swarming can't dedup tasks, because ultimate merkle tree hash varies.

In practice, this has biggest effect in needlessly increasing load on swarming test pool during go/migrate2luci of a try builder.
 
To be clear: this isn't blocking luci migration, at least we can use lower % in luci-migration app to ensure lower impact on swarming test pool.

Comment 2 by mar...@chromium.org, Jan 23 2018

Cc: jbudorick@chromium.org
Labels: -Pri-2 Pri-1
John will not be happy about this for android test load.

Comment 3 by estaab@chromium.org, Jan 23 2018

Maybe something to do with timestamps? I think we have to override them (--mtime) if we don't want tar storing them.

Comment 4 by estaab@chromium.org, Jan 23 2018

Yeah, it's probably order. Should we crawl the file tree and sort the list before adding to the tar?
I found the relevant code, here is the fix except tests are a lot harder to fix: https://chromium-review.googlesource.com/c/infra/luci/luci-go/+/881949

I'll first rework tests, and then rebase the fix above.
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 24 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/835197b7c3d4e6f3fb375d40522a3f80e7ebd174

commit 835197b7c3d4e6f3fb375d40522a3f80e7ebd174
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Wed Jan 24 17:27:28 2018

[isolate] refactor tests of tar_archiver.

Preparatory work for follow up fix to bug below.

R=maruel@chromium.org

Bug: 804869
Change-Id: I0bb7a40959460df74c1797aaef43428059ac20b8
Reviewed-on: https://chromium-review.googlesource.com/882125
Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org>
Reviewed-by: Erik Staab <estaab@chromium.org>

[modify] https://crrev.com/835197b7c3d4e6f3fb375d40522a3f80e7ebd174/client/cmd/isolate/tar_archiver_test.go

Comment 7 by efoo@chromium.org, Jan 30 2018

Labels: LUCI-Backlog

Comment 8 by efoo@chromium.org, Jan 30 2018

Labels: -LUCI-M0-Backlog
Project Member

Comment 9 by bugdroid1@chromium.org, Feb 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/2a0fccb696f11512d051636af077e3ed78b2a636

commit 2a0fccb696f11512d051636af077e3ed78b2a636
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Thu Feb 01 00:44:00 2018

[isolate] Making tarring isolate deterministic.

Bug: 804869
Change-Id: I724bfd7dabc23b460aaafb446bd786ebe3eb3460
Reviewed-on: https://chromium-review.googlesource.com/881949
Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Erik Staab <estaab@chromium.org>

[modify] https://crrev.com/2a0fccb696f11512d051636af077e3ed78b2a636/client/cmd/isolate/tar_archiver.go
[modify] https://crrev.com/2a0fccb696f11512d051636af077e3ed78b2a636/client/cmd/isolate/tar_archiver_test.go

Owner: tandrii@chromium.org
Status: Started (was: Available)
Proposed roll to chromium/src/
https://chromium-review.googlesource.com/#/c/chromium/src/+/898524
Project Member

Comment 11 by bugdroid1@chromium.org, Feb 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2b41002ea64e29af9c50403dac86551940588532

commit 2b41002ea64e29af9c50403dac86551940588532
Author: Andrii Shyshkalov <tandrii@chromium.org>
Date: Fri Feb 02 02:27:08 2018

Update isolate binary.

https://crrev.com/c/881949 Making tarring isolate deterministic.

R=maruel

Bug: 804869
Change-Id: I3d4582eb9aa55aa764778b48027081e1d106d310
Reviewed-on: https://chromium-review.googlesource.com/898524
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#533925}
[modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/README.md
[modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/linux64/isolate.sha1
[modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/mac64/isolate.sha1
[modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/win64/isolate.exe.sha1

Status: Fixed (was: Started)
Deployed and fixed.
Status: Started (was: Fixed)
It appears files are actually different between runs.

2686c2686
< ./out/Debug/libGLESv2.so a22baf441d8e01f3de151fcd7eb264bd
---
> ./out/Debug/libGLESv2.so 93d171f049ecfc34e484ab251215467f
2726c2726
< ./out/Debug/libcontent.so 19efcab15f131a1798394eeffcea65e0
---
> ./out/Debug/libcontent.so f58e139f9756c36a899d1e1349bdc670
2773c2773
< ./out/Debug/libgles2.so b63970d3812b267a466d6aa17bfaba49
---
> ./out/Debug/libgles2.so cdff8af5df1799a9221896d76dc547f0
2936c2936
< ./out/Debug/v8_context_snapshot.bin dab4593c4ae68f4a7d300c09e218d210
---
> ./out/Debug/v8_context_snapshot.bin a2e70d652287d75a21b88b9d96896600
maruel@ is "Debug" build on linux supposed to be deterministic?
Cc: yyanagisawa@chromium.org
Oh, I don't think so. +Yoshisato for confirmation.
NVM about debug, the same applies to release bot:

https://chromium-review.googlesource.com/c/chromium/src/+/867511/13
linux_chromium_rel_ng
https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955226458052322176%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0
https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955223993082620688%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0
https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955221570541907920%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0

gives for app_shell_unittests 3 diff hashes:
b66c74e2f62a78569ca506f7db51b618e1894c09
44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb
98806ce774459ced7020545aa79800bc7adb01a3


Comparing resulting trees shows this diff:

diff 44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb.files 98806ce774459ced7020545aa79800bc7adb01a3.files
2678c2678
< ./out/Release/app_shell_unittests ee12388757f6e40e0b8b7e48aba10850
---
> ./out/Release/app_shell_unittests 068c9d34a21ebc3bea4d2e20dd1c1970
2686c2686
< ./out/Release/libGLESv2.so 7bab9fcbedb563f91efa0b50ce284250
---
> ./out/Release/libGLESv2.so bcf3acd1b2dffdaa1e234d6a02ec83dd
2737c2737
< ./out/Release/v8_context_snapshot.bin a806462d53d1b5cea28d07db2a55ef49
---
> ./out/Release/v8_context_snapshot.bin 93b86f09944c78d34a4bd0350dfdb934

Note that corresponding release builder which does two clobber builds on the same revision and compares resulting artifacts has been consistently green over last couple of days except for a few flukes unrelated to determinism.
https://ci.chromium.org/buildbot/chromium.linux/Deterministic%20Linux/

I've talked to V8 person(jkummerow@) and he said that at least v8_context_snapshot.bin should be deterministic, and it's a bug if it's not.
I've also compared with vbindiff the differing binaries and it not just a few bytes, but lots of bytes differ.
Re: #16, #17
I think Linux debug build deterministic: 
https://uberchromegw.corp.google.com/i/chromium.fyi/builders/Linux%20deterministic%20%28dbg%29

If not, there should be regression.
Questions:
(1) is this only on LUCI or on buildbot, too?
(2) is this only on incremental builds?
    Based on deterministic CI (waterfall) builder, it certainly seems so.
Ah, let me check configuration on build....I am considering unexpected config given to this builder.
Re: #21
I understand my radar broken.
GOMA_USE_LOCAL: false
Re: #24
However, following also have the same config and I think there should not be any difference caused by remote compile and local compile because all compile goes to remote.
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng/28892
Andrii, is it possible for you to show what is different?
Or, is it possible for you to upload/send me two binaries that are expected to be the same but different?
Something like this should do:

#!/bin/bash
set -e
export D=<EDIT path/tmp/iso-repro>
mkdir $D
cd <INFRA_CEHCKOUT>/infra/luci/client

export P=b66c74e2f62a78569ca506f7db51b618e1894c09
python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P

export P=44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb
python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P

export P=98806ce774459ced7020545aa79800bc7adb01a3
python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P

cd $D
find . -name "v8_context_snapshot.bin"
find . -name "libGLESv2.so"
Re: #22

I come up with several questions on this.
(1)
Can binaries change if hostname etc change?  If it depends on a system's archive (.a) files, it could be?
I do not think linux changes its output based on environment it runs compile but I am not sure.

(2)
How linker read object files?  If it reads timestamp order, incremental build may change the result?  However, it may cause parallel build to change the result, so maybe no?
> (1) Can binaries change if hostname etc change?  If it depends on a system's archive (.a) files, it could be?

No idea.


> (2) How linker read object files?  If it reads timestamp order, incremental build may change the result?  However, it may cause parallel build to change the result, so maybe no?

Agree that no. Judging from deterministic CI builders, where all object files are re-created after a clobber by ninja executing lots of parallel jobs, hence I think linker is fine.
Cc: ruiu@google.com
Re: #27 & Re:#29 (2)

I would like to listen to Rui's advice on this.
As far as I see libGLESv2.so of two builds, order of strings in .rodata ELF section seems to be different.  Note that size and offset of sections seems to be the same, I suppose a linker changed the order between two builds.
Re: #31
Not only .rodata but also .rela.dyn and .text are different.
Re #30: 3 buildbot builds also produced different hashes, so the problem seems to be general to incremental builds.
Summary: linker appears non-deterministic on incremental builds (was: isolate go batcharchive is non-deterministic wrt to files being isolated)
changing title to better represent the problem.
Discussed with ruiu@ in IM: much more likely to be a bug with different object files, than a linker.

I guess I should have done this before: get a snapshot of all files in Release directory.
In go/deterministic, it uplaods the difference but it only compares objects in the same machine.  In this case, I guess we need to upload all object files, dll and executables that could be different.

Comment 37 by efoo@chromium.org, Feb 28 2018

Labels: -LUCI-Backlog LUCI-Chromium-CQSets LUCI-Blocker-Chromium-CQSets

Comment 38 by efoo@chromium.org, Mar 6 2018

Labels: -LUCI-Blocker-Chromium-CQSets
Cc: tandrii@chromium.org
Owner: ----
Status: Available (was: Started)
Components: -Infra>Platform>Swarming Infra>Platform>Swarming>Admin
Cc: -no...@chromium.org tikuta@chromium.org
Added Takuto in case he is interested in.
Cc: -yyanagisawa@chromium.org no...@chromium.org
Oops, unexpected removal of Cc X(
Cc: yyanagisawa@chromium.org
Blocking: 869348
Labels: -Restrict-View-Google
(doesn't look like this needs to be private)
Cc: erikc...@chromium.org
Owner: erikc...@chromium.org
Status: Assigned (was: Available)
Assigning to erikchen@ since what he is fixing deterministic clobber builds, which is a necessary requirement for making deterministic incremental builds.

Sign in to add a comment