Issue metadata
Sign in to add a comment
|
linker appears non-deterministic on incremental builds |
||||||||||||||||||||
Issue descriptionReproduction: run chromium "Linux Builder" builder on the same revision twice. Expected: same isolate hashes Actual: different ones However, actual files isolated are the same, but tar-ed groups of small files result in different tar file hashes, likely due to varying order of small files in tar archives. Good news is that this doesn't affect huge binaries, these aren't uploaded twice, so isolate server load is substantially worse. Bad news is that swarming can't dedup tasks, because ultimate merkle tree hash varies. In practice, this has biggest effect in needlessly increasing load on swarming test pool during go/migrate2luci of a try builder.
,
Jan 23 2018
John will not be happy about this for android test load.
,
Jan 23 2018
Maybe something to do with timestamps? I think we have to override them (--mtime) if we don't want tar storing them.
,
Jan 23 2018
Yeah, it's probably order. Should we crawl the file tree and sort the list before adding to the tar?
,
Jan 23 2018
I found the relevant code, here is the fix except tests are a lot harder to fix: https://chromium-review.googlesource.com/c/infra/luci/luci-go/+/881949 I'll first rework tests, and then rebase the fix above.
,
Jan 24 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/835197b7c3d4e6f3fb375d40522a3f80e7ebd174 commit 835197b7c3d4e6f3fb375d40522a3f80e7ebd174 Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Wed Jan 24 17:27:28 2018 [isolate] refactor tests of tar_archiver. Preparatory work for follow up fix to bug below. R=maruel@chromium.org Bug: 804869 Change-Id: I0bb7a40959460df74c1797aaef43428059ac20b8 Reviewed-on: https://chromium-review.googlesource.com/882125 Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> Reviewed-by: Erik Staab <estaab@chromium.org> [modify] https://crrev.com/835197b7c3d4e6f3fb375d40522a3f80e7ebd174/client/cmd/isolate/tar_archiver_test.go
,
Jan 30 2018
,
Jan 30 2018
,
Feb 1 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/2a0fccb696f11512d051636af077e3ed78b2a636 commit 2a0fccb696f11512d051636af077e3ed78b2a636 Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Thu Feb 01 00:44:00 2018 [isolate] Making tarring isolate deterministic. Bug: 804869 Change-Id: I724bfd7dabc23b460aaafb446bd786ebe3eb3460 Reviewed-on: https://chromium-review.googlesource.com/881949 Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Reviewed-by: Erik Staab <estaab@chromium.org> [modify] https://crrev.com/2a0fccb696f11512d051636af077e3ed78b2a636/client/cmd/isolate/tar_archiver.go [modify] https://crrev.com/2a0fccb696f11512d051636af077e3ed78b2a636/client/cmd/isolate/tar_archiver_test.go
,
Feb 2 2018
Proposed roll to chromium/src/ https://chromium-review.googlesource.com/#/c/chromium/src/+/898524
,
Feb 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2b41002ea64e29af9c50403dac86551940588532 commit 2b41002ea64e29af9c50403dac86551940588532 Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Fri Feb 02 02:27:08 2018 Update isolate binary. https://crrev.com/c/881949 Making tarring isolate deterministic. R=maruel Bug: 804869 Change-Id: I3d4582eb9aa55aa764778b48027081e1d106d310 Reviewed-on: https://chromium-review.googlesource.com/898524 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> Cr-Commit-Position: refs/heads/master@{#533925} [modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/README.md [modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/linux64/isolate.sha1 [modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/mac64/isolate.sha1 [modify] https://crrev.com/2b41002ea64e29af9c50403dac86551940588532/tools/luci-go/win64/isolate.exe.sha1
,
Feb 5 2018
Deployed and fixed.
,
Feb 8 2018
This appears to not be fixed, though I'm not yet sure due to bug in isolatego OR due to non-deterministic build. For example, on this CL https://chromium-review.googlesource.com/c/chromium/src/+/905929/2 linux_chromium_dbg_ng builder has 3 builds, each produing different isolated hashes: https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955223535610423264%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0 https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955219793337700240%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0 https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955215889715200144%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0
,
Feb 8 2018
,
Feb 8 2018
It appears files are actually different between runs. 2686c2686 < ./out/Debug/libGLESv2.so a22baf441d8e01f3de151fcd7eb264bd --- > ./out/Debug/libGLESv2.so 93d171f049ecfc34e484ab251215467f 2726c2726 < ./out/Debug/libcontent.so 19efcab15f131a1798394eeffcea65e0 --- > ./out/Debug/libcontent.so f58e139f9756c36a899d1e1349bdc670 2773c2773 < ./out/Debug/libgles2.so b63970d3812b267a466d6aa17bfaba49 --- > ./out/Debug/libgles2.so cdff8af5df1799a9221896d76dc547f0 2936c2936 < ./out/Debug/v8_context_snapshot.bin dab4593c4ae68f4a7d300c09e218d210 --- > ./out/Debug/v8_context_snapshot.bin a2e70d652287d75a21b88b9d96896600
,
Feb 8 2018
maruel@ is "Debug" build on linux supposed to be deterministic?
,
Feb 8 2018
Oh, I don't think so. +Yoshisato for confirmation.
,
Feb 8 2018
NVM about debug, the same applies to release bot: https://chromium-review.googlesource.com/c/chromium/src/+/867511/13 linux_chromium_rel_ng https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955226458052322176%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0 https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955223993082620688%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0 https://logs.chromium.org/v/?s=chromium%2Fbuildbucket%2Fcr-buildbucket.appspot.com%2F8955221570541907920%2F%2B%2Fsteps%2Fisolate_tests%2F0%2Flogs%2Fjson.output%2F0 gives for app_shell_unittests 3 diff hashes: b66c74e2f62a78569ca506f7db51b618e1894c09 44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb 98806ce774459ced7020545aa79800bc7adb01a3 Comparing resulting trees shows this diff: diff 44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb.files 98806ce774459ced7020545aa79800bc7adb01a3.files 2678c2678 < ./out/Release/app_shell_unittests ee12388757f6e40e0b8b7e48aba10850 --- > ./out/Release/app_shell_unittests 068c9d34a21ebc3bea4d2e20dd1c1970 2686c2686 < ./out/Release/libGLESv2.so 7bab9fcbedb563f91efa0b50ce284250 --- > ./out/Release/libGLESv2.so bcf3acd1b2dffdaa1e234d6a02ec83dd 2737c2737 < ./out/Release/v8_context_snapshot.bin a806462d53d1b5cea28d07db2a55ef49 --- > ./out/Release/v8_context_snapshot.bin 93b86f09944c78d34a4bd0350dfdb934
,
Feb 8 2018
Note that corresponding release builder which does two clobber builds on the same revision and compares resulting artifacts has been consistently green over last couple of days except for a few flukes unrelated to determinism. https://ci.chromium.org/buildbot/chromium.linux/Deterministic%20Linux/ I've talked to V8 person(jkummerow@) and he said that at least v8_context_snapshot.bin should be deterministic, and it's a bug if it's not.
,
Feb 8 2018
I've also compared with vbindiff the differing binaries and it not just a few bytes, but lots of bytes differ.
,
Feb 8 2018
Re: #16, #17 I think Linux debug build deterministic: https://uberchromegw.corp.google.com/i/chromium.fyi/builders/Linux%20deterministic%20%28dbg%29 If not, there should be regression.
,
Feb 8 2018
Questions:
(1) is this only on LUCI or on buildbot, too?
(2) is this only on incremental builds?
Based on deterministic CI (waterfall) builder, it certainly seems so.
,
Feb 8 2018
Ah, let me check configuration on build....I am considering unexpected config given to this builder.
,
Feb 8 2018
Re: #21 I understand my radar broken. GOMA_USE_LOCAL: false
,
Feb 8 2018
Re: #24 However, following also have the same config and I think there should not be any difference caused by remote compile and local compile because all compile goes to remote. https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng/28892
,
Feb 8 2018
Andrii, is it possible for you to show what is different? Or, is it possible for you to upload/send me two binaries that are expected to be the same but different?
,
Feb 8 2018
Something like this should do: #!/bin/bash set -e export D=<EDIT path/tmp/iso-repro> mkdir $D cd <INFRA_CEHCKOUT>/infra/luci/client export P=b66c74e2f62a78569ca506f7db51b618e1894c09 python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P export P=44fbfd8e5dd5cf0bad152ab7184a90f80e32ddfb python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P export P=98806ce774459ced7020545aa79800bc7adb01a3 python isolateserver.py download -I https://isolateserver.appspot.com --namespace default-gzip -s $P --target $D/$P cd $D find . -name "v8_context_snapshot.bin" find . -name "libGLESv2.so"
,
Feb 8 2018
Re: #22 I come up with several questions on this. (1) Can binaries change if hostname etc change? If it depends on a system's archive (.a) files, it could be? I do not think linux changes its output based on environment it runs compile but I am not sure. (2) How linker read object files? If it reads timestamp order, incremental build may change the result? However, it may cause parallel build to change the result, so maybe no?
,
Feb 8 2018
> (1) Can binaries change if hostname etc change? If it depends on a system's archive (.a) files, it could be? No idea. > (2) How linker read object files? If it reads timestamp order, incremental build may change the result? However, it may cause parallel build to change the result, so maybe no? Agree that no. Judging from deterministic CI builders, where all object files are re-created after a clobber by ninja executing lots of parallel jobs, hence I think linker is fine.
,
Feb 8 2018
Meanwhile, I'm re-running linux_chromium_rel_ng on buildbot with the same fixed revision: https://ci.chromium.org/buildbot/tryserver.chromium.linux/linux_chromium_rel_ng/642405 https://ci.chromium.org/buildbot/tryserver.chromium.linux/linux_chromium_rel_ng/642406 https://ci.chromium.org/buildbot/tryserver.chromium.linux/linux_chromium_rel_ng/642407
,
Feb 8 2018
Re: #27 & Re:#29 (2) I would like to listen to Rui's advice on this. As far as I see libGLESv2.so of two builds, order of strings in .rodata ELF section seems to be different. Note that size and offset of sections seems to be the same, I suppose a linker changed the order between two builds.
,
Feb 8 2018
Re: #31 Not only .rodata but also .rela.dyn and .text are different.
,
Feb 8 2018
Re #30: 3 buildbot builds also produced different hashes, so the problem seems to be general to incremental builds.
,
Feb 12 2018
changing title to better represent the problem.
,
Feb 12 2018
Discussed with ruiu@ in IM: much more likely to be a bug with different object files, than a linker. I guess I should have done this before: get a snapshot of all files in Release directory.
,
Feb 13 2018
In go/deterministic, it uplaods the difference but it only compares objects in the same machine. In this case, I guess we need to upload all object files, dll and executables that could be different.
,
Feb 28 2018
,
Mar 6 2018
,
Jun 15 2018
,
Jun 21 2018
,
Aug 3
Added Takuto in case he is interested in.
,
Aug 3
Oops, unexpected removal of Cc X(
,
Aug 3
,
Aug 3
,
Aug 28
Assigning to erikchen@ since what he is fixing deterministic clobber builds, which is a necessary requirement for making deterministic incremental builds. |
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by tandrii@chromium.org
, Jan 23 2018