ClangToTWinCFI has never been green |
|||||
Issue descriptionhttps://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FClangToTWinCFI%2F62%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout FAILED: pepper_hash_for_uma.exe pepper_hash_for_uma.exe.pdb C:/b/depot_tools/win_tools-2_7_6_bin/python/bin/python.exe ../../build/toolchain/win/tool_wrapper.py link-wrapper environment.x86 False ../../third_party/llvm-build/Release+Asserts/bin/lld-link.exe /nologo /OUT:./pepper_hash_for_uma.exe /PDB:./pepper_hash_for_uma.exe.pdb @./pepper_hash_for_uma.exe.rsp Warning: request a ThreadPool with 8 threads, but LLVM_ENABLE_THREADS has been turned off LLVM ERROR: Failed to open cache file thinlto-cache\llvmcache-78E75E6C3EB0BC7B5E65B982AD80974205C86931: permission denied Two issues: 1. can't open cache file 2. lld warnings aren't fatal (also "symbol scopes are not balanced") -- they should be
,
Oct 3 2017
I was going to look at the "failed to open cache file" today.
,
Oct 3 2017
There's a third issue, "flatc.exe failed with exit code -1073741819". e.g. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FClangToTWinCFI%2F62%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout (near the end)
,
Oct 3 2017
That one sounds similar to https://bugs.chromium.org/p/chromium/issues/detail?id=707586#c6
,
Oct 5 2017
https://reviews.llvm.org/D38570 seems to fix "failed to open cache file" locally.
,
Oct 6 2017
D38570 landed in r315079. I next ran into some duplicate symbol errors which I fixed in r315026. Now I'm looking at this (linking mini_installer.exe): ..\..\..\..\llvm-project\ra\bin\lld-link.exe: warning: lto.tmp: undefined symbol: memset ..\..\..\..\llvm-project\ra\bin\lld-link.exe: warning: lto.tmp: undefined symbol: memset error: link failed (side note: why are these warnings and not errors?)
,
Oct 6 2017
> (side note: why are these warnings and not errors?) I believe that's an artifact of the older implementation of error(). IIRC, error() in COFF lld exited immediately. Here, we want to find as many undefined symbols as we can, so I chose to use warn() instead of error(). Now error() does not exit immediately, so it doesn't make sense to keep them as warnings. I'll change that.
,
Oct 6 2017
If you change the undefined symbol message from warning to error, lld will exit after the error limit is exceeded. The behavior we have now is that we report all the undefined symbols (instead of up to error limit), and then fatal. I think the behavior we have is better.
,
Oct 6 2017
But the whole point of having an ErrorLimit is to limit the total amount of errors emitted in a single run of the linker, no?
,
Oct 6 2017
> But the whole point of having an ErrorLimit is to limit the total amount of errors emitted in a single run of the linker, no? Sure, but do you see every undefined symbol as an individual error, or "there were undefined symbols" as a single error? We currently do the latter, and I am happy with it.
,
Oct 6 2017
I don't know about that. I personally prefer making it an error() because I don't want lld to wipe out everything that was previously displayed on the terminal by displaying a lot of error messages.
,
Oct 6 2017
Ok. I accepted the warn->error change.
,
Oct 7 2017
https://chromium-review.googlesource.com/#/c/chromium/src/+/706375 fixes the mini_installer.exe link failure. With that, "ninja all" succeeds! Now to deal with the test failures.
,
Oct 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/444365dd67fc5027e8e0e5b4e7a5e6a56b8176b6 commit 444365dd67fc5027e8e0e5b4e7a5e6a56b8176b6 Author: Peter Collingbourne <pcc@chromium.org> Date: Sat Oct 07 16:46:13 2017 Mark mini_installer's implementation of memset as used. Marking memset as used is necessary in order to link with LLVM link-time optimization (LTO). It prevents LTO from discarding the memset symbol, allowing for compiler-generated references to memset to be satisfied. Bug: 771194 Change-Id: I5d8c7b94772e0ee002ec4373364bfe920092f5f4 Reviewed-on: https://chromium-review.googlesource.com/706375 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Peter Collingbourne <pcc@chromium.org> Cr-Commit-Position: refs/heads/master@{#507296} [modify] https://crrev.com/444365dd67fc5027e8e0e5b4e7a5e6a56b8176b6/chrome/installer/mini_installer/mini_installer_exe_main.cc
,
Oct 9 2017
https://build.chromium.org/p/chromium.fyi/builders/ClangToTWinCFI is consistently failing with the flatc error. The bug from c#4 looks like an intermittent error, so probably not related. https://build.chromium.org/p/chromium.fyi/builders/ClangToTWinCFI64 is failing in an unusual way. Here's one recent example: https://build.chromium.org/p/chromium.fyi/builders/ClangToTWinCFI64/builds/67 It throws an exception in "steps". If I try to examine the compile log, it never finishes loading (the message at the end eventually changes to "Loading streams (has the build crashed?)..."). If I look at the log for "steps", I see this: LogDog Butler: I/O Keep-alive. LogDog Butler: I/O Keep-alive. LogDog Butler: I/O Keep-alive. LogDog Butler: I/O Keep-alive. LogDog Butler: I/O Keep-alive. LogDog Butler: I/O Keep-alive. command timed out: 2400 seconds without output, attempting to kill So I assume that one of the link jobs is being slow, and I'll need to figure out why. dpranke@: does that sound correct?
,
Oct 9 2017
Yes, that sounds correct.
,
Oct 9 2017
It seems that at least one problem is that lld seems to spend several minutes on each invocation doing nothing but pruning the ThinLTO cache directory. I tried rebuilding a small dll locally and it spent more than 10 minutes pruning a cache directory that was the result of running "ninja all". I'd imagine that the situation would be worse if multiple link processes are trying to prune the same cache directory.
,
Oct 9 2017
Is that because we have too many files in one directory? Or, is that just because it deletes a lot of files?
,
Oct 9 2017
It could be caused by too many files in one directory (of course I don't have a lot of visibility into how NTFS works, but I wouldn't be surprised to find that it deals with large directories inefficiently), but it could also be the result of inefficient use of the file system.
The ThinLTO pruner works by first doing an initial pass of the cache directory that stats every file in the cache to figure out whether to delete it (using the file size and last access time). It then does a second pass where it actually deletes files according to the results it obtained during the first pass. According to process monitor, the first pass takes just under 4 minutes, and we do end up issuing a lot of redundant syscalls in that time. Specifically, the stat on each file to figure out the size and last access time is unnecessary because FindFirstFile/FindNextFile already returns that information in a data structure.
There's also the issue that removing the files uses more system calls than necessary. Our implementation of sys::fs::remove first stats the file to determine whether it is a file or a directory, and then calls either RemoveDirectoryW or DeleteFileW based on the result. I reckon that we can eliminate the stat call by first trying DeleteFileW, and then if that doesn't work try RemoveDirectoryW.
I'm going to see what kind of speedup I can get by removing some of these stat calls. The other thing that I want to try is to change the cache layout on the file system to avoid having so many files in each directory. Basically instead of
cache-dir/
llvmcache-ABC123
llvmcache-ABC456
llvmcache-DEF123
we would have
cache-dir/
AB/
llvmcache-C123
llvmcache-C456
DE/
llvmcache-F123
,
Oct 9 2017
As far as I know, stat data is stored with the directory inode, not the file inode, in NTFS. So if you do something like
for f in files:
stat(f)
and then stat() on Windows calls the Windows stat() function (which basically goes to dirname(f), gets stat info for all files, and then returns one file), this is very inefficient.
In Ninja, I added this stat cache to mitigate against this: https://github.com/ninja-build/ninja/blob/master/src/disk_interface.cc#L174
Here's a short writeup: https://github.com/ninja-build/ninja/pull/779
,
Oct 10 2017
It appears that my change to eliminate the stat call during directory enumeration (I also switched from FindFirstFile FindFirstFileEx, and enabled a couple of performance options that are exposed by that function) as well as during sys::fs::remove improved things significantly -- lld now enumerates a cache directory containing 560k files in 1.7 seconds, and deletes 2.7k of those files in 4.5 seconds. Total link time for my small dll is reduced to 9.5 seconds. So cache pruning still dominates execution, but much less dramatically.
,
Oct 10 2017
My stat elimination patches have landed, they were https://reviews.llvm.org/D38715 (r315351) and https://reviews.llvm.org/D38716 (r315378). flatc turned out to be https://bugs.llvm.org/show_bug.cgi?id=34306 and https://reviews.llvm.org/D38769 is the fix.
,
Oct 11 2017
Fix for flatc issue landed in r315400.
,
Feb 9 2018
The first green build was https://ci.chromium.org/buildbot/chromium.clang/ToTWinCFI/620 |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by h...@chromium.org
, Oct 3 2017