Build with /DEBUG:GHASH on Windows |
||||
Issue descriptionFor faster linking. See http://blog.llvm.org/2018/01/improving-link-time-on-windows-with.html tikuta already hooked up a use_ghash gn flag in https://chromium-review.googlesource.com/c/chromium/src/+/1004335/ I don't really know what the possible downsides are. Can we just turn it on?
,
Nov 13
There was talk, but I think it needs to be motivated by more data about the compile time overhead. Here's how I was planning to measure it: 1. Sum the size of all .debug$H sections in every .obj in a chromium build, calculate percentage obj size overhead to estimate extra goma network traffic cost 2. Identify obj with maximum .debug$H size for worst case profiling, rebuild, measure how much time is spent in SHA1 hashing during the compile vs. traditional DenseMap hashing Another question is, does SHA1 have a high constant time setup cost for small records? If so, small records (think pointer types and 'const' qualified types) will dominate the cost of hashing. If not, long records (LF_FIELDLIST) will dominate the cost of hashing. We'll need to know that before deciding on a better hash.
,
Nov 16
I did some debug+component ghash builds of chrome today, and measured three things: link times, object file size vs. .debug$H size, and compile time. The summary is that links get 32% faster, but object files grow by 7%. There is no compile time impact. Compile time impact -------------------- There's no compile time impact because we the global type table builder, which calculates the SHA1 hashes, regardless of whether we plan to emit them. -emit-codeview-ghash-section just causes us to emit them in .debug$H. I profiled the compile of the object file with the largest ghash section (i.e. the one with the most type records), and 49 samples of 13K samples were in SHA1::body. I don't think we're being wasteful by computing these hashes, so it's not like we can rip out ghash and win back some extra compile time. In general compile time is dominated by the frontend. Link times ---------- I chose these five DLLs because they have the largest PDBs: blink_platform.dll blink_modules.dll blink_core.dll content.dll chrome.dll I should probably also link browser_tests, since that's an important incremental case. I built all DLLs, deleted each one, rebuild with `ninja -v -d keeprsp -n`, extracted the individual link lines, and re-ran them in a loop to measure. For each DLL, I did an initial /DEBUG link, and thew away the result to warm up the disk cache. Then I timed three links with /DEBUG and three with /DEBUG:GHASH. I averaged the results for each DLL, calcluated the speedup per DLL, and then averaged across all DLLs to get 32%. Each DLL had surprisingly different results, which I think reflects the relative amount of duplicate type information. The data, units in seconds except the speedup: /DEBUG /DEBUG:GHASH speedup % blink_platform.dll 6.69 5.65 18.43% blink_modules.dll 15.28 10.33 47.91% blink_core.dll__ 29.76 19.05 56.22% content.dll______ 20.38 17.49 16.57% chrome.dll_________ 37.10 30.57 21.35% One thing to note about my methodology is that I was using the same object files. LLD might potentially get *faster* if the .debug$H section isn't present in the input. It might be worth remeasuring for that reason. Another random statistic I gathered while doing this is that LLD uses 5.3GB of "private bytes" (I'm not sure what that means) while linking chrome.dll. I think a lot of this is allocated while doing PDB writing, and we should focus on bringing that down if we can. I already have one patch in flight to help: https://reviews.llvm.org/D54554 I was profiling with Chromium's current pinned clang, so it didn't have that patch. Object file size impact ------------------------ For this, I looked at every file ending in .obj in the chrome build directory, and used a python script (based on ml.py!) to extract the sizes of the various .debug sections. I deleted every .obj before rebuilding 'chrome' to avoid getting stale objects. I broke it down by .debug$S, .debug$T, and .debug$H. hash avg % 7.01% type avg % 47.95% sym avg % 16.99% debug size 71.96% total hash MB 2,106.53 total obj MB 30,039.45 So, we're adding 2.1GB of object file data, which may matter to goma, but we're already paying 72% to enable debug info. I think that as we pursue further build optimizations like C++ modules, we'll find a way to "home" more type information to module object files like we do for PCH files, and that will reduce the overall amount of .debug$T and .debug$H data in the objects. Conclusion ------------ I think we should ship it soon. :) It's worth going back and measuring a couple of other build configurations, like release+static and release+dll, and browser_tests.exe. We should also compare link times between distinct build directories, i.e. when using object files with no .debug$H section. That'll reduce the speedup, so it's important. We also need to know how much the 7% extra network traffic matters to goma. If the build is dominated by sending object files over the wire, then the 7% object file size matters more. I also want to wait until after I land the change to rename the command line flag to "-gcodeview-ghash" from "-mllvm -emit-codeview-ghash-section" (https://reviews.llvm.org/D54370) and roll that in.
,
Nov 17
From some (old) comments of mine on crbug.com/687728 . "I found that lld used about 18% less commit than link.exe, peaking at 6.7 GB for unit_tests.exe. Most links used less than 5 GB of commit." Which is to say that I don't think that the private bytes used by lld-link.exe are a concern. Back in the VS days I would sometimes see 20+ GB of private bytes for PGO and LTCG builds and on a 64-GB machine that made it very difficult to get the right amount of link parallelism. At 5 GB per link it is easy. Therefore, while reducing memory would have some modest advantages I would not think it should be a priority. I don't think that the 2.1 GB of object data will matter much to goma. It does mean we will put more pressure on disk space and disk cache but it's a small enough percentage that it is probably still worthwhile.
,
Nov 21
,
Dec 21
There's a CL for this in the CQ at https://chromium-review.googlesource.com/c/chromium/src/+/1388554 Chatting with rnk, we realized that this has only an effect with symbol_level=2 and that none of the bots we could find use that. So this should help devs building with full debug info, but it won't help bots. We also realized that win/asan builds build with -gline-tables-only, but we don't pass the two ghash lines there: https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?type=cs&q="is_clang+%26%26+using_sanitizer"&sq=package:chromium&g=0&l=2338 We should probably add them there too, then at least the win/asan bot would test this change and benefit from it.
,
Dec 21
Does -gline-tabls-only -gcodeview-ghash do the right thing? (I.e. only type hashes for types needed for line tables?)
,
Dec 21
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4dcb10c9b199341e5c687d98eb9d91795d212d33 commit 4dcb10c9b199341e5c687d98eb9d91795d212d33 Author: Nico Weber <thakis@chromium.org> Date: Fri Dec 21 20:31:43 2018 win: Enable ghash by default. This speeds up linking by 20%-50% (average 32%), but in return makes obj files 7% larger. Compilation speed is not affected. Also use the new driver-level -gcodeview-ghash flag instead of an (equivalent, but unofficial) -mllvm switch. This only affects symbol_level=2 builds. Bug: 904324 Change-Id: Iad4d62820b75a65df572104f9042a86cfcdbec96 Reviewed-on: https://chromium-review.googlesource.com/c/1388554 Commit-Queue: Nico Weber <thakis@chromium.org> Reviewed-by: Reid Kleckner <rnk@chromium.org> Cr-Commit-Position: refs/heads/master@{#618584} [modify] https://crrev.com/4dcb10c9b199341e5c687d98eb9d91795d212d33/build/config/compiler/BUILD.gn
,
Dec 21
> Does -gline-tabls-only -gcodeview-ghash do the right thing? (I.e. only type hashes for types needed for line tables?) I'm pretty sure it does, but I doubt there will be a measurable win, since most records would be short.
,
Dec 23
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4b5096c89d65a6a86b9797a7252c3ff3f338187c commit 4b5096c89d65a6a86b9797a7252c3ff3f338187c Author: Nico Weber <thakis@chromium.org> Date: Sun Dec 23 15:39:25 2018 Revert "win: Enable ghash by default." This reverts commit 4dcb10c9b199341e5c687d98eb9d91795d212d33. Reason for revert: Speculative, might have caused https://crbug.com/917637 Original change's description: > win: Enable ghash by default. > > This speeds up linking by 20%-50% (average 32%), but in return makes > obj files 7% larger. Compilation speed is not affected. > > Also use the new driver-level -gcodeview-ghash flag instead of an > (equivalent, but unofficial) -mllvm switch. > > This only affects symbol_level=2 builds. > > Bug: 904324 > Change-Id: Iad4d62820b75a65df572104f9042a86cfcdbec96 > Reviewed-on: https://chromium-review.googlesource.com/c/1388554 > Commit-Queue: Nico Weber <thakis@chromium.org> > Reviewed-by: Reid Kleckner <rnk@chromium.org> > Cr-Commit-Position: refs/heads/master@{#618584} TBR=thakis@chromium.org,rnk@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Bug: 904324 Change-Id: Id09357d2cc11d613387497c1580db1e1f87324de Reviewed-on: https://chromium-review.googlesource.com/c/1390106 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#618793} [modify] https://crrev.com/4b5096c89d65a6a86b9797a7252c3ff3f338187c/build/config/compiler/BUILD.gn
,
Dec 26
The official builder uploads symbols to the symbol server like this: [00:42:23] Uploading C:\b\c\b\win64_clang\src\out\Release_x64\chrome.dll C:\b\c\b\win64_clang\src\third_party\breakpad\symupload.exe --timeout 0 C:\b\c\b\win64_clang\src\out\Release_x64\chrome.dll http://clients2.google.com/cr/symbol Failed, 30 before retrying upload. ... never succeeds after 3 retries I guess to deploy this we have to dig into breakpad's symupload.exe and see what it doesn't like about our PDBs.
,
Dec 26
I tried running it locally, but it can't find msdia*.dll: $ ../../third_party/breakpad/symupload.exe chrome_child.dll --timeout 0 http://clients2.google.com/cr/symbol CoCreateInstance CLSID_DiaSource {3BFCEA48-620F-4B6B-81F7-B9AF75454C7D} failed (msdia*.dll unregistered?) Could not get symbol data from chrome_child.dll I tried registering both msdia120.dll and msdia140.dll from an admin command prompt, and it succeeded, but symupload.exe produces the same output. I get the impression from the last change to the file that I need to use the VS 2013 toolset version of DIA to make this work: https://chromium.googlesource.com/chromium/src/+/fe1ea4fe20c985b6996585dfd5a9d707f8c22dba Anybody have any ideas for how to run and debug this tool? Based on the source, if there's something wrong with the PDB, I would've expected it to bail out with an error here: https://cs.chromium.org/chromium/src/third_party/breakpad/breakpad/src/tools/windows/symupload/symupload.cc?type=cs&q=DumpSymbolsToTempFile&sq=package:chromium&g=0&l=197
,
Dec 26
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fdff263cdc91940ae528971e801ca7f7d6ced5c7 commit fdff263cdc91940ae528971e801ca7f7d6ced5c7 Author: Reid Kleckner <rnk@google.com> Date: Wed Dec 26 23:34:07 2018 win: Use new cflags to enable ghash This reinstates crrev.com/618584 without the part that enables ghash by default. The old flags no longer work after the last clang roll. R=thakis@chromium.org Bug: 904324 Change-Id: I723a3bc423b0ac749160decc36cb276ee5b1a41b Reviewed-on: https://chromium-review.googlesource.com/c/1391415 Commit-Queue: Reid Kleckner <rnk@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Reviewed-by: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#618980} [modify] https://crrev.com/fdff263cdc91940ae528971e801ca7f7d6ced5c7/build/config/compiler/BUILD.gn
,
Dec 27
I dumped the types in the PDB, and I noticed some real ghash bugs (duplicate types that were not deduplicated), so I'll work on those.
,
Dec 27
I think we need some tools to detect global type hash collisions to give us more confidence that this optimization actually works. There are some minor differences between the PDBs with and without ghash enabled. For starters, there are a few extra records in the ghash one:
Types (TPI Stream)
============================================================
- Showing 6,871,935 records
+ Showing 6,871,943 records
0x1000 | LF_ARGLIST [size = 20]
0x0603 (void*): `void*`
0x0022 (unsigned long): `unsigned long`
...
And llvm-pdbutil dump -types crashes on the ghash type stream, so I don't think it's ready for prime time.
,
Jan 4
It looks like /debug:ghash has bugs with masm-produced object files, which have topologically unsorted type streams: https://bugs.llvm.org/show_bug.cgi?id=40221 |
||||
►
Sign in to add a comment |
||||
Comment 1 by thakis@chromium.org
, Nov 12