New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 904324 link

Starred by 5 users

Issue metadata

Status: Available
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 904337

Blocking:
issue 903751



Sign in to add a comment

Build with /DEBUG:GHASH on Windows

Project Member Reported by h...@chromium.org, Nov 12

Issue description

For faster linking.

See http://blog.llvm.org/2018/01/improving-link-time-on-windows-with.html

tikuta already hooked up a use_ghash gn flag in https://chromium-review.googlesource.com/c/chromium/src/+/1004335/


I don't really know what the possible downsides are. Can we just turn it on?
 
Cc: brucedaw...@chromium.org brat...@opera.com
iirc it increases obj file size and might make the compile phase slower, in exchange for faster incremental link times. I think the main blocker is to collect some numbers.
Oh, and is the hash format finalized?

The blog post talks about full sha1 hashes but suggests truncating to 8 bytes is probably enough. Zach committed that in r332669 (and r332676), but IIRC there was also talk about switching to a different algorithm?
There was talk, but I think it needs to be motivated by more data about the compile time overhead. Here's how I was planning to measure it:

1. Sum the size of all .debug$H sections in every .obj in a chromium build, calculate percentage obj size overhead to estimate extra goma network traffic cost
2. Identify obj with maximum .debug$H size for worst case profiling, rebuild, measure how much time is spent in SHA1 hashing during the compile vs. traditional DenseMap hashing

Another question is, does SHA1 have a high constant time setup cost for small records? If so, small records (think pointer types and 'const' qualified types) will dominate the cost of hashing. If not, long records (LF_FIELDLIST) will dominate the cost of hashing. We'll need to know that before deciding on a better hash.
I did some debug+component ghash builds of chrome today, and measured three things: link times, object file size vs. .debug$H size, and compile time. The summary is that links get 32% faster, but object files grow by 7%. There is no compile time impact.

Compile time impact
--------------------

There's no compile time impact because we the global type table builder, which calculates the SHA1 hashes, regardless of whether we plan to emit them. -emit-codeview-ghash-section just causes us to emit them in .debug$H. I profiled the compile of the object file with the largest ghash section (i.e. the one with the most type records), and 49 samples of 13K samples were in SHA1::body. I don't think we're being wasteful by computing these hashes, so it's not like we can rip out ghash and win back some extra compile time. In general compile time is dominated by the frontend.


Link times
----------

I chose these five DLLs because they have the largest PDBs:
blink_platform.dll blink_modules.dll blink_core.dll content.dll chrome.dll

I should probably also link browser_tests, since that's an important incremental case.

I built all DLLs, deleted each one, rebuild with `ninja -v -d keeprsp -n`, extracted the individual link lines, and re-ran them in a loop to measure. For each DLL, I did an initial /DEBUG link, and thew away the result to warm up the disk cache. Then I timed three links with /DEBUG and three with /DEBUG:GHASH. I averaged the results for each DLL, calcluated the speedup per DLL, and then averaged across all DLLs to get 32%. Each DLL had surprisingly different results, which I think reflects the relative amount of duplicate type information.

The data, units in seconds except the speedup:
	                /DEBUG	/DEBUG:GHASH	speedup %
blink_platform.dll	6.69	5.65	18.43%
blink_modules.dll	15.28	10.33	47.91%
blink_core.dll__	29.76	19.05	56.22%
content.dll______	20.38	17.49	16.57%
chrome.dll_________	37.10	30.57	21.35%

One thing to note about my methodology is that I was using the same object files. LLD might potentially get *faster* if the .debug$H section isn't present in the input. It might be worth remeasuring for that reason.

Another random statistic I gathered while doing this is that LLD uses 5.3GB of "private bytes" (I'm not sure what that means) while linking chrome.dll. I think a lot of this is allocated while doing PDB writing, and we should focus on bringing that down if we can. I already have one patch in flight to help: https://reviews.llvm.org/D54554 I was profiling with Chromium's current pinned clang, so it didn't have that patch.


Object file size impact
------------------------

For this, I looked at every file ending in .obj in the chrome build directory, and used a python script (based on ml.py!) to extract the sizes of the various .debug sections. I deleted every .obj before rebuilding 'chrome' to avoid getting stale objects. I broke it down by .debug$S, .debug$T, and .debug$H.

hash avg %	7.01%
type avg %	47.95%
sym avg %	16.99%
debug size	71.96%
	
total hash MB	2,106.53
total obj  MB	30,039.45

So, we're adding 2.1GB of object file data, which may matter to goma, but we're already paying 72% to enable debug info. I think that as we pursue further build optimizations like C++ modules, we'll find a way to "home" more type information to module object files like we do for PCH files, and that will reduce the overall amount of .debug$T and .debug$H data in the objects.


Conclusion
------------

I think we should ship it soon. :) It's worth going back and measuring a couple of other build configurations, like release+static and release+dll, and browser_tests.exe. We should also compare link times between distinct build directories, i.e. when using object files with no .debug$H section. That'll reduce the speedup, so it's important. We also need to know how much the 7% extra network traffic matters to goma. If the build is dominated by sending object files over the wire, then the 7% object file size matters more.

I also want to wait until after I land the change to rename the command line flag to "-gcodeview-ghash" from "-mllvm -emit-codeview-ghash-section" (https://reviews.llvm.org/D54370) and roll that in.
From some (old) comments of mine on  crbug.com/687728 . "I found that lld used about 18% less commit than link.exe, peaking at 6.7 GB for unit_tests.exe. Most links used less than 5 GB of commit."

Which is to say that I don't think that the private bytes used by lld-link.exe are a concern. Back in the VS days I would sometimes see 20+ GB of private bytes for PGO and LTCG builds and on a 64-GB machine that made it very difficult to get the right amount of link parallelism. At 5 GB per link it is easy. Therefore, while reducing memory would have some modest advantages I would not think it should be a priority.

I don't think that the 2.1 GB of object data will matter much to goma. It does mean we will put more pressure on disk space and disk cache but it's a small enough percentage that it is probably still worthwhile.

Blockedon: 904337
There's a CL for this in the CQ at https://chromium-review.googlesource.com/c/chromium/src/+/1388554

Chatting with rnk, we realized that this has only an effect with symbol_level=2 and that none of the bots we could find use that. So this should help devs building with full debug info, but it won't help bots.

We also realized that win/asan builds build with -gline-tables-only, but we don't pass the two ghash lines there:

https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?type=cs&q="is_clang+%26%26+using_sanitizer"&sq=package:chromium&g=0&l=2338


We should probably add them there too, then at least the win/asan bot would test this change and benefit from it.
Does -gline-tabls-only -gcodeview-ghash do the right thing? (I.e. only type hashes for types needed for line tables?)
Project Member

Comment 9 by bugdroid1@chromium.org, Dec 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4dcb10c9b199341e5c687d98eb9d91795d212d33

commit 4dcb10c9b199341e5c687d98eb9d91795d212d33
Author: Nico Weber <thakis@chromium.org>
Date: Fri Dec 21 20:31:43 2018

win: Enable ghash by default.

This speeds up linking by 20%-50% (average 32%), but in return makes
obj files 7% larger. Compilation speed is not affected.

Also use the new driver-level -gcodeview-ghash flag instead of an
(equivalent, but unofficial) -mllvm switch.

This only affects symbol_level=2 builds.

Bug: 904324
Change-Id: Iad4d62820b75a65df572104f9042a86cfcdbec96
Reviewed-on: https://chromium-review.googlesource.com/c/1388554
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Reid Kleckner <rnk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618584}
[modify] https://crrev.com/4dcb10c9b199341e5c687d98eb9d91795d212d33/build/config/compiler/BUILD.gn

> Does -gline-tabls-only -gcodeview-ghash do the right thing? (I.e. only type hashes for types needed for line tables?)

I'm pretty sure it does, but I doubt there will be a measurable win, since most records would be short.
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 23

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4b5096c89d65a6a86b9797a7252c3ff3f338187c

commit 4b5096c89d65a6a86b9797a7252c3ff3f338187c
Author: Nico Weber <thakis@chromium.org>
Date: Sun Dec 23 15:39:25 2018

Revert "win: Enable ghash by default."

This reverts commit 4dcb10c9b199341e5c687d98eb9d91795d212d33.

Reason for revert: Speculative, might have caused https://crbug.com/917637

Original change's description:
> win: Enable ghash by default.
> 
> This speeds up linking by 20%-50% (average 32%), but in return makes
> obj files 7% larger. Compilation speed is not affected.
> 
> Also use the new driver-level -gcodeview-ghash flag instead of an
> (equivalent, but unofficial) -mllvm switch.
> 
> This only affects symbol_level=2 builds.
> 
> Bug: 904324
> Change-Id: Iad4d62820b75a65df572104f9042a86cfcdbec96
> Reviewed-on: https://chromium-review.googlesource.com/c/1388554
> Commit-Queue: Nico Weber <thakis@chromium.org>
> Reviewed-by: Reid Kleckner <rnk@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#618584}

TBR=thakis@chromium.org,rnk@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: 904324
Change-Id: Id09357d2cc11d613387497c1580db1e1f87324de
Reviewed-on: https://chromium-review.googlesource.com/c/1390106
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618793}
[modify] https://crrev.com/4b5096c89d65a6a86b9797a7252c3ff3f338187c/build/config/compiler/BUILD.gn

Owner: r...@chromium.org
The official builder uploads symbols to the symbol server like this:

[00:42:23] Uploading C:\b\c\b\win64_clang\src\out\Release_x64\chrome.dll
C:\b\c\b\win64_clang\src\third_party\breakpad\symupload.exe --timeout 0 C:\b\c\b\win64_clang\src\out\Release_x64\chrome.dll http://clients2.google.com/cr/symbol
Failed, 30 before retrying upload.
... never succeeds after 3 retries

I guess to deploy this we have to dig into breakpad's symupload.exe and see what it doesn't like about our PDBs.
Cc: mosescu@chromium.org
I tried running it locally, but it can't find msdia*.dll:

$ ../../third_party/breakpad/symupload.exe chrome_child.dll --timeout 0 http://clients2.google.com/cr/symbol
CoCreateInstance CLSID_DiaSource {3BFCEA48-620F-4B6B-81F7-B9AF75454C7D} failed (msdia*.dll unregistered?)
Could not get symbol data from chrome_child.dll

I tried registering both msdia120.dll and msdia140.dll from an admin command prompt, and it succeeded, but symupload.exe produces the same output. 

I get the impression from the last change to the file that I need to use the VS 2013 toolset version of DIA to make this work:
https://chromium.googlesource.com/chromium/src/+/fe1ea4fe20c985b6996585dfd5a9d707f8c22dba

Anybody have any ideas for how to run and debug this tool? Based on the source, if there's something wrong with the PDB, I would've expected it to bail out with an error here:
https://cs.chromium.org/chromium/src/third_party/breakpad/breakpad/src/tools/windows/symupload/symupload.cc?type=cs&q=DumpSymbolsToTempFile&sq=package:chromium&g=0&l=197
Project Member

Comment 14 by bugdroid1@chromium.org, Dec 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fdff263cdc91940ae528971e801ca7f7d6ced5c7

commit fdff263cdc91940ae528971e801ca7f7d6ced5c7
Author: Reid Kleckner <rnk@google.com>
Date: Wed Dec 26 23:34:07 2018

win: Use new cflags to enable ghash

This reinstates crrev.com/618584 without the part that enables ghash by
default. The old flags no longer work after the last clang roll.

R=thakis@chromium.org

Bug: 904324
Change-Id: I723a3bc423b0ac749160decc36cb276ee5b1a41b
Reviewed-on: https://chromium-review.googlesource.com/c/1391415
Commit-Queue: Reid Kleckner <rnk@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#618980}
[modify] https://crrev.com/fdff263cdc91940ae528971e801ca7f7d6ced5c7/build/config/compiler/BUILD.gn

I dumped the types in the PDB, and I noticed some real ghash bugs (duplicate types that were not deduplicated), so I'll work on those.
I think we need some tools to detect global type hash collisions to give us more confidence that this optimization actually works. There are some minor differences between the PDBs with and without ghash enabled. For starters, there are a few extra records in the ghash one:

                      Types (TPI Stream)                     
 ============================================================
-  Showing 6,871,935 records
+  Showing 6,871,943 records
      0x1000 | LF_ARGLIST [size = 20]
               0x0603 (void*): `void*`
               0x0022 (unsigned long): `unsigned long`
...

And llvm-pdbutil dump -types crashes on the ghash type stream, so I don't think it's ready for prime time.
It looks like /debug:ghash has bugs with masm-produced object files, which have topologically unsorted type streams:
https://bugs.llvm.org/show_bug.cgi?id=40221

Sign in to add a comment