Windows build with GN takes much longer than GYP |
||||||
Issue descriptionAs reported on IRC. Bruce has offered to take a look. Please add more labels and dependencies as needed. At r406702, I tried to build the chrome target with GN, and towards the end, link.exe ends up taking 13+ GB of RAM and 10 minutes to link. I started with: * is_debug = true, use_goma = true, is_component_build = true and then tried: * is_debug = true, use_goma = true, is_component_build = true, target_cpu = "x86" * is_debug = false, use_goma = true, is_component_build = true, target_cpu = "x86" and did not see any performace increases. The out\gn directory was ~78 GB. Then I switched to GYP and built with "component=shared_library use_goma=1" and I saw link.exe barely use 1 GB and quickly finishing. My out\Debug directory is 15 GB.
,
Jul 21 2016
tl;dr: goma My initial tests suggest that this should be a non-issue for component builds. I tested 32-bit/64-bit gyp/gn component builds of the 'chrome' target. The linker commit sizes were 12-18% higher (maximum of 6.2 GB) for gn than gyp when linking chrome.dll. Everything else was less. On non-component builds the biggest linker commit sizes were for chrome_child.dll. Those were about 52-55% (maximum of 10.0 GB) for than gyp. The cause of this is known and is being improved as a side-effect of the work on crbug.com/624274 The biggest difference - the only difference - appears to be goma. I thought that the usual advice was to not use goma on Windows. This could explain some of the behavior we are seeing on the bots - if goma is causing memory usage to more than double for linking then that could easily be killing performance.
,
Jul 21 2016
linking doesn't go through goma, though? we certainly use goma extensively on the bots (though not the official bots).
,
Jul 21 2016
Goma changes how symbols are accumulated. It puts them in the .obj files which means that the linker then has much more work to do. The short-term solution for thestig@ is probably to stop using goma. I'm looking now to see why goma is having a larger affect on gn than gyp.
,
Jul 21 2016
+ukai FYI. Thanks for checking Bruce. I didn't know I have a goma allergy. I can do GN without goma for a bit. I have enough cores.
,
Jul 21 2016
> Goma changes how symbols are accumulated. It puts them in the .obj files which > means that the linker then has much more work to do. True. perhaps that's jet another side effect of the source_set/static_library issue.
,
Jul 21 2016
Curious why others haven't noticed / complained earlier. Or perhaps they did and we just didn't turn it into a bug?
,
Jul 21 2016
Most users don't use goma on local builds, I believe, and if they did we probably just told them not to. The number of build configurations that we routinely use is daunting: gyp/gn, 32-bit/64-bit, component/static, goma/non-goma, debug/release, and for a while we had VS2013/VS2015. This makes detecting and characterizing a regression tricky.
,
Jul 22 2016
Interesting. I figured everyone was using goma everywhere. In the meanwhile, I'll try symbol_level = 0 and see if that gets my build going fast again. I mostly do printf debugging so the symbols are not that useful anyway.
,
Jul 22 2016
Found it. This logic exists in gyp but was not replicated to gn:
['OS=="win" and use_goma==1', {
# goma doesn't support pch yet.
'chromium_win_pch': 0,
# goma doesn't support PDB yet, so win_z7=1 or fastbuild=1.
'conditions': [
['win_z7==0 and fastbuild==0', {
'fastbuild': 1,
}],
],
}],
fastbuild==1 means that debugging information is disabled in the compiler. This means that object files are (based on a quick test in base) 6x smaller in gyp-goma builds than in regular gyp builds or in gn builds (with or without goma). This then makes linking much faster.
It also makes the debugging experience for gyp-goma builds pretty terrible - you get call stacks with function names but no local variables or types. But sometimes that is sufficient.
- This fully explains this regression
- This also explains gn build slowdowns on the build machines
The win_z7 flag is a GYP_DEFINE which developers can use to override the lack of symbols in goma builds, but I don't know if anybody ever uses it.
Easy fix. I will land something tomorrow.
,
Jul 22 2016
Ah! Good catch. Yeah, you would need to change the default value of symbol_level based on use_goma, I guess. We didn't port win_z7 over to GN, and so far no one has complained.
,
Jul 22 2016
,
Jul 22 2016
Does dymbol_level = 2 not work for goma? It seems weird to relate these two things. I normally don't use goma because of the poor symbols. If I could get some moderate speedup for my builds with full symbols that would be better.
,
Jul 22 2016
I believe symbol_level=2 works fine, it just doesn't produce PDBs so the object files are much larger (i.e., win_z7 means that the debugging code is in the .o itself).
,
Jul 22 2016
Goma should work with symbol_level=2. https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?q=build/config/compiler/BUILD.gn&sq=package:chromium&dr&l=1561 https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?q=build/config/compiler/BUILD.gn&sq=package:chromium&dr&l=1490 But, yeah, it includes symbols in object files, so link could be slower.
,
Jul 22 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/615601219acf13abf60ac400a5c07333da6fe897 commit 615601219acf13abf60ac400a5c07333da6fe897 Author: brucedawson <brucedawson@chromium.org> Date: Fri Jul 22 20:33:24 2016 Disable compilation symbols on goma builds Goma builds necessarily don't use PDBs. That means that symbols have to be put in .obj files. This means that symbols get stored redundantly, which makes the job of the linker much harder - greatly increased linker working sets and greatly increased link times. gyp deals with this by disabling symbol generation in the compile stage for goma builds. gn has not been doing this, which has made gn goma builds *significantly* slower, unless symbol_level is explicitly set to 1. gyp has an override switch (win_z7) but it appears that nobody uses it so for now I am leaving it unimplemented in gn. BUG= 630074 Review-Url: https://codereview.chromium.org/2174873002 Cr-Commit-Position: refs/heads/master@{#407251} [modify] https://crrev.com/615601219acf13abf60ac400a5c07333da6fe897/build/config/compiler/BUILD.gn
,
Jul 22 2016
> I believe symbol_level=2 works fine For some definition of fine. It works "fine" in the sense that you get binaries and you get symbols, but the performance of the links is extremely slow. The purpose of goma is to improve build times, but using goma with symbol_level==2 leads to build times that are probably *worse*: - incremental builds are certainly worse - thestig@ was reporting much longer link times - full builds might also be worse, because the links are so much slower, network and disk traffic is much higher, and page-file thrashing is highly likely. So that is why (as far as I can tell) why goma plus symbol_level==2 (fastbuild==0) was prohibited by default in gyp. gyp does let you force symbol_level==2 on with goma by setting win_z7=1, but as far as I can tell nobody uses that.
,
Jul 24 2016
,
Aug 11 2016
I think we at least need a way to turn on debug info with goma. We had to add a special case for is_asan for ClusterFuzz in http://crbug.com/635715 . Is it reasonable to try to arrange things like this? gn gen # Defaults to symbol_level=2, i.e. /Zi gn gen --args="use_goma=1" # Implies no debug info gn gen --args="use_goma=1 symbol_level=2" # Implies /Z7 Or do we really need a separate win_z7 argument? Would /debug:fastlink help here? I remember doing some measurements to show that it did.
,
Aug 11 2016
I have no objection to adding /z7 if we need it and it makes sense to do so, and your suggestion seems like it would work (apart from it needing to be use_goma=true, of course ;). However, the point of this was that goma+full debugging was slower than local builds w/ full debugging, so maybe the answer here is that if you need debug info, just don't use goma? I'm never quite sure what the situation w/ /debug:fastlink is, so maybe bruce has thoughts.
,
Aug 11 2016
I would assume /debug:fastlink doesn't work with /z7? But I don't see any specific documentation about that either way.
,
Aug 11 2016
/debug:fastlink does work with /Z7, at least somewhat. I made these numbers using clang's /Z7 output for http://crbug.com/589977#c16 , which was checking the link time of webcore_shared.dll: /debug : clean: 2m28.148s, incremental: 0m4.037s /debug:fastlink : clean: 1m04.115s, incremental: 0m3.522s MSVC's /Zi is still better than clang's /Z7: MSVC /Zi /debug : clean 1m14.861s, incremental: 0m2.646s I didn't measure MSVC /Z7 /debug:fastlink, unfortunately. /Zi actually only puts *type* information into the PDB. The *symbol* information is still in the object files (.debug$S). I think this is done because a lot of symbol information is from inline functions which may be discarded during linking. There are actually many comdat .debug$S sections, and I think the linker discards them in the usual way if the associated .text section is discarded. After that, the linker stuffs them in the PDB. So, /debug:fastlink is probably getting some of its speedup by leaving .debug$S in object files, and only making an index.
,
Aug 11 2016
> So, /debug:fastlink is probably getting some of its speedup by leaving > .debug$S in object files, and only making an index. Or all of its speedup from that. The only design change I am aware of with /debug:fastlink is putting in references to files containing debug information rather than copying it all to the PDB. I think that on most (maybe all) machines goma+z7 is going to be slower than a local build with debug information. The memory consumed by linking is extreme (15 GB for chrome.dll IIRC) which can easily lead to swapping and disk thrashing on even a powerful machine.
,
Feb 23 2017
I'm not sure why this CL description didn't get appended to this bug, but I'm pasting it in now. With this change it should be practical to use "use_goma = true" with "symbol_level = 2" as long as "is_win_fastlink = true" is also specified. Read below for other args.gn suggestions and for the limitations: Allow using goma on Windows with symbol_level == 2 Relanding with fix for amd64-generic Trusty build failure. Previous version was crrev.com/2661023010. Traditionally goma for Chrome has been less useful on Windows than on other platforms because it was incompatible with full debug information. Building with goma requires using /Z7 instead of /Zi, and this causes the linker's memory usage and runtime to blow up as all of the debug information is merged. However, /debug:fastlink makes this work. Because it doesn't merge all of the debug information it makes goma and /Z7 practical. Full release component builds can be done in less than fifteen minutes, with incremental builds taking just a few seconds. Without goma a full release component build of Chrome can easily take 40+ minutes, even on a Z840. Goma's speedup comes from massively parallelizing the compile phase, however even with /debug:fastlink the linking phases are longer with the /Z7 switch that is required by goma. A /debug:fastlink of chrome.dll in a component build goes from ~32 seconds to ~110 seconds on a Z620 when /Z7 (goma) is selected. This penalty will be reduced by VC++ 2017 which claims 30% speed improvements on /debug:fastlink. So, if you frequently need to do full relinks then goma may still be a bad choice. However for most scenarios this change should make goma a good choice for component builds of Chrome, even with full debug information enabled. To make use of this ability you need to explicitly specify the switches below - symbol_level will otherwise default to 1 when use_goma == true. use_goma = true is_win_fastlink = true symbol_level = 2 In addition, these two settings are strongly recommended if you want the fastest possible builds: is_component_build = true target_cpu="x86" - to ensure incremental linking always works The fastlink PDBs work well for most scenarios but there are a few scenarios where they do not work: - Copying PDBs to another machine (fails due to references to the .obj files) - Reporting on all symbols/globals with SymbolSort or windbg's "x" command or similar will report nothing - ETW tracing fails - VS heap profiling fails Ideally mspdbcmf.exe would let us create "normal" PDBs when needed but in practice this appears to be unusable with /Z7 created PDBs. BUG= 630074 , 688203 Review-Url: https://codereview.chromium.org/2702203004 Cr-Commit-Position: refs/heads/master@{#452171} Committed: https://chromium.googlesource.com/chromium/src/+/77bdd81b8517ef0166ba14549545fa1f27d9bfec
,
Aug 7 2017
Clang never uses type server PDBs (/Zi) and always puts types in object files (/Z7). Should we revisit this check? Whether goma is used or not won't affect the final link times. The final link times are admittedly long for non-component builds, but this check currently makes it harder to reproduce a static build with symbols, which is pretty similar to the official build configuration.
,
Aug 7 2017
With VC++ we were seeing ~25 GB working sets for the linker with goma builds without fastlink. Avoiding that situation is what drove the check. If clang link memory is reasonable without fastlink then I agree that we should stop requiring fastlink, at least on clang builds. I am curious as to how clang avoids the memory explosion that plagues VC++ when building with /Z7 - I would intuitively expect them to both hit the same problems.
,
Aug 7 2017
I've noticed that VC emits a ton of types in the __vc_attributes namespace in every object file with /Z7. This only happens when compiling C++ files. You can see this with `cvdump -t`: https://github.com/Microsoft/microsoft-pdb/blob/master/cvdump/cvdump.exe They're only 100 type records per file, but that might matter. The other thing is that clang only emits full type information for a class with its vtable if it has one. This trick and others like it reduce the total amount of type info by 35%: https://bugs.chromium.org/p/chromium/issues/detail?id=642812#c15 It's probably worth doing a direct comparison between MSVC /Z7 and clang /Z7. The 35% number was comparing clang vs "clang -fstandalone-debug". |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by dpranke@chromium.org
, Jul 21 2016Labels: Proj-GN-Migration