Component builds restrict build parallelism, costing 60-120 seconds on a full build |
|||||||
Issue descriptionChrome's component build transforms some components from static libraries (or source sets) to DLLs. This shrinks the size of chrome.dll (and makes chrome_child.dll go away) which allows for faster incremental builds. However, an unintended consequence of this is increased serialization in full component builds. The first attached image is a snapshot of a ninjatracing report on a non-component debug goma build. Note that chrome.dll, chrome_child.dll, and chrome_watcher.dll build close to last, and in parallel. In a component debug goma build we can see that blink_core.dll, content.dll, and others link serially. This is probably unavoidable because of the chain of imports between them. However what might be avoidable is that many compilation steps are blocked until all of these links have happened - see the two stacks of compiles after the serialized links in the middle/left in the second attached image. For example, //chrome/browser:browser depends on //content:content in debug component builds, but not in debug non-component builds, as shown by "gn path": >gn path out\debug //content:content //chrome/browser:browser No non-data paths found between these two targets. >gn path out\debug_component //content:content //chrome/browser:browser //chrome/browser:browser --[public]--> //chrome/browser:browser_0 --[public]--> //content/public/browser:browser --[public]--> //content:content So, for some reason (presumably it's just the default gn-to-ninja behavior) a source file in a static library that depends on a DLL is treated differently from a source file in a library that depends on a static library. It looks like it might save 60-120 s off of the ~525 s component build times I'm seeing. On non-goma builds the percentage saving is smaller but the absolute saving looks like it would be similar.
,
May 26 2017
,
May 26 2017
I went back to a gyp build (e28fd5d84cebeb446e2c35cf6b766e35aa56ec5f from August 22, 2016) to see if this issue affect GYP as well. It does, although not identically. The attached image and ninjatracing .json file show that webcore_shared.dll and content.dll both block compilation. GYP goma builds always use symbol_level = 1 (or whatever the GYP equivalent was). Settings were GYP_DEFINES=component=shared_library disable_nacl=1 use_goma=1 I also attached a .json file from building with GYP, although that build is from May, 2017 and therefore cannot be directly compared.
,
Jun 27 2017
See also crbug.com/578477
,
Sep 15 2017
I was doubting that using component causes this serialized dependency, but I confirmed that component itself is not the reason.
I found the reason that some cc files wait to finish some link. Such dependency is generated from build rule like below.
```
action("action_a") {
script = "a.py"
outputs = [
"$target_gen_dir/a.cc",
"$target_gen_dir/a.h",
]
deps = [ ":component_a" ]
}
component("component_a") {
}
component("component_b") {
sources = [
"b.cc",
"b.h",
]
deps = [ ":action_a" ]
}
```
This rule makes b.cc not to be compiled until action_a finished.
And similar rules are generated from some build template in chromium repository.
e.g.
https://chromium.googlesource.com/chromium/src/+/672fd5cd07981bc528210e1368b6884f2cb9bf96/tools/json_schema_compiler/json_features.gni#12
many objs -> gen of EventModules.cpp -> //device/vr:mojo_bindings_blink (libdevice_vr_mojo_bindings_blink.so from mojom template)
https://chromium.googlesource.com/chromium/src/+/5b957db1a476b8e3083d3e208dbf2a17416d0d95/third_party/WebKit/Source/bindings/modules/BUILD.gn#78
https://chromium.googlesource.com/chromium/src/+/a788bcd802e5ef17eb5a02b23bff3698281f70dd/device/vr/BUILD.gn#138
https://chromium.googlesource.com/chromium/src/+/4bec680c68c8cb2d88fb32a8412d19bf7a21107f/mojo/public/tools/bindings/mojom.gni#976
https://chromium.googlesource.com/chromium/src/+/4bec680c68c8cb2d88fb32a8412d19bf7a21107f/mojo/public/tools/bindings/mojom.gni#758
So action makes compile/link serialized.
I'll make some patches to solve this strong dependency.
,
Sep 15 2017
My investigations to date into this issue have involved converting the .ninja_log files into tracing.json files using the ninjatracing tool and then manually looking for serialization points, and then manually looking for the long poles that appear to be causing the serialization. It should be fairly straightforward to problematically find serialization points and the last few tasks that complete before the serialization point, thus making it easier to measure improvement and look for more opportunities. Another useful analysis would be to analyze the .ninja_log file to record the average parallelism. This won't identify where to make improvements but it gives a long-term stable way of comparing progress. It can also be used to compare build parallelism between platforms and between different build types. For instance, my claim is that non-component builds are more parallel than component builds. Quantifying parallelism would let us quantify this claim and mix it in with other build performance measurements such as elapsed wall time and the sum of elapsed wall times for all steps.
,
Sep 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/49bdb0d5159f629084ff3d163af4af9bdef333a6 commit 49bdb0d5159f629084ff3d163af4af9bdef333a6 Author: Takuto Ikuta <tikuta@google.com> Date: Tue Sep 19 02:36:10 2017 Optimize build dependency for json_features template With this CL, some compile task can run without waiting link of //extensions/common There are many compile tasks depend on some json_features targets, and such compile tasks wait finish of link in //extensions/common in deps of json_features action. But //extensions/common is not used in code generation, so we can move it to deps of source_set. After moving to source_set, compile tasks in some targets depend on json_features no need to wait finish of link in //extensions/common, because gn knows object files only depend on source files. gn can't find such unnecessary dependency if link is in deps of action. ninja trace from building chrome without this CL. http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.ULVV1ZrTGHxHCPKRs4uVIbYdrYmR2QkE-yjpfNJwCeI=.gz/trace.html ninja trace from building chrome with this CL. http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KRBZxtDbA2CEAq1J8eBrfI_iy6pi8GL0XPj54IiiLM0=.gz/trace.html Improved build time few seconds on Z840 linux. Bug: 725639 Change-Id: I80739045663d08236266b0697c40758e53773c59 Reviewed-on: https://chromium-review.googlesource.com/668343 Commit-Queue: Takuto Ikuta <tikuta@google.com> Reviewed-by: Devlin <rdevlin.cronin@chromium.org> Cr-Commit-Position: refs/heads/master@{#502765} [modify] https://crrev.com/49bdb0d5159f629084ff3d163af4af9bdef333a6/tools/json_schema_compiler/json_features.gni
,
Sep 19 2017
The reason of serialized link in component build is this. https://chromium.googlesource.com/chromium/src/+/16b6872f38d70af4103b23cd54b3ef8c7697b341/content/BUILD.gn#79 Let me show current ninja tracing when building chrome target (not all) I set use_lld = true and is_debug = false Linux, component build http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KRBZxtDbA2CEAq1J8eBrfI_iy6pi8GL0XPj54IiiLM0=.gz/trace.html Win, component build http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.YXMP3KqFvcyLSdOghHFk0gagFvoQ1NJrq5x9oQA4L_c=.gz/trace.html Win, non component build http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.olMU9WznutJh9GvBx8-Pen66FhMkG02UxK-3sU4b3Kk=.gz/trace.html From linux component build and win non component build tracings, it looks that mksnapshot is separating action of two compile spikes. It generates snapshot.cc used in targets of second spike. Also in linux build, target obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o makes compiles in second spike wait. Both target needs to be finished before some linking, but it is not necessary when compiling other object files. So if we can write dependency only required in linking explicitly, this dependency will be removed by modifying many BUILD.gn I want to fix this, but mitigation of slow process creation/destruction on windows 10 can be high priority for me.
,
Sep 19 2017
For the yasm slowness, obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o, I've been thinking we should switch to nasm; yasm seems to have fallen out of maintenance (last release ~3 years ago) and is now much slower then nasm. I don't have the exact numbers right now, but e.g., a recent ffmpeg build with yasm takes ~minutes while with nasm it was ~seconds. Filed issue 766721.
,
Sep 19 2017
I thought yasm was relatively fast now that we changed to always having it optimized. At least, the yasm step that I was examining used to take ~35 seconds in debug builds and this dropped to ~12 seconds after I force-optimized yasm. But, if there are other even slower steps then switching to nasm sounds good. It would be worth checking to see if this is a debug/release difference (in which case maybe my change is no longer working) or just general slowness. > I want to fix this, but mitigation of slow process creation/destruction on windows 10 can be high priority for me. I don't think it is possible for us to mitigate this bug, unless we can actually avoid creating/destroying processes. Also, it appears that Microsoft has now fixed the bug - I will be looking at a trace from a fixed system later today - and we should get hot fixes, within a few months?
,
Nov 22 2017
Apparently the build dependencies were better in the gyp world - see crbug.com/623233 for interesting thoughts and graphs.
,
Nov 22 2017
,
Nov 22 2017
,
Nov 22 2017
,
Dec 8 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/697bba05613ccae688c7565fc4e9e4b73f453b04 commit 697bba05613ccae688c7565fc4e9e4b73f453b04 Author: Takuto Ikuta <tikuta@google.com> Date: Fri Dec 08 10:23:29 2017 Remove a fake dependency from event_modules event_modules does not depend on //device/vr:mojo_bindings_blink. This CL increases build parallelism by removing a fake dependency. ninja trace log changed like below when building chrome. Without this CL: http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KmDhbGmTjZVDmgVYEP9VHFfXWd5rA_MXx0vuXUWzNhA=.gz/trace.html With this CL : http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.c6Teo4mlQCT3wnYe6fCGMRmfe8hnKhkelkWG93qC9NY=.gz/trace.html Bug: 725639 Change-Id: I860b7c9d054f99e628d09fc0b2c159db19ca0b9c Reviewed-on: https://chromium-review.googlesource.com/816814 Commit-Queue: Takuto Ikuta <tikuta@google.com> Reviewed-by: Yuki Shiino <yukishiino@chromium.org> Cr-Commit-Position: refs/heads/master@{#522752} [modify] https://crrev.com/697bba05613ccae688c7565fc4e9e4b73f453b04/third_party/WebKit/Source/bindings/modules/BUILD.gn
,
Dec 15 2017
I did some tests to see if this made any significant difference by running a batch file (shown below) that does repeated builds. I did six of each. This showed a 3 second time saving but the standard deviation was far higher than that so it doesn't mean anything. I appreciate the fix and hope that more parallelism blockers can be found and removed. I recommend testing future changes with repeated builds in order to determine how much time is saved. I'm not suggesting this as a requirement - just a nice-to-have thing so that you get full and accurate credit for big wins. Here is a typical version of my test batch file - adjust as needed. @rem Set this to tell goma not to use cached compiles, gives more consistent (but slower) compiles set GOMA_STORE_ONLY=true @rem Set this to tell goma not to do local compiles set GOMA_USE_LOCAL=false set basesettings="goma_dir=\"C:\src\goma\goma-win64\" is_component_build=true is_debug=true target_cpu=\"x86\" enable_nacl=false remove_webcore_debug_symbols=true set testsettings=symbol_level=2 use_goma=true is_win_fastlink=true use_jumbo_build=true @rem Repeat this block multiple times. @echo on call git checkout master @echo on call gn gen out\BuildTest --args=%basesettings% %testsettings%" >nul @echo on call gn clean out\BuildTest & call gn gen out\BuildTest & call ninja -C out\BuildTest chrome @echo on call git checkout reverted @echo on call gn gen out\BuildTest --args=%basesettings% %testsettings%" >nul @echo on call gn clean out\BuildTest & call gn gen out\BuildTest & call ninja -C out\BuildTest chrome
,
Jan 18 2018
> Apparently the build dependencies were better in the gyp world I'd guess that this was because in gyp you'd have to explicitly set hard_dependency in targets that had public generated headers. gn always assumes that because people forgot to do that all the time in gyp, but this restricts parallelism some.
,
Jan 18 2018
The good news is that linking in lld is fast enough that the restricted parallelism is a much smaller problem. With symbol_level=1 on a full component rebuild of the 'chrome' target I'm seeing a weighted time spent on linking (i.e.; parallelism corrected link time) of about 30 s. Some of that is unavoidable so the actual cost is necessarily less than that. The cost is greater on a symbol_level=2 build but still much better than with link.exe. It would still be great to remove parallelism blockers, but the benefits to a fix are getting lower.
,
Jun 20 2018
Do you think this bug yet need to be fixed? I think heavily unnecessary serialized part was removed in 578477 or become negligible by lld.
,
Jun 20 2018
I made CL for yet another restricted build parallelism. https://chromium-review.googlesource.com/c/chromium/src/+/1107431 The CL improves build time of content_shell from 316.8s to 254.5s on Z840 linux without goma backend cache. See attached screenshot.
,
Jun 20 2018
I would be interesting to see the post_build_ninja_summary.py results from the two builds, or the .ninja_log files. I can't actually see the v8_context_snapshot serialization in the first screenshot, or at least I can't identify it as such.
,
Jun 20 2018
I did a few builds with this patch and I'm not sure that this is avoid serialization. I've attached a .ninja_log and pasted in the post build summary:
Longest build steps:
4.3 weighted s to build v8.dll, v8.dll.lib, v8.dll.pdb (4.3 s CPU time)
4.4 weighted s to build mksnapshot.exe, mksnapshot.exe.pdb (4.4 s CPU time)
5.8 weighted s to build blink_core.dll, blink_core.dll.lib, blink_core.dll.pdb (5.8 s CPU time)
5.9 weighted s to build obj/v8/v8_base/v8_base_jumbo_31.obj (98.9 s CPU time)
6.3 weighted s to build obj/content/browser/browser/browser_jumbo_36.obj (116.4 s CPU time)
13.0 weighted s to build obj/v8/v8_external_snapshot/v8_external_snapshot_jumbo_1.obj (13.0 s CPU time)
26.8 weighted s to build content.dll, content.dll.lib, content.dll.pdb (26.8 s CPU time)
37.7 weighted s to build chrome.dll, chrome.dll.lib, chrome.dll.pdb (37.7 s CPU time)
39.4 weighted s to build snapshot_blob.bin (39.4 s CPU time)
78.5 weighted s to build obj/v8/v8_base/v8_base_jumbo_20.obj (171.2 s CPU time)
Time by build-step type:
3.2 s weighted time to generate 4914 .stamp files (1251.4 s CPU time)
3.4 s weighted time to generate 717 mojo files (2022.9 s CPU time)
39.8 s weighted time to generate 6 .bin files (42.9 s CPU time)
94.8 s weighted time to generate 241 PEFile (linking) files (241.3 s CPU time)
325.2 s weighted time to generate 14526 .obj files (141153.1 s CPU time)
471.4 s weighted time (146011.5 s CPU time, 309.7x parallelism)
22700 build steps completed, average of 48.15/s
So, there is definitely some pre and post serialization. I can't tell if there is less serialization or not.
,
Jun 20 2018
Note that this was a debug jumbo goma component build, FWIW. Settings are: is_debug = true is_component_build = true enable_nacl = false target_cpu = "x86" remove_webcore_debug_symbols=true use_jumbo_build = true use_goma = true Goma settings are: GOMA_ENABLE_MACRO_CACHE=true GOMA_MAX_ACTIVE_TASKS=2000 GOMA_MAX_SUBPROCS=24 GOMA_STORE_ONLY=true GOMA_USE_LOCAL=false
,
Jun 20 2018
The target of the patch is not chrome but mainly content_shell. Serialization for chrome due to v8_context_snapshot.bin should be removed now. I attached ninja_log when building content_shell without patch, you'll see v8_context_snapshot.bin around 262s. > GOMA_ENABLE_MACRO_CACHE=true Sorry this flag has no-meaning now.
,
Jun 21 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f6e559e0817b597243e6b4eb4806d6fa48dd6adf commit f6e559e0817b597243e6b4eb4806d6fa48dd6adf Author: Takuto Ikuta <tikuta@chromium.org> Date: Thu Jun 21 10:23:43 2018 Make v8_context_snapshot as data_deps v8_context_snapshot is not the target to be linked to other libraries. This improves build time of some targets by utilize parallelization. For example, build time of -j800 content_shell reduced from 316.8s to 254.5s on Z840 linux without goma backend cache. See the difference of build trace screenshots in https://bugs.chromium.org/p/chromium/issues/detail?id=725639#c20 Bug: 725639 Change-Id: I64d57241ff6b742db4ddcb31afc07a5e7c2e2eb1 Reviewed-on: https://chromium-review.googlesource.com/1107431 Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Reviewed-by: Hitoshi Yoshida <peria@chromium.org> Reviewed-by: Jeremy Roman <jbroman@chromium.org> Reviewed-by: Jay Civelli <jcivelli@chromium.org> Commit-Queue: Takuto Ikuta <tikuta@chromium.org> Cr-Commit-Position: refs/heads/master@{#569207} [modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/content/shell/BUILD.gn [modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/content/test/BUILD.gn [modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/gin/BUILD.gn [modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/services/data_decoder/BUILD.gn |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by brucedaw...@chromium.org
, May 23 2017