New issue
Advanced search Search tips

Issue 725639 link

Starred by 10 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 787983



Sign in to add a comment

Component builds restrict build parallelism, costing 60-120 seconds on a full build

Project Member Reported by brucedaw...@chromium.org, May 23 2017

Issue description

Chrome's component build transforms some components from static libraries (or source sets) to DLLs. This shrinks the size of chrome.dll (and makes chrome_child.dll go away) which allows for faster incremental builds.

However, an unintended consequence of this is increased serialization in full component builds.

The first attached image is a snapshot of a ninjatracing report on a non-component debug goma build. Note that chrome.dll, chrome_child.dll, and chrome_watcher.dll build close to last, and in parallel.

In a component debug goma build we can see that blink_core.dll, content.dll, and others link serially. This is probably unavoidable because of the chain of imports between them. However what might be avoidable is that many compilation steps are blocked until all of these links have happened - see the two stacks of compiles after the serialized links in the middle/left in the second attached image.

For example, //chrome/browser:browser depends on //content:content in debug component builds, but not in debug non-component builds, as shown by "gn path":

>gn path out\debug //content:content //chrome/browser:browser
No non-data paths found between these two targets.

>gn path out\debug_component //content:content //chrome/browser:browser
//chrome/browser:browser --[public]-->
//chrome/browser:browser_0 --[public]-->
//content/public/browser:browser --[public]-->
//content:content


So, for some reason (presumably it's just the default gn-to-ninja behavior) a source file in a static library that depends on a DLL is treated differently from a source file in a library that depends on a static library.

It looks like it might save 60-120 s off of the ~525 s component build times I'm seeing. On non-goma builds the percentage saving is smaller but the absolute saving looks like it would be similar.

 
noncomponent build parallelism.PNG
19.0 KB View Download
component build serialization.PNG
10.4 KB View Download
Goma builds with symbol_level = 2 (and is_win_fastlink = 2) take about 18 minutes for debug component and 19.5 minutes for component, so in this case component builds are actually faster. But the loss of parallelism is still a problem in component builds and I estimate that fixing this issue could save ~200 s - that's how long a full link of chrome.dll takes and the ~370 s of doing the dependent links of blink_core.dll, content.dll, etc. should fully overlap with that. It's even possible that the savings would be closer to 370 s, or about a third on debug component goma symbol_level = 2 builds.

Comment 2 by tikuta@chromium.org, May 26 2017

Cc: tikuta@chromium.org
Owner: brucedaw...@chromium.org
Status: Assigned (was: Untriaged)
I went back to a gyp build (e28fd5d84cebeb446e2c35cf6b766e35aa56ec5f from August 22, 2016) to see if this issue affect GYP as well. It does, although not identically. The attached image and ninjatracing .json file show that webcore_shared.dll and content.dll both block compilation. GYP goma builds always use symbol_level = 1 (or whatever the GYP equivalent was).

Settings were GYP_DEFINES=component=shared_library disable_nacl=1 use_goma=1

I also attached a .json file from building with GYP, although that build is from May, 2017 and therefore cannot be directly compared.
gyp_chrome_debug_goma_component_build.json
3.9 MB View Download
gyp_chrome_debug_goma_component.PNG
16.2 KB View Download
build_debug_component_goma.json
5.1 MB View Download
See also crbug.com/578477

Comment 5 by tikuta@chromium.org, Sep 15 2017

Owner: tikuta@chromium.org
I was doubting that using component causes this serialized dependency, but I confirmed that component itself is not the reason.

I found the reason that some cc files wait to finish some link. Such dependency is generated from build rule like below.

```
action("action_a") {
  script = "a.py"
  outputs = [
    "$target_gen_dir/a.cc",
    "$target_gen_dir/a.h",
  ]

  deps = [ ":component_a" ]
}

component("component_a") {
}


component("component_b") {
  sources = [
    "b.cc",
    "b.h",
  ]
  deps = [ ":action_a" ]
}
```

This rule makes b.cc not to be compiled until action_a finished.
And similar rules are generated from some build template in chromium repository.
e.g.
https://chromium.googlesource.com/chromium/src/+/672fd5cd07981bc528210e1368b6884f2cb9bf96/tools/json_schema_compiler/json_features.gni#12

many objs -> gen of EventModules.cpp -> //device/vr:mojo_bindings_blink (libdevice_vr_mojo_bindings_blink.so from mojom template)
https://chromium.googlesource.com/chromium/src/+/5b957db1a476b8e3083d3e208dbf2a17416d0d95/third_party/WebKit/Source/bindings/modules/BUILD.gn#78
https://chromium.googlesource.com/chromium/src/+/a788bcd802e5ef17eb5a02b23bff3698281f70dd/device/vr/BUILD.gn#138
https://chromium.googlesource.com/chromium/src/+/4bec680c68c8cb2d88fb32a8412d19bf7a21107f/mojo/public/tools/bindings/mojom.gni#976
https://chromium.googlesource.com/chromium/src/+/4bec680c68c8cb2d88fb32a8412d19bf7a21107f/mojo/public/tools/bindings/mojom.gni#758

So action makes compile/link serialized.


I'll make some patches to solve this strong dependency.

My investigations to date into this issue have involved converting the .ninja_log files into tracing.json files using the ninjatracing tool and then manually looking for serialization points, and then manually looking for the long poles that appear to be causing the serialization. It should be fairly straightforward to problematically find serialization points and the last few tasks that complete before the serialization point, thus making it easier to measure improvement and look for more opportunities.

Another useful analysis would be to analyze the .ninja_log file to record the average parallelism. This won't identify where to make improvements but it gives a long-term stable way of comparing progress. It can also be used to compare build parallelism between platforms and between different build types. For instance, my claim is that non-component builds are more parallel than component builds. Quantifying parallelism would let us quantify this claim and mix it in with other build performance measurements such as elapsed wall time and the sum of elapsed wall times for all steps.

Project Member

Comment 7 by bugdroid1@chromium.org, Sep 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/49bdb0d5159f629084ff3d163af4af9bdef333a6

commit 49bdb0d5159f629084ff3d163af4af9bdef333a6
Author: Takuto Ikuta <tikuta@google.com>
Date: Tue Sep 19 02:36:10 2017

Optimize build dependency for json_features template

With this CL, some compile task can run without waiting link of //extensions/common
There are many compile tasks depend on some json_features targets, and such compile tasks wait finish of link in //extensions/common in deps of json_features action.
But //extensions/common is not used in code generation, so we can move it to deps of source_set.
After moving to source_set, compile tasks in some targets depend on json_features no need to wait finish of link in //extensions/common, because gn knows object files only depend on source files.
gn can't find such unnecessary dependency if link is in deps of action.

ninja trace from building chrome without this CL.
http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.ULVV1ZrTGHxHCPKRs4uVIbYdrYmR2QkE-yjpfNJwCeI=.gz/trace.html

ninja trace from building chrome with this CL.
http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KRBZxtDbA2CEAq1J8eBrfI_iy6pi8GL0XPj54IiiLM0=.gz/trace.html

Improved build time few seconds on Z840 linux.

Bug: 725639
Change-Id: I80739045663d08236266b0697c40758e53773c59
Reviewed-on: https://chromium-review.googlesource.com/668343
Commit-Queue: Takuto Ikuta <tikuta@google.com>
Reviewed-by: Devlin <rdevlin.cronin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502765}
[modify] https://crrev.com/49bdb0d5159f629084ff3d163af4af9bdef333a6/tools/json_schema_compiler/json_features.gni

Comment 8 by tikuta@chromium.org, Sep 19 2017

The reason of serialized link in component build is this.
https://chromium.googlesource.com/chromium/src/+/16b6872f38d70af4103b23cd54b3ef8c7697b341/content/BUILD.gn#79



Let me show current ninja tracing when building chrome target (not all)

I set use_lld = true and is_debug = false

Linux, component build
http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KRBZxtDbA2CEAq1J8eBrfI_iy6pi8GL0XPj54IiiLM0=.gz/trace.html

Win, component build
http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.YXMP3KqFvcyLSdOghHFk0gagFvoQ1NJrq5x9oQA4L_c=.gz/trace.html

Win, non component build
http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.olMU9WznutJh9GvBx8-Pen66FhMkG02UxK-3sU4b3Kk=.gz/trace.html


From linux component build and win non component build tracings, it looks that mksnapshot is separating action of two compile spikes.
It generates snapshot.cc used in targets of second spike.
Also in linux build, target obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o makes compiles in second spike wait.

Both target needs to be finished before some linking, but it is not necessary when compiling other object files.
So if we can write dependency only required in linking explicitly, this dependency will be removed by modifying many BUILD.gn


I want to fix this, but mitigation of slow process creation/destruction on windows 10 can be high priority for me.
For the yasm slowness, obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o, I've been thinking we should switch to nasm; yasm seems to have fallen out of maintenance (last release ~3 years ago) and is now much slower then nasm. I don't have the exact numbers right now, but e.g., a recent ffmpeg build with yasm takes ~minutes while with nasm it was ~seconds. Filed issue 766721.
I thought yasm was relatively fast now that we changed to always having it optimized. At least, the yasm step that I was examining used to take ~35 seconds in debug builds and this dropped to ~12 seconds after I force-optimized yasm. But, if there are other even slower steps then switching to nasm sounds good. It would be worth checking to see if this is a debug/release difference (in which case maybe my change is no longer working) or just general slowness.

> I want to fix this, but mitigation of slow process creation/destruction on windows 10 can be high priority for me.

I don't think it is possible for us to mitigate this bug, unless we can actually avoid creating/destroying processes. Also, it appears that Microsoft has now fixed the bug - I will be looking at a trace from a fixed system later today - and we should get hot fixes, within a few months?

Apparently the build dependencies were better in the gyp world - see  crbug.com/623233  for interesting thoughts and graphs.
Blocking: 495670
Blocking: 787983
Blocking: -495670
Project Member

Comment 15 by bugdroid1@chromium.org, Dec 8 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/697bba05613ccae688c7565fc4e9e4b73f453b04

commit 697bba05613ccae688c7565fc4e9e4b73f453b04
Author: Takuto Ikuta <tikuta@google.com>
Date: Fri Dec 08 10:23:29 2017

Remove a fake dependency from event_modules

event_modules does not depend on //device/vr:mojo_bindings_blink.
This CL increases build parallelism by removing a fake dependency.

ninja trace log changed like below when building chrome.
Without this CL: http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.KmDhbGmTjZVDmgVYEP9VHFfXWd5rA_MXx0vuXUWzNhA=.gz/trace.html
With this CL   : http://chromium-build-stats.appspot.com/ninja_log/upload/ninja_log.c6Teo4mlQCT3wnYe6fCGMRmfe8hnKhkelkWG93qC9NY=.gz/trace.html

Bug: 725639
Change-Id: I860b7c9d054f99e628d09fc0b2c159db19ca0b9c
Reviewed-on: https://chromium-review.googlesource.com/816814
Commit-Queue: Takuto Ikuta <tikuta@google.com>
Reviewed-by: Yuki Shiino <yukishiino@chromium.org>
Cr-Commit-Position: refs/heads/master@{#522752}
[modify] https://crrev.com/697bba05613ccae688c7565fc4e9e4b73f453b04/third_party/WebKit/Source/bindings/modules/BUILD.gn

I did some tests to see if this made any significant difference by running a batch file (shown below) that does repeated builds. I did six of each. This showed a 3 second time saving but the standard deviation was far higher than that so it doesn't mean anything.

I appreciate the fix and hope that more parallelism blockers can be found and removed. I recommend testing future changes with repeated builds in order to determine how much time is saved. I'm not suggesting this as a requirement - just a nice-to-have thing so that you get full and accurate credit for big wins.


Here is a typical version of my test batch file - adjust as needed.

@rem Set this to tell goma not to use cached compiles, gives more consistent (but slower) compiles
set GOMA_STORE_ONLY=true

@rem Set this to tell goma not to do local compiles
set GOMA_USE_LOCAL=false

set basesettings="goma_dir=\"C:\src\goma\goma-win64\" is_component_build=true is_debug=true target_cpu=\"x86\" enable_nacl=false remove_webcore_debug_symbols=true
set testsettings=symbol_level=2 use_goma=true is_win_fastlink=true use_jumbo_build=true


@rem Repeat this block multiple times.

@echo on
call git checkout master
@echo on
call gn gen out\BuildTest --args=%basesettings% %testsettings%" >nul
@echo on
call gn clean out\BuildTest & call gn gen out\BuildTest & call ninja -C out\BuildTest chrome

@echo on
call git checkout reverted
@echo on
call gn gen out\BuildTest --args=%basesettings% %testsettings%" >nul
@echo on
call gn clean out\BuildTest & call gn gen out\BuildTest & call ninja -C out\BuildTest chrome

> Apparently the build dependencies were better in the gyp world

I'd guess that this was because in gyp you'd have to explicitly set hard_dependency in targets that had public generated headers. gn always assumes that because people forgot to do that all the time in gyp, but this restricts parallelism some.
The good news is that linking in lld is fast enough that the restricted parallelism is a much smaller problem. With symbol_level=1 on a full component rebuild of the 'chrome' target I'm seeing a weighted time spent on linking (i.e.; parallelism corrected link time) of about 30 s. Some of that is unavoidable so the actual cost is necessarily less than that.

The cost is greater on a symbol_level=2 build but still much better than with link.exe.

It would still be great to remove parallelism blockers, but the benefits to a fix are getting lower.

Do you think this bug yet need to be fixed?

I think heavily unnecessary serialized part was removed in 578477 or become negligible by lld.
debug_component_build_chrome.png
191 KB View Download
I made CL for yet another restricted build parallelism.
https://chromium-review.googlesource.com/c/chromium/src/+/1107431

The CL improves build time of content_shell from 316.8s to 254.5s on Z840 linux without goma backend cache.

See attached screenshot.
content_shell_serialize.png
359 KB View Download
content_shell_deserialize.png
389 KB View Download
I would be interesting to see the post_build_ninja_summary.py results from the two builds, or the .ninja_log files. I can't actually see the v8_context_snapshot serialization in the first screenshot, or at least I can't identify it as such.
I did a few builds with this patch and I'm not sure that this is avoid serialization. I've attached a .ninja_log and pasted in the post build summary:

    Longest build steps:
           4.3 weighted s to build v8.dll, v8.dll.lib, v8.dll.pdb (4.3 s CPU time)
           4.4 weighted s to build mksnapshot.exe, mksnapshot.exe.pdb (4.4 s CPU time)
           5.8 weighted s to build blink_core.dll, blink_core.dll.lib, blink_core.dll.pdb (5.8 s CPU time)
           5.9 weighted s to build obj/v8/v8_base/v8_base_jumbo_31.obj (98.9 s CPU time)
           6.3 weighted s to build obj/content/browser/browser/browser_jumbo_36.obj (116.4 s CPU time)
          13.0 weighted s to build obj/v8/v8_external_snapshot/v8_external_snapshot_jumbo_1.obj (13.0 s CPU time)
          26.8 weighted s to build content.dll, content.dll.lib, content.dll.pdb (26.8 s CPU time)
          37.7 weighted s to build chrome.dll, chrome.dll.lib, chrome.dll.pdb (37.7 s CPU time)
          39.4 weighted s to build snapshot_blob.bin (39.4 s CPU time)
          78.5 weighted s to build obj/v8/v8_base/v8_base_jumbo_20.obj (171.2 s CPU time)
    Time by build-step type:
           3.2 s weighted time to generate 4914 .stamp files (1251.4 s CPU time)
           3.4 s weighted time to generate 717 mojo files (2022.9 s CPU time)
          39.8 s weighted time to generate 6 .bin files (42.9 s CPU time)
          94.8 s weighted time to generate 241 PEFile (linking) files (241.3 s CPU time)
         325.2 s weighted time to generate 14526 .obj files (141153.1 s CPU time)
    471.4 s weighted time (146011.5 s CPU time, 309.7x parallelism)
    22700 build steps completed, average of 48.15/s

So, there is definitely some pre and post serialization. I can't tell if there is less serialization or not.

.ninja_log
3.4 MB Download
Note that this was a debug jumbo goma component build, FWIW. Settings are:

is_debug = true
is_component_build = true
enable_nacl = false
target_cpu = "x86"
remove_webcore_debug_symbols=true
use_jumbo_build = true
use_goma = true

Goma settings are:

GOMA_ENABLE_MACRO_CACHE=true
GOMA_MAX_ACTIVE_TASKS=2000
GOMA_MAX_SUBPROCS=24
GOMA_STORE_ONLY=true
GOMA_USE_LOCAL=false
The target of the patch is not chrome but mainly content_shell. Serialization for chrome due to v8_context_snapshot.bin should be removed now.

I attached ninja_log when building content_shell without patch, you'll see v8_context_snapshot.bin around 262s.

> GOMA_ENABLE_MACRO_CACHE=true

Sorry this flag has no-meaning now.

ninja_log.zip
2.2 MB Download
Project Member

Comment 25 by bugdroid1@chromium.org, Jun 21 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f6e559e0817b597243e6b4eb4806d6fa48dd6adf

commit f6e559e0817b597243e6b4eb4806d6fa48dd6adf
Author: Takuto Ikuta <tikuta@chromium.org>
Date: Thu Jun 21 10:23:43 2018

Make v8_context_snapshot as data_deps

v8_context_snapshot is not the target to be linked to other libraries.

This improves build time of some targets by utilize parallelization.
For example, build time of -j800 content_shell reduced from 316.8s to 254.5s on Z840 linux without goma backend cache.

See the difference of build trace screenshots in
https://bugs.chromium.org/p/chromium/issues/detail?id=725639#c20

Bug: 725639
Change-Id: I64d57241ff6b742db4ddcb31afc07a5e7c2e2eb1
Reviewed-on: https://chromium-review.googlesource.com/1107431
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Reviewed-by: Hitoshi Yoshida <peria@chromium.org>
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Reviewed-by: Jay Civelli <jcivelli@chromium.org>
Commit-Queue: Takuto Ikuta <tikuta@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569207}
[modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/content/shell/BUILD.gn
[modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/content/test/BUILD.gn
[modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/gin/BUILD.gn
[modify] https://crrev.com/f6e559e0817b597243e6b4eb4806d6fa48dd6adf/services/data_decoder/BUILD.gn

Sign in to add a comment