New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 695864 link

Starred by 3 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocked on:
issue 635308

Blocking:
issue 651908



Sign in to add a comment

bindings generator becomes 5 times slower in Mac with large -j

Project Member Reported by tikuta@chromium.org, Feb 24 2017

Issue description

I found that bindings generator on Mac becomes slow with large -j in ninja.

That means, when I run following command in Mac Pro, it took around 2 minutes.
$ time ninja -C out/Release/ -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/Release/'
[1042/1042] STAMP obj/third_party/WebKit/Source/bindings/modules/v8/generate_bindings_modules_v8_interfaces.stamp

real	1m58.599s
user	4m26.968s
sys	8m4.907s

Without explicit -j, it completes 5 times faster than with -j 200.
$ time ninja -C out/Release/ generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/Release/'
[1042/1042] STAMP obj/third_party/WebKit/Source/bindings/modules/v8/generate_bindings_modules_v8_interfaces.stamp

real	0m24.896s
user	5m16.037s
sys	1m23.280s


This slowness looks not happen on Linux and Windows.
Why this slowness happens only on Mac?

Internal chromium developers using distributed compiler may build chromium with large -j.
So, I want this slowness to be fixed.

 

Comment 1 by bashi@chromium.org, Feb 27 2017

Components: -Internals>Mojo>Bindings Blink>Bindings

Comment 2 by tikuta@chromium.org, Feb 28 2017

Owner: tikuta@chromium.org
Status: Started (was: Untriaged)
Let me make script_pool for target using python generator.

Comment 3 by tikuta@chromium.org, Feb 28 2017

Blockedon: 635308
Cc: brettw@chromium.org
Owner: ----
Status: Available (was: Started)
pool looks to be used only in tool.
https://chromium.googlesource.com/chromium/src/+/master/tools/gn/docs/reference.md#tool_Specify-arguments-to-a-toolchain-tool


Cannot use pool for action in GN now.
https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/scripts/scripts.gni?type=cs&q=idl_compiler&l=141

Comment 4 by peria@chromium.org, Mar 2 2017

Blocking: 651908

Comment 5 by bashi@chromium.org, Mar 2 2017

Maybe we can change the code generator to take a list of all IDL files and generate all .cpp/.h files in a single process, rather than spawning per IDL file. I'm not sure this actually improves the situation though.

Also we have to make sure that gn supports something similar to [1] outside action_foreach.

[1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/scripts/scripts.gni?q=scripts.gni&dr&l=176

Comment 6 by bashi@chromium.org, Mar 3 2017

I wrote a WIP CL to generate all interface bindings in a single action.
https://codereview.chromium.org/2726103005/

On my macpro:

* w/ CL

$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[30/30] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp
       15.19 real        25.17 user         2.21 sys

* w/o CL

$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[1026/1026] STAMP obj/third_party/WebKit/Source/binding...odules/v8/generate_bindings_modules_v8_interfaces.stamp
       34.48 real       241.44 user        99.08 sys

Could someone in goma team try the CL to confirm this improvement? I didn't test the CL on other platform like Linux and Windows.
Thank you.

I confirmed your CL drastically improved idl_compiler step on my Mac Pro (late 2013).

* w/ your CL

$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[46/46] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp
       17.33 real        28.77 user         2.84 sys


* w/o your CL

$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[1/1] Regenerating ninja files
[1000/1001] ACTION //third_party/WebKit/Source/bindings/modules/v8:generate_bindings_modules_v8_interfaces(//build/toolchain/mac:clang_x64)
      119.34 real       269.32 user       486.35 sys


I wonder why your Mac Pro does not become very slow when using -j 200 in master.

I'll try that
bashi-san

which os are you using?
On my sierra, -j200 gets really slower.
building content_unittests on Z620 Linux (for more real build)

w/o

ninja -j500 content_unittests  441.60s user 280.87s system 512% cpu 2:21.02 total
ninja -j500 content_unittests  449.75s user 281.42s system 517% cpu 2:21.29 total

w/

ninja -j500 content_unittests  328.26s user 336.66s system 432% cpu 2:33.66 total (maybe cache not warmed)
ninja -j500 content_unittests  342.14s user 246.72s system 474% cpu 2:03.99 total

looks much better
I'm using:

macOS Sierra
Version 10.12.3
Mac Pro (Late 2013)
Processor 3.5 GHz 6-Core Intel Xeon E5
Memory 32 GB 1866 MHz DDR3

bashi-san

not 12 cores but 6 cores?
shinyak-san: Ah, sorry for confusion. The "About this Mac" says it has 6 cores but `sysctl -n hw.ncpu` says it's 12.
bashi-san

Thank you. OK. Actually on my machine, `sysctl -n hw.ncpu` is 24.
Probably, this slowness happens only on this kind of monster mac.

Thank you for trying the CL. Do you think we should go forward with this approach? It seems the CL improves build speed somewhat but always :(
but always -> but not always :(
What is the cons? IIRC, when one idl file is updated, we're re-generating various *.h/*.cc files. If this is still true, instead of invoking a lot of python script, invoking one script looks better. We cannot run the script in parallel in this case, though. It's a cons. However, according to the above measurement, even in this case, the build performance gets better, right?

If we don't have a large negative impact in usual case, I believe we can go forward in this way.

Owner: bashi@chromium.org
Yeah, my concern was that reducing # of actions that run in parallel may slow down build speed on other platforms, but it seems that it's not the case. I'll update the CL for review. Thanks!
In my Z840 Windows,

* w/o your CL
$ time ninja -C out/single/ -j 500 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
real    0m12.054s
user    0m0.015s
sys     0m0.060s

* w/ your CL
$ time ninja -C out/single/ -j 500 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
real    0m19.122s
user    0m0.000s
sys     0m0.060s


becomes bit slow, but ignorable.
Hmm, building content_shell becomes slow when just touching an IDL file on Linux too.

$ touch third_party/WebKit/Source/core/css/CSSFontFaceRule.idl
$ ninja -C out/gn-release -j 200 content_shell

w/o CL
ninja -C out/gn-release -j 200 content_shell  140.14s user 37.23s system 1592% cpu 11.135 total

w/ CL
ninja -C out/gn-release -j 200 content_shell  22.46s user 2.91s system 181% cpu 13.981 total

Probably we want to use a single action on mac only.
Hmm, only 3 seconds in real time? difficult choice...


Yeah, but for Blink developers working with IDL files may be a typical work flow and I'm a bit nervous about the slow down.
Project Member

Comment 23 by bugdroid1@chromium.org, Mar 3 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4482e5a9b79b7ae945360471e13546503d11a15d

commit 4482e5a9b79b7ae945360471e13546503d11a15d
Author: bashi <bashi@chromium.org>
Date: Fri Mar 03 10:11:48 2017

bindings: Generate all interfaces in a single action on mac

Spawning a python process per IDL file is heavy on mac. Use a single action to
generate bindings for all interface types on mac. On my mac pro, this CL makes
code generation 2x faster:

* w/ CL

$ time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[30/30] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp
       15.19 real        25.17 user         2.21 sys

* w/o CL

$ time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[1026/1026] STAMP obj/third_party/WebKit/Source/binding...odules/v8/generate_bindings_modules_v8_interfaces.stamp
       34.48 real       241.44 user        99.08 sys

We still use `action_foreach` for other platforms as using a single action slows
down builds a bit.

BUG=695864

Review-Url: https://codereview.chromium.org/2726103005
Cr-Commit-Position: refs/heads/master@{#454557}

[modify] https://crrev.com/4482e5a9b79b7ae945360471e13546503d11a15d/third_party/WebKit/Source/bindings/scripts/idl_compiler.py
[modify] https://crrev.com/4482e5a9b79b7ae945360471e13546503d11a15d/third_party/WebKit/Source/bindings/scripts/scripts.gni

Comment 24 by bashi@chromium.org, Apr 11 2017

Cc: bashi@chromium.org
Labels: -Pri-2 Pri-3
Owner: ----
I don't have further plans on this issue.
Project Member

Comment 25 by bugdroid1@chromium.org, Jan 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4c48a35125352ab95d8660d2039864ba49f9f7d2

commit 4c48a35125352ab95d8660d2039864ba49f9f7d2
Author: Takuto Ikuta <tikuta@google.com>
Date: Thu Jan 25 07:47:14 2018

Support pool for action_foreach

This is follow up of pool support in action
https://codereview.chromium.org/2926013002

Using action pool can remove some overhead of many running process.
Pool support of action_foreach gives better control for some python generator step when using goma.
e.g. https://codereview.chromium.org/2726103005

Bug: 695864
Change-Id: Ibd0bbaffc59513db42119138520aee3505762eee
Reviewed-on: https://chromium-review.googlesource.com/882625
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Takuto Ikuta <tikuta@google.com>
Cr-Commit-Position: refs/heads/master@{#531844}
[modify] https://crrev.com/4c48a35125352ab95d8660d2039864ba49f9f7d2/tools/gn/ninja_action_target_writer.cc
[modify] https://crrev.com/4c48a35125352ab95d8660d2039864ba49f9f7d2/tools/gn/ninja_action_target_writer_unittest.cc

Project Member

Comment 26 by bugdroid1@chromium.org, Feb 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2c098797567226115653c2567dbcfc72ee5af5ae

commit 2c098797567226115653c2567dbcfc72ee5af5ae
Author: Takuto Ikuta <tikuta@chromium.org>
Date: Fri Feb 09 00:21:48 2018

Use action pool for non-goma tasks

Invoking cpu intensive python processes more than machine cores has some overhead on Mac(crbug.com/695864) and Win.

This change introduces pool mainly for python generator to restrict the number of running process when we specify large parallelism with goma.
I took 3 time build stats using target generate_bindings_modules_v8_interfaces and generate_bindings_core_v8_interfaces which have 1148 python tasks.

With this CL on z840 windows10
TotalSeconds      : 18.2953436
TotalSeconds      : 18.6283626
TotalSeconds      : 19.2731436

Without this CL on Z840 windows10
TotalSeconds      : 23.8277797
TotalSeconds      : 23.6952018
TotalSeconds      : 23.0853999

Linux looks to have good task scheduler.
With this CL on z840 linux
0m9.067s
0m8.771s
0m8.953s

Without this CL on Z840 linux
0m8.998s
0m9.022s
0m8.958s

Also this improves UI's responsiveness when we are building chrome on windows.


Stats of clean chrome build in each major OS is like below.

5 time clean build of chrome on Z840 windows 10 with -j1000 and warm goma backend cache is like below.

With this CL
333.3425057
317.4724857
305.0217898
317.8907203
305.1031952
Avg: 315.76613934

Without this CL
369.9731363
331.296758
329.0041556
329.1472297
333.3883952
Avg: 338.56193496


5 time clean build of chrome on Z840 linux with -j1000 and warm goma backend cache is like below.

With this CL
90.42
87.91
90.45
90.50
89.02
avg: 89.66

Without this CL
89.52
86.34
86.08
85.67
85.89
avg: 86.7


3 time clean build of chrome on 24 thread Mac Pro with -j500 and warm goma backend cache is like below.

With this CL
638.28
627.28
624.69
avg: 630.083

Without this CL
667.52
663.83
655.95
avg: 662.433


Bug: 695864
Change-Id: I6838c0f71b8d8030e6eab58b2990810aaa997dfa
Reviewed-on: https://chromium-review.googlesource.com/882581
Reviewed-by: Dirk Pranke <dpranke@chromium.org>
Commit-Queue: Takuto Ikuta <tikuta@chromium.org>
Cr-Commit-Position: refs/heads/master@{#535589}
[modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/dotfile_settings.gni
[modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/BUILD.gn
[modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/gcc_toolchain.gni
[add] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/get_cpu_count.py
[modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/mac/BUILD.gn
[modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/win/BUILD.gn

Cc: -roc...@chromium.org rockot@google.com

Sign in to add a comment