bindings generator becomes 5 times slower in Mac with large -j |
|||||||
Issue descriptionI found that bindings generator on Mac becomes slow with large -j in ninja. That means, when I run following command in Mac Pro, it took around 2 minutes. $ time ninja -C out/Release/ -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/Release/' [1042/1042] STAMP obj/third_party/WebKit/Source/bindings/modules/v8/generate_bindings_modules_v8_interfaces.stamp real 1m58.599s user 4m26.968s sys 8m4.907s Without explicit -j, it completes 5 times faster than with -j 200. $ time ninja -C out/Release/ generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/Release/' [1042/1042] STAMP obj/third_party/WebKit/Source/bindings/modules/v8/generate_bindings_modules_v8_interfaces.stamp real 0m24.896s user 5m16.037s sys 1m23.280s This slowness looks not happen on Linux and Windows. Why this slowness happens only on Mac? Internal chromium developers using distributed compiler may build chromium with large -j. So, I want this slowness to be fixed.
,
Feb 28 2017
Let me make script_pool for target using python generator.
,
Feb 28 2017
pool looks to be used only in tool. https://chromium.googlesource.com/chromium/src/+/master/tools/gn/docs/reference.md#tool_Specify-arguments-to-a-toolchain-tool Cannot use pool for action in GN now. https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/scripts/scripts.gni?type=cs&q=idl_compiler&l=141
,
Mar 2 2017
,
Mar 2 2017
Maybe we can change the code generator to take a list of all IDL files and generate all .cpp/.h files in a single process, rather than spawning per IDL file. I'm not sure this actually improves the situation though. Also we have to make sure that gn supports something similar to [1] outside action_foreach. [1] https://cs.chromium.org/chromium/src/third_party/WebKit/Source/bindings/scripts/scripts.gni?q=scripts.gni&dr&l=176
,
Mar 3 2017
I wrote a WIP CL to generate all interface bindings in a single action. https://codereview.chromium.org/2726103005/ On my macpro: * w/ CL $ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/gn-single' [30/30] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp 15.19 real 25.17 user 2.21 sys * w/o CL $ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/gn-single' [1026/1026] STAMP obj/third_party/WebKit/Source/binding...odules/v8/generate_bindings_modules_v8_interfaces.stamp 34.48 real 241.44 user 99.08 sys Could someone in goma team try the CL to confirm this improvement? I didn't test the CL on other platform like Linux and Windows.
,
Mar 3 2017
Thank you.
I confirmed your CL drastically improved idl_compiler step on my Mac Pro (late 2013).
* w/ your CL
$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[46/46] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp
17.33 real 28.77 user 2.84 sys
* w/o your CL
$ /usr/bin/time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces
ninja: Entering directory `out/gn-single'
[1/1] Regenerating ninja files
[1000/1001] ACTION //third_party/WebKit/Source/bindings/modules/v8:generate_bindings_modules_v8_interfaces(//build/toolchain/mac:clang_x64)
119.34 real 269.32 user 486.35 sys
I wonder why your Mac Pro does not become very slow when using -j 200 in master.
,
Mar 3 2017
I'll try that
,
Mar 3 2017
bashi-san which os are you using? On my sierra, -j200 gets really slower.
,
Mar 3 2017
building content_unittests on Z620 Linux (for more real build) w/o ninja -j500 content_unittests 441.60s user 280.87s system 512% cpu 2:21.02 total ninja -j500 content_unittests 449.75s user 281.42s system 517% cpu 2:21.29 total w/ ninja -j500 content_unittests 328.26s user 336.66s system 432% cpu 2:33.66 total (maybe cache not warmed) ninja -j500 content_unittests 342.14s user 246.72s system 474% cpu 2:03.99 total looks much better
,
Mar 3 2017
I'm using: macOS Sierra Version 10.12.3 Mac Pro (Late 2013) Processor 3.5 GHz 6-Core Intel Xeon E5 Memory 32 GB 1866 MHz DDR3
,
Mar 3 2017
bashi-san not 12 cores but 6 cores?
,
Mar 3 2017
shinyak-san: Ah, sorry for confusion. The "About this Mac" says it has 6 cores but `sysctl -n hw.ncpu` says it's 12.
,
Mar 3 2017
bashi-san Thank you. OK. Actually on my machine, `sysctl -n hw.ncpu` is 24. Probably, this slowness happens only on this kind of monster mac.
,
Mar 3 2017
Thank you for trying the CL. Do you think we should go forward with this approach? It seems the CL improves build speed somewhat but always :(
,
Mar 3 2017
but always -> but not always :(
,
Mar 3 2017
What is the cons? IIRC, when one idl file is updated, we're re-generating various *.h/*.cc files. If this is still true, instead of invoking a lot of python script, invoking one script looks better. We cannot run the script in parallel in this case, though. It's a cons. However, according to the above measurement, even in this case, the build performance gets better, right? If we don't have a large negative impact in usual case, I believe we can go forward in this way.
,
Mar 3 2017
Yeah, my concern was that reducing # of actions that run in parallel may slow down build speed on other platforms, but it seems that it's not the case. I'll update the CL for review. Thanks!
,
Mar 3 2017
In my Z840 Windows, * w/o your CL $ time ninja -C out/single/ -j 500 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces real 0m12.054s user 0m0.015s sys 0m0.060s * w/ your CL $ time ninja -C out/single/ -j 500 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces real 0m19.122s user 0m0.000s sys 0m0.060s becomes bit slow, but ignorable.
,
Mar 3 2017
Hmm, building content_shell becomes slow when just touching an IDL file on Linux too. $ touch third_party/WebKit/Source/core/css/CSSFontFaceRule.idl $ ninja -C out/gn-release -j 200 content_shell w/o CL ninja -C out/gn-release -j 200 content_shell 140.14s user 37.23s system 1592% cpu 11.135 total w/ CL ninja -C out/gn-release -j 200 content_shell 22.46s user 2.91s system 181% cpu 13.981 total Probably we want to use a single action on mac only.
,
Mar 3 2017
Hmm, only 3 seconds in real time? difficult choice...
,
Mar 3 2017
Yeah, but for Blink developers working with IDL files may be a typical work flow and I'm a bit nervous about the slow down.
,
Mar 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4482e5a9b79b7ae945360471e13546503d11a15d commit 4482e5a9b79b7ae945360471e13546503d11a15d Author: bashi <bashi@chromium.org> Date: Fri Mar 03 10:11:48 2017 bindings: Generate all interfaces in a single action on mac Spawning a python process per IDL file is heavy on mac. Use a single action to generate bindings for all interface types on mac. On my mac pro, this CL makes code generation 2x faster: * w/ CL $ time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/gn-single' [30/30] STAMP obj/third_party/WebKit/Source/bindings/core/v8/generate_bindings_core_v8_interfaces.stamp 15.19 real 25.17 user 2.21 sys * w/o CL $ time ninja -C out/gn-single -j 200 generate_bindings_modules_v8_interfaces generate_bindings_core_v8_interfaces ninja: Entering directory `out/gn-single' [1026/1026] STAMP obj/third_party/WebKit/Source/binding...odules/v8/generate_bindings_modules_v8_interfaces.stamp 34.48 real 241.44 user 99.08 sys We still use `action_foreach` for other platforms as using a single action slows down builds a bit. BUG=695864 Review-Url: https://codereview.chromium.org/2726103005 Cr-Commit-Position: refs/heads/master@{#454557} [modify] https://crrev.com/4482e5a9b79b7ae945360471e13546503d11a15d/third_party/WebKit/Source/bindings/scripts/idl_compiler.py [modify] https://crrev.com/4482e5a9b79b7ae945360471e13546503d11a15d/third_party/WebKit/Source/bindings/scripts/scripts.gni
,
Apr 11 2017
I don't have further plans on this issue.
,
Jan 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c48a35125352ab95d8660d2039864ba49f9f7d2 commit 4c48a35125352ab95d8660d2039864ba49f9f7d2 Author: Takuto Ikuta <tikuta@google.com> Date: Thu Jan 25 07:47:14 2018 Support pool for action_foreach This is follow up of pool support in action https://codereview.chromium.org/2926013002 Using action pool can remove some overhead of many running process. Pool support of action_foreach gives better control for some python generator step when using goma. e.g. https://codereview.chromium.org/2726103005 Bug: 695864 Change-Id: Ibd0bbaffc59513db42119138520aee3505762eee Reviewed-on: https://chromium-review.googlesource.com/882625 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Takuto Ikuta <tikuta@google.com> Cr-Commit-Position: refs/heads/master@{#531844} [modify] https://crrev.com/4c48a35125352ab95d8660d2039864ba49f9f7d2/tools/gn/ninja_action_target_writer.cc [modify] https://crrev.com/4c48a35125352ab95d8660d2039864ba49f9f7d2/tools/gn/ninja_action_target_writer_unittest.cc
,
Feb 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2c098797567226115653c2567dbcfc72ee5af5ae commit 2c098797567226115653c2567dbcfc72ee5af5ae Author: Takuto Ikuta <tikuta@chromium.org> Date: Fri Feb 09 00:21:48 2018 Use action pool for non-goma tasks Invoking cpu intensive python processes more than machine cores has some overhead on Mac(crbug.com/695864) and Win. This change introduces pool mainly for python generator to restrict the number of running process when we specify large parallelism with goma. I took 3 time build stats using target generate_bindings_modules_v8_interfaces and generate_bindings_core_v8_interfaces which have 1148 python tasks. With this CL on z840 windows10 TotalSeconds : 18.2953436 TotalSeconds : 18.6283626 TotalSeconds : 19.2731436 Without this CL on Z840 windows10 TotalSeconds : 23.8277797 TotalSeconds : 23.6952018 TotalSeconds : 23.0853999 Linux looks to have good task scheduler. With this CL on z840 linux 0m9.067s 0m8.771s 0m8.953s Without this CL on Z840 linux 0m8.998s 0m9.022s 0m8.958s Also this improves UI's responsiveness when we are building chrome on windows. Stats of clean chrome build in each major OS is like below. 5 time clean build of chrome on Z840 windows 10 with -j1000 and warm goma backend cache is like below. With this CL 333.3425057 317.4724857 305.0217898 317.8907203 305.1031952 Avg: 315.76613934 Without this CL 369.9731363 331.296758 329.0041556 329.1472297 333.3883952 Avg: 338.56193496 5 time clean build of chrome on Z840 linux with -j1000 and warm goma backend cache is like below. With this CL 90.42 87.91 90.45 90.50 89.02 avg: 89.66 Without this CL 89.52 86.34 86.08 85.67 85.89 avg: 86.7 3 time clean build of chrome on 24 thread Mac Pro with -j500 and warm goma backend cache is like below. With this CL 638.28 627.28 624.69 avg: 630.083 Without this CL 667.52 663.83 655.95 avg: 662.433 Bug: 695864 Change-Id: I6838c0f71b8d8030e6eab58b2990810aaa997dfa Reviewed-on: https://chromium-review.googlesource.com/882581 Reviewed-by: Dirk Pranke <dpranke@chromium.org> Commit-Queue: Takuto Ikuta <tikuta@chromium.org> Cr-Commit-Position: refs/heads/master@{#535589} [modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/dotfile_settings.gni [modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/BUILD.gn [modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/gcc_toolchain.gni [add] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/get_cpu_count.py [modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/mac/BUILD.gn [modify] https://crrev.com/2c098797567226115653c2567dbcfc72ee5af5ae/build/toolchain/win/BUILD.gn
,
Oct 17
|
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by bashi@chromium.org
, Feb 27 2017