New issue
Advanced search Search tips

Issue 829710 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Bug



Sign in to add a comment

Improve performance of ninja when building chrome with warmed goma cache on windows

Project Member Reported by tikuta@chromium.org, Apr 6 2018

Issue description

I confirmed that ninja can be build speed bottleneck when goma's backend/local cache is sufficiently warmed on windows.

Repro.

machine: 24C/48T Z840 Windows10 version 1607


args.gn
"""
goma_dir = "C:\\src\\goma_client\\client\\out\\Release"


# component build gives us serialized build around some large dlls.
is_component_build = false

enable_nacl = false

symbol_level = 0

# Disable optimization for small compilation time.
is_debug = true

use_goma = true
use_lld = true

# v8's snapshot gives us serialized build and unstable build time.
v8_use_snapshot = false
"""

Need to set GOMA_DEPS_CACHE_FILE=dep env to ignore include processor's speed.


With this environment variable, I built chrome with -j1024 with warm backend/local cache 3 times each.
Note: I restart goma before each build.

Build time of chrome using current depot_tools ninja is like below in seconds.
206.9228757
215.3773701
197.5123948

Build time of chrome with faster ninja https://github.com/atetubou/ninja/commit/2bbe20454f026c3fd3b7ec6bfa91c8ae650dbb72
(threaded process spawn and skipping cl parsing)
166.1214287
301.3087335 
153.7200107


Ignoring accidentally slow build, faster ninja will give us around x1.2~ faster build time.
I'd like next ninja contains my patches to attain this performance improvement.

 
Interesting. I noticed that ninja.exe uses ~120 s of CPU time when building a jumbo-component version of Chrome, and ~175 s of CPU time when building a non-jumbo-component version of Chrome. A jumbo non-component version of Chrome uses the least CPU time at around 106 s.

This amount of CPU time is not enough to affect build times on machines with a small number of cores (see crbug.com/819319 for that issue) but for many-core machines doing goma builds this ninja CPU time is enough that it can be the long pole.

Note that the more that we optimize goma (both gomacc.exe and compiler_proxy.exe, crbug.com/819319) the more likely it is that ninja will become the limiting factor to build performance.

So, I look forward to seeing both ninja.exe and goma being optimized.


I notice that the patch just comments out cl parsing - what is the plan for making that patch into something that can be landed in ninja?
I couldn't find the patch for threading of the spawning of processes.

https://github.com/atetubou/ninja/commit/2bbe20454f026c3fd3b7ec6bfa91c8ae650dbb72 shows an empty diff for me.

This feels like scraping the bottom of the barrel; I think if we want to do this depends on how complex the patch is.
Oops, this is diff between master and my branch.
https://github.com/atetubou/ninja/compare/master...2bbe20454f026c3fd3b7ec6bfa91c8ae650dbb72?expand=1

I will send following patches after making those well organized.
* threaded process spawn
* faster clparser. I'm considering to introduce stack based vector/string for string manipulation and reducing GetFullPathName calls.

Sign in to add a comment