New issue
Advanced search Search tips

Issue 903751 link

Starred by 7 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug


Sign in to add a comment

Building Chromium is slow

Project Member Reported by h...@chromium.org, Nov 9

Issue description

This is a meta-bug for reducing build times.
 
Do you have something specific slowness? Just slowness is too wide to focus on.

Depends on how and where the build happens, things we need to do will be different.
1. Which OS and build config ?
2. Full build or Incremental build ?
3. Focus on bot or user ?
4. Goma build or non-goma build ?

By the way, I'm trying to collect ninja's build log from chromium developers in this quarter so that we can know where we should focus on easily.
https://bugs.chromium.org/p/chromium/issues/detail?id=900161
It's a tracking bug for Hans, probably motivated by an internal thread. Pasting from there:

'''Short-term ideas (that Hans said he has on his list):

- build chromium's clang with PGO (someone had sent a patch for this a while ago, https://codereview.chromium.org/2793343002/, but it was during some transition -- recipes or gn or something -- and I wanted to complete that transition first and then we never got back to it)

- My gn-built clang is ~10% faster than the cmake-built one. My handwavy theory is that because I don't pass -fPIC. The cmake build has LLVM_ENABLE_PIC which can be used to turn this off (it defaults to on), should measure if turning it off does anything and, if so, probably do that.'''


As long Hans knows what this bug is about, it's fine :-)
Per https://chromium-review.googlesource.com/c/chromium/src/+/1317069/6/third_party/blink/renderer/core/css/css_initial_value.h#30 , having some opt-in toggle for key functions will help us emit at least some fewer (virtual) inline methods. I wouldn't be super surprised if it isn't a huge win after /dllexportInlines-, but maybe it's easy to try.
> Do you have something specific slowness? Just slowness is too wide to focus on.

The idea is to track different efforts and ideas for improvement. I don't think it's a problem that this is a wide area, that just means there's lots of room for improvement :-)
Issue 787983 has a bunch of ideas for Windows.
Blockedon: 904324
Cc: dpranke@chromium.org
For the record: we're at the point where the amount of money we spend on compilation in our CI infrastructure has become "interesting" to the powers that be, which means we'll have even more incentives for keeping things as fast and cheap as we can.

Though obviously, faster compiles on tend to be just interesting on their own. 
Blockedon: 906037
Any additional recent thoughts on multi-threading, which I assume would be a multiple-platform benefit if implemented, and/or offering an option to bypass invoking the -cc1 child process creation for clang-cl?

https://reviews.llvm.org/D52193
https://reviews.llvm.org/D52411
https://reviews.llvm.org/D52193 does nothing for Chromium. It's meant for msbuild, which passes several cc files to a single clang invocation, which currently builds all files passed to it in sequence. That patch makes it build them in parallel. In Chromium's build, ninja spawns one clang process per cc file, which is already efficient (and "better" than msbuild's approach, since ninja is in charge of all the scheduling, instead of several systems competing).

With https://reviews.llvm.org/D52411 I couldn't measure a speedup for Chromium builds on my system when I tried it last time. It's surprising to me that aganea is seeing one.
c#11: "https://reviews.llvm.org/D52193 does nothing for Chromium."

Thanks for the FYI. I mostly just skimmed the MP proposal. Seems I was thinking more along the lines of msvc cl.exe /cgthreads (IIUC) when thinking multi-threading on the compiler side.

About clang-cl spawning -cc1, it looks there could be considerable variances, as aganea reports ~117ms on Haswell versus ~60ms on Skylake to -cc1 global initializers for his test systems. 

Looking elsewhere, perhaps investigate replacing these hard-coded values with some type of (basic?) heuristics - cores, mem, build type, etc. - on platforms building with ThinLTO?

"/opt:lldltojobs=8",

"-Wl,--thinlto-jobs=8",

I usually just set the job limit to the number of total cores available for my local software builds using LLVM with ThinLTO, but I suspect there could be a more optimum proc-to-mem balance to be found, especially for a large project the size of Chromium.
These are only used if you do lto builds, which only happen in production builds, not during regular development. But if you do official builds, you can try tweaking these numbers and see how much they help!

Note that LTO currently does ~all codegen more or less serially; we have issue 877722 for making that step way more parallel. Lots of discussion over there, but we're still figuring out what exactly we want to do.
Bumping the LTO job values definitely helps here, as I do my personal Chromium builds with LTO on a 32-core system.

I was also thinking about the various ToT, CFI, etc. builds being done by the buildbots.

Would it be worthwhile for (local?) component builds to start use LTO at "-Wl,--lto-O0" (/opt:lldlto=0) with so they can get ThinLTO caching for (faster?) incremental building? Thought I understand ThinLTO caching currently is disabled on Windows due to a file leakage issue that would need resolved.
As mentioned, we haven't figured our LTO story out yet, so we won't use it in dev builds before then.
Blockedon: 917404
Blockedon: 920687
Blockedon: 919239

Comment 19 by tikuta@chromium.org, Jan 17 (6 days ago)

Blockedon: 922875

Sign in to add a comment