Building Chromium is slow |
||||||||
Issue description
,
Nov 9
It's a tracking bug for Hans, probably motivated by an internal thread. Pasting from there: '''Short-term ideas (that Hans said he has on his list): - build chromium's clang with PGO (someone had sent a patch for this a while ago, https://codereview.chromium.org/2793343002/, but it was during some transition -- recipes or gn or something -- and I wanted to complete that transition first and then we never got back to it) - My gn-built clang is ~10% faster than the cmake-built one. My handwavy theory is that because I don't pass -fPIC. The cmake build has LLVM_ENABLE_PIC which can be used to turn this off (it defaults to on), should measure if turning it off does anything and, if so, probably do that.''' As long Hans knows what this bug is about, it's fine :-)
,
Nov 9
Per https://chromium-review.googlesource.com/c/chromium/src/+/1317069/6/third_party/blink/renderer/core/css/css_initial_value.h#30 , having some opt-in toggle for key functions will help us emit at least some fewer (virtual) inline methods. I wouldn't be super surprised if it isn't a huge win after /dllexportInlines-, but maybe it's easy to try.
,
Nov 9
> Do you have something specific slowness? Just slowness is too wide to focus on. The idea is to track different efforts and ideas for improvement. I don't think it's a problem that this is a wide area, that just means there's lots of room for improvement :-)
,
Nov 9
Issue 787983 has a bunch of ideas for Windows.
,
Nov 12
,
Nov 16
,
Nov 16
For the record: we're at the point where the amount of money we spend on compilation in our CI infrastructure has become "interesting" to the powers that be, which means we'll have even more incentives for keeping things as fast and cheap as we can. Though obviously, faster compiles on tend to be just interesting on their own.
,
Nov 16
,
Dec 2
Any additional recent thoughts on multi-threading, which I assume would be a multiple-platform benefit if implemented, and/or offering an option to bypass invoking the -cc1 child process creation for clang-cl? https://reviews.llvm.org/D52193 https://reviews.llvm.org/D52411
,
Dec 2
https://reviews.llvm.org/D52193 does nothing for Chromium. It's meant for msbuild, which passes several cc files to a single clang invocation, which currently builds all files passed to it in sequence. That patch makes it build them in parallel. In Chromium's build, ninja spawns one clang process per cc file, which is already efficient (and "better" than msbuild's approach, since ninja is in charge of all the scheduling, instead of several systems competing). With https://reviews.llvm.org/D52411 I couldn't measure a speedup for Chromium builds on my system when I tried it last time. It's surprising to me that aganea is seeing one.
,
Dec 3
c#11: "https://reviews.llvm.org/D52193 does nothing for Chromium." Thanks for the FYI. I mostly just skimmed the MP proposal. Seems I was thinking more along the lines of msvc cl.exe /cgthreads (IIUC) when thinking multi-threading on the compiler side. About clang-cl spawning -cc1, it looks there could be considerable variances, as aganea reports ~117ms on Haswell versus ~60ms on Skylake to -cc1 global initializers for his test systems. Looking elsewhere, perhaps investigate replacing these hard-coded values with some type of (basic?) heuristics - cores, mem, build type, etc. - on platforms building with ThinLTO? "/opt:lldltojobs=8", "-Wl,--thinlto-jobs=8", I usually just set the job limit to the number of total cores available for my local software builds using LLVM with ThinLTO, but I suspect there could be a more optimum proc-to-mem balance to be found, especially for a large project the size of Chromium.
,
Dec 3
These are only used if you do lto builds, which only happen in production builds, not during regular development. But if you do official builds, you can try tweaking these numbers and see how much they help! Note that LTO currently does ~all codegen more or less serially; we have issue 877722 for making that step way more parallel. Lots of discussion over there, but we're still figuring out what exactly we want to do.
,
Dec 4
Bumping the LTO job values definitely helps here, as I do my personal Chromium builds with LTO on a 32-core system. I was also thinking about the various ToT, CFI, etc. builds being done by the buildbots. Would it be worthwhile for (local?) component builds to start use LTO at "-Wl,--lto-O0" (/opt:lldlto=0) with so they can get ThinLTO caching for (faster?) incremental building? Thought I understand ThinLTO caching currently is disabled on Windows due to a file leakage issue that would need resolved.
,
Dec 4
As mentioned, we haven't figured our LTO story out yet, so we won't use it in dev builds before then.
,
Dec 21
,
Jan 10
,
Jan 14
,
Jan 17
(6 days ago)
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by tikuta@chromium.org
, Nov 9