Yasm is on critical path and is too slow |
||||
Issue descriptionWhen doing debug builds we end up building debug versions of some of the build tools, such as yasm. The debug version of yasm is about 3x slower than the release version, which causes one of the build steps to take 33-42 s. This file is on the critical path and it has been seen to cause the build to stall for about fifteen seconds (draining from ~1,000 processes in a goma build down to just a couple) before ramping back up to 1,000. Two test builds were done with the following settings and is_debug = true and is_debug = false: is_component_build = false target_cpu = "x86" enable_nacl = false treat_warnings_as_errors = false Debug: del obj\third_party\libvpx\libvpx_yasm\highbd_sad4d_sse2.o & ninja obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o [1 processes, 1/1 @ 0.0/s : 33.013s ] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:x86) Release: del obj\third_party\libvpx\libvpx_yasm\highbd_sad4d_sse2.o & ninja obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o [1 processes, 1/1 @ 0.1/s : 12.436s ] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:x86) The yasm step is also slowed down by being run twice - run_yasm.py invokes yasm twice and on this file - contrary to the comments in the Python code - the second invocation is only fractionally faster (14.15 versus 14.88 on one test run). Therefore, the debug and release builds of this file could presumably both be optimized from their current ~33/12 s times down to ~6 s. This is worthwhile only because they end up on the critical path in debug goma builds. yasm spends a lot of its time in yasm__strcasecmp so a simple #define HAVE_STRICMP 1 (or not #undefing it in the config file) will help, as will avoiding the debug heap overhead, but the biggest win would be fixing it so that it can generate the dependency files and the regular output in one pass.
,
May 16 2017
Nice find! We should always build yasm in opt to address the 3x debug slowdown (precedent: https://cs.chromium.org/chromium/src/third_party/ffmpeg/BUILD.gn?q=file:ffmpeg.*build.gn+package:%5Echromium$&dr&l=207).
,
May 16 2017
In the case of yasm adding the optimize_max config is not sufficient to resolve the problem. That only gives about a 10-15% speedup.
The rest of the performance difference comes from the debug CRT and from having _DEBUG defined. It turns out that the static_crt/dynamic_crt configs are the most important ones for improving the performance of yasm. The /MTd flag causes about a 2.3x slowdown because it pulls in the debug CRT (with lots of extra locks and allocation overhead) and because it defines _DEBUG (which presumably triggers various asserts and other debug checks).
I'll have a CL for review soon. The test results are:
# Build with full optimizations even on debug configurations, because
# some yasm build steps (highbd_sad4d_sse2.asm) can take ~33 seconds
# or more in debug component builds. Enabling compiler optimizations
# saves ~5 seconds.
configs -= [ "//build/config/compiler:default_optimization" ]
configs += [ "//build/config/compiler:optimize_max" ]
# This avoids explicitly defining _DEBUG - this doesn't make a
# noticeable difference to the run time.
configs -= [ "//build/config:debug" ]
configs += [ "//build/config:release" ]
# This switches to using the debug CRT. On debug-component builds of
# highbd_sad4d_sse2.asm this saves about 15 s.
configs -= [ "//build/config/win:default_crt" ]
configs += [ "//build/config/win:release_crt" ]
For simplicity not all of these changes will be included.
,
May 17 2017
,
May 22 2017
Manually adding link to CL that fixed this issue: commit 9c90eb122f3327c8c03bbf2171a97cb157f89459 Author: brucedawson <brucedawson@chromium.org> Date: Thu May 18 20:45:28 2017 -0700 Optimize yasm even in debug builds Running yasm on highbd_sad4d_sse2.asm takes about ~33 seconds on debug component builds on Windows - closer to 42 seconds when the system is under load. This is partially because compiler optimizations are disabled in debug builds (costs ~5 seconds) and partially because we link with the debug CRT (costs ~15 seconds). This change makes it so that we always enable optimizations and always link with the release CRT, thus reducing the time to run yasm on highbd_sad4d_sse2.asm from ~33 seconds to ~13 seconds. Further improvements could be obtained by only running yasm once on the .asm files, but such a change is left for later. The total CPU-time savings is tiny compared to the cost of a full build, but on some goma builds this step ends up being the long pole which serializes the build and costs an estimated 5% of elapsed build time. BUG= 722617 Review-Url: https://codereview.chromium.org/2885213002 Cr-Commit-Position: refs/heads/master@{#473066} |
||||
►
Sign in to add a comment |
||||
Comment 1 by brucedaw...@chromium.org
, May 15 2017