New issue
Advanced search Search tips

Issue 722617 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 2
Type: Feature



Sign in to add a comment

Yasm is on critical path and is too slow

Project Member Reported by brucedaw...@chromium.org, May 15 2017

Issue description

When doing debug builds we end up building debug versions of some of the build tools, such as yasm. The debug version of yasm is about 3x slower than the release version, which causes one of the build steps to take 33-42 s. This file is on the critical path and it has been seen to cause the build to stall for about fifteen seconds (draining from ~1,000 processes in a goma build down to just a couple) before ramping back up to 1,000. Two test builds were done with the following settings and is_debug = true and is_debug = false:

is_component_build = false
target_cpu = "x86"
enable_nacl = false
treat_warnings_as_errors = false

Debug:
del obj\third_party\libvpx\libvpx_yasm\highbd_sad4d_sse2.o & ninja obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o
[1 processes, 1/1 @ 0.0/s : 33.013s ] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:x86)

Release:
del obj\third_party\libvpx\libvpx_yasm\highbd_sad4d_sse2.o & ninja obj/third_party/libvpx/libvpx_yasm/highbd_sad4d_sse2.o
[1 processes, 1/1 @ 0.1/s : 12.436s ] ACTION //third_party/libvpx:libvpx_yasm_action(//build/toolchain/win:x86)


The yasm step is also slowed down by being run twice - run_yasm.py invokes yasm twice and on this file - contrary to the comments in the Python code - the second invocation is only fractionally faster (14.15 versus 14.88 on one test run).

Therefore, the debug and release builds of this file could presumably both be optimized from their current ~33/12 s times down to ~6 s. This is worthwhile only because they end up on the critical path in debug goma builds.

yasm spends a lot of its time in yasm__strcasecmp so a simple #define HAVE_STRICMP 1 (or not #undefing it in the config file) will help, as will avoiding the debug heap overhead, but the biggest win would be fixing it so that it can generate the dependency files and the regular output in one pass.

 
Summary: Yasm is on critical path and is too slow (was: Yasm is on critical path and are too slow)

Comment 2 by thakis@chromium.org, May 16 2017

Nice find! We should always build yasm in opt to address the 3x debug slowdown (precedent: https://cs.chromium.org/chromium/src/third_party/ffmpeg/BUILD.gn?q=file:ffmpeg.*build.gn+package:%5Echromium$&dr&l=207).
Labels: -Pri-3 Pri-2
Owner: brucedaw...@chromium.org
Status: Started (was: Untriaged)
In the case of yasm adding the optimize_max config is not sufficient to resolve the problem. That only gives about a 10-15% speedup.

The rest of the performance difference comes from the debug CRT and from having _DEBUG defined. It turns out that the static_crt/dynamic_crt configs are the most important ones for improving the performance of yasm. The /MTd flag causes about a 2.3x slowdown because it pulls in the debug CRT (with lots of extra locks and allocation overhead) and because it defines _DEBUG (which presumably triggers various asserts and other debug checks).

I'll have a CL for review soon. The test results are:

      # Build with full optimizations even on debug configurations, because
      # some yasm build steps (highbd_sad4d_sse2.asm) can take ~33 seconds
      # or more in debug component builds. Enabling compiler optimizations
      # saves ~5 seconds.
      configs -= [ "//build/config/compiler:default_optimization" ]
      configs += [ "//build/config/compiler:optimize_max" ]
      # This avoids explicitly defining _DEBUG - this doesn't make a
      # noticeable difference to the run time.
      configs -= [ "//build/config:debug" ]
      configs += [ "//build/config:release" ]
      # This switches to using the debug CRT. On debug-component builds of
      # highbd_sad4d_sse2.asm this saves about 15 s.
      configs -= [ "//build/config/win:default_crt" ]
      configs += [ "//build/config/win:release_crt" ]

For simplicity not all of these changes will be included.

Comment 4 by tikuta@chromium.org, May 17 2017

Cc: tikuta@chromium.org
Status: Fixed (was: Started)
Manually adding link to CL that fixed this issue:

commit 9c90eb122f3327c8c03bbf2171a97cb157f89459
Author: brucedawson <brucedawson@chromium.org>
Date:   Thu May 18 20:45:28 2017 -0700

    Optimize yasm even in debug builds

    Running yasm on highbd_sad4d_sse2.asm takes about ~33 seconds on debug
    component builds on Windows - closer to 42 seconds when the system is
    under load. This is partially because compiler optimizations are
    disabled in debug builds (costs ~5 seconds) and partially because we
    link with the debug CRT (costs ~15 seconds). This change makes it so
    that we always enable optimizations and always link with the release
    CRT, thus reducing the time to run yasm on highbd_sad4d_sse2.asm from
    ~33 seconds to ~13 seconds.

    Further improvements could be obtained by only running yasm once on the
    .asm files, but such a change is left for later.

    The total CPU-time savings is tiny compared to the cost of a full build,
    but on some goma builds this step ends up being the long pole which
    serializes the build and costs an estimated 5% of elapsed build time.

    BUG= 722617 

    Review-Url: https://codereview.chromium.org/2885213002
    Cr-Commit-Position: refs/heads/master@{#473066}

Sign in to add a comment