lld-link and/or clang-cl should increase function alignment to 16 bytes? |
||
Issue description(Marking this a blocker of crbug.com/792131 , though it may not actually be a blocker.) There are two connected issues here: 1) the x86 emulator issue, 2) the cfg issue 1: Microsoft reported problems running our lld-linked binaries in their x86 emulator which is used to run win32 programs on arm64 devices, because their emulator can't handle non 16-byte aligned call targets, and trips on "setup!__scrt_initialize_thread_safe_statics". They say this is a bug in the emulator, but it might also indicate a link.exe compatibility problem. They say "our linker ensures that the call targets are 16 byte aligned". 2: MS says the reason for 16-byte aligned call targets is stricter control-flow guard checks. With CFG, the kernel maintains a bitmap to indicate valid call targets in a process. Two bits map to 16 bytes in the address space. 00 means none of those 16 bytes are a valid call target, 10 means the first byte (the aligned address) is a valid call target, 11 means any address in the 16-byte range is valid. If a function is not 16-byte aligned, they have to use this last permissive bitmap value, which can lead to exploits like https://www.blackhat.com/docs/asia-18/asia-18-Lain-Back-To-The-Epilogue-How-To-Evade-Windows-Control-Flow-Guard-With-Less-Than-16-Bytes.pdf which exploited non-aligned call targets in system libraries. MS said: "when generating CFG code, MS compiler will always make sure all the CFG targets are 16 byte aligned". Note that 1) said the linker ensures alignment, and 2) said the compiler ensures alignment. It's unclear what exactly the compiler and linker should be doing here. Let's use this to track figuring out what's going on (what is not aligned in our binary? what does cl.exe and link.exe do that we don't?), how to fix, and how to ship the fixes.
,
Jun 27 2018
Okay, but the linker works with sections, not functions. What if there's an unaligned function in the middle of a section?
It seems MSVC emits functions with 16-byte alignment by default(?) so let's use assembly for an example:
a.cc:
extern "C" int foo(void);
extern "C" int bar(void);
extern "C" int baz(void);
int (*funcs[])(void) = { foo, bar, baz };
int main(int argc) { return (funcs[argc])(); }
b.asm:
.model flat,c
.code
foo proc near
mov eax, 0
ret
foo endp
bar proc near
mov eax, 1
ret
bar endp
baz proc near
mov eax, 2
ret
baz endp
end
ml /c b.asm && cl /c a.cc && link a.obj b.obj /guard:cf /map:map.txt
grep -A2 _foo map.txt
0001:00000020 _foo 00401020 f b.obj
0001:00000026 _bar 00401026 f b.obj
0001:0000002c _baz 0040102c f b.obj
Those are not 16-byte aligned. Are they in the CFG table?
dumpbin /loadconfig "a.exe" | grep -A6 "Guard CF Function Table"
Guard CF Function Table
Address
--------
00401020
00401026
0040102C
,
Jun 27 2018
Oh, MSVC reduces the alignment with /Os so we can just use that:
int foo() { return 1; }
int bar() { return 2; }
int baz() { return 3; }
int (*funcs[])(void) = { foo, bar, baz };
int main(int argc) { return (funcs[argc])();
cl /c /Os a.cc && link a.obj /guard:cf /map:map.txt && grep -A2 foo map.txt
0001:00000000 ?foo@@YAHXZ 00401000 f a.obj
0001:00000008 ?bar@@YAHXZ 00401008 f a.obj
0001:00000010 ?baz@@YAHXZ 00401010 f a.obj
Again, bar is unaligned despite being in the CFG table, because it's in the middle of a section.
One function per section (/Gy) without CFG:
cl /c /Os /Gy a.cc && link a.obj /map:map.txt && grep -A2 foo map.txt
0001:00000000 ?foo@@YAHXZ 00401000 f a.obj
0001:00000008 ?bar@@YAHXZ 00401008 f a.obj
0001:00000010 ?baz@@YAHXZ 00401010 f a.obj
And with CFG:
cl /c /Os /Gy a.cc && link a.obj /guard:cf /map:map.txt && grep -A2 foo map.txt
0001:00000000 ?foo@@YAHXZ 00401000 f a.obj
0001:00000010 ?bar@@YAHXZ 00401010 f a.obj
0001:00000020 ?baz@@YAHXZ 00401020 f a.obj
Now the function gets aligned.
So it seems clear it's the section getting aligned.
,
Jun 27 2018
Just to double check __scrt_initialize_thread_safe_statics is in a separate section... lib /list "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\lib\x86\LIBCMT.lib" | grep thread_safe_statics.obj f:\binaries\Intermediate\vctools\libcmt.nativeproj__851063217\objr\x86\thread_safe_statics.obj lib "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\lib\x86\LIBCMT.lib" /extract:f:\binaries\Intermediate\vctools\libcmt.nativeproj__851063217\objr\x86\thread_safe_statics.obj /out:thread_safe_statics.obj dumpbin /symbols thread_safe_statics.obj | grep ?__scrt_initialize_thread_safe_statics@@YAHXZ 00A 00000000 SECT4 notype () Static | ?__scrt_initialize_thread_safe_statics@@YAHXZ (int __cdecl __scrt_initialize_thread_safe_statics(void)) 079 00000000 SECT28 notype Static | __guard_fids_?__scrt_initialize_thread_safe_statics@@YAHXZ I looked and there's nothing else in Section 4. The section has 1-byte alignment.
,
Jun 27 2018
Thanks for the analysis, Hans. It seems to me that the most specific thing that would work around the problem is: If: 1. Linking with -guard:cf, and 2. Section is executable, and 3. Section contains a function referenced in the gfid table, and 4. Section alignment is less than 16 bytes Set section alignment to 16 bytes. I'll code something up for this.
,
Jun 27 2018
Is #3 going to cause a performance impact when -ffunction-sections is not present? Can we just skip #3? We'll overalign some non guarded functions, but only in rare cases.
,
Jun 27 2018
Isn't the majority of chrome's functions not in the gfid table? Why would that be rare?
,
Jun 27 2018
(Maybe #2 is redundant since it wouldn't be in the gfid table if it's not an executable function?) Why would #3 cause performance impact? Do you mean for the linking or for the executable? I'd imagine we could iterate over the list of symbols that we're going to put in the gfid table and increase their alignment. I think 16-byte aligning all functions would grow the binary a lot.
,
Jun 27 2018
To clarify, I intended the conditions to express "at least these cases need to be 16-byte aligned" so that we have criteria to evaluate a change by, not necessarily as "we need to implement exactly this". Aligning to 16 bytes outside these circumstances is ok as far as the problem we're trying to work around, and may have other benefits, but also drawbacks. For example, aligning every executable section to 16 bytes may make the branch predictor in Intel CPUs happier, but may also lead to larger binaries. Considering the specific points: If we skip #3 (check if section is referenced by the gfid table), we end up 16-byte aligning functions that link.exe does not currently 16-byte align (defined in their own section without 16-byte alignment). There are a lot of those in the C runtime. A program we're asked to link may add their own. This has the potential to substantially increase the size of the binaries we produce, particularly when linking code that specifically tries to create smaller binaries by putting every function in its own section (-ffunction-sections) or own object file and not 16-byte aligning those. The cost of making sure we only 16-byte align sections that are referenced by the gfid table is basically an extra few instructions executed for every entry in the gfid table - on top of the set operations, branching, and I/O we already do per entry. It's unfortunate, but I like it better than the alternative. I think it makes sense to skip checking if a section is executable (#2) if we are already checking that it is referenced from the gfid table.
,
Jun 27 2018
I was referring only to the case when -ffunction-sections is not present. Obviously if -ffunction-sections is present, it's easy to lookup each function (section) in the gfid table. What do we do when a section has N>1 functions and the gfid table has M functions? What does link do in that case?
,
Jun 27 2018
The linker only aligns the section, it doesn't reorder things inside sections, see above. So I'd assume that link.exe checks the first function in the section against gfid and nothing else.
,
Jun 28 2018
For your reviewing pleasure: https://reviews.llvm.org/D48690
,
Jun 28 2018
Before we submit this, is it worth pinging the Microsoft people and asking them if this will be sufficient to address their case? I’m pretty sure it will, but it would pretty annoying to find out after the fact that there was something else as well, since we have no way to test their specific failure
,
Jun 28 2018
(I guess we could just roll it to canary and then ask them)
,
Jun 28 2018
(Yeah, I think saying "can you try current canary" is probably easiest for everyone)
,
Jun 28 2018
With the lld patch landed, we should also inspect our cfg tables to see if all addresses are now aligned.
,
Jun 28 2018
Current status: inglorion++ has a patch at https://reviews.llvm.org/D48690 that landed in r335864. I optimistically kicked off a roll at that revision at https://chromium-review.googlesource.com/c/chromium/src/+/1118531 , but the tot waterfall is currently very red due to bug 857393 so we don't know how things currently look (compile should clear up soon, at least on some bots, but who knows what happened to tests while compile was red). Anyways, once we've managed to roll that in, we should do the checking we want (cf comment 16), then wait for a canary to go out, and then ask Microsoft if things look better. Then we can think about what, if anything, we want to do for m68.
,
Jun 28 2018
> Then we can think about what, if anything, we want to do for m68. I keep reading that as m68k, and thinking "Wait, we support that, too?"
,
Jun 29 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0107b1058f35a71e2d90cd9be686e82bc36d8c47 commit 0107b1058f35a71e2d90cd9be686e82bc36d8c47 Author: Nico Weber <thakis@chromium.org> Date: Fri Jun 29 11:52:36 2018 Roll clang 335608:335864. Ran `tools/clang/scripts/upload_revision.py 335864`. Contains two notable changes: - lld now increases section layout to 16 bytes for sections that are CFG targets, which makes CFG more precise on x86 and works around a crash bug in Microsoft's x86 emulator on arm. - Wconstant-logical-operand now fires more often and finds more bugs. We fixed all new instances of this warning on all bots we know about. Bug: 857012 , 857393 Change-Id: I9396f5bae9750a1abcb9f0b256e9caed7b6ccc71 Reviewed-on: https://chromium-review.googlesource.com/1118531 Reviewed-by: Hans Wennborg <hans@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#571444} [modify] https://crrev.com/0107b1058f35a71e2d90cd9be686e82bc36d8c47/tools/clang/scripts/update.py
,
Jul 2
This is live in the canary channel, and it's been confirmed that 69.0.3477.0 works with the MS emulator.
I looked at setup.exe for 69.0.3477.0 (32-bit), and while the wast majority of the cfg table entries are now 16-byte aligned, there are still many unaligned ones:
dumpbin /loadconfig setup.exe | grep -A9999 "Guard CF Function Table" | grep " [0-9A-F]\{7\}[1-9A-F]"
0044A258
0044A276
0044A294
0044A2B4
0044A2D2
0044A30E
0044A32C
0044A34A
0044A368
0044A388
0044A3A6
0044A3C6
004EF926 __SEH_epilog4
004F16A8 __except_handler3
004F3914
004F3924
004F392C
004F3938
004F394C
004F39D8
004F39E8
004F39FC
004F3E94
004F3EA4
004F3EAC
004F3EB8
004F3ECC
004F3F58
004F3F68
004F3F7C
004F437E ___from_strstr_to_strchr
004F49B4 __pow_default
005063F9
00506401
00506446 __local_unwind2
005064FC __NLG_Notify
0050651B __NLG_Call
00506C09
00506C59 __pow_pentium4
005079E5 __trandisp2
00507A84 __rttospop
00507AAA __rtzeronpop
00507ABD __rtonepop
00507B82 __rtindfnpop
00507B93 __rttosnpopde
00507BE7 __startOneArgErrorHandling
00507C45 __load_CW
00507C5C __convertTOStoQNaN
00507C75 __fload_withFB
00507CB8 __checkTOS_withFB
00507CCE __fast_exit
00507CDB __math_exit
00507D19 __check_range_exit
0050FB9B __exp_default
0050FBC8 __exp_pentium4
00510299 __ctrandisp2
00510428 __ctrandisp1
005113BF __log_default
00511498 __log_pentium4
00511814 __sqrt_common
(That's 60 functions.)
Using link.exe instead results in only 2 unaligned entries:
004EFA68 __except_handler3
004F0721
That's a lot fewer.
What happened to e.g. __pow_pentium4? It's still in the executable, and it's still unaligned, just not in the CFG table. From the map file:
0001:00103650 ___acrt_initialize_sse2 00504650 f libucrt:mathfcns.obj
0001:00103660 __CIpow_pentium4 00504660 f libucrt:pow_pentium4.obj
0001:00103679 __pow_pentium4 00504679 libucrt:pow_pentium4.obj <----
Interestingly, the "f" is missing..
What about __SEH_epilog4?
0001:000ea466 __SEH_epilog4 004eb466 f libcmt:_sehprolg4_.obj
That's also in the executable, also unaligned, and does have the f flag, but it's not in the CFG table.
Overall, the CFG table when linked with lld has 4291 entries, and when linked with link.exe it has 1142 entries. It seems lld is less precise in what functions need to go in the table?
,
Jul 5
> It seems lld is less precise in what functions need to go in the table?
Looking some more into this:
For object files compiles with Clang, we check llvm::Function::hasAddressTaken() and put such functions in a table in the object file, which the linker then puts into the final cfg table.
But for object files not compiled with Clang, lld looks at the relocations instead, in markSymbolsWithRelocations()
Where does __pow_pentium4 come from? From a /verbose link:
Found __pow_pentium4
Referenced in libucrt.lib(pow.obj)
Loaded libucrt.lib(pow_pentium4.obj)
lib C:\src\chromium\src\third_party\depot_tools\win_toolchain\vs_files\3bc0ec615cf20ee342f3bc29bc991b5ad66d8d2c\win_sdk\bin\..\..\win_sdk\Lib\10.0.17134.0\ucrt\x86\libucrt.lib /extract:d:\os\obj\x86fre\minkernel\crts\ucrt\src\appcrt\dll\mt\..\..\tran\mt\objfre\i386\pow.obj /out:pow.obj
dumpbin /all /disasm pow.obj
It's used in the pow function:
_pow:
00000000: 83 3D 00 00 00 00 cmp dword ptr [___use_sse2_mathfcns],0
00
00000007: 0F 84 00 00 00 00 je __pow_default
0000000D: 83 EC 08 sub esp,8
00000010: 0F AE 5C 24 04 stmxcsr dword ptr [esp+4]
00000015: 8B 44 24 04 mov eax,dword ptr [esp+4]
00000019: 25 80 7F 00 00 and eax,7F80h
0000001E: 3D 80 1F 00 00 cmp eax,1F80h
00000023: 75 0F jne jnedef
00000025: D9 3C 24 fnstcw word ptr [esp]
00000028: 66 8B 04 24 mov ax,word ptr [esp]
0000002C: 66 83 E0 7F and ax,7Fh
00000030: 66 83 F8 7F cmp ax,7Fh
jnedef:
00000034: 8D 64 24 08 lea esp,[esp+8]
00000038: 0F 85 00 00 00 00 jne __pow_default
0000003E: E9 00 00 00 00 jmp __pow_pentium4
RAW DATA #1
00000000: 83 3D 00 00 00 00 00 0F 84 00 00 00 00 83 EC 08 .=............ì.
00000010: 0F AE 5C 24 04 8B 44 24 04 25 80 7F 00 00 3D 80 .®\$..D$.%....=.
00000020: 1F 00 00 75 0F D9 3C 24 66 8B 04 24 66 83 E0 7F ...u.Ù<$f..$f.à.
00000030: 66 83 F8 7F 8D 64 24 08 0F 85 00 00 00 00 E9 00 f.ø..d$.......é.
00000040: 00 00 00 ...
RELOCATIONS #1
Symbol Symbol
Offset Type Applied To Index Name
-------- ---------------- ----------------- -------- ------
00000002 DIR32 00000000 A ___use_sse2_mathfcns
00000009 REL32 00000000 B __pow_default
0000003A REL32 00000000 B __pow_default
0000003F REL32 00000000 C __pow_pentium4
So, it's referenced by a REL32 relocation.
And looking at lld's markSymbolsWithRelocations() there's already a comment about this:
// Look for relocations in this section against symbols in executable output
// sections.
for (Symbol *Ref : SC->symbols()) {
// FIXME: Do further testing to see if the relocation type matters,
// especially for 32-bit where taking the address of something usually
// uses an absolute relocation instead of a relative one.
Based on link.exe not putting this symbol in the cfg table, it seems we can skip REL32 relocations here.
,
Jul 5
Skipping REL32 relocations brings the lld-linked setup.exe cfg table down to 1208 entries, so pretty close to 1142 with link.exe. Looking at the remaining extra entries, there is for example one for ReturnPoint from _exsup4_.obj. It's referenced by a DIR32 relocation (in __local_unwind4), but it's a label, not a function. So probably lld should skip labels too (looking at the symbol's storage class). That brings us to 1187 functions. One remaining entry is for TrailingUpVec, from memmove.obj. That also looks like a label referenced by DIR32 relocations. In the symbol table it looks like this: 010 00000264 SECT1 notype Static | TrailingUpVec 011 00000000 SECT1 notype () External | _memmove 012 00000274 SECT1 notype Label | TrailingUp0 for some reason it's not marked as a label. Looking at the source from the WinDDK, it's because TrailingUpVec isn't code, but a jump table stored in-line with the function code.
,
Jul 5
Skipping symbols that don't have the complex type set to COFF::IMAGE_SYM_DTYPE_FUNCTION brings the table down to 1067 entries, which is lower than link.exe. The question is whether it's now missing something...
,
Jul 5
Oops, I used a "break" instead of "continue" when I tried #23. Skipping non-function symbols actually brings the lld-linked table to 1154 entries, so still a few more than link.exe.
,
Jul 5
Another function that's only in the lld-linked cfg table: ?dtor$9@?0??UnPack@LzmaUtilImpl@@QAEKABVFilePath@base@@PAV23@@Z@4HA (from obj/chrome/installer/util\with_no_strings/lzma_util.obj) it is indeed a function, and it's referenced by a DIR32 relocation, but that relocation is for an entry in an .xdata table. Maybe such references don't need to go in the cfg table? I suppose that depends on the exception handling implementation..
,
Jul 5
Ignoring relocations in .xdata sections brings the table size down to 1140 entries. But actually, lzma_util.obj is compiled by clang, so there shouldn't be any need to scan it for relocations. However, it has no .gfids$y section? Adrian implemented that in January, but it looks like we never added the flag to the Chromium build :-/ We should do that, but we should also take a look to see whether that mechanism is more or less precise than checking relocations.
,
Jul 6
I'll be ooo for a few weeks so if someone wants to write a patch for this, feel free to. I think we should:
- ignore REL32 relocations on 32-bit, but not 64-bit (see below)
- ignore relocations not referring to symbols marked as functions in the symbol table
- ignore relocations in the .xdata section
- make clang emit the .gfid$y table in chromium builds (/guard:cf might not be the right flag because we're still not emitting the instrumentation)
It was interesting to see how relocations are scanned differently on 32-bit and 64-bit:
void foo() { return; }
void bar() { return; }
int main() {
foo();
void (*arr[])() = { &bar };
(*arr[0])();
return 0;
}
cl /c \src\tmp\a.cc && dumpbin /all /disasm a.obj > a.txt && link a.obj /guard:cf /map:map.txt && dumpbin /loadconfig a.exe
In a 32-bit build, main has a REL32 relocation for foo and a DIR32 for bar. Only bar ends up in the cfgid table, so it seems safe to say link.exe ignores REL32 relocations in 32-bit.
In 64-bit, main uses REL32 relocations for both foo and bar (the latter with a rip-relative lea to get the absolute address). And both foo and bar end up in the cfgid table. So it looks like on 64-bit, link.exe also considers REL32 relocations, and ends up including also regular call targets in the table.
,
Jul 13
(previous clang roll was issue 855772 )
,
Jul 23
Thanks for looking into this stuff while I was away! - ignore REL32 relocations on 32-bit, but not 64-bit (see below) - ignore relocations not referring to symbols marked as functions in the symbol table Makes sense. - ignore relocations in the .xdata section That's surprising to me, since .xdata is full of function and funclet pointers. I would imagine that you'd want to include them as possible indirect call targets in .gfids. Anyway, we'll figure it out eventually. - make clang emit the .gfid$y table in chromium builds (/guard:cf might not be the right flag because we're still not emitting the instrumentation) That makes sense. I think we'll want an extra flag, maybe spelled like "/guard:cf,nochecks" similar to "/guard:cf,nolongjmp", to emit .gfids$y without emitting instrumentation. It'll be useful for measurement.
,
Jul 23
I opened issue 866651 to cover more enhancements to LLVM's control flow guard implementation. We should also work on the delayload CFG stuff (issue 854771) at some point. It looks to me like this alignment bug is fixed. |
||
►
Sign in to add a comment |
||
Comment 1 by h...@chromium.org
, Jun 27 2018Looking at __scrt_initialize_thread_safe_statics: a.cc: int f() { return 0; } int main() { static int x = f(); return x; Linking without CFG enabled: cl -c a.cc && link a.obj /map:map.txt && grep -C3 __scrt_initialize_thread_safe_statics map.txt Static symbols 0001:00000063 ?__scrt_initialize_thread_safe_statics@@YAHXZ 00401063 f LIBCMT:thread_safe_statics.obj 0001:0000008a ?__scrt_initialize_thread_safe_statics_platform_specific@@YAXXZ 0040108a f LIBCMT:thread_safe_statics.obj So it's not aligned here. And now with CFG enabled: cl -c a.cc && link /guard:cf a.obj /map:map.txt && grep -C3 __scrt_initialize_thread_safe_statics map.txt 0002:ffff3000 __guard_fids__ 00400000 LIBCMT:thread_safe_statics.obj 0002:ffff3000 __guard_fids__ 00400000 libvcruntime:undname.obj 0002:ffff3000 __guard_fids__ 00400000 libvcruntime:undname.obj 0002:ffff3000 __guard_fids_?__scrt_initialize_thread_safe_statics@@YAHXZ 00400000 LIBCMT:thread_safe_statics.obj 0002:ffff3000 __guard_fids__ 00400000 libucrt:sse2_initializer.obj 0002:ffff3000 __guard_fids____scrt_set_unhandled_exception_filter 00400000 LIBCMT:utility_desktop.obj 0002:ffff3000 __guard_fids__ 00400000 libucrt:stdio_initializer.obj -- 0002:ffff3004 __guard_fids__ 00400004 libucrt:stdio_initializer.obj 0002:ffff3004 __guard_fids__ 00400004 LIBCMT:exe_main.obj 0002:ffff3008 __guard_fids__ 00400008 LIBCMT:exe_main.obj 0001:00000070 ?__scrt_initialize_thread_safe_statics@@YAHXZ 00401070 f LIBCMT:thread_safe_statics.obj 0001:00000097 ?__scrt_initialize_thread_safe_statics_platform_specific@@YAXXZ 00401097 f LIBCMT:thread_safe_statics.obj 0001:00000170 ?__scrt_uninitialize_thread_safe_statics@@YAXXZ 00401170 f LIBCMT:thread_safe_statics.obj 0001:000002e0 ?pre_c_initialization@@YAHXZ 004012e0 f LIBCMT:exe_main.obj 0001:00000390 ?post_pgo_initialization@@YAHXZ 00401390 f LIBCMT:exe_main.obj Okay, looks like the linker does align it. Does it align all functions? grep -A30 "Publics by Value" map.txt Address Publics by Value Rva+Base Lib:Object 0000:00000000 __except_list 00000000 <absolute> 0000:00000000 ___dynamic_value_reloc_table 00000000 <absolute> 0000:00000000 ___enclave_config 00000000 <absolute> 0000:00000000 ___guard_longjmp_count 00000000 <absolute> 0000:00000000 ___guard_longjmp_table 00000000 <absolute> 0000:00000000 ___hybrid_code_map_count 00000000 <absolute> 0000:00000000 ___guard_iat_table 00000000 <absolute> 0000:00000000 ___guard_iat_count 00000000 <absolute> 0000:00000000 ___hybrid_auxiliary_iat 00000000 <absolute> 0000:00000000 ___hybrid_code_map 00000000 <absolute> 0000:00000003 ___safe_se_handler_count 00000003 <absolute> 0000:0000002c __tls_array 0000002c <absolute> 0000:0000002d ___guard_fids_count 0000002d <absolute> 0000:10013500 ___guard_flags 10013500 <absolute> 0000:00000000 ___ImageBase 00400000 <linker-defined> 0001:00000000 ?f@@YAHXZ 00401000 f a.obj 0001:00000010 _main 00401010 f a.obj 0001:0000018c __Init_thread_footer 0040118c f LIBCMT:thread_safe_statics.obj 0001:000001d6 __Init_thread_header 004011d6 f LIBCMT:thread_safe_statics.obj 0001:00000228 __Init_thread_notify 00401228 f LIBCMT:thread_safe_statics.obj 0001:00000269 __Init_thread_wait 00401269 f LIBCMT:thread_safe_statics.obj 0001:00000530 _mainCRTStartup 00401530 f LIBCMT:exe_main.obj 0001:0000053a ??$__crt_fast_encode_pointer@PAX@@YAPAXQAX@Z 0040153a f i libvcruntime:winapi_downlevel.obj 0001:0000053a ??$__crt_fast_encode_pointer@PAP6AXXZ@@YAPAP6AXXZQAP6AXXZ@Z 0040153a f i LIBCMT:utility.obj 0001:0000059b ___scrt_acquire_startup_lock 0040159b f LIBCMT:utility.obj 0001:000005d0 ___scrt_initialize_crt 004015d0 f LIBCMT:utility.obj 0001:00000609 ___scrt_initialize_onexit_tables 00401609 f LIBCMT:utility.obj 0001:000006b3 ___scrt_is_nonwritable_in_current_image 004016b3 f LIBCMT:utility.obj 0001:0000073d ___scrt_release_startup_lock 0040173d f LIBCMT:utility.obj No, looks like there are plenty of unaligned functions here, so presumably it's only aligning those in the CFG table as suggested in the emails. I was using link.exe version 14.12.25834.0; was the behaviour different in previous versions? No, seems to have been the same in the linker I have from VS2015, 14.00.24215.1.