New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 857012 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jul 23
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug

Blocking:
issue 792131



Sign in to add a comment

lld-link and/or clang-cl should increase function alignment to 16 bytes?

Project Member Reported by h...@chromium.org, Jun 27 2018

Issue description

(Marking this a blocker of  crbug.com/792131 , though it may not actually be a blocker.)

There are two connected issues here: 1) the x86 emulator issue, 2) the cfg issue

1:
Microsoft reported problems running our lld-linked binaries in their x86 emulator which is used to run win32 programs on arm64 devices, because their emulator can't handle non 16-byte aligned call targets, and trips on "setup!__scrt_initialize_thread_safe_statics".

They say this is a bug in the emulator, but it might also indicate a link.exe compatibility problem. They say "our linker ensures that the call targets are 16 byte aligned".


2:
MS says the reason for 16-byte aligned call targets is stricter control-flow guard checks. With CFG, the kernel maintains a bitmap to indicate valid call targets in a process. Two bits map to 16 bytes in the address space. 00 means none of those 16 bytes are a valid call target, 10 means the first byte (the aligned address) is a valid call target, 11 means any address in the 16-byte range is valid. If a function is not 16-byte aligned, they have to use this last permissive bitmap value, which can lead to exploits like https://www.blackhat.com/docs/asia-18/asia-18-Lain-Back-To-The-Epilogue-How-To-Evade-Windows-Control-Flow-Guard-With-Less-Than-16-Bytes.pdf which exploited non-aligned call targets in system libraries.

MS said: "when generating CFG code, MS compiler will always make sure all the CFG targets are 16 byte aligned".


Note that 1) said the linker ensures alignment, and 2) said the compiler ensures alignment. It's unclear what exactly the compiler and linker should be doing here.


Let's use this to track figuring out what's going on (what is not aligned in our binary? what does cl.exe and link.exe do that we don't?), how to fix, and how to ship the fixes.
 

Comment 1 by h...@chromium.org, Jun 27 2018

Looking at __scrt_initialize_thread_safe_statics:

a.cc:

int f() { return 0; }
int main() { static int x = f(); return x; 

Linking without CFG enabled:

cl -c a.cc && link a.obj /map:map.txt && grep -C3 __scrt_initialize_thread_safe_statics map.txt

 Static symbols

 0001:00000063       ?__scrt_initialize_thread_safe_statics@@YAHXZ 00401063 f   LIBCMT:thread_safe_statics.obj
 0001:0000008a       ?__scrt_initialize_thread_safe_statics_platform_specific@@YAXXZ 0040108a f   LIBCMT:thread_safe_statics.obj

So it's not aligned here.


And now with CFG enabled:

cl -c a.cc && link /guard:cf a.obj /map:map.txt && grep -C3 __scrt_initialize_thread_safe_statics map.txt


 0002:ffff3000       __guard_fids__             00400000     LIBCMT:thread_safe_statics.obj
 0002:ffff3000       __guard_fids__             00400000     libvcruntime:undname.obj
 0002:ffff3000       __guard_fids__             00400000     libvcruntime:undname.obj
 0002:ffff3000       __guard_fids_?__scrt_initialize_thread_safe_statics@@YAHXZ 00400000     LIBCMT:thread_safe_statics.obj
 0002:ffff3000       __guard_fids__             00400000     libucrt:sse2_initializer.obj
 0002:ffff3000       __guard_fids____scrt_set_unhandled_exception_filter 00400000     LIBCMT:utility_desktop.obj
 0002:ffff3000       __guard_fids__             00400000     libucrt:stdio_initializer.obj
--
 0002:ffff3004       __guard_fids__             00400004     libucrt:stdio_initializer.obj
 0002:ffff3004       __guard_fids__             00400004     LIBCMT:exe_main.obj
 0002:ffff3008       __guard_fids__             00400008     LIBCMT:exe_main.obj
 0001:00000070       ?__scrt_initialize_thread_safe_statics@@YAHXZ 00401070 f   LIBCMT:thread_safe_statics.obj
 0001:00000097       ?__scrt_initialize_thread_safe_statics_platform_specific@@YAXXZ 00401097 f   LIBCMT:thread_safe_statics.obj
 0001:00000170       ?__scrt_uninitialize_thread_safe_statics@@YAXXZ 00401170 f   LIBCMT:thread_safe_statics.obj
 0001:000002e0       ?pre_c_initialization@@YAHXZ 004012e0 f   LIBCMT:exe_main.obj
 0001:00000390       ?post_pgo_initialization@@YAHXZ 00401390 f   LIBCMT:exe_main.obj


Okay, looks like the linker does align it.

Does it align all functions?

grep -A30 "Publics by Value" map.txt
  Address         Publics by Value              Rva+Base       Lib:Object

 0000:00000000       __except_list              00000000     <absolute>
 0000:00000000       ___dynamic_value_reloc_table 00000000     <absolute>
 0000:00000000       ___enclave_config          00000000     <absolute>
 0000:00000000       ___guard_longjmp_count     00000000     <absolute>
 0000:00000000       ___guard_longjmp_table     00000000     <absolute>
 0000:00000000       ___hybrid_code_map_count   00000000     <absolute>
 0000:00000000       ___guard_iat_table         00000000     <absolute>
 0000:00000000       ___guard_iat_count         00000000     <absolute>
 0000:00000000       ___hybrid_auxiliary_iat    00000000     <absolute>
 0000:00000000       ___hybrid_code_map         00000000     <absolute>
 0000:00000003       ___safe_se_handler_count   00000003     <absolute>
 0000:0000002c       __tls_array                0000002c     <absolute>
 0000:0000002d       ___guard_fids_count        0000002d     <absolute>
 0000:10013500       ___guard_flags             10013500     <absolute>
 0000:00000000       ___ImageBase               00400000     <linker-defined>
 0001:00000000       ?f@@YAHXZ                  00401000 f   a.obj
 0001:00000010       _main                      00401010 f   a.obj
 0001:0000018c       __Init_thread_footer       0040118c f   LIBCMT:thread_safe_statics.obj
 0001:000001d6       __Init_thread_header       004011d6 f   LIBCMT:thread_safe_statics.obj
 0001:00000228       __Init_thread_notify       00401228 f   LIBCMT:thread_safe_statics.obj
 0001:00000269       __Init_thread_wait         00401269 f   LIBCMT:thread_safe_statics.obj
 0001:00000530       _mainCRTStartup            00401530 f   LIBCMT:exe_main.obj
 0001:0000053a       ??$__crt_fast_encode_pointer@PAX@@YAPAXQAX@Z 0040153a f i libvcruntime:winapi_downlevel.obj
 0001:0000053a       ??$__crt_fast_encode_pointer@PAP6AXXZ@@YAPAP6AXXZQAP6AXXZ@Z 0040153a f i LIBCMT:utility.obj
 0001:0000059b       ___scrt_acquire_startup_lock 0040159b f   LIBCMT:utility.obj
 0001:000005d0       ___scrt_initialize_crt     004015d0 f   LIBCMT:utility.obj
 0001:00000609       ___scrt_initialize_onexit_tables 00401609 f   LIBCMT:utility.obj
 0001:000006b3       ___scrt_is_nonwritable_in_current_image 004016b3 f   LIBCMT:utility.obj
 0001:0000073d       ___scrt_release_startup_lock 0040173d f   LIBCMT:utility.obj

No, looks like there are plenty of unaligned functions here, so presumably it's only aligning those in the CFG table as suggested in the emails.



I was using link.exe version 14.12.25834.0; was the behaviour different in previous versions?

No, seems to have been the same in the linker I have from VS2015, 14.00.24215.1.

Comment 2 by h...@chromium.org, Jun 27 2018

Okay, but the linker works with sections, not functions. What if there's an unaligned function in the middle of a section?

It seems MSVC emits functions with 16-byte alignment by default(?) so let's use assembly for an example:

a.cc:

extern "C" int foo(void);
extern "C" int bar(void);
extern "C" int baz(void);
int (*funcs[])(void) = { foo, bar, baz };
int main(int argc) { return (funcs[argc])(); }


b.asm:

.model flat,c
.code
foo proc near
	mov eax, 0
        ret
foo endp
bar proc near
	mov eax, 1
        ret
bar endp
baz proc near
	mov eax, 2
        ret
baz endp
end


ml /c b.asm && cl /c a.cc && link a.obj b.obj /guard:cf /map:map.txt
grep -A2 _foo map.txt
 0001:00000020       _foo                       00401020 f   b.obj
 0001:00000026       _bar                       00401026 f   b.obj
 0001:0000002c       _baz                       0040102c f   b.obj

Those are not 16-byte aligned. Are they in the CFG table?


dumpbin /loadconfig "a.exe" | grep -A6 "Guard CF Function Table"
    Guard CF Function Table

          Address
          --------
          00401020
          00401026
          0040102C

Comment 3 by h...@chromium.org, Jun 27 2018

Oh, MSVC reduces the alignment with /Os so we can just use that:

int foo() { return 1; }
int bar() { return 2; }
int baz() { return 3; }

int (*funcs[])(void) = { foo, bar, baz };
int main(int argc) { return (funcs[argc])(); 

cl /c /Os a.cc && link a.obj /guard:cf /map:map.txt && grep -A2 foo map.txt
 0001:00000000       ?foo@@YAHXZ                00401000 f   a.obj
 0001:00000008       ?bar@@YAHXZ                00401008 f   a.obj
 0001:00000010       ?baz@@YAHXZ                00401010 f   a.obj

Again, bar is unaligned despite being in the CFG table, because it's in the middle of a section.



One function per section (/Gy) without CFG:

cl /c /Os /Gy a.cc && link a.obj /map:map.txt && grep -A2 foo map.txt
 0001:00000000       ?foo@@YAHXZ                00401000 f   a.obj
 0001:00000008       ?bar@@YAHXZ                00401008 f   a.obj
 0001:00000010       ?baz@@YAHXZ                00401010 f   a.obj

And with CFG:

cl /c /Os /Gy a.cc && link a.obj /guard:cf /map:map.txt && grep -A2 foo map.txt
 0001:00000000       ?foo@@YAHXZ                00401000 f   a.obj
 0001:00000010       ?bar@@YAHXZ                00401010 f   a.obj
 0001:00000020       ?baz@@YAHXZ                00401020 f   a.obj

Now the function gets aligned.

So it seems clear it's the section getting aligned.

Comment 4 by h...@chromium.org, Jun 27 2018

Just to double check __scrt_initialize_thread_safe_statics is in a separate section...

lib /list "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\lib\x86\LIBCMT.lib" | grep thread_safe_statics.obj
f:\binaries\Intermediate\vctools\libcmt.nativeproj__851063217\objr\x86\thread_safe_statics.obj

lib "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\lib\x86\LIBCMT.lib" /extract:f:\binaries\Intermediate\vctools\libcmt.nativeproj__851063217\objr\x86\thread_safe_statics.obj /out:thread_safe_statics.obj

dumpbin /symbols thread_safe_statics.obj | grep ?__scrt_initialize_thread_safe_statics@@YAHXZ
00A 00000000 SECT4  notype ()    Static       | ?__scrt_initialize_thread_safe_statics@@YAHXZ (int __cdecl __scrt_initialize_thread_safe_statics(void))
079 00000000 SECT28 notype       Static       | __guard_fids_?__scrt_initialize_thread_safe_statics@@YAHXZ

I looked and there's nothing else in Section 4. The section has 1-byte alignment.
Thanks for the analysis, Hans. It seems to me that the most specific thing that would work around the problem is:

If:

1. Linking with -guard:cf, and
2. Section is executable, and
3. Section contains a function referenced in the gfid table, and
4. Section alignment is less than 16 bytes

Set section alignment to 16 bytes.

I'll code something up for this.

Comment 6 by zturner@google.com, Jun 27 2018

Is #3 going to cause a performance impact when -ffunction-sections is not present?  Can we just skip #3?  We'll overalign some non guarded functions, but only in rare cases.

Comment 7 by thakis@chromium.org, Jun 27 2018

Isn't the majority of chrome's functions not in the gfid table? Why would that be rare?

Comment 8 by h...@chromium.org, Jun 27 2018

(Maybe #2 is redundant since it wouldn't be in the gfid table if it's not an executable function?)

Why would #3 cause performance impact? Do you mean for the linking or for the executable? I'd imagine we could iterate over the list of symbols that we're going to put in the gfid table and increase their alignment. I think 16-byte aligning all functions would grow the binary a lot.
To clarify, I intended the conditions to express "at least these cases need to be 16-byte aligned" so that we have criteria to evaluate a change by, not necessarily as "we need to implement exactly this".

Aligning to 16 bytes outside these circumstances is ok as far as 
the problem we're trying to work around, and may have other benefits, but also drawbacks. For example, aligning every executable section to 16 bytes may make the branch predictor in Intel CPUs happier, but may also lead to larger binaries.

Considering the specific points:

If we skip #3 (check if section is referenced by the gfid table), we end up 16-byte aligning functions that link.exe does not currently 16-byte align (defined in their own section without 16-byte alignment). There are a lot of those in the C runtime. A program we're asked to link may add their own. This has the potential to substantially increase the size of the binaries we produce, particularly when linking code that specifically tries to create smaller binaries by putting every function in its own section (-ffunction-sections) or own object file and not 16-byte aligning those.

The cost of making sure we only 16-byte align sections that are referenced by the gfid table is basically an extra few instructions executed for every entry in the gfid table - on top of the set operations, branching, and I/O we already do per entry. It's unfortunate, but I like it better than the alternative.

I think it makes sense to skip checking if a section is executable (#2) if we are already checking that it is referenced from the gfid table.

Comment 10 by zturner@google.com, Jun 27 2018

I was referring only to the case when -ffunction-sections is not present.  Obviously if -ffunction-sections is present, it's easy to lookup each function (section) in the gfid table.

What do we do when a section has N>1 functions and the gfid table has M functions?  What does link do in that case?
The linker only aligns the section, it doesn't reorder things inside sections, see above. So I'd assume that link.exe checks the first function in the section against gfid and nothing else.
For your reviewing pleasure: https://reviews.llvm.org/D48690
Before we submit this, is it worth pinging the Microsoft people and asking
them if this will be sufficient to address their case? I’m pretty sure it
will, but it would pretty annoying to find out after the fact that there
was something else as well, since we have no way to test their specific
failure
(I guess we could just roll it to canary and then ask them)
(Yeah, I think saying "can you try current canary" is probably easiest for everyone)

Comment 16 by h...@chromium.org, Jun 28 2018

With the lld patch landed, we should also inspect our cfg tables to see if all addresses are now aligned.
Current status: inglorion++ has a patch at https://reviews.llvm.org/D48690 that landed in r335864. I optimistically kicked off a roll at that revision at https://chromium-review.googlesource.com/c/chromium/src/+/1118531 , but the tot waterfall is currently very red due to  bug 857393  so we don't know how things currently look (compile should clear up soon, at least on some bots, but who knows what happened to tests while compile was red).

Anyways, once we've managed to roll that in, we should do the checking we want (cf comment 16), then wait for a canary to go out, and then ask Microsoft if things look better.

Then we can think about what, if anything, we want to do for m68.
> Then we can think about what, if anything, we want to do for m68.

I keep reading that as m68k, and thinking "Wait, we support that, too?"
Project Member

Comment 19 by bugdroid1@chromium.org, Jun 29 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0107b1058f35a71e2d90cd9be686e82bc36d8c47

commit 0107b1058f35a71e2d90cd9be686e82bc36d8c47
Author: Nico Weber <thakis@chromium.org>
Date: Fri Jun 29 11:52:36 2018

Roll clang 335608:335864.

Ran `tools/clang/scripts/upload_revision.py 335864`.

Contains two notable changes:
- lld now increases section layout to 16 bytes for sections that
  are CFG targets, which makes CFG more precise on x86 and
  works around a crash bug in Microsoft's x86 emulator on arm.
- Wconstant-logical-operand now fires more often and finds more bugs.
  We fixed all new instances of this warning on all bots we know about.

Bug:  857012 , 857393 
Change-Id: I9396f5bae9750a1abcb9f0b256e9caed7b6ccc71
Reviewed-on: https://chromium-review.googlesource.com/1118531
Reviewed-by: Hans Wennborg <hans@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571444}
[modify] https://crrev.com/0107b1058f35a71e2d90cd9be686e82bc36d8c47/tools/clang/scripts/update.py

This is live in the canary channel, and it's been confirmed that 69.0.3477.0 works with the MS emulator.

I looked at setup.exe for 69.0.3477.0 (32-bit), and while the wast majority of the cfg table entries are now 16-byte aligned, there are still many unaligned ones:

dumpbin /loadconfig setup.exe | grep -A9999 "Guard CF Function Table" | grep " [0-9A-F]\{7\}[1-9A-F]"
          0044A258
          0044A276
          0044A294
          0044A2B4
          0044A2D2
          0044A30E
          0044A32C
          0044A34A
          0044A368
          0044A388
          0044A3A6
          0044A3C6
          004EF926  __SEH_epilog4
          004F16A8  __except_handler3
          004F3914
          004F3924
          004F392C
          004F3938
          004F394C
          004F39D8
          004F39E8
          004F39FC
          004F3E94
          004F3EA4
          004F3EAC
          004F3EB8
          004F3ECC
          004F3F58
          004F3F68
          004F3F7C
          004F437E  ___from_strstr_to_strchr
          004F49B4  __pow_default
          005063F9
          00506401
          00506446  __local_unwind2
          005064FC  __NLG_Notify
          0050651B  __NLG_Call
          00506C09
          00506C59  __pow_pentium4
          005079E5  __trandisp2
          00507A84  __rttospop
          00507AAA  __rtzeronpop
          00507ABD  __rtonepop
          00507B82  __rtindfnpop
          00507B93  __rttosnpopde
          00507BE7  __startOneArgErrorHandling
          00507C45  __load_CW
          00507C5C  __convertTOStoQNaN
          00507C75  __fload_withFB
          00507CB8  __checkTOS_withFB
          00507CCE  __fast_exit
          00507CDB  __math_exit
          00507D19  __check_range_exit
          0050FB9B  __exp_default
          0050FBC8  __exp_pentium4
          00510299  __ctrandisp2
          00510428  __ctrandisp1
          005113BF  __log_default
          00511498  __log_pentium4
          00511814  __sqrt_common

(That's 60 functions.)

Using link.exe instead results in only 2 unaligned entries:

          004EFA68  __except_handler3
          004F0721

That's a lot fewer.

What happened to e.g. __pow_pentium4? It's still in the executable, and it's still unaligned, just not in the CFG table. From the map file:

 0001:00103650       ___acrt_initialize_sse2    00504650 f   libucrt:mathfcns.obj
 0001:00103660       __CIpow_pentium4           00504660 f   libucrt:pow_pentium4.obj
 0001:00103679       __pow_pentium4             00504679     libucrt:pow_pentium4.obj  <----

Interestingly, the "f" is missing..

What about __SEH_epilog4?

 0001:000ea466       __SEH_epilog4              004eb466 f   libcmt:_sehprolg4_.obj

That's also in the executable, also unaligned, and does have the f flag, but it's not in the CFG table.

Overall, the CFG table when linked with lld has 4291 entries, and when linked with link.exe it has 1142 entries. It seems lld is less precise in what functions need to go in the table?
> It seems lld is less precise in what functions need to go in the table?

Looking some more into this:

For object files compiles with Clang, we check llvm::Function::hasAddressTaken() and put such functions in a table in the object file, which the linker then puts into the final cfg table.

But for object files not compiled with Clang, lld looks at the relocations instead, in markSymbolsWithRelocations()

Where does __pow_pentium4 come from? From a /verbose link:

      Found __pow_pentium4
        Referenced in libucrt.lib(pow.obj)
        Loaded libucrt.lib(pow_pentium4.obj)

lib C:\src\chromium\src\third_party\depot_tools\win_toolchain\vs_files\3bc0ec615cf20ee342f3bc29bc991b5ad66d8d2c\win_sdk\bin\..\..\win_sdk\Lib\10.0.17134.0\ucrt\x86\libucrt.lib /extract:d:\os\obj\x86fre\minkernel\crts\ucrt\src\appcrt\dll\mt\..\..\tran\mt\objfre\i386\pow.obj /out:pow.obj

dumpbin /all /disasm pow.obj

It's used in the pow function:

_pow:
  00000000: 83 3D 00 00 00 00  cmp         dword ptr [___use_sse2_mathfcns],0
            00
  00000007: 0F 84 00 00 00 00  je          __pow_default
  0000000D: 83 EC 08           sub         esp,8
  00000010: 0F AE 5C 24 04     stmxcsr     dword ptr [esp+4]
  00000015: 8B 44 24 04        mov         eax,dword ptr [esp+4]
  00000019: 25 80 7F 00 00     and         eax,7F80h
  0000001E: 3D 80 1F 00 00     cmp         eax,1F80h
  00000023: 75 0F              jne         jnedef
  00000025: D9 3C 24           fnstcw      word ptr [esp]
  00000028: 66 8B 04 24        mov         ax,word ptr [esp]
  0000002C: 66 83 E0 7F        and         ax,7Fh
  00000030: 66 83 F8 7F        cmp         ax,7Fh
jnedef:
  00000034: 8D 64 24 08        lea         esp,[esp+8]
  00000038: 0F 85 00 00 00 00  jne         __pow_default
  0000003E: E9 00 00 00 00     jmp         __pow_pentium4

RAW DATA #1
  00000000: 83 3D 00 00 00 00 00 0F 84 00 00 00 00 83 EC 08  .=............ì.
  00000010: 0F AE 5C 24 04 8B 44 24 04 25 80 7F 00 00 3D 80  .®\$..D$.%....=.
  00000020: 1F 00 00 75 0F D9 3C 24 66 8B 04 24 66 83 E0 7F  ...u.Ù<$f..$f.à.
  00000030: 66 83 F8 7F 8D 64 24 08 0F 85 00 00 00 00 E9 00  f.ø..d$.......é.
  00000040: 00 00 00                                         ...

RELOCATIONS #1
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000002  DIR32                      00000000         A  ___use_sse2_mathfcns
 00000009  REL32                      00000000         B  __pow_default
 0000003A  REL32                      00000000         B  __pow_default
 0000003F  REL32                      00000000         C  __pow_pentium4

So, it's referenced by a REL32 relocation.

And looking at lld's markSymbolsWithRelocations() there's already a comment about this:

    // Look for relocations in this section against symbols in executable output
    // sections.
    for (Symbol *Ref : SC->symbols()) {
      // FIXME: Do further testing to see if the relocation type matters,
      // especially for 32-bit where taking the address of something usually
      // uses an absolute relocation instead of a relative one.


Based on link.exe not putting this symbol in the cfg table, it seems we can skip REL32 relocations here.
Skipping REL32 relocations brings the lld-linked setup.exe cfg table down to 1208 entries, so pretty close to 1142 with link.exe.

Looking at the remaining extra entries, there is for example one for ReturnPoint from _exsup4_.obj. It's referenced by a DIR32 relocation (in __local_unwind4), but it's a label, not a function. So probably lld should skip labels too (looking at the symbol's storage class). That brings us to 1187 functions.

One remaining entry is for TrailingUpVec, from memmove.obj. That also looks like a label referenced by DIR32 relocations. In the symbol table it looks like this:

010 00000264 SECT1  notype       Static       | TrailingUpVec
011 00000000 SECT1  notype ()    External     | _memmove
012 00000274 SECT1  notype       Label        | TrailingUp0

for some reason it's not marked as a label.

Looking at the source from the WinDDK, it's because TrailingUpVec isn't code, but a jump table stored in-line with the function code.
Skipping symbols that don't have the complex type set to COFF::IMAGE_SYM_DTYPE_FUNCTION brings the table down to 1067 entries, which is lower than link.exe. The question is whether it's now missing something...
Oops, I used a "break" instead of "continue" when I tried #23.

Skipping non-function symbols actually brings the lld-linked table to 1154 entries, so still a few more than link.exe.
Another function that's only in the lld-linked cfg table:

?dtor$9@?0??UnPack@LzmaUtilImpl@@QAEKABVFilePath@base@@PAV23@@Z@4HA (from obj/chrome/installer/util\with_no_strings/lzma_util.obj)

it is indeed a function, and it's referenced by a DIR32 relocation, but that relocation is for an entry in an .xdata table.

Maybe such references don't need to go in the cfg table? I suppose that depends on the exception handling implementation..
Ignoring relocations in .xdata sections brings the table size down to 1140 entries.

But actually, lzma_util.obj is compiled by clang, so there shouldn't be any need to scan it for relocations. However, it has no .gfids$y section?

Adrian implemented that in January, but it looks like we never added the flag to the Chromium build :-/

We should do that, but we should also take a look to see whether that mechanism is more or less precise than checking relocations.
I'll be ooo for a few weeks so if someone wants to write a patch for this, feel free to. I think we should:

- ignore REL32 relocations on 32-bit, but not 64-bit (see below)
- ignore relocations not referring to symbols marked as functions in the symbol table
- ignore relocations in the .xdata section
- make clang emit the .gfid$y table in chromium builds (/guard:cf might not be the right flag because we're still not emitting the instrumentation)


It was interesting to see how relocations are scanned differently on 32-bit and 64-bit:

void foo() { return; }
void bar() { return; }

int main() {
	foo();

	void (*arr[])() = { &bar };
	(*arr[0])();

	return 0;
}

cl /c \src\tmp\a.cc && dumpbin /all /disasm a.obj > a.txt && link a.obj /guard:cf /map:map.txt  && dumpbin /loadconfig a.exe


In a 32-bit build, main has a REL32 relocation for foo and a DIR32 for bar. Only bar ends up in the cfgid table, so it seems safe to say link.exe ignores REL32 relocations in 32-bit.

In 64-bit, main uses REL32 relocations for both foo and bar (the latter with a rip-relative lea to get the absolute address). And both foo and bar end up in the cfgid table. So it looks like on 64-bit, link.exe also considers REL32 relocations, and ends up including also regular call targets in the table.
(previous clang roll was  issue 855772 )
Thanks for looking into this stuff while I was away!

- ignore REL32 relocations on 32-bit, but not 64-bit (see below)
- ignore relocations not referring to symbols marked as functions in the symbol table

Makes sense.

- ignore relocations in the .xdata section

That's surprising to me, since .xdata is full of function and funclet pointers. I would imagine that you'd want to include them as possible indirect call targets in .gfids. Anyway, we'll figure it out eventually.

- make clang emit the .gfid$y table in chromium builds (/guard:cf might not be the right flag because we're still not emitting the instrumentation)

That makes sense. I think we'll want an extra flag, maybe spelled like "/guard:cf,nochecks" similar to "/guard:cf,nolongjmp", to emit .gfids$y without emitting instrumentation. It'll be useful for measurement.
Status: Fixed (was: Assigned)
I opened issue 866651 to cover more enhancements to LLVM's control flow guard implementation. We should also work on the delayload CFG stuff (issue 854771) at some point. It looks to me like this alignment bug is fixed.

Sign in to add a comment