CFG's list of branch targets is suppressed in chrome.exe, and the Chrome DLLs |
|||
Issue descriptionSince May 2015 chrome/app/chrome_exe_load_config_win.cc has defined the special symbol _load_config_used. The linker pulls this in and uses it to customize the loader config information, as shown by dumpbin /loadconfig. Unfortunately the IMAGE_LOAD_CONFIG_DIRECTORY_2 structure has changed in the last three years, including the addition of CFG (Control Flow Guard) information. This means that our defining of _load_config_used is partially disabling CFG, with unknown consequences. This can be seen by running dumpbin /loadconfig chrome.exe on various chrome versions. I've attached the results with the custom _load_config_used present (the default, *_nocfg.txt) and with it removed or renamed (*_cfg.txt since that enables cfg). I also did the tests with lld-link.exe and link.exe so that we can see the exact differences and see that this change is about _load_config_used, not about the linker. I am unsure of the security implications of this but since this happened purely accidentally it seems like something that we should evaluate. Note that CFG *is* enabled for 64-bit processes (see crbug.com/870054 , and the 2 TB reservation in all 64-bit Chrome processes) but it is presumably less secure without the list of valid branch targets in chrome.exe. chrome.dll and chrome_child.dll don't seem to have a list of functions when run through dumpbin /loadconfig but I don't know why given that chrome_exe_load_config_win.cc is not linked in to them. Also, they do have "Guard CF address of check-function pointer" and a lot of other CFG related fields so ??? These are the build settings that I used - with and without use_lld = false. #use_lld = false is_component_build = false is_debug = false target_cpu = "x64" use_goma = true symbol_level = 2 remove_webcore_debug_symbols = true treat_warnings_as_errors = false msvc_use_absolute_paths = true If we want to fix this for chrome.exe then we can either extend the definition of _load_config_used to include the CFG fields or remove this structure - it was added in order to improve heap behavior but unless we want to reevaluate that occasionally it's not clear that we should leave that non-default setting. Regarding chrome.dll and chrome_child.dll I'm not sure what if anything needs to be done.
,
Aug 8
> chrome.dll and chrome_child.dll don't seem to have a list of functions when run through dumpbin /loadconfig but I don't know why given that chrome_exe_load_config_win.cc is not linked in to them. It's because only the .exes are built with /guard:cf after https://codereview.chromium.org/2813823006 Maybe we should revert that now?
,
Aug 9
The suppression is for different reasons in chrome.exe and the DLLs. We should decide whether to enable it.
,
Aug 14
We should also consider whether we should be using CFG in our v8 code. To do this we would have to use the CFG calling pattern in our generated code, and use SetProcessValidCallTargets to indicate which addresses in our code regions are valid indirect branch targets.
,
Aug 15
I'm a bit confused about what this is all about. What do you mean the "list of branch targets is suppressed in chrome.exe and DLLs"??? CFG checks are not compiled into our code. I'll slowly work through this (I could just be very slow in the head today due having a bad head cold): 1) I didn't know _load_config_used/chrome_exe_load_config_win.cc even existed until this ticket. Why are we defining this config header structure here? (Ah, 2015.) It doesn't appear to be referenced anywhere anymore... It was probably a very early local define before the sdk headers (winnt.h now), or related to the heap decommit override - though I'm not conviced that does anything now. chrome_exe_load_config_win.cc should definitely be deleted. It's a bad idea to drop in a redefine, and not auto catch any changes to the winnt structures. :S If the decommit override really still needs to happen, I recommend finding a better way - like manipulate the PE header value as a build step with pe_image_safe or something. Don't redefine exported OS structs if at all possible. 2) I do not believe this define was impacting the msvc builds with partial CFG support, from right before the switch to clang happened. (Nor is it impacting the clang builds now.) I was doing builds with msvc, and seeing all the expected PE header structures/values in chrome.exe. (I.e. link_cfg.txt) (Note that NTLoader patches in pointers and addresses for the CFG-header stuff while mapping.) 3) Ok, link_cfg.txt versus lld_cfg.txt: some of this is definitely from the clang linker work (not a header redefine). - Actually seems to be mostly as expected. The Guard flags are a little different, because clang/LLD hasn't been updated to support all the same flags like delayload IAT. - "Export suppression info present" is new in the msvc toolchain, and we don't have clang support yet to be taking advantage of it either. One unfortunate side-effect of moving to clang is just that we will always lag behind the evolution of MSVC <-> Windows support changes. Like cfg export suppression. We'll lag - even when we always have enough resourcing to update clang-win right away (which never happens). Be aware folks, this was always a concern of mine. Hey Reid, can you explain some of the differences, like why are the function counts so different? Just compilation/optimization and function output differences? And is there any plan or timeline for further *CFG* support on Windows (distinct from CFI work)? I seem to recall you mentioning maybe IAT... 4) Now seems a good time to clarify the current expected state of CFG on Windows builds: * We do not compile any CFG validation into our binaries at this time. We only want to take advantage of the checks built into the Windows system DLLs that will be in our process space(s). To do this, the PE header for the outer EXE needs to have the "CF instrumented" flag enabled. * This is why we still only need to enable /guard:cfg on the EXE (chrome.exe, which we use for all procs). * No DLLs need to be built with /guard:cf at this point, since we don't compile cfg into our binaries. There's nothing stopping the revert of https://codereview.chromium.org/2813823006 now though (the original clang bug was fixed). But I would recommend carefully assessing whether we want any "CF Instrumented" flags enabled in our DLL PEs, and if it's worth it. * As we are not currently planning to compile actual CFG checks into our binaries - since the move started to clang - I wouldn't recommend any using of SetProcessValidCallTargets for dynamic CFG. There's really no reason (and only cost) when we're not instrumenting the rest of our binaries. At least at this point. **While theoretically the clang-win compiler could be updated to fully support building in CFG checks (or supporting SetProcessValidCallTargets), that's not on our/LEXAN roadmap (as far as I know). The last I heard, the rough plan was to wait for full support of forward-edge CFI support in clang-win (an equivalent or possibly better solution to cfg). CFI is all done at build-time only though, there's no run-time OS support like SetProcessValidCallTargets. * During the last in-depth CFI meetings here in MTV, we decided that V8 processes were out of scope for CFG/CFI at this time, since only forward-edge control-flow in a JIT process would barely slow down an exploit chain. The first targets would be non-JIT processes (probably browser first) while carefully assessing the cost/benefit. * I'm roughly assuming any CFG action you're seeing in V8 processes currently is only related to MS code in the process. There's no benefit to it specifically in the V8 procs (given JIT). We need to keep instrumenting chrome.exe though (since it's our shared exe). Theoretically we could try to scrub the CF instrumentation out of only V8 procs... but that might actually be tricky. Can't/Shouldn't do it before NTLoader maps the image (NtCreateSection) without invalidating our binary signature, and once the image is loaded, the OS has enabled all the CFG bitmap etc. Would have to mangle our own image at run-time maybe... before sandbox lockdown. Probably nasty. Now, it's very possible that I didn't understand something about this ticket. Ref: bad head cold at the moment. Please feel free to follow-up with me Bruce, or point out what I'm missing. wfh/awhalley and lexan team will be owning Windows CFI/CFG from here on out... but you and I could always have a brain dump when I'm up in KIR too.
,
Aug 15
(Mostly independent to this bug, but a small comment: > like manipulate the PE header value as a build step with pe_image_safe or something We should try to write the right bits in the first place and stop thinking about having post-link build steps massaging the linker output. If we do need to change some bits that get written, we should give an lld flag to control that and pass said flag.)
,
Aug 15
> - Actually seems to be mostly as expected. The Guard flags are a little different, because clang/LLD hasn't been updated to support all the same flags like delayload IAT. That's https://crbug.com/854771 > - "Export suppression info present" is new in the msvc toolchain, and we don't have clang support yet to be taking advantage of it either. One unfortunate side-effect of moving to clang is just that we will always lag behind the evolution of MSVC <-> Windows support changes. Like cfg export suppression. We'll lag - even when we always have enough resourcing to update clang-win right away (which never happens). Be aware folks, this was always a concern of mine. Fair enough. We did reach out to ask Microsoft for more CFG implementation recommendations in June, but I had a long vacation in July and haven't found time to implement it. We've also heard that ThinLTO optimziations are higher priority that CFG, but it doesn't really matter, I want to do both. I'm not familiar with the export suppression info. > Hey Reid, can you explain some of the differences, like why are the function counts so different? Just compilation/optimization and function output differences? And is there any plan or timeline for further *CFG* support on Windows (distinct from CFI work)? I seem to recall you mentioning maybe IAT... This is probably https://crbug.com/866651. I don't think it's worth all that much time making sure our relocation scanning is identical to MSVC's, we should instead focus on -guard:cf,nochecks, which is what Hans is doing. That flag will have the compiler can give us the precise list of address-taken functions in .gfids$y without doing the instrumentation. I will say that one of the *nice* things about having our own toolchain is that we have the ability to add features that we need like this. Regarding timeline, I think we're putting ThinLTO (and CFI, which depends on ThinLTO) before CFG, but we can shuffle it around. --- Regarding the overall future direction of control flow hardening in Chrome, I personally think we want to do both CFI and CFG to get the benefits of both implementations. CFI can't be extended to add support for something like SetProcessValidCallTargets, so whatever we can't instrument with CFI, we'll want to fall back to CFG. For example, calls into V8 generated code needs to be protected by something CFG-like. Anyway, I'm confident we'll get this all straightened out over the next year, so focus on getting better. :) CFI and CFG are definitely in our minds, in our OKR documents, and not going to get lost.
,
Aug 16
A few other data points: When we switched to the VS 2017 linker we hit CFG crashes when we did a longjmp. I think that's one of those cases where we have some VC++ compiled code (the longjmp implementation) interacting with clang-cl generated code (the branch target) in the same module. The immediate fix was to pass /guard:cf,nolongjmp to the VC++ 2017 linker: https://chromium-review.googlesource.com/678063. I wonder if there are any other places where we have VC++ code that is checking the CFG bitmap before making a call and is finding that all address is our three main binaries are valid (because of crrev.com/c/2813823006 and _load_config_used)? > I'm roughly assuming any CFG action you're seeing in V8 > processes currently is only related to MS code in the process. Unfortunately there isn't a way to allocate a block of executable RAM without allocating CFG memory as well, so we are paying some of the cost of CFG regardless of how much we use it (although a patch landed today to greatly reduce that cost). I wanted to make sure all of the quirks of how we were (unintentionally and intentionally) disabling CFG in Chrome. If this work is tracked elsewhere then feel free to dupe this bug.
,
Aug 16
#6: 100% agree Nico. I was just enumerating possibilities, but there should definitely be a sane flag-type way of tweaking heap decommit value, if absolutely necessary. I'd like to keep as much redefining/patching/hacking code out of our code and build process as possible. #7: Thanks for the links Reid! I know you and lexan have a plan and are working through items - and you're doing a great job with the Windows-specific resources you have. Definitely prioritizing the ThinLTO work at the same time, as I recall it's perf gating the forward-edge cfi for Windows. All good! And we agree that we'll need to balance both cfg and cfi moving forward (but there's no value in SetProcessValidCallTargets specifically/only for V8/JIT processes at this time). #8: > Unfortunately there isn't a way to allocate a block of executable RAM without allocating CFG memory as well, so we are paying some of the cost of CFG regardless of how much we use it (although a patch landed today to greatly reduce that cost). Yup. I'm happy about the patch to reduce the V8 hit we were seeing in the other ticket. And yes, unfortunately given that we use the same exe for all child processes, and should actively resist doing hacky behaviour at run-time when spawning, there is a CFG overhead for chrome.exe now. Keep up your amazing work of detecting and sleuthing out unexpected inefficiencies on Windows Bruce! You're like a diviner with that stuff.
,
Aug 31
I'm shutting down this ticket for now Bruce - works as intended. Current state of cfg instrumentation is as expected. Lexan is working to get ThinLTO done (clang), which is blocking movement on forward-edge cfi experimentation on Windows. Feel free to follow-up with guts for longer-term strategic planning. We need to be careful not to turn anything else on without a very thorough cost/benefit analysis. |
|||
►
Sign in to add a comment |
|||
Comment 1 by dxf@google.com
, Aug 8