New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 665930 link

Starred by 8 users

Security: ASLR bypass by MMU cache side channel: AnC or ASLR^Cache

Reported by ben.gras...@gmail.com, Nov 16 2016

Issue description

VULNERABILITY DETAILS
There is a side channel that discloses MMU lookup activity. This side channel is due to the MMU caching pagetable cachelines in the CPU cache.

This allows code to compute accessed virtual addresses by observing CPU cache activity. This allows ASLR bypass in Javascript. Javascript can compute the address of e.g. an ArrayBuffer. This works for data pointers in Chrome and code and data pointers in Firefox.

VERSION
Chrome Version: 52.0.2743.82
Operating System: Ubuntu 16.04, Linux kernel 4.4.0-38

We have verified this signal exists on Intel Ivy Bridge, Skylake and Haswell microarchitectures.

We do not expect this side channel to be very specific to micro-architectures however.

REPRODUCTION CASE
We have a POC available working in Chrome and Firefox, which is available upon request.

Attached is the research paper that is accepted for publication at NDSS 2017. It details all of the workings of the side channel and implementation details in Chrome.

We intend to make this work public on February 15th 2017.

We have some mitigations in mind we would be happy to discuss.

Best regards,

Ben

 

Comment 1 Deleted

Cc: awhalley@chromium.org
We have set up our own info page to track the disclosure process that we'll add more info to as we have it.

URL
https://www.vusec.net/projects/anc/

PW
TeZwEysX4qza
Owner: infe...@chromium.org
This is outside of my expertise. aarya@, what are your thoughts?

Comment 5 by mea...@chromium.org, Nov 18 2016

Cc: rickyz@chromium.org xzhou@chromium.org
Components: Internals
Status: Assigned (was: Unconfirmed)
Adding some guts folks.

Comment 6 by rickyz@chromium.org, Nov 19 2016

Cc: palmer@chromium.org mnissler@chromium.org
Cc: keescook@chromium.org
+keescook, as this likely has impact beyond Javascript and browsers.

Comment 8 by mea...@chromium.org, Nov 22 2016

Cc: -keescook@chromium.org infe...@chromium.org
Labels: OS-All
Owner: keescook@chromium.org
inferno is probably not the right owner, assigning to keescook.

Comment 9 by palmer@chromium.org, Nov 22 2016

Cc: wad@chromium.org jsc...@chromium.org
Components: Blink>JavaScript
Labels: Security_Impact-Stable
+jschuh, wad: Relevant to your interests.

I'm still reading the paper.
Oh, I should add: Yes, we'd very much like to see your PoC, and to hear your thoughts on mitigation. :) Thanks.
Cc: mseaborn@chromium.org
@palmer POC:

I am working on cleaning up & test the POC right now. (The reason for my wanting to do this is that the code used to do experiments for the paper is for both firefox and chrome and is a bit messy so the point is perhaps easy to miss in the noise.)

I intend to upload a POC today (wednesday my time, CET).
@palmer mitigation:

We have collected a story on https://www.vusec.net/projects/anc/ , PW TeZwEysX4qza

Summarized in my words:

  - This attack needs a high resolution timer. performance.now() already has some jitter (right?), so we could not use this directly. In Chrome we could reliably use a shared memory based counter as predicted by mseaborn@ in https://github.com/tc39/ecmascript_sharedmem/issues/1, in fact both sources of time were predicted there (we discovered this issue text after finding them). The Jan 27 comment seems to dismiss this as either unexploitable or exploitable in other ways. So, one mitigation would be to make access to high resolution timers (even) harder - though it is probably hard to guarantee this.

  - This attack relies on uniquely identifying not only cache lines but also pagetable slots used for the buffer lookup to compute the addresses. This in turn relies on (a) a large range of contiguous virtual address space and (b) getting this buffer allocated in a very different part of the address space so that the cachelines are not re-used after eviction and before the measurement; we call this 'blinding.' If all Javascript runtime data & code were to be forced in the same 4TB of virtual address space, this attack could never observe the top 9 bits (out of the current 48 virtual user address space bits available) of the buffer address. What also would frustrate the attack greatly is noncontiguous buffers, so e.g. adding 1 level of indirection might already make a good solution to the address impossible.

 - There is also the matter of Intel CAT that might isolate cache activity, but we probably should not count on because it requires OS and HW support.

Is that something to go on?

I'd be happy to clarify on a call if the above is unclear (as I find it a bit hard to write clearly).
I think this is a significant finding.  Thanks for researching it.

Do you think hardware changes could mitigate this, if hardware allowed randomising the offsets at which PTEs appear within a page table?  I can imagine two possible schemes:

 1) Stronger scheme: Each PTE could contain a 9-bit value (randomly chosen by the kernel) which is added (mod 512) or XOR'd to the index of the PTE at the next level of the page table hierarchy.  There are reserved bits in the x86-64 PTEs which could be used for this.

 2) Weaker scheme: The CPU could have a global 9-bit register for each level of the page table hierarchy which would be added (mod 512) or XOR'd to the PTE's offset when looking up the PTE.  For XORing, this would be equivalent to XORing each virtual address with an OS-chosen value before lookup.

Would this work?  Would this stop the attacker from determining which attacker-accessible memory locations alias in the cache with PTE locations?

Obviously this wouldn't help with current hardware, but it would be useful to know whether this ASLR-defeating cache side channel is easier to mitigate than cache side channels in general.
I attach the POC for Chrome. It solves the pagetable slots used for the 3 lowest levels of the pagetables for a data buffer given a data buffer that crosses a 8GB boundary (this can take a few trials; otherwise only the 2 lower levels are known).

For the top level this takes a buffer that crosses a 4TB boundary and so is very rare.

A fairly detailed README plus screenshot of a successful run (for comparison) is included.

README:


Files
=====

aslr-sidechannel-poc.html: basic html file driving the POC
aslr-sidechannel.js: Javascript code implementing the main MMU sidechannel and ASLR bypass work
extra-js: directory with supporting Javascript code that is not central to the POC, e.g. solver and BigInt
css: directory with minor css settings

Remarks for Reproducing
=======================

This work has been tested successfully on the Ivy Bridge, Haswell
and Skylake microarchitectures but is not expected to be highly
microarch-specific.

This work has been tested on Chrome 54.0.2840.100 revision
ed651c97177b2ac846b27f62bb8efed6dac0f90b but is not expected to be
highly chrome-version-specific.

This machine has 32GB but that should not be necessary. It probably will not work out of the box in a VM
and probably has to run on native HW (pagetable lookups will be different in VM guest mode).

Limitations
===========

We had some trouble understanding the layout of the JIT code in memory
and producing suitable JIT functions, and because of this were not
successful in finding JIT code addresses, only the data start address of
a large ArrayBuffer.  (Code pointer finding was successful in Firefox.) We
expect this may yet be possible with further work.

To run the POC in Chrome
========================

The POC relies on hugepages being off. (Otherwise the lowest pagetable level
does not exist.)

sudo sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'

The POC relies on the shared memory extension to implement a high
resolution timer (higher than performance.now() is) that is not on
by default.

Start Chrome with shared memory extension enabled:

google-chrome --js-flags=--harmony-sharedarraybuffer --enable-blink-feature=SharedArrayBuffer --allow-file-access-from-files --user-data-dir aslr-sidechannel-poc.html

This will start running the eviction cacheline vs. target slowdown
analyzer. It should solve for the lower 2 levels of the pagetable slot
within about 30 seconds. For the next level up we need to allocate a
buffer (which is 2GB) that crosses a 8GB boundary. This often takes a few
tries. Use this python script to check, which will print all allocations
in a chrome renderer larger than 1000MB:

$ python watch_aslr.py `ps -auxw | grep type=render | grep -v grep | awk '{ print $2 }'`
pid 15054
0x2969eb604000 0x296a6a604000 rw-p 2032 Mbytes, slots/cachelines: slots:  82 423 347   4 cachelines: 10 52 43  0. - slots:  82 425 339   4 cachelines: 10 53 42  0.

This prints the pagetable slots active in all 4 levels when looking
up the start and the end of the buffer.  To uniquely determine the
highest-before-last slot, the starting L3 pagetable cacheline has to be
different from the ending L3 pagetable cacheline.  In this example, this
is 52 and 53, so this buffer is suitable for computing the 3 lower levels.

The chances of the buffer crossing a 4TB boundary, needed for a full
address computation, is much lower still but is theoretically just a
matter of time.  (A few thousand attempts.)

The Successful-demo.png was captured after around 8 trials of finding
a buffer that could solve for the lower 2 levels. As can be seen, the
pagetable slots it solved for (423, 347, 4) match the ones printed by the
python script, and the address (0x69eb604000) is a match in the 3*9+12=39
lower bits with the real starting address, which is 0x2969eb604000.
@mseaborn About the XOR-based mitigation. I think it will make the attackers job much harder, but a lot of information will still be available.

For the lower two pagetable levels (where a 1GB buffer will be enough to observe all slots) I expect it won't be too hard to recover the upper 6 bits of the XOR key, as only one value will make the permutation follow the expected linear scanning pattern. However we can't know the lower 3 bits as they permute the pagetable slots within the same cacheline. We also don't know the exact pagetable slot of the transition because of this. So I think it's safe to say this scheme will preserve 3 bits of entropy at each level, but not the desired 9. So this means sacrificing only 3 bits of precious pagetable slot position.

For the top two levels we need a huge amount of contiguous address space to see the transitions, which even without the permutation is quite hard to get in non-native environments (say JS), so that will indeed make the attack harder but does not fully stop it.

Labels: Security_Severity-Medium
Tentatively assigning a Medium security level, but this is beyond my direct expertise so if someone more knowledgable has a better level, please change.
Project Member

Comment 18 by sheriffbot@chromium.org, Nov 29 2016

Labels: M-55
Project Member

Comment 19 by sheriffbot@chromium.org, Nov 29 2016

Labels: Pri-1
Project Member

Comment 20 by sheriffbot@chromium.org, Nov 30 2016

keescook: Uh oh! This issue still open and hasn't been updated in the last 14 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers?

If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one?

If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: keescook@chromium.org
Owner: palmer@chromium.org
I don't work on the Chrome code base very often, so I'm not the right person for looking at how to mitigate this on the JS side.

On the kernel side, there isn't going to be a quick fix, I'm afraid. It's already well understood that ASLR has architectural weaknesses on x86, so Pri-1 doesn't really apply there.
Project Member

Comment 22 by sheriffbot@chromium.org, Dec 15 2016

palmer: Uh oh! This issue still open and hasn't been updated in the last 23 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers?

If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one?

If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
The camera-ready version of our paper was submitted last night and includes fairly significant new generalizations to many Intel microarchitectures, and also AMD and ARM implementations, 32 and 64 bit.
Project Member

Comment 24 by sheriffbot@chromium.org, Jan 26 2017

Labels: -M-55 M-56
Labels: -Pri-1 -M-56 -Security_Severity-Medium Security_Severity-Low M-58 Pri-2
Thanks for all info, Ben and others. Sorry I haven't been working on this bug in a while; some other things came up.

jschuh and I think this is Low severity, due in part to the need for a non-standard configuration (the shared memory extension).

Also, When I try to run the PoC, I get a JavaScript exception:

aslr-sidechannel.js:176 Uncaught DOMException: Failed to execute 'postMessage' on 'Worker': SharedArrayBuffer can not be in transfer list.
    at make_count_worker (http://localhost/js-chrome-poc/aslr-sidechannel.js:176:15)
    at init (http://localhost/js-chrome-poc/aslr-sidechannel-poc.html:70:2)
    at window.onload (http://localhost/js-chrome-poc/aslr-sidechannel-poc.html:16:59)

Maybe I'm holding it wrong? :)

There might be something for me to do here: constrain PartitionAlloc to a 4 TiB range of address space, as you suggest. I'll look into doing that. But for the long term/real fix, it's probably a hardware adventure.
Components: Blink>MemoryAllocator>Partition
@palmer Yes, the shared memory extension does make the barrier to executing the measurement nonzero. Two things about that though: (1) it seems to me that sooner or later this will be standard and on by default (tracking the SAB standardization process suggests to me it is making steady progress towards standardization) and (2) there may be more sources of time hidden here and there. 

Admittely not here and now, so I can understand a low prioritization; but I do have a feeling that with more research and/or engineering work, a new timing source will present itself, and now is an opportunity to get out in front of that. But, 'code or gtfo' as they say, so I'll leave it at that.

As for the POC: how bad that it's collapsing on you. At the time i tested it and noted all the conditions quite thoroughly. Let me try it again and get back to you on that.

I agree that this is a hardware thing, but interestingly, the cpu vendors said this is a software thing :) :( :)
More accurately it's a hardware/OS issue. That is to say that the hardware and OS in combination are supposed to provide guarantees for ASLR, and in this instance they're failing to meet those guarantees. We may be able to make some hacky mitigations around this, but a proper fix would need to be either in the hardware or the OS memory management implementation (or in some combination of the two).
Labels: -Restrict-View-SecurityTeam
Dropping view restrictions since the paper is public.
Labels: allpublic
Cc: seththompson@chromium.org
Cc: bradnelson@chromium.org danno@chromium.org titzer@chromium.org hablich@chromium.org
 Issue 693042  has been merged into this issue.
Speaking of the POC. that was not supposed to be public. Can someone please remove it ASAP?
Removed, sorry!
Thanks! I realized later i could also remove it and did. Double gone now I suppose.

Comment 37 Deleted

@palmer i got around to diagnosing this. Some standard evolved just in time to break this POC :-). The change is:

diff --git a/code/js-chrome-poc/aslr-sidechannel.js b/code/js-chrome-poc/aslr-sidechannel.js
index a65cfe6..7189c35 100644
--- a/code/js-chrome-poc/aslr-sidechannel.js
+++ b/code/js-chrome-poc/aslr-sidechannel.js
@@ -173,7 +173,7 @@ function make_count_worker()
 
        timing_buf = new Uint32Array(timing_array);
        count_worker = new Worker("count_worker.js");
-       count_worker.postMessage([timing_buf,0,timing_array], [timing_buf.buffer]);
+       count_worker.postMessage([timing_buf,0,timing_array]);
 }


i.e. not to put the timing_buf (shared memory) in the transfer list.

Comment 39 by aarya@google.com, May 25 2017

Cc: mstarzinger@chromium.org
Project Member

Comment 40 by sheriffbot@chromium.org, Jun 6 2017

Labels: -M-58 M-59
Project Member

Comment 41 by sheriffbot@chromium.org, Jul 26 2017

Labels: -M-59 M-60
Project Member

Comment 42 by sheriffbot@chromium.org, Sep 6 2017

Labels: -M-60 M-61
Project Member

Comment 43 by sheriffbot@chromium.org, Oct 18 2017

Labels: -M-61 M-62
Project Member

Comment 44 by sheriffbot@chromium.org, Dec 7 2017

Labels: -M-62 M-63
Project Member

Comment 45 by sheriffbot@chromium.org, Jan 25 2018

Labels: -M-63 M-64
Labels: -Pri-2 -OS-All -M-64 OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows Pri-3
so, SharedArrayBuffer has been (temporarily?) removed, as part of our Spectre mitigation plan, and performance.now and Date.now coarsened and jittered. Other high-enough-resolution timers probably still exist, however.

It still remains for me/us to think about limiting Partition Alloc to a 4 TiB range, but in a Spectre/Meltdown world I'm not sure how to prioritize that, given all the other stuff going on.
Status: Started (was: Assigned)
So, this class of problem got worse. ;) We are working on an overall approach to microarchitectural side channel info-leak attacks: https://www.chromium.org/Home/chromium-security/ssca We're also working on an update to our threat model for renderers that we'll be publishing soon.
Status: WontFix (was: Started)
Actually, calling it WontFix (which is not to say that it's not a real problem), since we're tracking the work elsewhere and this bug doesn't help us track anything additionally.
Why not duplicate it into that bug?

Sign in to add a comment