New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 7 users

Issue metadata

Status: WontFix
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Chrome , Mac , Fuchsia
Pri: 3
Type: Bug-Security



Sign in to add a comment
link

Issue 665930: Security: ASLR bypass by MMU cache side channel: AnC or ASLR^Cache

Reported by ben.gras...@gmail.com, Nov 16 2016

Issue description

VULNERABILITY DETAILS
There is a side channel that discloses MMU lookup activity. This side channel is due to the MMU caching pagetable cachelines in the CPU cache.

This allows code to compute accessed virtual addresses by observing CPU cache activity. This allows ASLR bypass in Javascript. Javascript can compute the address of e.g. an ArrayBuffer. This works for data pointers in Chrome and code and data pointers in Firefox.

VERSION
Chrome Version: 52.0.2743.82
Operating System: Ubuntu 16.04, Linux kernel 4.4.0-38

We have verified this signal exists on Intel Ivy Bridge, Skylake and Haswell microarchitectures.

We do not expect this side channel to be very specific to micro-architectures however.

REPRODUCTION CASE
We have a POC available working in Chrome and Firefox, which is available upon request.

Attached is the research paper that is accepted for publication at NDSS 2017. It details all of the workings of the side channel and implementation details in Chrome.

We intend to make this work public on February 15th 2017.

We have some mitigations in mind we would be happy to discuss.

Best regards,

Ben
 

Comment 1 Deleted

Comment 2 by awhalley@chromium.org, Nov 16 2016

Cc: awhalley@chromium.org

Comment 3 by ben.gras...@gmail.com, Nov 17 2016

We have set up our own info page to track the disclosure process that we'll add more info to as we have it.

URL
https://www.vusec.net/projects/anc/

PW
TeZwEysX4qza

Comment 4 by kerrnel@chromium.org, Nov 18 2016

Owner: infe...@chromium.org
This is outside of my expertise. aarya@, what are your thoughts?

Comment 5 by mea...@chromium.org, Nov 18 2016

Cc: rickyz@chromium.org xzhou@chromium.org
Components: Internals
Status: Assigned (was: Unconfirmed)
Adding some guts folks.

Comment 6 by rickyz@chromium.org, Nov 19 2016

Cc: palmer@chromium.org mnissler@chromium.org

Comment 7 by mnissler@chromium.org, Nov 21 2016

Cc: keescook@chromium.org
+keescook, as this likely has impact beyond Javascript and browsers.

Comment 8 by mea...@chromium.org, Nov 22 2016

Cc: -keescook@chromium.org infe...@chromium.org
Labels: OS-All
Owner: keescook@chromium.org
inferno is probably not the right owner, assigning to keescook.

Comment 9 by palmer@chromium.org, Nov 22 2016

Cc: wad@chromium.org jsc...@chromium.org
Components: Blink>JavaScript
Labels: Security_Impact-Stable
+jschuh, wad: Relevant to your interests.

I'm still reading the paper.

Comment 10 by palmer@chromium.org, Nov 22 2016

Oh, I should add: Yes, we'd very much like to see your PoC, and to hear your thoughts on mitigation. :) Thanks.

Comment 11 by palmer@chromium.org, Nov 22 2016

Cc: mseaborn@chromium.org

Comment 12 by ben.gras...@gmail.com, Nov 23 2016

@palmer POC:

I am working on cleaning up & test the POC right now. (The reason for my wanting to do this is that the code used to do experiments for the paper is for both firefox and chrome and is a bit messy so the point is perhaps easy to miss in the noise.)

I intend to upload a POC today (wednesday my time, CET).

Comment 13 by ben.gras...@gmail.com, Nov 23 2016

@palmer mitigation:

We have collected a story on https://www.vusec.net/projects/anc/ , PW TeZwEysX4qza

Summarized in my words:

  - This attack needs a high resolution timer. performance.now() already has some jitter (right?), so we could not use this directly. In Chrome we could reliably use a shared memory based counter as predicted by mseaborn@ in https://github.com/tc39/ecmascript_sharedmem/issues/1, in fact both sources of time were predicted there (we discovered this issue text after finding them). The Jan 27 comment seems to dismiss this as either unexploitable or exploitable in other ways. So, one mitigation would be to make access to high resolution timers (even) harder - though it is probably hard to guarantee this.

  - This attack relies on uniquely identifying not only cache lines but also pagetable slots used for the buffer lookup to compute the addresses. This in turn relies on (a) a large range of contiguous virtual address space and (b) getting this buffer allocated in a very different part of the address space so that the cachelines are not re-used after eviction and before the measurement; we call this 'blinding.' If all Javascript runtime data & code were to be forced in the same 4TB of virtual address space, this attack could never observe the top 9 bits (out of the current 48 virtual user address space bits available) of the buffer address. What also would frustrate the attack greatly is noncontiguous buffers, so e.g. adding 1 level of indirection might already make a good solution to the address impossible.

 - There is also the matter of Intel CAT that might isolate cache activity, but we probably should not count on because it requires OS and HW support.

Is that something to go on?

I'd be happy to clarify on a call if the above is unclear (as I find it a bit hard to write clearly).

Comment 14 by mseaborn@chromium.org, Nov 23 2016

I think this is a significant finding.  Thanks for researching it.

Do you think hardware changes could mitigate this, if hardware allowed randomising the offsets at which PTEs appear within a page table?  I can imagine two possible schemes:

 1) Stronger scheme: Each PTE could contain a 9-bit value (randomly chosen by the kernel) which is added (mod 512) or XOR'd to the index of the PTE at the next level of the page table hierarchy.  There are reserved bits in the x86-64 PTEs which could be used for this.

 2) Weaker scheme: The CPU could have a global 9-bit register for each level of the page table hierarchy which would be added (mod 512) or XOR'd to the PTE's offset when looking up the PTE.  For XORing, this would be equivalent to XORing each virtual address with an OS-chosen value before lookup.

Would this work?  Would this stop the attacker from determining which attacker-accessible memory locations alias in the cache with PTE locations?

Obviously this wouldn't help with current hardware, but it would be useful to know whether this ASLR-defeating cache side channel is easier to mitigate than cache side channels in general.

Comment 15 by ben.gras...@gmail.com, Nov 24 2016

I attach the POC for Chrome. It solves the pagetable slots used for the 3 lowest levels of the pagetables for a data buffer given a data buffer that crosses a 8GB boundary (this can take a few trials; otherwise only the 2 lower levels are known).

For the top level this takes a buffer that crosses a 4TB boundary and so is very rare.

A fairly detailed README plus screenshot of a successful run (for comparison) is included.

README:


Files
=====

aslr-sidechannel-poc.html: basic html file driving the POC
aslr-sidechannel.js: Javascript code implementing the main MMU sidechannel and ASLR bypass work
extra-js: directory with supporting Javascript code that is not central to the POC, e.g. solver and BigInt
css: directory with minor css settings

Remarks for Reproducing
=======================

This work has been tested successfully on the Ivy Bridge, Haswell
and Skylake microarchitectures but is not expected to be highly
microarch-specific.

This work has been tested on Chrome 54.0.2840.100 revision
ed651c97177b2ac846b27f62bb8efed6dac0f90b but is not expected to be
highly chrome-version-specific.

This machine has 32GB but that should not be necessary. It probably will not work out of the box in a VM
and probably has to run on native HW (pagetable lookups will be different in VM guest mode).

Limitations
===========

We had some trouble understanding the layout of the JIT code in memory
and producing suitable JIT functions, and because of this were not
successful in finding JIT code addresses, only the data start address of
a large ArrayBuffer.  (Code pointer finding was successful in Firefox.) We
expect this may yet be possible with further work.

To run the POC in Chrome
========================

The POC relies on hugepages being off. (Otherwise the lowest pagetable level
does not exist.)

sudo sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'

The POC relies on the shared memory extension to implement a high
resolution timer (higher than performance.now() is) that is not on
by default.

Start Chrome with shared memory extension enabled:

google-chrome --js-flags=--harmony-sharedarraybuffer --enable-blink-feature=SharedArrayBuffer --allow-file-access-from-files --user-data-dir aslr-sidechannel-poc.html

This will start running the eviction cacheline vs. target slowdown
analyzer. It should solve for the lower 2 levels of the pagetable slot
within about 30 seconds. For the next level up we need to allocate a
buffer (which is 2GB) that crosses a 8GB boundary. This often takes a few
tries. Use this python script to check, which will print all allocations
in a chrome renderer larger than 1000MB:

$ python watch_aslr.py `ps -auxw | grep type=render | grep -v grep | awk '{ print $2 }'`
pid 15054
0x2969eb604000 0x296a6a604000 rw-p 2032 Mbytes, slots/cachelines: slots:  82 423 347   4 cachelines: 10 52 43  0. - slots:  82 425 339   4 cachelines: 10 53 42  0.

This prints the pagetable slots active in all 4 levels when looking
up the start and the end of the buffer.  To uniquely determine the
highest-before-last slot, the starting L3 pagetable cacheline has to be
different from the ending L3 pagetable cacheline.  In this example, this
is 52 and 53, so this buffer is suitable for computing the 3 lower levels.

The chances of the buffer crossing a 4TB boundary, needed for a full
address computation, is much lower still but is theoretically just a
matter of time.  (A few thousand attempts.)

The Successful-demo.png was captured after around 8 trials of finding
a buffer that could solve for the lower 2 levels. As can be seen, the
pagetable slots it solved for (423, 347, 4) match the ones printed by the
python script, and the address (0x69eb604000) is a match in the 3*9+12=39
lower bits with the real starting address, which is 0x2969eb604000.

Comment 16 by ben.gras...@gmail.com, Nov 24 2016

@mseaborn About the XOR-based mitigation. I think it will make the attackers job much harder, but a lot of information will still be available.

For the lower two pagetable levels (where a 1GB buffer will be enough to observe all slots) I expect it won't be too hard to recover the upper 6 bits of the XOR key, as only one value will make the permutation follow the expected linear scanning pattern. However we can't know the lower 3 bits as they permute the pagetable slots within the same cacheline. We also don't know the exact pagetable slot of the transition because of this. So I think it's safe to say this scheme will preserve 3 bits of entropy at each level, but not the desired 9. So this means sacrificing only 3 bits of precious pagetable slot position.

For the top two levels we need a huge amount of contiguous address space to see the transitions, which even without the permutation is quite hard to get in non-native environments (say JS), so that will indeed make the attack harder but does not fully stop it.

Comment 17 by dominickn@chromium.org, Nov 29 2016

Labels: Security_Severity-Medium
Tentatively assigning a Medium security level, but this is beyond my direct expertise so if someone more knowledgable has a better level, please change.

Comment 18 by sheriffbot@chromium.org, Nov 29 2016

Project Member
Labels: M-55

Comment 19 by sheriffbot@chromium.org, Nov 29 2016

Project Member
Labels: Pri-1

Comment 20 by sheriffbot@chromium.org, Nov 30 2016

Project Member
keescook: Uh oh! This issue still open and hasn't been updated in the last 14 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers?

If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one?

If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 21 by keescook@chromium.org, Nov 30 2016

Cc: keescook@chromium.org
Owner: palmer@chromium.org
I don't work on the Chrome code base very often, so I'm not the right person for looking at how to mitigate this on the JS side.

On the kernel side, there isn't going to be a quick fix, I'm afraid. It's already well understood that ASLR has architectural weaknesses on x86, so Pri-1 doesn't really apply there.

Comment 22 by sheriffbot@chromium.org, Dec 15 2016

Project Member
palmer: Uh oh! This issue still open and hasn't been updated in the last 23 days. This is a serious vulnerability, and we want to ensure that there's progress. Could you please leave an update with the current status and any potential blockers?

If you're not the right owner for this issue, could you please remove yourself as soon as possible or help us find the right one?

If the issue is fixed or you can't reproduce it, please close the bug. If you've started working on a fix, please set the status to Started.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 23 by ben.gras...@gmail.com, Dec 21 2016

The camera-ready version of our paper was submitted last night and includes fairly significant new generalizations to many Intel microarchitectures, and also AMD and ARM implementations, 32 and 64 bit.

Comment 24 by sheriffbot@chromium.org, Jan 26 2017

Project Member
Labels: -M-55 M-56

Comment 25 by palmer@chromium.org, Feb 15 2017

Labels: -Pri-1 -M-56 -Security_Severity-Medium Security_Severity-Low M-58 Pri-2
Thanks for all info, Ben and others. Sorry I haven't been working on this bug in a while; some other things came up.

jschuh and I think this is Low severity, due in part to the need for a non-standard configuration (the shared memory extension).

Also, When I try to run the PoC, I get a JavaScript exception:

aslr-sidechannel.js:176 Uncaught DOMException: Failed to execute 'postMessage' on 'Worker': SharedArrayBuffer can not be in transfer list.
    at make_count_worker (http://localhost/js-chrome-poc/aslr-sidechannel.js:176:15)
    at init (http://localhost/js-chrome-poc/aslr-sidechannel-poc.html:70:2)
    at window.onload (http://localhost/js-chrome-poc/aslr-sidechannel-poc.html:16:59)

Maybe I'm holding it wrong? :)

There might be something for me to do here: constrain PartitionAlloc to a 4 TiB range of address space, as you suggest. I'll look into doing that. But for the long term/real fix, it's probably a hardware adventure.

Comment 26 by palmer@chromium.org, Feb 15 2017

Components: Blink>MemoryAllocator>Partition

Comment 27 by ben.gras...@gmail.com, Feb 15 2017

@palmer Yes, the shared memory extension does make the barrier to executing the measurement nonzero. Two things about that though: (1) it seems to me that sooner or later this will be standard and on by default (tracking the SAB standardization process suggests to me it is making steady progress towards standardization) and (2) there may be more sources of time hidden here and there. 

Admittely not here and now, so I can understand a low prioritization; but I do have a feeling that with more research and/or engineering work, a new timing source will present itself, and now is an opportunity to get out in front of that. But, 'code or gtfo' as they say, so I'll leave it at that.

As for the POC: how bad that it's collapsing on you. At the time i tested it and noted all the conditions quite thoroughly. Let me try it again and get back to you on that.

I agree that this is a hardware thing, but interestingly, the cpu vendors said this is a software thing :) :( :)

Comment 28 by jsc...@chromium.org, Feb 15 2017

More accurately it's a hardware/OS issue. That is to say that the hardware and OS in combination are supposed to provide guarantees for ASLR, and in this instance they're failing to meet those guarantees. We may be able to make some hacky mitigations around this, but a proper fix would need to be either in the hardware or the OS memory management implementation (or in some combination of the two).

Comment 29 by jsc...@chromium.org, Feb 15 2017

Labels: -Restrict-View-SecurityTeam
Dropping view restrictions since the paper is public.

Comment 30 by palmer@chromium.org, Feb 15 2017

Labels: allpublic

Comment 31 by palmer@chromium.org, Feb 16 2017

Cc: seththompson@chromium.org

Comment 32 by seththompson@chromium.org, Feb 16 2017

Cc: bradnelson@chromium.org danno@chromium.org titzer@chromium.org hablich@chromium.org

Comment 33 by joh...@chromium.org, Feb 16 2017

 Issue 693042  has been merged into this issue.

Comment 34 by ben.gras...@gmail.com, Feb 16 2017

Speaking of the POC. that was not supposed to be public. Can someone please remove it ASAP?

Comment 35 by awhalley@chromium.org, Feb 16 2017

Removed, sorry!

Comment 36 by ben.gras...@gmail.com, Feb 16 2017

Thanks! I realized later i could also remove it and did. Double gone now I suppose.

Comment 37 Deleted

Comment 38 by ben.gras...@gmail.com, Mar 27 2017

@palmer i got around to diagnosing this. Some standard evolved just in time to break this POC :-). The change is:

diff --git a/code/js-chrome-poc/aslr-sidechannel.js b/code/js-chrome-poc/aslr-sidechannel.js
index a65cfe6..7189c35 100644
--- a/code/js-chrome-poc/aslr-sidechannel.js
+++ b/code/js-chrome-poc/aslr-sidechannel.js
@@ -173,7 +173,7 @@ function make_count_worker()
 
        timing_buf = new Uint32Array(timing_array);
        count_worker = new Worker("count_worker.js");
-       count_worker.postMessage([timing_buf,0,timing_array], [timing_buf.buffer]);
+       count_worker.postMessage([timing_buf,0,timing_array]);
 }


i.e. not to put the timing_buf (shared memory) in the transfer list.

Comment 39 by aarya@google.com, May 25 2017

Cc: mstarzinger@chromium.org

Comment 40 by sheriffbot@chromium.org, Jun 6 2017

Project Member
Labels: -M-58 M-59

Comment 41 by sheriffbot@chromium.org, Jul 26 2017

Project Member
Labels: -M-59 M-60

Comment 42 by sheriffbot@chromium.org, Sep 6 2017

Project Member
Labels: -M-60 M-61

Comment 43 by sheriffbot@chromium.org, Oct 18 2017

Project Member
Labels: -M-61 M-62

Comment 44 by sheriffbot@chromium.org, Dec 7 2017

Project Member
Labels: -M-62 M-63

Comment 45 by sheriffbot@chromium.org, Jan 25 2018

Project Member
Labels: -M-63 M-64

Comment 46 by palmer@chromium.org, Jan 27 2018

Labels: -Pri-2 -OS-All -M-64 OS-Android OS-Chrome OS-Fuchsia OS-Linux OS-Mac OS-Windows Pri-3
so, SharedArrayBuffer has been (temporarily?) removed, as part of our Spectre mitigation plan, and performance.now and Date.now coarsened and jittered. Other high-enough-resolution timers probably still exist, however.

It still remains for me/us to think about limiting Partition Alloc to a 4 TiB range, but in a Spectre/Meltdown world I'm not sure how to prioritize that, given all the other stuff going on.

Comment 47 by palmer@chromium.org, Feb 14 2018

Status: Started (was: Assigned)
So, this class of problem got worse. ;) We are working on an overall approach to microarchitectural side channel info-leak attacks: https://www.chromium.org/Home/chromium-security/ssca We're also working on an update to our threat model for renderers that we'll be publishing soon.

Comment 48 by palmer@chromium.org, Feb 14 2018

Status: WontFix (was: Started)
Actually, calling it WontFix (which is not to say that it's not a real problem), since we're tracking the work elsewhere and this bug doesn't help us track anything additionally.

Comment 49 by 93m4qau...@gmail.com, Feb 15 2018

Why not duplicate it into that bug?

Sign in to add a comment