New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Long OOO (go/where-is-mgiuca)
Closed: Apr 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Security

Blocked on:
issue 707620

Blocking:
issue 514919



Sign in to add a comment
link

Issue 707071: Security: getInstalledRelatedApps: timing attack can leak installed status

Reported by mgiuca@chromium.org, Mar 30 2017 Project Member

Issue description

From discussion on internal design doc:
https://docs.google.com/document/d/1HLPFFk9W5vGsCkhKCnzs-ZtMH8yHk844Ccq0Hao0xxc/edit

dhnishi:
"""
A savvy malicious actor could time API call to potentially heuristically determine if a non-associated app is installed.
"""

palmer:
"""
Yes, this is a straightforward attack. I think you should deal with it before shipping.

The usual solution to this kind of timing oracle is to eliminate the oracle: ensure that all operations take a constant amount of time. However, that's not always possible.

You can try to blur the bimodality of the timing distribution by adding random noise (e.g., sleep(rand() % kFooFactor)), but an attacker who can take many samples can factor out the noise:

https://rdist.root.org/2009/05/28/timing-attack-in-google-keyczar-library/#comment-5481

(see Nate's response)

So, if I were you I'd try to either:

1. Ensure that all of these API calls take exactly N milliseconds (or whatever)

or

2. Add random noise, and then apply the same noise factor every time for that caller, so that they can't factor out the noise by taking more samples. (Different callers, different noise factor.)
"""
 

Comment 1 by mgiuca@chromium.org, Mar 30 2017

Blocking: 514919

Comment 2 by palmer@chromium.org, Mar 30 2017

Note that approach (2) could still be gamed: If you define the caller to be the origin, an attacker could take many samples by creating many different origins. (For example, generating a million {foo}.attacker.com hostnames, or opening all 65k ports on their server to get 65k guesses.)

So, it might be good to define "caller" for this purpose as a site's effective TLD + 1 label  ("eTLD+1"), and use that as the key into your map<caller, noise_factor>. (E.g. treat all {foo}.attacker.com:[port] as just "attacker.com".) Then the attacker is back to getting only 1 sample.

Comment 3 by mgiuca@chromium.org, Mar 31 2017

Cc: owe...@chromium.org
Yes, I was about to ask questions along those lines.

Approach 1 could be difficult/impossible: the algorithm works like this:

  if the app is installed:
    get the manifest from the app
    for each asset statement in the manifest:
      get and parse the asset statement
      for each related app in the asset statements:
        check if the origin matches; if so return true
  return false

Basically, the algorithm is O(N) in the number of related apps in the Android app manifest IF the app is installed, or O(1) if the app isn't installed. If the attacker knows how many apps are in the app's manifest, we can't introduce the correct amount of time delay to match it when the app isn't installed, because we don't know how big N is.

All I can think of is introducing a fixed number of steps K (say, 100), performing a dummy operation K times if the app isn't installed, and if the app is installed, performing the true N operations, followed by K-N dummy operations. Is that OK, or is this going to have no real impact?

Approach 2: Bleh, I wrote a huge wall of text to try and understand this but I ended up getting even more confused. Perhaps we can have a VC to discuss instead (mind if I set one up?)

Here's me trying to get my head around this:

- You want to generate a random artificial time delay to avoid someone being able to tell whether it's installed (slower) or not installed (faster).
- But if it's random on each query, you can aggregate it to cancel out all the random noise.
- But if we use the same time attack across a set of queries, it's effectively the same as having no noise at all.

That last point is tricky. I thought a bunch about this, whether we want to keep the same random number across some or all of these sets of queries:
1. Queries from the same domain/subdomain/however you carve up the calling entity.
2. Queries to the same user/device.
3. Queries about the same native app ID.

Any or all of the above.

You're asking for #1 and I think implicitly #2 (because you're suggesting we generate a random number on the device... otherwise it would be a hash of the domain, not a random number). But if we do #1 and #2, you can still aggregate across all users to see which users generally respond slower to a query than others.

#3 is also important, because you could otherwise do a query for 100 apps and see which ones are generally slower than others. So I think we want a different random time delay per app being queried.

I don't think #1 is actually important at all... we could use the same offset across all domains and it wouldn't matter. (If you are trying to use ETLD to make sure all domains controlled by an attacker get the same offset... why not just lump all domains on the planet together?) It's really #2 and #3 that are important.

So... does this mean we generate on-device (#2) a random number per app (#3), and cache it? This avoids repeated queries getting different results. But I'm still concerned that you can aggregate across many users, or also many apps. I'm not thoroughly confused because it seems regardless of whether you cache the number or generate a random one each time, there are timing issues.

Lastly... if we're going to cache a random number per app, why not just cache the result for that app, and then we don't need to do this at all...?

I think a VC is a good idea. This is hurting my head.

+owen: FYI this may introduce some new complexity which delays the launch.

Comment 4 by elawrence@chromium.org, Mar 31 2017

Is the API available to subframes, or only the top-level frame? If it's only in the top-level frame, I believe the likelihood of abuse declines significantly.

Could it work to do something like this:

On each call to the API, select *two* candidate apps, the TARGET app and a randomly-chosen DECOY app. If the TARGET app is not installed, select a second randomly chosen DECOY app.

Run the algorithm twice. If the app is present and origin access is allowed, return TRUE at the end. If the app is not present or is present but origin access is disallowed, return FALSE at the end.

Timing for all three cases (AppInstalled, AppInstalledButNotAffiliatedWithOrigin, and AppNotInstalled) should all be unpredictable because a potential attacker doesn't know what apps the user has and thus has no way to know the timing of the decoy.

?

Comment 5 by palmer@google.com, Mar 31 2017

I wouldn't assume that limiting the API to the top-level frame would necessarily reduce abuse. For example, a lot of ads are still done by <script src="..."> rather than by iframe. (Alas.)

Thinking about the decoy solution...

Go ahead and schedule a VC, although note that my calendar next week is pretty crazy.

Comment 6 by mgiuca@chromium.org, Apr 3 2017

#4 interesting idea about the decoy app. I'm not sure why you need two decoys.

Thinking this through, if the algorithm was:

- If TARGET app installed, check backlinks from that app (O(x) checks).
- Else, choose a DECOY app and check backlinks from that app (O(y) checks).

This will take O(x) time if installed, O(y) time if not. x is known to the attacker, so they could use the time taken to figure out if it's x or y and therefore whether it's installed.

But I think your double-decoy algorithm is this:

- Choose a random app DECOY1.
- Check backlinks from DECOY1 (O(z) checks).
- If TARGET app is installed, check backlinks from TARGET (O(x) checks).
- Else, choose a DECOY2 app, and check backlinks from DECOY2 (O(y) checks).

Now this will take O(x + z) time if it's installed, or O(y + z) time if not. Only x is known to the attacker, so they can't really estimate what the predicted time would be for installed vs not installed. OK, that explains the two decoys.

But I think this is just a roundabout way of adding a random time delay. Using a decoy app instead of a random time delay helps in that it gives you a realistic delay (rather than just adding N milliseconds). But other than that, I think this has all the downsides of a random number: do we choose a random decoy app on each query? Or do we cache the decoy ID so we use the same one again?

If we choose a random one each time, the attacker can repeat 100 times and see whether the average is O(x + <typical app>) or (<typical app> * 2). If we cache the decoy choice so we get the same app every time for a given user, then an attacker can aggregate across all users.

Another problem with this is that Digital Asset Links is basically not used at all (I couldn't find a single app that uses it) --- it is only expected to be used by apps that want to use getInstalledRelatedApps. Therefore, if you choose a random decoy app, there is a high probability that y and z will be 0.

:/

At this point I'm leaning towards my 100 operations solution to Approach 1.

Comment 7 by mgiuca@chromium.org, Apr 3 2017

I have confirmed empirically that this attack is possible (there are major timing differences between installed and not-installed apps).

I tested 1000 queries against three different app IDs:
1. A non-installed app (garbage name). Took between 2500 and 2800ms.
2. An installed app with no backlinks (com.twitter.android). Took between 3100 and 3300ms.
3. An installed app with one backlink (my test app). Took between 3600 and 4000ms.

So the difference is very noticeable.

This is mitigated somewhat by the fact that you can't just query any app you like. The function itself takes no arguments; instead the apps to be queried are in the manifest (which is controlled by the attacker, but can't be changed dynamically). So if an attacker wanted to compare queries against different apps for the same user (as I'm doing above), they'd probably have to get the user to run the query from two different domains, or change the manifest being served from the server and have the user make queries from both manifests.

To answer #4, yes this API is available from a subframe (this was discussed during development and we just went with it because it was easier that way and there was no obvious reason to block it). Having a subframe allows the attacker to instantly make requests from multiple domains. We could block subframes, but there might still be other ways to make requests from multiple domains.

Comment 8 by mgiuca@chromium.org, Apr 3 2017

Blockedon: 707620

Comment 9 by mgiuca@chromium.org, Apr 3 2017

Status: Started (was: Assigned)
I rewrote the code to just try to perform the same operations on all code paths. The old algorithm was:

1. Call PackageManager.getApplicationInfo. Early-exit if not installed.
2. Call PackageManager.getResourcesForApplication. Early-exit if not installed.
3. Get the "asset_statements" resource string from the application resource. Early-exit if not found.
4. Parse the JSON string into a JSON array. Early-exit if error.
5. For each statement, check if it matches this site. Early-exit if match.

The new algorithm is:

1. Call PackageManager.getApplicationInfo. Remember if installed or not.
2. If installed PackageManager.getResourcesForApplication on this app, and store the results. If not, call PackageManager.getResourcesForApplication on Chrome itself, and throw away the results.
3. Get the "asset_statements" resource string from the application resource. If not found, or the app is not installed, default to "[]".
4. Parse the JSON string into a JSON array. If error, empty array.
5. For i in [0, 99]: get statement #i (or if out of bounds, make a dummy statement). Check if it matches this site. Early-exit if match.

(Note that the early-exit in #5 is OK because there is no "attack" if it returns true; it's a legitimate match.) The only major difference between the two is a bit of extra JSON parsing.

The old algorithm was getting (on my Nexus 7 2013):
[Note: Different and significantly faster than what I had this morning; can't explain that but all of the following results are consistent.]
Non-installed: 1.9--2.1ms
Installed-0-backlinks: 2.1--2.8ms
Installed-1-backlink: 2.4--2.9ms

The new algorithm gets:
Non-installed: 15.6--16.1ms
Installed-0-backlinks: 15.6--16.0ms
Installed-1-backlink: Not tested yet...

Yay? But also, the speed might be a legitimate problem now (16ms for a single API call?). We can tweak the 100 statements cap; it becomes a trade-off between how many statements we support in an APK vs how slow it is.

Last experiment: new algorithm but without Step 5 (normalizes all the calls to the Android APIs but with a variable amount of local processing):
Non-installed: 2.5--3.0ms
Installed-0-backlinks: 2.5--3.0ms
Installed-1-backlink: 2.5--3.0ms

The problem with this last approach (even though it would be significantly simpler) is that if an app had a large amount of statements, you could still detect it because it would take longer (I was running an experiment on such an app but ran into problems and out of time...). Will test this tomorrow. But I think the full algorithm with the crazy 100 steps is the only way to go.

Comment 10 by elawrence@chromium.org, Apr 3 2017

> Digital Asset Links is basically not used at all

Ah. Yeah, that's probably a fatal limitation.

> I'm not sure why you need two decoys.

To foil a timing attack, you want to ensure that the attacker cannot rely upon the timing of either path (Present or NotInstalled). Selecting a decoy, even in the case that the app is installed, would mean that the timing would not be predictable because the attacker doesn't get to pick the Decoy.

> Having a subframe allows the attacker to instantly make requests 
> from multiple domains. We could block subframes, but there might 
> still be other ways to make requests from multiple domains. 

Yes, the attacker could navigate you through a series of top-level pages to build a profile. But this is far more visible to the user and impacts the user-experience, and thus is less likely to be taken advantage of by, say, an advertising network that is embedded in a page.

Comment 11 by mgiuca@chromium.org, Apr 4 2017

Note: I've started a separate bug to have the API restricted to top-level frames:  Issue 707620 .

Comment 12 by dominickn@chromium.org, Apr 4 2017

Labels: Security_Severity-Low Security_Impact-None

Comment 13 by mgiuca@chromium.org, Apr 4 2017

I did a full experimental analysis of this because the above trials were haphazard:
https://docs.google.com/a/google.com/document/d/1nfBayd6Bm0SZRsqfK3yiHtY_Cwbjt3V7Z0WsD5fzMnM/edit

The results are discouraging. I can't find a way to make the timing of an APK with a lot of backlinks look anything like that of a missing APK.

Comment 14 by mgiuca@chromium.org, Apr 5 2017

I just met with palmer@ to discuss the approach.

The "Approach #1" (trying to make the code take the same amount of time on all code paths) is likely a losing battle, since tiny variations in e.g., doing an extra if statement end up causing one path to take more time than the other. And it's brittle -- if someone refactors it or adds some new functionality it can become unbalanced. Let's avoid this and instead just try to inject random noise to the timings.

Instead of a random number, we will generate a hash based on (device-unique salt × the Android app ID). The salt would be a randomly generated number, computed once and stored in the pref store forever (or some other stable value that uniquely identifies the device but is unknown to the site).

We generate a random time from this hash between 0 and 10ms. This gives enough noise on top of the signal (which was measured on a Nexus 7 2013 to be between 2--15ms for different apps). And sleep for that amount of time on top of processing the query.

This mitigates:
- An attacker querying many times on one device and averaging out the runs (since they will always take approx. the same amount of time and that time won't reflect whether the app is installed).
- An attacker doing a distributed attack over many users (e.g., those with the same hardware in User-Agent). Since all users have different offsets, you can't establish a baseline "installed" vs "not installed" timing for known apps by averaging across users.

We also plan to limit the API to top-level contexts (not iframes), which makes it difficult for attackers to change which apps they are querying about ( Issue 707620 ).

One problem remains: an attacker could cache the times taken to query a particular app, then notice if it suddenly jumps over time, which would indicate that the user has installed or uninstalled the app. This is the downside of having a stable offset per app ID. But if we occasionally change the app ID (e.g., generate a new one each time the browser process restarts, or at random time intervals), then an attacker could perform an average over a long period of time. So it seems hard to defend against a dedicated attacker who is willing to store timing data over a long period of time for a particular user. Is this a problem we need to address somehow?

Comment 15 by palmer@chromium.org, Apr 5 2017

Re the last bit in #14: I'd say it's more of a risk that an attacker could learn that the device owner has uninstalled the app by watching the change in time than that they could wait a very long time to average out the randomness.

For example, if we created a new salt/secret per time Chrome started up, and if the attacker gets 1 guess per startup, it would take quite a long time to get a meaningful number of samples to factor out that randomness. And, the attacker would need to tie their other identifier(s) (e.g. cookies) to the guesses, and thus would need the other identifier(s) to outlast the many browser restarts.

I think at that point, the value of a successful attack is quite possibly lower than the cost of mounting it. And, the cost/benefit ratio is likely lower than that of other fingerprinting techniques.

I've been wrong before, so don't take this as gospel. But with this we'd be out of the trivially-exploitable territory.

Comment 16 by mgiuca@chromium.org, Apr 6 2017

Initial CL is up:
https://codereview.chromium.org/2802603002/
It's private but I added you to CC. Not ready for review yet.

#15: OK, so you're saying I should change what I have now (permanently store a device ID in prefs) with a temporary ID that's re-rolled on browser startup?

That still gives a window where a site can detect when an app is installed/uninstalled:

1. Visit site. Logs timing data.
2. Install app.
3. Return to site within same browsing session. Log new timing data.

The site could take as many samples as it wants before and after the installation, using the same salt, and notice when there is a significant change.

However, there is plausible deniability because it wouldn't be able to tell the difference between the app being installed/uninstalled, and the browser being restarted. So maybe this is fine?

We could also reroll it every 5--15 minutes, rather than on browser restart. (It would have to be a random amount of time, because otherwise the site could learn the restart interval.) That might add more complexity without buying us any more security.

benwells@ suggested instead of ADDING a delay, we make it so the TOTAL time taken equals the random value, by measuring the time the operation took, then sleeping for the remainder of the time. This would require that the minimum random value be greater than the longest expected time for the operation to take, and then we may as well not use a random time, but just make it always take exactly, say, 15ms. Is it worth exploring this? It would be simpler but we'd have to choose a fairly conservative value so it might make it really slow.

Comment 17 by palmer@chromium.org, Apr 6 2017

"OK, so you're saying I should change what I have now (permanently store a device ID in prefs) with a temporary ID that's re-rolled on browser startup?"

Yeah, I think so.

"However, there is plausible deniability because it wouldn't be able to tell the difference between the app being installed/uninstalled, and the browser being restarted. So maybe this is fine?"

That's what I'm thinking.

"We could also reroll it every 5--15 minutes, rather than on browser restart. (It would have to be a random amount of time, because otherwise the site could learn the restart interval.) That might add more complexity without buying us any more security."

Yeah, I would say let your sense of simplicity guide here. We shouldn't let this mechanism get too baroque.

As for constant time, that's ideal if you can swing it but I've been assuming you can't because: (a) I wonder how precisely you can realistically get it (sleep is never exact, and sleep(15 - time_taken) might still leak some information?); and (b) indeed, you might have to be so conservative that it hurts real use-cases. But, maybe not.

Comment 18 by mgiuca@chromium.org, Apr 6 2017

I've implemented the new proposal and updated the analysis doc with timing experiments:
https://docs.google.com/document/d/1nfBayd6Bm0SZRsqfK3yiHtY_Cwbjt3V7Z0WsD5fzMnM/edit

Note there's still a flaw here which is that the delta between a "normal" app and an app with 200 statements (highly unrealistic) is longer than the 8ms random variation, which means the timing range for those huge apps doesn't overlap at all with the timing range for a normal app.

I could extend the window to 16ms (then there would be a ~50% overlap), but is there a point in doing this? You could have one with 400 statements. At some point we have to draw the line.

CL is ready for review now:
https://codereview.chromium.org/2802603002/

Comment 19 by bugdroid1@chromium.org, Apr 13 2017

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/245e999d5e19bd5f4bd44feed0336b2156581df1

commit 245e999d5e19bd5f4bd44feed0336b2156581df1
Author: mgiuca <mgiuca@chromium.org>
Date: Thu Apr 13 06:32:55 2017

getInstalledRelatedApps: Introduce random delay to stop timing attacks.

Queries to this API have a sleep of 0--8ms, randomly chosen based on a
SHA-256 hash of the app ID, salted with a unique stable device ID. This
prevents websites from being able to determine whether a non-related app
is installed based on the time the query takes.

The website is unable to compare the timing of different apps because
they have different random delays, and unable to predict the random
delay for a given device, because the salt is unknown.

BUG= 707071 

Review-Url: https://codereview.chromium.org/2802603002
Cr-Commit-Position: refs/heads/master@{#464322}

[modify] https://crrev.com/245e999d5e19bd5f4bd44feed0336b2156581df1/content/public/android/BUILD.gn
[modify] https://crrev.com/245e999d5e19bd5f4bd44feed0336b2156581df1/content/public/android/java/src/org/chromium/content/browser/installedapp/InstalledAppProviderImpl.java
[add] https://crrev.com/245e999d5e19bd5f4bd44feed0336b2156581df1/content/public/android/java/src/org/chromium/content/browser/installedapp/PackageHash.java
[modify] https://crrev.com/245e999d5e19bd5f4bd44feed0336b2156581df1/content/public/android/junit/src/org/chromium/content/browser/installedapp/InstalledAppProviderTest.java
[add] https://crrev.com/245e999d5e19bd5f4bd44feed0336b2156581df1/content/public/android/junit/src/org/chromium/content/browser/installedapp/PackageHashTest.java

Comment 20 by mgiuca@chromium.org, Apr 13 2017

Status: Fixed (was: Started)

Comment 21 by sheriffbot@chromium.org, Apr 13 2017

Project Member
Labels: -Restrict-View-SecurityTeam Restrict-View-SecurityNotify

Comment 22 by palmer@chromium.org, Apr 13 2017

#18: I'd say it might a good idea to add either a hard limit, and/or a note in the developer documentation ("If you add too many statements, your app becomes more susceptible to installation detection (see  crbug.com/707071 )").

Comment 23 by mgiuca@chromium.org, Apr 18 2017

#22: It doesn't make much sense to add it to developer documentation because it would go in the Android documentation (DAL). This feature isn't part of DAL; it just uses that format. Maybe it makes sense once standardized, but not really as an experiment.

Would your hard limit be based on the number, or time? (I'd be OK with a number, but not a time.) Does this block the launch? We have missed branch point for making any more changes to this (but can of course merge).

Comment 24 by palmer@chromium.org, Jun 30 2017

Cc: mariakho...@chromium.org

Comment 25 by sheriffbot@chromium.org, Jul 20 2017

Project Member
Labels: -Restrict-View-SecurityNotify allpublic
This bug has been closed for more than 14 weeks. Removing security view restrictions.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment