New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 7 users

Issue metadata

Status: WontFix
Owner:
Last visit > 30 days ago
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Android , Windows , Mac
Pri: 2
Type: Bug



Sign in to add a comment
link

Issue 796060: Cache Storage value rises on each refresh when Analytics code is in the html

Reported by drmrbre...@gmail.com, Dec 19 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36

Steps to reproduce the problem:
(1) open new tab
(2) browse to https://cloud3squared.com/files/sw-analytics-demo/index.html
(3) note the "using" value on the page (which matches the Cache Storage value in devtools, though a step behind)
(4) hit F5 repeatedly, noting the value increasing each time

What is the expected behavior?
Cache Storage value should remain constant... all resources are already cached in the service worker cache

What went wrong?
Cache Storage value increases forever.

Without the Google Analytics code, the Cache Storage value is stable.  The following is the same code, except that the Analytics code block has been removed:

https://cloud3squared.com/files/sw-noanalytics-demo/index.html

Did this work before? N/A 

Chrome version: 63.0.3239.108  Channel: stable
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version:
 

Comment 1 by chem...@gmail.com, Dec 19 2017

I think this problem occurs even without Analytics code (or it's something that is also present in Firebase JS).

Since 63 hit my Android device I've been getting Quota Exceeded errors. I reinstalled 62 via APKMirror and everything worked fine again. Updated to 63 and the problem returned within a day.

Comment 2 by drmrbre...@gmail.com, Dec 19 2017

Re comment #1... thanks for chiming in.  I've also seen it without analytics code... indeed I filed another bug report about it (https://bugs.chromium.org/p/chromium/issues/detail?id=795704) ... however as mentioned in that issue, neither the Chromium team nor I could actually reproduce it again... but it was *definitely* happening.  Hopefully this new demo (with analytics code) is a way to reproduce the behaviour consistently.

I also have a related issue at https://bugs.chromium.org/p/chromium/issues/detail?id=795134 which demonstrates that quota reporting seems to be all over the place.

Comment 3 by chem...@gmail.com, Dec 19 2017

Maybe the problem is easier to reproduce on Android? I've run into Quota Exceeded when it should not be possible twice now and I expect a friend also did once (because of a particular issue she had with my hobby project web app).

I'm on Android 8.1 (Pixel 2016), she's on Android 7.x (Galaxy S7)

Comment 4 by drmrbre...@gmail.com, Dec 19 2017

Yep, certainly more likely to run into Quota Exceeded when on a space-limited environment like a mobile device.  I hit that same error not on Android but in a space-limited IDE... https://github.com/GoogleChrome/puppeteer/issues/1596 ... that is what got me into trying to work out what was going on.  But I think the problem is possibly not easier to reproduce (as such) on Android... maybe just more noticeable because you hit the quota limit quickly whereas on a desktop you don't (for a while).

Can you reproduce a problem with the following steps: https://bugs.chromium.org/p/chromium/issues/detail?id=795134#c9.  It's as if a chunk of disk usage is counted multiple times towards the quota.  The problem I'm reporting in this particular issue may be related to that... double/triple/etc counting of resource usage... and hitting Quota Exceeded even when you haven't really.

The other issue reported in 795134 is that the quota usage seems just so way above *actual* usage... I could accept a small uplift (due to internal factors) but not such a massive uplift.

I note that the Chromium team closed that issue (https://bugs.chromium.org/p/chromium/issues/detail?id=795134)... rather prematurely IMHO... and it's not clear whether it is still being investigated.

Comment 5 by krajshree@chromium.org, Dec 19 2017

Labels: Needs-Triage-M63

Comment 6 by krajshree@chromium.org, Dec 20 2017

Labels: Triaged-ET M-65 OS-Linux OS-Mac
Status: Untriaged (was: Unconfirmed)
Able to reproduce this issue on Mac 10.12.6, Win-10 and Ubuntu 14.04 using chrome reported version #63.0.3239.108 and latest canary #65.0.3299.0 using comment #0.
This is a non-regression issue as it is observed from M50 old builds. 

Hence, marking it as untriaged to get more inputs from dev team.

Thanks...!!

Comment 7 by krajshree@chromium.org, Dec 20 2017

Correction: From M60 and older builds, on navigating to url: https://cloud3squared.com/files/sw-analytics-demo/index.html "using" is shown as undefined. Hence, marking untriaged from M-61.

Comment 8 by alph@chromium.org, Dec 20 2017

Owner: eostroukhov@chromium.org
Status: Assigned (was: Untriaged)
Can repro on M63

Comment 9 by ericbidelman@chromium.org, Dec 21 2017

Components: Blink>Storage>CacheStorage
Adding https://bugs.chromium.org/p/chromium/issues/detail?id=795133, which is similar.

That one mentions a fix in Chrome 64, but I'm still seeing reproing comment #0 and https://bugs.chromium.org/p/chromium/issues/detail?id=795134#c9 in the latest Mac 10.12.6 Canary 65.0.3299.0.

Comment 10 by jeffy@chromium.org, Jan 2 2018

FWIW, I'm also able to reproduce via the steps mentioned in #0 (reloading https://cloud3squared.com/files/sw-analytics-demo/index.html) on Chrome Canary 65.0.3309.0 on a Macbook. (As well as on earlier instances of Chrome.)


https://bugs.chromium.org/p/chromium/issues/detail?id=795133#c8 suggests that this issue is triggered by having DevTools open, but I'm able to reproduce just by opening a Chrome Canary browser window and reloading that reproduction URL repeatedly, without ever opening DevTools.

Furthermore, if you test this in an Incognito window with a comparatively smaller upper limit on Cache Storage usage, Chrome stalls indefinitely on loading the page once you've hit the quota limit. Nothing is logged and the cache storage operations don't appear to fail when this happens, as seen in the attached screen shot.

I've anecdotally started hearing from partners that have deployed service workers and are running into this issue "in the real world".

Comment 11 by bsittler@chromium.org, Jan 2 2018

Cc: bsittler@chromium.org

Comment 12 by bsittler@chromium.org, Jan 2 2018

Behavior seems to be devtools-independent and reproducible on multiple platforms

Comment 13 by bsittler@chromium.org, Jan 2 2018

Labels: OS-Android

Comment 14 by bsittler@chromium.org, Jan 2 2018

Cc: pwnall@chromium.org

Comment 15 by bsittler@chromium.org, Jan 2 2018

Labels: Needs-Feedback
I'm seeing a bunch of cache entries with unique URLs like https://www.google-analytics.com/collect?... where some parameters are time-varying. Is it possible these (which seem to be created at each page load) are responsible for the observed behavior?

Comment 16 by bsittler@chromium.org, Jan 3 2018

I think the page's SW cache code needs to change to stop creating cache entries for Google Analytics "collect" GIFs since those have per-page-view changes in their "a" query parameter (a random number, apparently.) Deleting these entries brings the cache usage back to a reasonable level in my tests.

Comment 17 by mek@chromium.org, Jan 3 2018

And the impact of these Google Analytics gifs is particularly bad because they are opaque responses, of which we don't want to reveal the exact size. As such their size is padded by on average 7MB.

Comment 18 by jeffy@google.com, Jan 3 2018

Thanks for the context regarding the size impact of padding for opaque responses.

Is there any special significance to the ~7MB size inflation? I do appreciate the need to avoid leaking information, but would it be sufficient to report back sizes that varied from the actual payload on the order of X bytes, or X kilobytes instead?

Can you confirm it's the padded sizes, rather than the actual bytes on disk, that are used to determine whether a QuotaExceeded exception is thrown? Assuming it is, I'm sure there's some caution needed to make sure that information doesn't leak by caching something opaque and then repeatedly trying to trigger QuotaExceeded, but at the same time the current behavior is causing developer and user pain.

Comment 19 by bsittler@chromium.org, Jan 3 2018

Status: WontFix (was: Assigned)
It's the padded size, and it must be that to prevent trivial defeat of the mitigation.

Do not cache resources that should not be cached.

Comment 20 by bsittler@chromium.org, Jan 3 2018

(We do need to work with owners of affected libraries and ensure they are updated to account for the mitigation, but that is a separate issue.)

Comment 21 by pwnall@chromium.org, Jan 3 2018

#18: Sorry about the pain!

I agree that the padding sizes are unfortunate and can cause developer pain. Sadly, after spending a long time looking into the issue, we found that anything less would not sufficiently mitigate the security vulnerability that the padding is trying to address.

If you're curious (I generally like to know what's causing me pain), you can read more about the problem I mentioned at https://www.blackhat.com/docs/us-16/materials/us-16-VanGoethem-HEIST-HTTP-Encrypted-Information-Can-Be-Stolen-Through-TCP-Windows-wp.pdf

Comment 22 by drmrbre...@gmail.com, Jan 3 2018

pwnall@ thanks for the explanations.  It makes sense.  Have to be more careful about what is being cached.  It's tricky, though, when the inclination is to cache all resources (to fully enable offline use of the page concerned), and then a third-party library like Google Analytics throws in a resource with a randomly-varying URL!

Here is a quick fix for the original demo, so that it isn't caching those collect files: https://cloud3squared.com/files/sw-analytics-demo-fixed/index.html

What I've done in the above fix is to check the cache-control header and use that as an indication as to whether the response should or shouldn't be cached in SW.  If it's opaque, the header cannot be read, so we don't cache those either.  

But then it's no longer caching all resources that IMHO it should be.  For example, it seems that most (all?) js library files stored on CDNs are returned in opaque responses... so those will no longer be cached in SW when those are prime candidates for caching.  In the above example, the jquery and webfontloader libraries are not cached in SW as a result.
 
The alternative is to control SW caching based on the URL... e.g. check for presence of 'https://www.google-analytics.com/collect' and don't cache those.  But that is very specific, and when Google (or whatever third party is involved) changes their URL then it breaks SW caching.

Any suggestions about how to decide what really *shouldn't* be cached in SW?

Comment 23 by bke...@mozilla.com, Jan 3 2018

For resources like this its probably best to strip the query parameters when caching.  That way you only keep the most recent.  The `cache.match(req, { ignoreSearch: true })` style query also probably helps here.

Comment 24 by chem...@gmail.com, Jan 3 2018

@c23: except that ignoreSearch significantly slows down the lookup (in sw-toolbox it does, at least. I had to remove it because fetch calls easily took > 100ms)

Comment 25 by bke...@mozilla.com, Jan 3 2018

I believe ignoreSearch performance is a known bug in chrome.  See:

https://bugs.chromium.org/p/chromium/issues/detail?id=682677

I thought that was making progress, but I see it hasn't gotten any attention for close to a year.  Sorry.

Comment 26 by drmrbre...@gmail.com, Jan 3 2018

Here's the original demo fixed using ignoreSearch (but only if query URL has '?' due to the performance issues presently associated with ignoreSearch noted above): https://cloud3squared.com/files/sw-analytics-demo-ignoresearch/index.html

Comment 27 by bsittler@chromium.org, Jan 3 2018

#22 - given the widespread usefulness of opaque responses in building an offline-capable, faster-than-network-capable web page, I think caching them generally makes sense. Likewise I think discarding query parameters on all cache entries is fairly disastrous in widespread cases where the query parameters are actually important (e.g. profile images, where the account identifier-derived key for the profile image is often a query parameter, as are image sizes).

I think it makes a lot of sense, though, to have a list of URLs or URL patterns that are not cached. To me 'https://www.google-analytics.com/collect' seems like a good candidate for that, and I expect that maintaining a list of uncacheable URL patterns should not be too onerous for an app maintainer.

One could even imagine a service like this (which includes a JavaScript library portion loaded by the page) even advertising the uncacheable parts in a JavaScript global variable, or even self-deleting matching requests from all caches during library loading

Comment 28 by bsittler@chromium.org, Jan 3 2018

Another option might be to switch from GET to POST for your Google Analytics data collection:

https://developers.google.com/analytics/devguides/collection/protocol/v1/reference#using-post

Comment 29 by bsittler@chromium.org, Jan 3 2018

You can tell the Google Analytics JavaScript library to send reports using navigator.sendBeacon (which uses POST rather than GET) using

gtag('config', 'GA_TRACKING_ID', { 'transport_type': 'beacon'});

https://developers.google.com/analytics/devguides/collection/gtagjs/sending-data#specify_different_transport_mechanisms

Comment 30 by bsittler@chromium.org, Jan 4 2018

On average each cached opaque response's padding is expected to contribute about 7 megabytes to the storage usage estimate. You can see how many of these responses you have in your caches like this, I think:

(async ()=>{console.log((await Promise.all([].concat(...await Promise.all((await Promise.all((await caches.keys()).map(key => caches.open(key)))).map(cache => cache.keys()))).map(request=>caches.match(request)))).filter(resp=>![...resp.headers.entries()].length).length);})()

Padding is computed here, apparently:

https://cs.chromium.org/chromium/src/content/browser/cache_storage/cache_storage_cache.cc?type=cs&q=cache_storage_cache+responsepaddingint&sq=package:chromium&l=310

Comment 31 by bsittler@chromium.org, Jan 4 2018

#30 has a bug. use this instead:

(async ()=>{console.log((await Promise.all([].concat(...await Promise.all((await Promise.all((await caches.keys()).map(key => caches.open(key)))).map(cache => cache.keys()))).map(request=>caches.match(request)))).filter(response=>!response.headers.get('date')).length);})()

Comment 32 by drmrbre...@gmail.com, Jan 8 2018

For anyone stumbling onto this thread, even though it's closed, I thought it useful to link to another interesting discussion of opaque responses: https://stackoverflow.com/a/39109790/4070848

Comment 33 by y...@yoav.ws, Jul 16 2018

Cc: jkarlin@chromium.org
bsittler@/jkarlin - do you have any info/documentation regarding the reasoning behind the "7MB per opaque response" average? It seems to have real cost in real-life applications, so I'm curious to see if there are ways to lower it without compromising SOP protections.

The padding value seems to have jumped from ~200KB to ~7MB on https://chromium-review.googlesource.com/c/chromium/src/+/663398. Maybe there's some middle-ground between the two values. Otherwise, we're effectively making the caching of opaque responses a huge footgun :/

Comment 34 by jsb...@chromium.org, Jul 23 2018

Cc: -bsittler@chromium.org cmumford@chromium.org
cmumford@ may be able to share data.

The padding was arrived at after consultations with the security team. It's not ideal, but no better solution has been proposed that satisfies the constraints.

Sign in to add a comment