New issue
Advanced search Search tips

Issue 682677 link

Starred by 16 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocked on:
issue 886861


Show other hotlists

Hotlists containing this issue:
Hotlist-1


Sign in to add a comment

Increasing number of items in cache decreases response speed drastically

Reported by lazyshee...@gmail.com, Jan 19 2017

Issue description

UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36

Steps to reproduce the problem:
// complete test case links given below
1. Cache some resources into Cache Storage using Service a Service Worker
2. Hijack responses with SW and respond from cache using "Cache.match()" ([^] https://developer.mozilla.org/en-US/docs/Web/API/Cache/match)
3. Observe response time of any item retrieved from cache and note it.
4. Increase number of items in Cache storage.
5. Observe and compare response times for items retrieved from cache storage. Having more items in a specific cache increases retrieval time drastically.

Complete test case;
// test case
> https://lazysheepherd.com/sw-test/bloat
// findings
> https://docs.google.com/spreadsheets/d/1ulErfthbzGT8Ip9pka3s76zX_HKYkPGoZF9bGGQddrU

Notes;

> Problem scope is for a single cache, not whole local Cache Storage. Which means speed of retrieval is affected by only number of items in the cache item being retrieved. Therefore retrieval time increases if you divide items in different caches and specify which cache to match against in "Cache.match()" function's "options.cacheName" option, speed is increased as the number of target items reduced.

> I have made example and have derived some stats from it but I do not have extensive knowledge on profiling in browser, so I was only able to test and get averages manually. While some exhaustive testing still would be good, results are already very conclusive and visible.

What is the expected behavior?
Much more optimized handling of cache storage item matching and retrieval.

What went wrong?
There is a solid linear correlation between number of items in the cache and response time.

Response time is roughly doubled for every tenfold increase in number of items in the cache.
e.g.
~10ms with 1 items in cache,
~20ms with 10 items in cache,
~100ms with 100 items in cache,
~1000ms with 1000 items in cache and so on.

Did this work before? No 

Does this work in other browsers? N/A

Chrome version: 55.0.2883.87  Channel: stable
OS Version: 6.1 (Windows 7, Windows Server 2008 R2)
Flash Version: Shockwave Flash 24.0 r0
 
I will add since I forgot to mention; problem is visible on chrome for android too. And in fact I have discovered this issue by experiencing it one of my PWA projects. It is working many times slower offline than online.
Components: -Blink>ServiceWorker Blink>Storage>CacheStorage
Labels: -OS-Windows OS-All
Status: Available (was: Unconfirmed)
It's because you're using ignoreSearch = true. Chrome's CacheStorage has terrible performance with ignoreSearch because it searches linearly through the keys. This is an unfortunate but known issue.

Leaving the issue open as something we should eventually address.
While it is really unfortunate, I was able to come up with a very convenient workaround.

I check whether if the request url service worker currently handling has a query parameter, and set 'ignoreSearch' to 'true' only if it is.

// check if query has query parameters
var hasQuery = event.request.url.indexOf('?') != -1;
...
// then set the option dynamically
ignoreSearch: hasQuery

I have also posted this solution to stackoverflow: http://stackoverflow.com/questions/41758252/service-worker-slow-response-times

Comment 5 by bke...@mozilla.com, Jan 20 2017

FWIW in gecko we store "url without query" as a separate column to avoid the linear search.

But this sort of thing is why we have in general pushed back on requests for general substring matching in the Cache API.  Its hard to implement without triggering this behavior.
Here's another reduced test-case for this: https://rawgit.com/jakearchibald/23bc89cf49d81c77015f0ef81a1ded75/raw/dcdaa3eb7e9841f689f2f87156affbeaaad7c55d/

Open in Canary with the console open.
Click "Cache stuff" - this caches 200 items.
Click "Fetch from cache via SW" - this fetches the 200 items using caches.match.
Click "Fetch from cache via SW ignoring search" - this fetches the 200 items using caches.match with ignoreSearch.

On my machine, the difference is 330ms vs 13000ms.
Owner: jkarlin@chromium.org
Status: Assigned (was: Available)
Wow, that's incredibly bad. I'll look into this.
Status: Available (was: Assigned)
Ah yes, I remember why we didn't address this in the first place. CacheStorage's backend (simple cache) doesn't keep the urls in its index, it only keeps a hash of the url in its index. This keeps the index fairly small.

To get the url we have to iterate over each entry, open it, and parse the metadata. Hence the ridiculously long times.

This would be an easy fix if we had a more flexible backend such as sqlite. Instead we're left w/ implementing optionally storing urls in the backend index or storing them on behalf of the backed in the CacheStorage metadata and dealing with synchronization issues.

As this isn't a quick fix and I don't have much time on my hands for this I'm marking it as available.

I have tested changing the name (so the order(at least the order visible to dev tools)) of a particular item in cache to see if that will help. E.g. if placing an item 1st or last place still would require to process all items in the cache. It looks like it does. Seems match does not occur as it iterates over each cache entry but first all items' metadata taken to a temp array, and only then match occur. Statistically speaking, if feasible, it will improve match speed considerably if it match each item against the request as it retrieves the metadata and stop the process if a match is got.

Secondly, are "ignoreVary" and "ignoreMethod" parameters also subject to this problem? Have there been a test for that?

And thridly, @Jake have you tested my workaround given above for this problem and/or come up with an other (perhaps a more efficient one if this one is not) workaround? 
Also, I have spent couple days on this and found nothing written about it anywhere on the internet. Now we have this issue entry and a SO question about it. Would those be enough to guide people out of this problem in the future or would it be necessary and appropriate to place some warnings on Google and MDN docs about this problem?
Status: Assigned (was: Available)
There is a hacky fix that I can do while we work out what to do with backend indexing. On the first use of ignore-vary we can memoize the keys in the cache, and keep it updated (with future puts/deletes) and use the memoized keys for future ignore-vary calls. So the first will still be slow but the remaining ones will be fast. Jake's example will perform nicely.
Cc: jkarlin@chromium.org jsb...@chromium.org cmumford@chromium.org
Owner: ----
Status: Available (was: Assigned)
Chrome doesn't support multiple vary entries in the same cache, so ignoreVary isn't a performance issue. Same goes for ignoreMethod. Yay more brokenness.

I think it's time we took a serious look at switching to sqllite so that we can get atomic operations and support for more indexes. I don't want to spend more time on hacking around our backend.





Project Member

Comment 12 by sheriffbot@chromium.org, Mar 7 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)
Summary: Increasing number of items in cache decreases response speed drastically (was: Increasing number of items in cache increases response speed drastically)
Is there any progress on this issue?
No progress. If anyone on the storage team wants to take it, we'd likely need to keep a set of stored urls in the CacheStorageCache index. I worry about this set getting out of sync with the backend. I also worry about the size of this in memory. URLs can be quite long. But if we had it, the slow path of QueryCache would be gone which would be amazing.

Comment 17 by bke...@mozilla.com, Apr 19 2018

> I also worry about the size of this in memory. URLs can be quite long.

FWIW you can index a hash of the URL to reduce the footprint of the index.  We do this in the Cache API implementation in gecko.
A list of hashed URLS is what we use for our backend index. The problem is that the "list" function (and ignore query) requires the full urls, so we open each entry and extract the URL which is super slow. 

Blockedon: 886861

Sign in to add a comment