New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 617620 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocked on:
issue 490029



Sign in to add a comment

non-media cache filled with streaming media from google play music

Project Member Reported by jkarlin@chromium.org, Jun 6 2016

Issue description

Within a couple hours of listening to streaming music on my Linux Desktop (which is now running simple cache) the cache contains almost entirely music, effectively clearing my cache. This is especially painful given that the resources aren't even conditionalizable. This applies to YouTube streams as well.

Presumably this is an effect of the SimpleCache LRU eviction algorithm.
 
Labels: -Type-Bug-Regression Type-Bug
Actually, I haven't confirmed that BlockFile cache didn't do this as well, so moving to Bug instead of Bug-Regression.
Components: Internals>Media
Summary: non-media cache filled with streaming media from google play music (was: SimpleCache fills with streaming media, replacing all other content)
This data should be in the media cache instead of the general cache. Google Play Music (which is what I've seen this issue with) uses XHR to gets its data, and presumably XHRs don't get tagged as media resources. 
Owner: jkarlin@chromium.org

Comment 4 by gavinp@chromium.org, Jun 14 2016

What's the solution here? Tagging XHRs into the media cache in some situations? A better eviction algorithm for simple cache?
I think a better eviction algorithm is the right approach. Any reason we couldn't drop the media cache after doing so?
YT also uses XHRs for HTML5 media playback, including on mobile. 
Cc: csharrison@chromium.org
jkarlin@, why do you suppose a better eviction algorithm is the right approach here? I'm nervous in general of complicating simple cache's eviction algorithm if we can get most of the wins by tagging these for use in media cache.

I understand the desire to have one, sane cache, but it seems like a longer term goal than fixing this bug.
How do you detect if an XHR request is for media or not at request time? That's the point that we decide which cache to use. 
Good point, the only think I can think of is to use the URL but that's probably way too hacky.

Any ideas to improve the eviction algorithm in a simple way?
Blockedon: 490029
Cc: shivanisha@chromium.org gavinp@chromium.org
Labels: -OS-Linux -Pri-3 OS-All Pri-2
gavinp@, csharrison@, jkarlin@, and shivanisha@ met and decided to proceed as follows:

We'll create separate bins within the SimpleCache based upon response content-type. Each bin will have a max-percentage of the cache that it's allowed to use. One of the bins will be a media bin. Eviction will continue to be LRU but will be run on the given bin. If a new entry will be too large for its bin it won't write to cache.

The end result will be that all media requests (with appropriately labeled content types) will wind up in the media bin of the cache, fixing the XHR problem. This will also allow us to remove the media cache.

Next steps:

1) Add UMA to track bytes written to cache for various content types. This is to give us an idea of what people's caches look like today.

2) Wait for SimpleCache to launch everywhere

3) Wait for SimpleCache to improve its index reliability

4) Implement the SimpleCache changes
Cc: pasko@chromium.org
"If a new entry will be too large for its bin it won't write to cache." Are we able to enact this policy at response time?

cc pasko@ as FYI

Comment 12 by pasko@chromium.org, Jun 20 2016

In general, getting rid of a backend instance for Media Cache sounds good. We do not always know which bin should the entry be in, when creating the entry. So that'd be more flexible, also less index files to flush to disk (yay).

> > "If a new entry will be too large for its bin it won't write to cache."
>
> Are we able to enact this policy at response time?

I am not sure this is a good idea.

We often do not know the size of the entry until we fully write to it. What this proposal seems to mean is that we need to be able to discard writing to a large entry (and return error?) when eviction starts, which complicates the (already complex) entry state machine (what if the entry is being doomed? in optimistic write?)

Meta question: I see a lot of "wait" in the plan. Can we run experiments (probably Clovis-like) to see what cache splitting strategy is better for cycling across a bunch of popular sites?
> > > "If a new entry will be too large for its bin it won't write to cache."
>
> > Are we able to enact this policy at response time?

> I am not sure this is a good idea.

Me neither, but if it's simple it's worth at least looking into when feasible.

> Meta question: I see a lot of "wait" in the plan. Can we run experiments (probably Clovis-like) to see what cache splitting strategy is better for cycling across a bunch of popular sites?

Do we have a reasonable approximation for what users navigations look like? E.g., playing music in the background all day while navigating other sites?



Comment 14 by pasko@chromium.org, Jun 20 2016

> Do we have a reasonable approximation for what users navigations look like?
> E.g., playing music in the background all day while navigating other sites?

We both know that we don't know how it looks like :/

Even this basic question is unclear: how to estimate the proportion of users that are affected by XHR eating their caches. I think this usecase is non-typical (esp on mobile), but it is rather a wild guess of mine.

Ideas are very welcome.

I am not even sure what I am trying to say :) Perhaps that the content type heuristic is cool, but there could be something else that is simpler and/or more efficient?

Also, we should make sure that by splitting cache in separate bins we do not regress UX for those who do not play music on the background and do not watch YT, but I did not easily understand how the plan guarantees it. To check for that I thought we could "just" cycle through X URLs simulating a user session. But now thinking a little more about it, any possible way we invent would raise super many questions :/

Maybe makes sense to run an origin-controlled experiment on YT that would save all YT XHR in a separate cache backend when in experiment group? Note: doing experiments around cache eviction is super long, we need to clear caches in Control group and wait until caches fill up (several weeks AFAIR).
Agree that mobile is less likely to have this issue as streaming often happens in dedicated apps.

If main cache grows by the size of the deleted media cache, and the media bin is the same same size as the old media cache, then we won't regress UX. It'll be the same.

I should mention that I'm not a huge fan of binning either. It leave some areas of the cache potentially under utilized, as you say. Ideally a simple eviction strategy would just work well, regardless of what we put in it. We have some time to come up with something better.
Project Member

Comment 17 by bugdroid1@chromium.org, Sep 23 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/81c04644f176ac5873c84db8f1cc8193762f88f3

commit 81c04644f176ac5873c84db8f1cc8193762f88f3
Author: jkarlin <jkarlin@chromium.org>
Date: Fri Sep 23 18:08:59 2016

[HttpCache] Add cache metrics for audio/video behavior

Cleans up the content-type metrics in http_cache_transaction and adds metrics
for two new content types, audio and video.

BUG=617620

Review-Url: https://codereview.chromium.org/2350183002
Cr-Commit-Position: refs/heads/master@{#420661}

[modify] https://crrev.com/81c04644f176ac5873c84db8f1cc8193762f88f3/net/http/http_cache_transaction.cc
[modify] https://crrev.com/81c04644f176ac5873c84db8f1cc8193762f88f3/tools/metrics/histograms/histograms.xml

Project Member

Comment 18 by bugdroid1@chromium.org, Sep 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9e25895414b4a327440262769ca8cd785fdbcddb

commit 9e25895414b4a327440262769ca8cd785fdbcddb
Author: jkarlin <jkarlin@chromium.org>
Date: Mon Sep 26 18:35:29 2016

[SimpleCache] Track some SimpleCache size stats on initialization

Once the SimpleCache has initialized, record its size, its max size, how full
it is, and how many entries it has. This gives us insight into the utilization
of caches.

BUG=617620

Review-Url: https://codereview.chromium.org/2355493003
Cr-Commit-Position: refs/heads/master@{#420941}

[modify] https://crrev.com/9e25895414b4a327440262769ca8cd785fdbcddb/net/disk_cache/simple/simple_index.cc
[modify] https://crrev.com/9e25895414b4a327440262769ca8cd785fdbcddb/tools/metrics/histograms/histograms.xml

Cc: mmenke@chromium.org
Given the delay on shipping Simple Cache everywhere, I guess this is on indefinite hold?  For servicification of the network stack, I really don't want to support an API that lets us bifurcate the cache based on resource type (Or even just bifurcating the cache and providing a separate mojo interfaces to use each one)
Right, I don't see anyone getting to this issue anytime soon. Given that we're already trashing the main cache anyway for lots of sites (e.g., youtube) it might be reasonable to argue that you don't need to support it. How about starting up a Finch trial and seeing how removing the media cache affects the ratio of bytes from network and page load performance?
Cc: hubbe@chromium.org
+hubbe we definitely have a bunch of bug reports from when we accidentally did this...
What about doing something like the Linux page cache (https://lwn.net/Articles/584101/, merged in 3.15): split the cache in two halves, one containing entries that were never hit and one containing entries that were hit at least once. Seems like a simple and cheap heuristic that will prevent YouTube and Play Music from thrashing the entire cache unless one plays all videos twice. :-)
Cc: -csharrison@chromium.org morlovich@chromium.org
So the SimpleCache backend now prioritizes by age and size. Media will get evicted sooner than other things. I believe the next plan is to remove the media cache and increase the size of the main cache. Of course, this doesn't work particularly well with the blockfile backend...
Cc: -gavinp@chromium.org jkarlin@chromium.org
Owner: ----
Status: Available (was: Assigned)
Yeah, I noticed, but in the modern world all too many pages load a 1M javascript which needs to be cached for the website to be usable outside of gigabit links (in other words, outside of selected areas of the West). With the default cache size of 320M, half an hour of watching youtube evicts these cached JS files. With --disk-cache-size=2000000000, one needs to watch youtube for several hours to evict that JS away.

But that's not a theoretical scenario: after reading the source code of SimpleCache and disk_cache and whatnot and deciding that 2G of cache should help, I went home, happened to binge some dashcam videos until 2am, and then again had to wait half a minute for Jira to load (normally it'd load under 5 seconds).

I mean, what's the point of evicting a javascript that has seen hundreds of cache hits every day to store a 2M fragment of youtube video that never being played again?
On what platform? SimpleCache isn't used on Windows or Mac.
Linux.
Great, thanks. Balancing the eviction knobs is slow-going work, but should be done.
FYI:  When I implemented the age-and-size priority in simplecache, I also tried a lot of different methods for deciding what entries to evict. My observation was that most of these algorithms gave rather small improvements, and may not be worth it. Increasing the cache size made a much bigger difference.

My suggestion would be to have a more flexible upper limit for the cache size. Normally the cache size is only a couple of percent of the available storage, but it's probably ok for chrome to use more than that while people are actually using chrome. This could be implemented by simply ignoring recently used entries when deciding if it's time to evict stuff or not. Another interpretation would be to have a limit of cache size * 1.5 for as long as the browser is considered "active", then trigger a cache eviction without the 1.5 when the browser goes "idle".

Either way, the increased cache size will also make the cache eviction more efficient, as it will have more entries to choose from when evicting.

An adaptive cache size is intriguing. We've previously studied the effects of statically doubling the cache size. It had a marginal improvement (1-2%) to cache hit rate.

Well, I'm just one data point of many, but:

- I'm not willing to dedicate more than cca 4G to chrome cache
- with 4G of cache, I think I can still reproduce the issue, but I'd have to watch 4K videos for a few hours :-)
- I have no idea what other effects the "hit at least once" heuristic has, I'm convinced it'd fix my problem but it may break other use cases :-/
hubbe: AFAIR the usage pattern you experimented with was not involving watching 4K videos for a few hours :)

re: the "hit at least once" heuristic:
+ clearly solving one problem, seems nice
+ easy to implement in simplecache? could occupy one bit in the index entry, and then some simple magic in eviction
+ dividing in two equal parts looks arbitrary, but also could be tuned
+ the only drawback I can see is that the cache is less efficient when a user drastically changes access pattern (and half of the cache is not enough for that) - should not happen often?

re: adaptive size:
+ attractive, the current sizing is mostly accidental
Agree that it should be fairly simple to implement. I worry about how the various eviction strategies will behave together. We need some simulations.
see also squid eviction policies
http://www.hpl.hp.com/techreports/1999/HPL-1999-142.pdf
Blockfile actually implements something like this, but more complex (of course). 

I am bit worried about things like metadata write on JS files producing an extra touch, etc., though maybe that's a feature :p

> I worry about how the various eviction strategies will behave together. We need some simulations.

I don't understand. I thought we have a framework of cache clearing that prevents strategies playing together in the field...

For simulations we could go the approach from hubbe@ - recording local access patterns and then simulating those. I've recently explored how to, among other things, dump data from Clank (go/dumpium - sorry, googlers only), it is more complicated than necessary for this purpose, but feel free to take the ideas/snippets. We could even try to dump access patterns for watching-videos-for-hours. That would be the most difficult part of the project.
Ah, I'm saying if we were to layer the frequency bit on top of Hubbe's age+recentness strategy, it could be strange. Yes, we can test them in whatever combination we like.
yea, I agree it is not the easiest thing to reason about
One crazy thought I had was to try to have a cache API that lets one influence eviction scoring based on in-memory hints. Only SimpleCache having those, and general difficulty making this a generically meaningful API is a big counter-argument, as is a lack of a versioning story for that... but also including  HINT_UNUSABLE_PER_CACHING_HEADERS using the same mechanism as some sort of re-used bit would be nice.

If the consumer knew that something shouldn't be stored then couldn't it request the resource with a DO_NOT_CACHE (or whatever it is) flag?
Well, yes, but rewinding videos (Or replaying short videos) probably isn't too uncommon.  So ideally, we'd cache video, but evict it more aggressively (Or only cache so much of it), particularly given how big videos can be (Which means fetching a video can be expensive, but so is caching it).
Ah, yes. So now I'm imagining a consumer that tracks media resources on a page and once the page has been navigated away from it can tell the backend to prioritize deleting those resources. 

Sign in to add a comment