[tracking bug] Find a good storage mechanism for snippets |
||||||||||||||||
Issue descriptionRight now, snippets are persisted in prefs, and thumbnails will likely be stored as raw files at first. We should find a better, common storage format - maybe LevelDB or SQLite.
,
Mar 18 2016
If you want to look into this, great! Note that it's not an MVP requirement though. At least for the first iteration, it should be good enough to just store the images as files.
,
Mar 31 2016
,
Apr 11 2016
,
Apr 22 2016
,
Apr 22 2016
Moving to MVP, since we more or less need this to properly support caching of thumbnails. Also setting to available, in case someone finds time to look at this while Markus is out.
,
Apr 22 2016
,
Apr 22 2016
,
Apr 22 2016
,
Apr 22 2016
,
May 2 2016
,
May 2 2016
I was looking a little at this bug since it seems like the only zine-mr-mile-MVP bug I could realistically get spun up on quickly enough for the next branch point. Seems there is a ThumbnailDatabase class already that uses SQLite. It's actually a favicon database, but probably quite similar to what we want. I could probably build something similar, though I don't have enough context yet to know the proper interface.
,
May 2 2016
We'll want to store both snippets and their associated images in the same DB. Not sure if the thumbnail DB does that kind of thing, or it just stores images. I think the first thing to do here is figure out kind of DB we want, e.g. SQLite or LevelDB or something else. I don't have any idea about their pros and cons.
,
May 3 2016
Assigning to sfiera for the investigation - results so far: SQLite would definitely work for us (example code: the thumbnails/favicon DB), but might be more complex than what we need.
,
May 3 2016
Yeah, SQLite can be a bit hairy in places. OTOH, we could access it directly from Java even when the native library isn't loaded, which might become useful. LevelDB is simpler, but it's just a plain key-value storage, so we'd have to write our own serialization.
,
May 3 2016
The thumbnail/favicon DB (SQLite-backed) stores image blobs. It definitely stores 16/32px icons, but I think it handles the larger ones too. There are some other fields, like dimensions and urls. //components/suggestions/image_manager.h (LevelDB-backed) is a good match for what we want, I think. The favicon database is a real database; the suggestions image manager is more of a file-backed cache (though I don't see anything that does eviction). It has a generic name, but it appears to have been specifically written for thumbnails. ProtoDatabase<> (LevelDB wrapper) is very simple to use if you can use string keys (URLs are sufficient for us?) and proto values (easy enough). Don't have to write any serialization code. Major downside to LevelDB: it loads everything into memory (or at least, ProtoDatabase<>, used by the image manager, seems to work that way). On the other hand, if we have a non-trivial number of snippets to choose from, we will need to choose some subset to show, and we probably won't want to express that ranking in SQL, so we'll load everything anyway. I would say that if we're prepared to load all snippets into memory, then LevelDB is definitely the right choice. If we're not, I'll have to dig some more.
,
May 3 2016
Thanks for the investigation! I actually looked at suggestions::ImageManager (in fact, I just moved suggestions::ImageFetcher into its own components, so that we can reuse it). It should indeed be a good model, but seems just different enough that we can't reuse it directly. String keys: This basically means we can't search in LevelDB, right? I think that's fine. Bernhard, WDYT? Using proto values is probably a good idea anyway, even if proto use isn't very common in Chrome. Certainly better than writing custom serialization. Keeping everything in memory: I don't think we want to commit to always keeping all images in memory. But I can't imagine that this is a general LevelDB limitation..?
,
May 3 2016
It's not a general LevelDB limitation, merely on the wrappers. We'd either need to use the LevelDB without the wrappers, and do our own serialization, or add lookup and range support to the wrappers ourselves. We could do it, but the choice is less clear. If you want to do something like "load data for snippets, minus the images" then that sounds more SQL-like. You want some columns and not others.
,
May 3 2016
Just to add: fundamentally, LevelDB can have data and images loaded separately with a prefix. For example, snippet:https://example.com/moon-actually-is-cheese has the data; thumbnail:https://example.com/moon-actually-is-cheese has the image. We still need support from ProtoDatabase<> to load the range ["snippet:", "snippet;") and later lookup "thumbnail:https://example.com/001.html". We can do it, but it's another thing that complicates LevelDB and makes it less obvious a choice.
,
May 3 2016
Thanks! I'd still say we go with LevelDB. For now, it's fine to keep everything in memory, and if we ever need to change that, it's not too hard to do. bauerb and sfiera, WDYT?
,
May 3 2016
I agree. I think LevelDB will work better for now. One caveat: I read leveldb_proto, and adding lookup/range support should be easy, but its OWNERS might prefer to keep the API small. Then, changing to not keep everything in memory becomes less easy (though still possible).
,
May 3 2016
I'm sure extending leveldb_proto can be worked out (if it even becomes necessary). If we're particularly worried, we could ask the owners if they'd be very opposed to that, but I can't imagine they would be.
,
May 3 2016
,
May 4 2016
OK, SGTM.
,
May 4 2016
Alright, then let's consider the investigation done! Implementation will be tracked on bug 605870 .
,
May 5 2016
I was talking with Bernhard earlier, and was wondering why we are communicating the snippets to the browser in JSON instead of as a protocol buffer? With protocol buffers, reading the snippets in would be easier and transmitting them would consume less user data. Additionally, if we store the snippets using ProtoDatabase/LevelDB, it would make sense to just have one representation of the snippets.
,
May 6 2016
Good question - in fact, the json we fetch is just a serialized proto. Maybe OnePlatform doesn't support (binary) protos? Either way, I'm not sure we'll be able to use the same representation for now, since what ChromeReader sends isn't exactly the same that we'd want to store. When we have our own server, then I expect we'd use the same proto for storage and transmission.
,
Jul 1 2016
,
Jul 1 2016
,
Jul 1 2016
|
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by markusheintz@chromium.org
, Mar 18 2016