New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 594122 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 584278
issue 605870



Sign in to add a comment

[tracking bug] Find a good storage mechanism for snippets

Project Member Reported by treib@chromium.org, Mar 11 2016

Issue description

Right now, snippets are persisted in prefs, and thumbnails will likely be stored as raw files at first. We should find a better, common storage format - maybe LevelDB or SQLite.
 
Owner: markusheintz@chromium.org

Comment 2 by treib@chromium.org, Mar 18 2016

Status: Assigned (was: Available)
If you want to look into this, great! Note that it's not an MVP requirement though. At least for the first iteration, it should be good enough to just store the images as files.

Comment 3 by fi...@chromium.org, Mar 31 2016

Blocking: 584266

Comment 4 by treib@chromium.org, Apr 11 2016

Cc: markusheintz@chromium.org
 Issue 602259  has been merged into this issue.

Comment 5 by treib@chromium.org, Apr 22 2016

Blocking: 584429

Comment 6 by treib@chromium.org, Apr 22 2016

Labels: -zine-mr-mile-v2 zine-mr-mile-MVP
Owner: ----
Status: Available (was: Assigned)
Moving to MVP, since we more or less need this to properly support caching of thumbnails.
Also setting to available, in case someone finds time to look at this while Markus is out.

Comment 7 by bauerb@chromium.org, Apr 22 2016

Blockedon: 605870

Comment 8 by bauerb@chromium.org, Apr 22 2016

Blocking: -584429

Comment 9 by treib@chromium.org, Apr 22 2016

Blocking: -584266 584278
Blocking: 605870
Blockedon: -605870
Labels: -Pri-2 Pri-1
I was looking a little at this bug since it seems like the only zine-mr-mile-MVP bug I could realistically get spun up on quickly enough for the next branch point. Seems there is a ThumbnailDatabase class already that uses SQLite. It's actually a favicon database, but probably quite similar to what we want.

I could probably build something similar, though I don't have enough context yet to know the proper interface.
We'll want to store both snippets and their associated images in the same DB. Not sure if the thumbnail DB does that kind of thing, or it just stores images.

I think the first thing to do here is figure out kind of DB we want, e.g. SQLite or LevelDB or something else. I don't have any idea about their pros and cons.
Owner: sfiera@chromium.org
Status: Assigned (was: Available)
Assigning to sfiera for the investigation - results so far:
SQLite would definitely work for us (example code: the thumbnails/favicon DB), but might be more complex than what we need.
Yeah, SQLite can be a bit hairy in places. OTOH, we could access it directly from Java even when the native library isn't loaded, which might become useful.

LevelDB is simpler, but it's just a plain key-value storage, so we'd have to write our own serialization.
The thumbnail/favicon DB (SQLite-backed) stores image blobs. It definitely stores 16/32px icons, but I think it handles the larger ones too. There are some other fields, like dimensions and urls.

//components/suggestions/image_manager.h (LevelDB-backed) is a good match for what we want, I think. The favicon database is a real database; the suggestions image manager is more of a file-backed cache (though I don't see anything that does eviction). It has a generic name, but it appears to have been specifically written for thumbnails.

ProtoDatabase<> (LevelDB wrapper) is very simple to use if you can use string keys (URLs are sufficient for us?) and proto values (easy enough). Don't have to write any serialization code.

Major downside to LevelDB: it loads everything into memory (or at least, ProtoDatabase<>, used by the image manager, seems to work that way). On the other hand, if we have a non-trivial number of snippets to choose from, we will need to choose some subset to show, and we probably won't want to express that ranking in SQL, so we'll load everything anyway.


I would say that if we're prepared to load all snippets into memory, then LevelDB is definitely the right choice. If we're not, I'll have to dig some more.
Thanks for the investigation!
I actually looked at suggestions::ImageManager (in fact, I just moved suggestions::ImageFetcher into its own components, so that we can reuse it). It should indeed be a good model, but seems just different enough that we can't reuse it directly.

String keys: This basically means we can't search in LevelDB, right? I think that's fine. Bernhard, WDYT?
Using proto values is probably a good idea anyway, even if proto use isn't very common in Chrome. Certainly better than writing custom serialization.

Keeping everything in memory: I don't think we want to commit to always keeping all images in memory. But I can't imagine that this is a general LevelDB limitation..?
It's not a general LevelDB limitation, merely on the wrappers. We'd either need to use the LevelDB without the wrappers, and do our own serialization, or add lookup and range support to the wrappers ourselves. We could do it, but the choice is less clear.

If you want to do something like "load data for snippets, minus the images" then that sounds more SQL-like. You want some columns and not others.
Just to add: fundamentally, LevelDB can have data and images loaded separately with a prefix. For example, snippet:https://example.com/moon-actually-is-cheese has the data; thumbnail:https://example.com/moon-actually-is-cheese has the image. We still need support from ProtoDatabase<> to load the range ["snippet:", "snippet;") and later lookup "thumbnail:https://example.com/001.html".

We can do it, but it's another thing that complicates LevelDB and makes it less obvious a choice.
Thanks!
I'd still say we go with LevelDB. For now, it's fine to keep everything in memory, and if we ever need to change that, it's not too hard to do.

bauerb and sfiera, WDYT?
I agree. I think LevelDB will work better for now.

One caveat: I read leveldb_proto, and adding lookup/range support should be easy, but its OWNERS might prefer to keep the API small. Then, changing to not keep everything in memory becomes less easy (though still possible).
I'm sure extending leveldb_proto can be worked out (if it even becomes necessary). If we're particularly worried, we could ask the owners if they'd be very opposed to that, but I can't imagine they would be.
Labels: zine-mr-iter-13
OK, SGTM.
Status: Fixed (was: Assigned)
Alright, then let's consider the investigation done! Implementation will be tracked on  bug 605870 .
I was talking with Bernhard earlier, and was wondering why we are communicating the snippets to the browser in JSON instead of as a protocol buffer? With protocol buffers, reading the snippets in would be easier and transmitting them would consume less user data.

Additionally, if we store the snippets using ProtoDatabase/LevelDB, it would make sense to just have one representation of the snippets.
Good question - in fact, the json we fetch is just a serialized proto. Maybe OnePlatform doesn't support (binary) protos?

Either way, I'm not sure we'll be able to use the same representation for now, since what ChromeReader sends isn't exactly the same that we'd want to store. When we have our own server, then I expect we'd use the same proto for storage and transmission.
Labels: zine-mr-MVP
Labels: -zine-mr-mvp
Labels: zine-mr-MVP

Sign in to add a comment