Use UTF-8 in DOMStorage |
|||||||||||
Issue descriptionRight now DOMStorage uses NullableString16(), which is just string16 with additional features. What was behind the decision to use string16 in the first place? Can we use string (UTF-8) instead? Or maybe we can replace NullableString16 with something that would decide whether to use UTF-8 or UTF-16 depending on the string itself (but still provide UTF-16 get/set methods)?
,
Aug 24 2016
Thanks for the explanation, jsbell@! A little more context - I'm primarily concerned about DOMStorage memory usage in the browser process. For example browsing yahoo.com can easily bring DOMStorage to ~5 MiBs. So I wonder if we can trick from https://codereview.chromium.org/1768063002 on the browser side? Or maybe we can make DOMStorageArea and friends to save/load opaque byte blobs instead of string16, and prepare (compress) them on the renderer side?
,
Aug 25 2016
Yeah, supporting "WTF-16" / "WTF-8" is plausible (and I'm supportive). I don't know why horo@ didn't pursue that approach further.
,
Aug 29 2016
,
Sep 8 2016
Re: "wtf-8"/"wtf-16" - see https://simonsapin.github.io/wtf-8/ for context. A few notes: * The backing store accepts string16s. It writes those as binary blobs to the database (I'm not sure about endianness?). It would save disk space to serialize those as wtf-8 but that would require migration. * Given the concrete motivation here (browser reduce memory usage), a scoped change would be to have DOMStorageValuesMap hold std::strings but keep APIs generally the same, with base::string16 (wtf-16) <--> std::string (wtf-8) conversions at API boundaries (i.e. DOMStorageArea) and inside the database impls. Not something I'm going to get to, so moving to "Available"
,
Sep 8 2016
BTW, as wtf-8 consumes more memory on some strings than wtf-16, maybe it makes sense to first add UMA of some sort to measure "compressability" of DOM storage strings? I.e. measure average ratio of string.length() / WTF8Length(string).
,
Sep 8 2016
Yeah, gather data first. :) Also, we could use a fast, general purpose compressor like third_party/snappy instead. My hunch is that it would be an even bigger savings than wtf-8 in most cases, if what's getting stored looks at all like general web content (e.g. cached HTML or JS data)
,
Feb 23 2017
mek@ - FYI; this may fall out of the new impl
,
Feb 23 2017
Certainly using UTF-8 won't automagically happen. But if we do want to change the storage format doing so at the same time as switching to the new backend probably makes sense, as we would avoid having to do multiple migrations. On disk of course (at least in the new backend) things will already be compressed by snappy (at least I believe leveldb does that by default?), so the question would really be about memory usage in the browser process (also a little bit about memory usage in the renderer process, but there it's easier to solve since onion-soupyfying would result in just using WTFStrings in the renderer). Actually, at the mojo level the new implementation just stores opaque byte arrays for keys and values. So what would need to change is how the renderer converts strings to byte arrays, and it shouldn't be too difficult to change that to either convert to UTF-8, or maybe just store keys and values as a one-byte format flag (8-bit vs 16-bit, possibly other formats later) and the raw data as it is stored in WTFString, to avoid as many conversions as possible.
,
Feb 23 2017
See #c2 for context - the motivating factor is indeed saving browser memory prior to writing.
Thanks for clarifying that the new impl. uses opaque byte arrays. So "lenient" conversion ("WTF-8") in the renderer would work, or a format flag as you suggest.
,
Feb 23 2017
,
Feb 24 2017
,
Mar 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f8d3cf5cfc015e0cbffc2a1fa65b6b96f85fe353 commit f8d3cf5cfc015e0cbffc2a1fa65b6b96f85fe353 Author: mek <mek@chromium.org> Date: Wed Mar 22 01:44:42 2017 Future-proof data format used for LocalStorage. To enable future changes to the way data is stored in localstorage without requiring a full migration of all existing data (for example adding compression, or simply storing 8-bit only strings more efficiently), add a tag to each encoded string to indicate the data format used. BUG= 640707 Review-Url: https://codereview.chromium.org/2748023008 Cr-Commit-Position: refs/heads/master@{#458614} [modify] https://crrev.com/f8d3cf5cfc015e0cbffc2a1fa65b6b96f85fe353/content/renderer/dom_storage/local_storage_cached_area.cc
,
Apr 3 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/86396aa4a030f140170a52d24972f90cfd75bce9 commit 86396aa4a030f140170a52d24972f90cfd75bce9 Author: mek <mek@chromium.org> Date: Mon Apr 03 00:09:32 2017 Fix localstorage data migration. https://codereview.chromium.org/2748023008 changed the data format for the new localstorage backend, but broke migration of data from the old format in the process. This change adds a end-to-end test for migration and fixes the migration again. BUG= 640707 , 586194 Review-Url: https://codereview.chromium.org/2767103002 Cr-Commit-Position: refs/heads/master@{#461351} [modify] https://crrev.com/86396aa4a030f140170a52d24972f90cfd75bce9/content/browser/dom_storage/dom_storage_browsertest.cc [modify] https://crrev.com/86396aa4a030f140170a52d24972f90cfd75bce9/content/browser/dom_storage/local_storage_context_mojo.cc [modify] https://crrev.com/86396aa4a030f140170a52d24972f90cfd75bce9/content/browser/dom_storage/local_storage_context_mojo.h [modify] https://crrev.com/86396aa4a030f140170a52d24972f90cfd75bce9/content/browser/dom_storage/local_storage_context_mojo_unittest.cc
,
May 12 2017
Is there any update on timeline of this feature?
,
Jun 2 2017
Wanted to check in on this as well. We'd really like to have this in M-61.
,
Jun 2 2017
I hope to be able to land the change that makes the new localstorage implementation the default soon, which would make that M61. Currently that still always stores things as UTF-16, but changing it to be more memory efficient at least in the browser process should be a fairly minor change afterwards (basically change Uint8VectorToString16 and String16ToUint8Vector in //src/content/renderer/dom_storage/local_storage_cached_area.cc to encode data differently). Making things more memory efficient renderer side would be trickier for now, but easier after the localstorage implementation is onion-soupifed, although I have no immediate plans to do that. Also none of this changes anything yet for the session storage implementation.
,
Jun 8 2017
Ping on this. I guess the new impl is enabled. Let's do the browser side changes first, thanks!
,
Jun 26 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/65ae6f8d391b1eae9b953ba1c4e7c859684e54bf commit 65ae6f8d391b1eae9b953ba1c4e7c859684e54bf Author: Marijn Kruisselbrink <mek@chromium.org> Date: Mon Jun 26 19:41:42 2017 Reduce memory usage in browser when a string is 8bit only. Bug: 640707 Change-Id: I05b11f638465f17191b709578dee901fb904495a Reviewed-on: https://chromium-review.googlesource.com/528023 Commit-Queue: Marijn Kruisselbrink <mek@chromium.org> Reviewed-by: Dmitry Skiba <dskiba@chromium.org> Cr-Commit-Position: refs/heads/master@{#482360} [modify] https://crrev.com/65ae6f8d391b1eae9b953ba1c4e7c859684e54bf/content/renderer/dom_storage/local_storage_cached_area.cc [modify] https://crrev.com/65ae6f8d391b1eae9b953ba1c4e7c859684e54bf/content/renderer/dom_storage/local_storage_cached_area_unittest.cc
,
Sep 27 2017
What's left to do? When will the work get done? What's the plan, here?
,
Sep 27 2017
What's left to do is session storage. Mojo-ifying session storage would likely give us this as well, but I don't believe anybody is currently planning to work on it.
,
Oct 2 2017
Given the changes in session storage structure to drop values, this is less likely to result in big savings. Moving to P3.
,
Sep 20
,
Jan 15
I don't think there is anything left to do here, so marking this as fixed. In the renderer everything is cached as WTF::String, so using 8-bit chars when that's all the strings use. The browser process doesn't store values unless needed (and it'll rarely be needed for session storage).
,
Yesterday
(34 hours ago)
Really? Wow. So keys & values are usually utf8?
,
Yesterday
(34 hours ago)
keys and values are usually (guesstimate: by several orders of magnitude) composed of code points U+0000 through U+00FF. Blink special-cases such strings to store them as 8 bits per code point in memory (technically, encoded as Latin-1). Strings with other code points are stored with 16 bits per code unit (UTF-16LE*). Blink doesn't UTF-8 encode string in memory. (* = not really; JavaScript strings are sequences of 16 bit code units, which aren't necessarily valid UTF-16LE) The point of this issue was that when holding strings to the browser, strings mostly composed of code points in U+0000 through U+007F will consume less memory if encoded as UTF-8 than if encoded to UTF-16LE. mek@'s last point is that on the Blink side this is moot, per above - there's already a space optimization, although it's *not* related to UTF-8 encoding. (Not sure that made things any clearer...) |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by jsb...@chromium.org
, Aug 24 2016