New issue
Advanced search Search tips

Issue 908487 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Handle big data in chrome.storage.local by implementing new methods

Reported by woxxom@gmail.com, Nov 26

Issue description

This is a feature request for the extensions API.

================================================================

Proposing to extend chrome.storage.local (and maybe .sync too for uniformity) by adding the following methods:

* getAllKeys()
  produces an array of key names just like IndexedDB

* getAsJson(key or arrayOfKeys or objectWithKeys or null)
  produces a single JSON string for the specified keys or all keys for "null"
  advantages: it's superfast; it doesn't pollute V8 memory since it's created natively

================================================================

Currently reading a value from chrome.storage.local spikes memory consumption approximately 10 times the amount of data read in case all the string values are unique and cannot by deduplicated by V8. Depending on the data structure reading 10-100MB could spike to anywhere between 100MB and 3.5GB and beyond, which crashes the extension process.

 	v8::internal::Heap::FatalProcessOutOfMemory
 	v8::internal::Heap::AllocateRawWithRetryOrFail
 	v8::internal::Factory::NewFixedArrayWithFiller
 	v8::internal::Factory::NewFixedArrayWithMap
 	v8::internal::HashTable
 	v8::internal::StringTable::LookupKey
 	v8::internal::StringTable::LookupString
 	v8::internal::LookupIterator::PropertyOrElement
 	v8::internal::JSReceiver::CreateDataProperty
 	v8::Object::CreateDataProperty
 	content::V8ValueConverterImpl::ToV8Object
 	content::V8ValueConverterImpl::ToV8ValueImpl
 	content::V8ValueConverterImpl::ToV8Object
 	content::V8ValueConverterImpl::ToV8ValueImpl
 	content::V8ValueConverterImpl::ToV8Value
 	extensions::APIRequestHandler::ArgumentAdapter::GetArguments
 	extensions::APIRequestHandler::CompleteRequestImpl
 	extensions::APIRequestHandler::CompleteRequest
 	extensions::NativeExtensionBindingsSystem::HandleResponse
 	IPC::MessageT

An extension with "unlimitedStorage" permission can shoot itself in the leg by incrementally storing more than V8 can handle in one call when reading the entire storage via chrome.storage.local.get(null, ...) which will crash the extension process as demonstrated by the attached extension (it can take several minutes so be patient).

Although it's possible to implement workarounds in JavaScript, but that would be slow as it requires adding a virtual layer to the storage by using a dedicated "index" key to store the actual key names, and write the actual data to uniformly named chunks.
 
crash-ext.zip
1.3 KB Download
Cc: jsb...@chromium.org pwnall@chromium.org
Status: Available (was: Unconfirmed)
This is an interesting idea; thanks for bringing it up!

<brainstorming>

Re sync: we put a hard limit on the number of bytes available to extensions to ~100k [1].  With this, even with the relatively poor performance of the storage API, the impact should be pretty minimal.  Now, storage.local is another story entirely, where we can end up storing a lot of data.

The suggestions here are reasonable workarounds; however, I wonder if the better solution is to just encourage extensions to using the open web alternatives.  Storage has come a long way since chrome.storage, and the necessity for it is much lower.  Additionally, storage on the web is improving at a much faster pace.  Would everyone's lives be better if extensions just use IndexedDB (or localStorage) for any local storage?  It seems like this would be a good path forward in most cases.

There are a few things we'd need to make this work:
- Extensions would need to have protected storage (not evicted by the browser)
- We'd need to hook up the unlimitedStorage permission (if it's not already)
- We'd need to make sure we add a keepalive count for the event page during async operations (so that the extension isn't terminated while e.g. waiting to retrieve data).

These seem relatively surmountable.  There are a few more interesting pieces, though:
- The storage API supports an onChanged event, which the extension event page can register for.  This would be harder to wire up with indexed DB.
- Content scripts can read and write to storage, even though they are running on a different origin.

I think my biggest concern for breakage would be around the content script issue.  Intuitively, I don't think we'd want to allow content scripts to access the extension's indexedDB/localStorage (though we potentially *could*), which means that information would have to instead be passed over extension message.  However, our extension messaging API has similar issues to the storage API, where all data is duplicated, JSON-serialized, deserialized, etc.  This means that extensions that need large data in content scripts are still without a good solution (though likely not worse off).

I'm curious for some more thoughts and opinions on this.  Summoning storage experts jsbell@ and pwnall@.  woxxom@, I'd also be curious for your take, and if there's a particular reason you're using storage.local instead of alternatives (since you rightly point out that it is significantly slower).

[1] https://developer.chrome.com/extensions/storage#property-sync
+1 - let's do it!

>  Extensions would need to have protected storage (not evicted by the browser)

Eviction "should" only occur for web schemes. But we should verify.

Re: proxying over extension messaging - if this were to switch from JSON to structured cloning then in theory we'd have full fidelity with what can be stored in Indexed DB (ignoring transferrable-but-not-serializable). Otherwise, a userspace proxying library seems feasible...?

> proxying over extension messaging - if this were to switch from JSON to structured cloning then in theory we'd have full fidelity with what can be stored in Indexed DB (ignoring transferrable-but-not-serializable).

I don't have a good idea of how difficult this would be.  With extension messaging, we can pass messages between otherwise-unrelated renderers (e.g., the extension's background page and a content script injected into a web page).  Does anything outside of blink currently use structured cloning?  If so, I think it'd be a great improvement (for multiple reasons).

> Otherwise, a userspace proxying library seems feasible...?

Yeah, and it would be relatively straightforward.  The downside is that a) it's much slower (because of the extra process hops and serialization), and content scripts sometimes need to act fast, and b) it means content scripts that want to use large storage objects are out of luck (because we have similar limitations/inefficiencies in the messaging code).  Of course, I don't have a good impression of how frequently content scripts need to access excessively-large amounts of data from storage - that might be something we can grab UMA on.

Sign in to add a comment