New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 756193 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Chrome Cache API exception

Project Member Reported by sduraisamy@chromium.org, Aug 16 2017

Issue description

We got the following feedback from one of the enterprise partners:

When trying to write to the caches api, they are getting the following error:
"Uncaught (in promise) DOMException: Entry was not found."

This is not happening always. Is there some way to remove the simple cache backend and create a new empty one when we get this error?

Full log can be found at https://gist.github.com/csexton/a44912524e7afbc9dc1db7c753358b09

Some relevant logs below:

[8685:8865:0815/104506.864619:INFO:remote_commands_service.cc(161)] Remote commands fetched.
[8685:8865:0815/104506.899042:ERROR:simple_version_upgrade.cc(164)] File structure does not match the disk cache backend.
[8685:8865:0815/104506.899291:ERROR:simple_backend_impl.cc(630)] Simple Cache Backend: wrong file structure on disk: /home/chronos/u-1134631a3449ab75f309eabbd2a6c84545e065f4/Storage/ext/bcdedldcbikpjmjbllhobfaohpajojli/def/Service Worker/CacheStorage/9a2b2fcfe89d2acf3d92909929566b9a51c251b1/5aa019f5-27e0-418e-9a62-2bb1a8d207c2
[8685:8866:0815/104506.899685:ERROR:disk_cache.cc(132)] Unable to create cache
[8685:8685:0815/104506.951788:ERROR:multi_user_window_manager_stub.cc(60)] Not implemented reached in virtual void chrome::MultiUserWindowManagerStub::RemoveObserver(chrome::MultiUserWindowManager::Observer *)
[8685:8685:0815/104507.580941:INFO:upload_job_impl.cc(399)] URL fetch completed.
 
rkc@, can you help triage?

Comment 2 by xiy...@chromium.org, Aug 16 2017

Cc: cbentzel@chromium.org
Components: Internals>Network>Cache>Simple
+cbentzel

That error is from simple disk cache backend failing to recognize the existing file. The cache API is a Web API, not platform app API. 


Comment 3 by r...@chromium.org, Aug 16 2017

Cc: r...@chromium.org
Owner: sduraisamy@chromium.org
Status: Assigned (was: Untriaged)
Is this the cache web API?
What is the context of this issue - what are they actually doing?

Cc: gavinp@chromium.org morlovich@chromium.org
I think it's CacheStorage, going by the directory? Maybe it needs to have delete-and-retry logic the way AppCache and ServiceWorker do (unless it does already?), or to just pass force=true to CreateCacheBackend if it doesn't put anything other than the cache in the directory to have CreateCacheBackend take care of that.

The debug output itself suggests that the index file in that directory doesn't look like one of ours --- too small, wrong magic, or some FS trouble. Its size and the first 8 bytes may give some idea of what's wrong, but it's unlikely to be too insightful.

Also, is this on ChromeOS, by any chance?

Yes, it is on Chrome OS.
I believe this issue originated with us, sure looks like our logs there :)

This is on ChromeOS, seeing it one 2 of the 4 test machines. 

The test machines:

- Have different hardware
- Running different ChromeOS versions (M59 and M60)
- Fresh installs from USB recovery as of last week
- One in debug mode with remote devtools enabled

One interesting note: Caches API does work in a sandboxed `<webview>` tag, just not in the chrome app itself. 

Below is the method that is failing with the "Uncaught (in promise) DOMException: Entry was not found."

    loadOrFetchDetails(body, url) {
      let self = this;
      return new Promise(function(resolve, reject) {
        caches.open(CACHE_NAME).then(function(cache) {
          cache.match(url).then(function(matchedResponse) {
            if (matchedResponse) {
              matchedResponse.json().then(function(jsonData) {
                let details = jsonData.details;
                resolve(details, true);
              });
            }
            else {
              let request = new Request(url, {
                method: "POST",
                headers: new Headers({"Content-Type": "application/json"}),
                body: JSON.stringify(body)
              });
              fetch(request).then(function(fetchResponse) {
                cache.put(url, fetchResponse.clone());
                return fetchResponse;
              })
              .then(function(response) {
                response.json().then(function(jsonData) {
                  let details = jsonData.details;
                  resolve(details);
                });
              })
              .catch(function(fetchError) {
                reject("Error fetching details");
              });
            }
          });
        });
      });
    }
  

Happy to supply more details, I can easily reproduce here.

This might give some insight into https://bugs.chromium.org/p/chromium/issues/detail?id=620424 then --- which notes that backend error rate on creation is much higher on ChromeOS.

Thanks for providing the detail, though I am like 50 layers below all this JS stuff. A couple of more questions:

1) Any chance you could hexdump the first 8 bytes of the "index" file in the directory in question? What size is it?

2) Are you doing any sort of syncing of data that may be copying the cache directories around?

1) The index file

Looks like "index" and "index-dir/the-real-index" are both 0 bytes:

    localhost 5aa019f5-27e0-418e-9a62-2bb1a8d207c2 # ls -al ./**
    -rw-r--r--. 1 chronos chronos    0 Aug  9 13:33 ./d70841621db5d4b7_0
    -rw-r--r--. 1 chronos chronos    0 Aug  9 13:33 ./fa5e53aa798c61ca_0
    -rw-r--r--. 1 chronos chronos    0 Aug  9 13:33 ./index
    
    ./index-dir:
    -rw-r--r--. 1 chronos chronos    0 Aug  9 13:33 the-real-index

2) Data syncing

We are not doing anything with syncing or interacting with the filesystem directly at all, everything is using the ChromeOS JS APIs.

That's pretty unexpected. We can make the _0 files 0-length on purposes in some circumstances, but the other ones definitely shouldn't be, and it's unclear how one would even get to that state short a machine crash, since they're written out at pretty different times.

Best I can say for getting out of this state is that caches.delete(CACHE_NAME) might just work and might "unwedge" it, though that's a pretty horrid workaround, but if it does, it would be interesting if trouble re-occurs.


0 sized file feels like that we are FS corruption again. Could you attach /var/log/messages? It would be interesting to see whether there are signs of FS corruption there.
Attached, it was only 17 lines long when I cat'd it. I reproduced the error twice, and noticed each time it added the following message:

    2017-08-17T14:29:01.778706-04:00 DEBUG kernel: [ 9336.979035] SELinux: initialized (dev proc, type proc), uses genfs_contexts
var-log-messages
1.9 KB View Download
Re #13: Looks like the message file just get rolled. Can you attach /var/log/messages.1 ?
Sure thing, attached the previous log file.
var-log-messages.1
283 KB Download
There are FS corruption logs after cryptohome mount. Most likely the user profile stored in the cryptohome is somehow corrupted, which probably explain the 0 sized file. Usually the corruptions are caused by device shuts down uncleaning (power loss, kernel crash etc).


2017-08-17T10:58:54.468322-04:00 INFO cryptohomed[1231]: An asynchronous mount request with sequence id: 1 finished; doing PKCS11 init...
...
2017-08-17T10:58:54.483468-04:00 DEBUG kernel: [    7.682993] Valid eCryptfs headers not found in file header region or xattr region, inode 261890
...

Is there a suggested resolution for ChromeOS? Are we talking about a full ChromeOS recovery?

I can certainly do that, but since we could reproduce I wanted to keep it in this state for any testing.
I'll leave it to xiyuan@ to provide ChromeOS-specific expertise.

WRT to underlying thing: I do suspect we may have an actual robustness bug in CacheStorage layer wrt to how it deals with FS corruption (it's a cache, so we should be dropping it rather than just erroring out), but creating some size 0 files to reproduce your state on disk shouldn't be a problem. 


Re #17: For this particular bug, the corruption might be limited to just the cache file. We can try to following to remedy the situation:

1. Try caches.delete(CACHE_NAME) as in #11 when the exception happens to see if that could get the app out of the state. This would be ideal if the app can handle the situation itself;

2. Try uninstall then install the app. This nukes all the data associated with the app and could bring the app back to a working state;

3. Try remove and the user back. This nukes the whole user profile and re-create the user's cryptohome;

4. If none of the above works, then full recovery is the only way out;

Awesome, option #1 worked!

I was able to run `caches.delete(CACHE_NAME)` in the console and the app went back to caching content for me.

I may add a startup guard to my app that writes a simple value to the caches and reads it out to ensure the filesystem is not corrupt.

One final question about this (and the related issue where it was failing 0.03% the time), have you see this occur once the cache files have been established and are being used? Or is this only when creating the new files?

I wondering if it would be necessary to have a health monitor job periodically check the caches system or if a check during startup is all we need.
We don't have a root cause for that bug (just an observation from statistics) so I can't be sure in general; for the cause you hit a startup check is sufficient, though. 

Sign in to add a comment