New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 666015 link

Starred by 7 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug



Sign in to add a comment

High number of cache errors

Reported by ben.mau...@gmail.com, Nov 16 2016

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36

Example URL:
n/a

Steps to reproduce the problem:
Somehow I got myself into a state where I frequently get Failed to load resource: net::ERR_CACHE_READ_FAILURE. It seems like once I encounter the failure for a resource chrome deletes that cache entry. It seems like the resource in question is never retried and the page simply fails to load.

Not sure how I got into the state -- I did have a hardreboot recently.

Here's an example about:net-internals entry from a failed resource:

t=262775 [st= 0] +REQUEST_ALIVE  [dt=33]
t=262775 [st= 0]   +URL_REQUEST_DELEGATE  [dt=3]
t=262776 [st= 1]      DELEGATE_INFO  [dt=2]
                      --> delegate_info = "extension AdBlock"
t=262778 [st= 3]   -URL_REQUEST_DELEGATE
t=262778 [st= 3]   +URL_REQUEST_START_JOB  [dt=20]
                    --> load_flags = 33024 (MAYBE_USER_GESTURE | VERIFY_EV_CERT)
                    --> method = "GET"
                    --> priority = "LOWEST"
                    --> url = "https://www.prod.facebook.com/rsrc.php/v3iBdl4/yM/l/en_US/7JrhvpvB8h0.js"
t=262778 [st= 3]      URL_REQUEST_DELEGATE  [dt=0]
t=262778 [st= 3]      HTTP_CACHE_GET_BACKEND  [dt=0]
t=262778 [st= 3]      HTTP_CACHE_OPEN_ENTRY  [dt=5]
t=262783 [st= 8]      HTTP_CACHE_ADD_TO_ENTRY  [dt=0]
t=262783 [st= 8]      HTTP_CACHE_READ_INFO  [dt=5]
t=262789 [st=14]      HTTP_CACHE_READ_INFO  [dt=9]
t=262798 [st=23]      URL_REQUEST_DELEGATE  [dt=0]
t=262798 [st=23]      URL_REQUEST_FILTERS_SET
                      --> filters = "FILTER_TYPE_GZIP"
t=262798 [st=23]   -URL_REQUEST_START_JOB
t=262798 [st=23]    URL_REQUEST_DELEGATE  [dt=0]
t=262798 [st=23]    HTTP_CACHE_READ_DATA  [dt=6]
                    --> net_error = -6 (ERR_FILE_NOT_FOUND)
t=262804 [st=29]    FAILED
                    --> net_error = -401 (ERR_CACHE_READ_FAILURE)
t=262808 [st=33] -REQUEST_ALIVE
                  --> net_error = -401 (ERR_CACHE_READ_FAILURE)

I did a bit of code searching and found this UMA

Histogram: HttpCache.ReadErrorNonRestartable recorded 149 samples (flags = 0x1)
2  ---O                                                                      (6 = 4.0%)
6  ------------------------------------------------------------------------O (143 = 96.0%)

What is the expected behavior?
I'd expect that for JS/CSS files at least that if the cache failed we'd retry from the network

What went wrong?
see above

Did this work before? N/A 

Chrome version: 54.0.2840.98  Channel: stable
OS Version: OS X 10.11.6
Flash Version: Shockwave Flash 23.0 r0
 

Comment 1 by bmau...@fb.com, Nov 16 2016

Labels: DevRel-Facebook

Comment 2 by bmau...@fb.com, Nov 16 2016

Curious if you guys have any UMA data for how frequent these http cache read errors are.
Cc: gavinp@chromium.org
Components: -Internals>Network Internals>Network>Cache
Labels: -Pri-2 Pri-1
The metric (on stable channel) doesn't seem to have any recent regression. Let's flip some labels to get on cache folks radar.

Seems like a serious bug, bumping pri and ccing gavinp.

Comment 4 by bmau...@fb.com, Nov 16 2016

Interestingly I cleared my cache and I'm *still* seeing examples of this happening
Cc: palmer@chromium.org ligim...@chromium.org
Labels: ReleaseBlock-Stable M-54
Owner: lgar...@chromium.org
Also looping to chromium//src/net/http/OWNERS for prioritizing.
Uh, why am I assigned? Any reason I should expect to have relevant experience?
Cc: -palmer@chromium.org
Components: Internals>Network
Owner: ----
Status: Untriaged (was: Unconfirmed)
Oh, palmer@ and I are on this because we're listed as OWNERS in net/http. Note that we're owners for an extremely narrow scope (`per-file transport_security_state_static.*`, which covers three files) in that folder.

Let's move this into the normal net triage process [1] by setting to Untriaged and adding the explicit `Internals>Network` label to match the triage filter.


[1] https://chromium.googlesource.com/chromium/src/+/master/net/docs/bug-triage.md

Comment 8 by eroman@chromium.org, Nov 17 2016

Cc: pasko@chromium.org

Comment 9 by pasko@chromium.org, Nov 17 2016

this seems like blockfile cache (otherwise the netlog would be slightly different). All I know about blockfile cache is that it mmaps a few files and stores multiple entries in them (one entry if the latter is huge .. afair). So getting an ERR_FILE_NOT_FOUND from reading the already-open file looks unusual. I have no knowledge of MacOS internals/bugs, but one question pops: is it theoretically possible that something cleans and unmaps files from the process? What happens when a process hits the limit of open file descriptors?
Components: -Internals>Network
This is a cache issue so as net triager I am re-triaging back to cache.
Labels: TE-NeedsTriageHelp
Its out of scope from TE side, hence adding TE-NeedsTriageHelp to addressed further.
Cc: jkarlin@chromium.org
This happens to me (and several other users I know of at my employer) with great regularity and happens across virtually every site/page I visit frequently when it appears. Manifests as the following:

1) Visit a page
2) Main page HTML loads and then all child assets (JS/CSS/images) that have cache entries fail to load, leaving the page in an unusable state
3) Reload page
4) Page renders correctly

In (2), the net-internals event log shows near-identical event timelines as in the original bug description, plus the following for the cache request:

t=4223 [st=0] +DISK_CACHE_ENTRY_IMPL  [dt=2]
               --> created = false
               --> key = "https://drive.google.com/_/drive_fe/_/js/k=drive_fe.main.en_US.9kQxQmW5Gog.O/m=sy2g,sy30,sy2t,sy37,sy44,sy3p,sy2x,sy3z,sy3y,sy46,sy39,sy34,sy3b,sy3i,sy3k,sy48,sy4b,sy40,sy4a,sy3q,sy45,sy47,sy3w,sy2c,sy2e,sy2f,sy3u,sy3x,sy42,sy5c,sy4d,sy2d,sy32,sy3j,sy3r,sy4c,sy2,em3,sy3s,sy5d,AuBwMe,sy2a,sy29,sy2w,sy2b,sy2k,sy43,dl/am=2Yj2AK4c/rt=j/rs=AFB8gszGDwIlXYeGrxxoV8yGdYWReQ4WXQ"
t=4223 [st=0]   +ENTRY_READ_DATA  [dt=0]
                 --> buf_len = 4652
                 --> index = 0
                 --> offset = 0
t=4223 [st=0]   -ENTRY_READ_DATA
                 --> bytes_copied = 4652
t=4225 [st=2]   +ENTRY_READ_DATA  [dt=0]
                 --> buf_len = 32768
                 --> index = 1
                 --> offset = 0
t=4225 [st=2]      ENTRY_DOOM
t=4225 [st=2]   -ENTRY_READ_DATA
                 --> net_error = -6 (ERR_FILE_NOT_FOUND)
t=4225 [st=2] -DISK_CACHE_ENTRY_IMPL

I haven't been able to find an environmental cause for this yet but *have* ruled out the Sophos Antivirus we were using (by switching to another AV vendor). I haven't journaled exactly when it happens but it's at least once a week.

Hoping the additional event log from the cache system itself may be of use and that this can get some attention soon :)

Labels: Hotlist-ConOps
Cc: morlovich@chromium.org
skimbrel:
1. what version+channel are you using?
2. wow, do you always use chrome with netlog on?

these parts of netlog did not provide any clue to me. If you see the problem regularly, I'd be tangentially interested how regularly you see such problems if you switch to the "simple cache backend" (about:flags -> simple cache for http -> Enabled). This backends opens and closes many more files in a unit of time, but corruptions are less likely.
@pasko:

1. 58.0.3029.81 on stable channel
2. No. When the problem happens it wipes out cache for a lot of sites at once — so when I see it happen on one page I can turn on netlog by opening chrome://net-internals and then go visit more pages to capture the events.

I am still pretty convinced there's an environmental cause because this only happens to me on my work laptop where there's a bunch of managed-endpoint software running, not on my personal machine at home. Not sure what to go looking for to isolate it and I haven't found a reliable means of reproducing on demand, so it's kinda hard to identify potential causes.
Oops, hit submit too soon.

I've enabled the simple cache backend — will run for a few weeks and see if the problem recurs (usually I see the breakage at least one a week).
Have been running with simple cache for a month now and haven't seen this since. Not really noticing performance changes either. Is this new backend intended to go mainline eventually?
It's mainline on Android and Linux. We hope to bring it to other platforms but there are performance issues that we need to overcome first.

Sign in to add a comment