High number of cache errors
Reported by
ben.mau...@gmail.com,
Nov 16 2016
|
||||||||||
Issue description
UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36
Example URL:
n/a
Steps to reproduce the problem:
Somehow I got myself into a state where I frequently get Failed to load resource: net::ERR_CACHE_READ_FAILURE. It seems like once I encounter the failure for a resource chrome deletes that cache entry. It seems like the resource in question is never retried and the page simply fails to load.
Not sure how I got into the state -- I did have a hardreboot recently.
Here's an example about:net-internals entry from a failed resource:
t=262775 [st= 0] +REQUEST_ALIVE [dt=33]
t=262775 [st= 0] +URL_REQUEST_DELEGATE [dt=3]
t=262776 [st= 1] DELEGATE_INFO [dt=2]
--> delegate_info = "extension AdBlock"
t=262778 [st= 3] -URL_REQUEST_DELEGATE
t=262778 [st= 3] +URL_REQUEST_START_JOB [dt=20]
--> load_flags = 33024 (MAYBE_USER_GESTURE | VERIFY_EV_CERT)
--> method = "GET"
--> priority = "LOWEST"
--> url = "https://www.prod.facebook.com/rsrc.php/v3iBdl4/yM/l/en_US/7JrhvpvB8h0.js"
t=262778 [st= 3] URL_REQUEST_DELEGATE [dt=0]
t=262778 [st= 3] HTTP_CACHE_GET_BACKEND [dt=0]
t=262778 [st= 3] HTTP_CACHE_OPEN_ENTRY [dt=5]
t=262783 [st= 8] HTTP_CACHE_ADD_TO_ENTRY [dt=0]
t=262783 [st= 8] HTTP_CACHE_READ_INFO [dt=5]
t=262789 [st=14] HTTP_CACHE_READ_INFO [dt=9]
t=262798 [st=23] URL_REQUEST_DELEGATE [dt=0]
t=262798 [st=23] URL_REQUEST_FILTERS_SET
--> filters = "FILTER_TYPE_GZIP"
t=262798 [st=23] -URL_REQUEST_START_JOB
t=262798 [st=23] URL_REQUEST_DELEGATE [dt=0]
t=262798 [st=23] HTTP_CACHE_READ_DATA [dt=6]
--> net_error = -6 (ERR_FILE_NOT_FOUND)
t=262804 [st=29] FAILED
--> net_error = -401 (ERR_CACHE_READ_FAILURE)
t=262808 [st=33] -REQUEST_ALIVE
--> net_error = -401 (ERR_CACHE_READ_FAILURE)
I did a bit of code searching and found this UMA
Histogram: HttpCache.ReadErrorNonRestartable recorded 149 samples (flags = 0x1)
2 ---O (6 = 4.0%)
6 ------------------------------------------------------------------------O (143 = 96.0%)
What is the expected behavior?
I'd expect that for JS/CSS files at least that if the cache failed we'd retry from the network
What went wrong?
see above
Did this work before? N/A
Chrome version: 54.0.2840.98 Channel: stable
OS Version: OS X 10.11.6
Flash Version: Shockwave Flash 23.0 r0
,
Nov 16 2016
Curious if you guys have any UMA data for how frequent these http cache read errors are.
,
Nov 16 2016
The metric (on stable channel) doesn't seem to have any recent regression. Let's flip some labels to get on cache folks radar. Seems like a serious bug, bumping pri and ccing gavinp.
,
Nov 16 2016
Interestingly I cleared my cache and I'm *still* seeing examples of this happening
,
Nov 16 2016
Also looping to chromium//src/net/http/OWNERS for prioritizing.
,
Nov 16 2016
Uh, why am I assigned? Any reason I should expect to have relevant experience?
,
Nov 16 2016
Oh, palmer@ and I are on this because we're listed as OWNERS in net/http. Note that we're owners for an extremely narrow scope (`per-file transport_security_state_static.*`, which covers three files) in that folder. Let's move this into the normal net triage process [1] by setting to Untriaged and adding the explicit `Internals>Network` label to match the triage filter. [1] https://chromium.googlesource.com/chromium/src/+/master/net/docs/bug-triage.md
,
Nov 17 2016
,
Nov 17 2016
this seems like blockfile cache (otherwise the netlog would be slightly different). All I know about blockfile cache is that it mmaps a few files and stores multiple entries in them (one entry if the latter is huge .. afair). So getting an ERR_FILE_NOT_FOUND from reading the already-open file looks unusual. I have no knowledge of MacOS internals/bugs, but one question pops: is it theoretically possible that something cleans and unmaps files from the process? What happens when a process hits the limit of open file descriptors?
,
Nov 18 2016
This is a cache issue so as net triager I am re-triaging back to cache.
,
Nov 22 2016
Its out of scope from TE side, hence adding TE-NeedsTriageHelp to addressed further.
,
Dec 1 2016
,
May 2 2017
This happens to me (and several other users I know of at my employer) with great regularity and happens across virtually every site/page I visit frequently when it appears. Manifests as the following:
1) Visit a page
2) Main page HTML loads and then all child assets (JS/CSS/images) that have cache entries fail to load, leaving the page in an unusable state
3) Reload page
4) Page renders correctly
In (2), the net-internals event log shows near-identical event timelines as in the original bug description, plus the following for the cache request:
t=4223 [st=0] +DISK_CACHE_ENTRY_IMPL [dt=2]
--> created = false
--> key = "https://drive.google.com/_/drive_fe/_/js/k=drive_fe.main.en_US.9kQxQmW5Gog.O/m=sy2g,sy30,sy2t,sy37,sy44,sy3p,sy2x,sy3z,sy3y,sy46,sy39,sy34,sy3b,sy3i,sy3k,sy48,sy4b,sy40,sy4a,sy3q,sy45,sy47,sy3w,sy2c,sy2e,sy2f,sy3u,sy3x,sy42,sy5c,sy4d,sy2d,sy32,sy3j,sy3r,sy4c,sy2,em3,sy3s,sy5d,AuBwMe,sy2a,sy29,sy2w,sy2b,sy2k,sy43,dl/am=2Yj2AK4c/rt=j/rs=AFB8gszGDwIlXYeGrxxoV8yGdYWReQ4WXQ"
t=4223 [st=0] +ENTRY_READ_DATA [dt=0]
--> buf_len = 4652
--> index = 0
--> offset = 0
t=4223 [st=0] -ENTRY_READ_DATA
--> bytes_copied = 4652
t=4225 [st=2] +ENTRY_READ_DATA [dt=0]
--> buf_len = 32768
--> index = 1
--> offset = 0
t=4225 [st=2] ENTRY_DOOM
t=4225 [st=2] -ENTRY_READ_DATA
--> net_error = -6 (ERR_FILE_NOT_FOUND)
t=4225 [st=2] -DISK_CACHE_ENTRY_IMPL
I haven't been able to find an environmental cause for this yet but *have* ruled out the Sophos Antivirus we were using (by switching to another AV vendor). I haven't journaled exactly when it happens but it's at least once a week.
Hoping the additional event log from the cache system itself may be of use and that this can get some attention soon :)
,
May 2 2017
,
May 4 2017
skimbrel: 1. what version+channel are you using? 2. wow, do you always use chrome with netlog on? these parts of netlog did not provide any clue to me. If you see the problem regularly, I'd be tangentially interested how regularly you see such problems if you switch to the "simple cache backend" (about:flags -> simple cache for http -> Enabled). This backends opens and closes many more files in a unit of time, but corruptions are less likely.
,
May 4 2017
@pasko: 1. 58.0.3029.81 on stable channel 2. No. When the problem happens it wipes out cache for a lot of sites at once — so when I see it happen on one page I can turn on netlog by opening chrome://net-internals and then go visit more pages to capture the events. I am still pretty convinced there's an environmental cause because this only happens to me on my work laptop where there's a bunch of managed-endpoint software running, not on my personal machine at home. Not sure what to go looking for to isolate it and I haven't found a reliable means of reproducing on demand, so it's kinda hard to identify potential causes.
,
May 4 2017
Oops, hit submit too soon. I've enabled the simple cache backend — will run for a few weeks and see if the problem recurs (usually I see the breakage at least one a week).
,
Jun 2 2017
Have been running with simple cache for a month now and haven't seen this since. Not really noticing performance changes either. Is this new backend intended to go mainline eventually?
,
Jun 5 2017
It's mainline on Android and Linux. We hope to bring it to other platforms but there are performance issues that we need to overcome first. |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by bmau...@fb.com
, Nov 16 2016