Requests return ERR_CONTENT_DECODING_FAILED clearing cache resolves the issue
Reported by
jmatth...@duosecurity.com,
Oct 29
|
||||||||||||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.20 Safari/537.36 Example URL: Most recent was: https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217 Steps to reproduce the problem: 1. Browse to GSuite or Facebook 2. Notice some functionality isn't working, generally XHR driven features 3. Clear cached images and files via Clear Browsing Data... 4. Reload What is the expected behavior? I would hope that I would not have to clear cache frequently. I have only experienced this issue on MacOS 10.14. Canary does not seem to exhibit this behavior, or I haven't used it long enough to reproduce it. What went wrong? I'm able to use the inspector to figure out what XHR requests are failing. I can clearly see ERR_CONTENT_DECODING_FAILED and then I can reproduce that by trying to visit the URL directly. Once I clear cache everything works again for an indeterminate period of time. Did this work before? N/A Chrome version: 71.0.3578.20 Channel: beta OS Version: OS X 10.14.1 Flash Version: FWIW I realize I'm handing you a bit of a heisenbug. I'm interested in anything that might help in mitigating this before we roll 10.14 internally.
,
Oct 29
I do see the 304 response with "content-encoding: br" and DISK_CACHE_ENTRY read:
=134792 [st= 1] +HTTP_TRANSACTION_READ_HEADERS [dt=12]
t=134804 [st=13] HTTP2_STREAM_UPDATE_SEND_WINDOW
--> delta = 10420224
--> stream_id = 3
--> window_size = 10485760
t=134804 [st=13] HTTP_TRANSACTION_READ_RESPONSE_HEADERS
--> HTTP/1.1 304
status: 304
access-control-allow-credentials: true
cache-control: public,max-age=31536000,immutable
content-type: application/x-javascript; charset=utf-8
x-content-type-options: nosniff
x-xss-protection: 0
content-security-policy: default-src * data: blob:;script-src *.facebook.com *.fbcdn.net *.facebook.net *.google-analytics.com *.virtualearth.net *.google.com 127.0.0.1:* *.spotilocal.com:* 'unsafe-inline' 'unsafe-eval' *.atlassolutions.com blob: data: 'self';style-src data: blob: 'unsafe-inline' *;connect-src *.facebook.com facebook.com *.fbcdn.net *.facebook.net *.spotilocal.com:* wss://*.facebook.com:* https://fb.scanandcleanlocal.com:* *.atlassolutions.com attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' chrome-extension://boadgeojelhgndaghljhdicfkmllpafd chrome-extension://dliochdbjfkdbacpmhlcpmleaejidimm;
timing-allow-origin: *
expires: Tue, 29 Oct 2019 16:55:34 GMT
last-modified: Mon, 01 Jan 2001 08:00:00 GMT
access-control-allow-origin: *
vary: Accept-Encoding
content-encoding: br
content-md5: UVs3Qe9b08CYFxJskOEQtg==
date: Mon, 29 Oct 2018 19:32:13 GMT
t=134804 [st=13] -HTTP_TRANSACTION_READ_HEADERS
t=134804 [st=13] HTTP_CACHE_WRITE_INFO [dt=1]
t=134805 [st=14] HTTP_CACHE_READ_INFO [dt=0]
t=134805 [st=14] NETWORK_DELEGATE_HEADERS_RECEIVED [dt=0]
t=134805 [st=14] URL_REQUEST_FILTERS_SET
--> filters = "BROTLI"
t=134805 [st=14] -URL_REQUEST_START_JOB
t=134805 [st=14] URL_REQUEST_DELEGATE_RESPONSE_STARTED [dt=0]
t=134805 [st=14] HTTP_CACHE_READ_DATA [dt=0]
t=134805 [st=14] URL_REQUEST_JOB_BYTES_READ
--> byte_count = 9592
t=134806 [st=15] FAILED
--> net_error = -330 (ERR_CONTENT_DECODING_FAILED)
t=134806 [st=15] -REQUEST_ALIVE
--> net_error = -330 (ERR_CONTENT_DECODING_FAILED)
83534: DISK_CACHE_ENTRY
https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217
Start Time: 2018-10-29 15:32:11.177
t=132874 [st= 0] +DISK_CACHE_ENTRY_IMPL [dt=60]
--> created = false
--> key = "https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217"
t=132874 [st= 0] +ENTRY_READ_DATA [dt=0]
--> buf_len = 5232
--> index = 0
--> offset = 0
t=132874 [st= 0] -ENTRY_READ_DATA
--> bytes_copied = 5232
t=132932 [st=58] +ENTRY_WRITE_DATA [dt=1]
--> buf_len = 5232
--> index = 0
--> offset = 0
--> truncate = true
t=132933 [st=59] -ENTRY_WRITE_DATA
--> bytes_copied = 5232
t=132933 [st=59] +ENTRY_READ_DATA [dt=0]
--> buf_len = 16
--> index = 2
--> offset = 0
t=132933 [st=59] -ENTRY_READ_DATA
--> bytes_copied = 16
t=132933 [st=59] +ENTRY_READ_DATA [dt=0]
--> buf_len = 32768
--> index = 1
--> offset = 0
t=132933 [st=59] -ENTRY_READ_DATA
--> bytes_copied = 9592
t=132934 [st=60] ENTRY_CLOSE
t=132934 [st=60] -DISK_CACHE_ENTRY_IMPL
,
Oct 29
> When you try to visit the URL directly, do you see a ERR_CONTENT_DECODING_FAILED error page, or just a truncated response? The error page unfortunately :/ > I assume you aren't using an SSL-decrypting proxy? Nope, to wit this happens when connected to both my work and home wi-fi network.
,
Oct 30
Since Facebook and Google use HTTPS, we can conclude it's not a network issue, unless there's an SSL-decrypting MitM (Which seems unlikely, though suppose malware doing that without your knowledge, having modified your root cert store is possible, I think we can pretty much ignore that possibility). You, however, seem to be the only one who's running into this. I don't think an extension could cause this. Is this only happening on one laptop? It's possible it's failing RAM, HDD, or some other system component. If it's more than one system, that possibility seems pretty unlikely. It could be something mucking with the on-disk files, I suppose. If you're wondering why I'm not looking at Chrome possibilities - I just don't see a way for a Chrome bug to be doing this to you, and no one else. There just aren't any configurable Chrome-bits that I can think of that affect this. That's certainly not proof it isn't Chrome, just can't think of any place to go poking if it is a Chrome issue.
,
Oct 30
Is Brotli support always on?
,
Oct 30
Yes, it is, unless someone set up a new field trial that I'm unaware of. I also checked the log, and we sent br in the accept-encoding line, and the server also send Vary: Accept-Encoding. I also believe we implicitly add that to all responses, anyways, so pretty sure it's not that. I should have mentioned that earlier.
,
Oct 30
Initial request seems to be content-encoding: br as well, so it's not a bitflip in non-updateable headers table. It may be worth trying to switch the cache backend to simple; if the bug still reproduces with that we could at least pull the cache file for the specific URL....
,
Oct 30
I'm going to work from another 10.14 partition tomorrow to see if I can repro this. There may be others at Duo with this same issue. I'm waiting on reports.
,
Oct 30
,
Nov 1
I disabled spotlight indexing on files and folders and this behaviors seems to have ceased completely for me.
,
Nov 1
Thanks for the followup! I'm going to go ahead and close the issue, but if it reappears while indexing is still disabled, please comment here and I'll re-open (If it happens again more than a week or two from now, it's best to file a new bug).
,
Nov 2
So metrics do suggest an increase in error rate for this error on OS X 10.14 and 10.14.1 with blockfile cache, with it being much higher than older OS versions or simplecache (apparently) regardless of version...
,
Nov 8
I've gotten 2 new reports of this issue and have had it re-occur myself (though less frequently) What diagnostic information would be helpful here, it seems the net exports might not contain the best data?
,
Nov 8
Opening the bug against, per comment 13.
,
Nov 8
Just had this happen to me again, this time in google calendar with the Zoom meetings plugin (clicking the button did nothing because the JS assets failed to load with ERR_CONTENT_DECODING_FAILED). Clearing cache, reloading page fixed the issue for now. What kind of diagnostics should I provide if this happens again? macOS 10.14.1 Chrome Version 70.0.3538.77
,
Nov 9
I ran into 'ERR_CONTENT_DECODING_FAILED' this AM. FWIW, I had to reboot my mac this AM because it was non-responsive when I tried to wake it. I hit 'ERR_CONTENT_DECODING_FAILED' after Chrome restored my tabs. Included is a trace file
,
Nov 9
Re: comment #15, #16: what file system are you folks using? Sadly our logging may be useless/way too high-level here, since there is a pretty high-chance of an OS bug. The log in comment #16 is interesting in that it's using gzip, not brotli, though.
,
Nov 9
I'm on APFS (Encrypted)
,
Nov 9
APFS (Encrypted) also. I think 10.14 forces everyone to APFS now (even if you pass the disable option in the updater).
,
Nov 9
Oh, looks like my Mac got converted as well. Was thinking it may have been too old for that. Well, I have a chance of reproducing it then....
,
Nov 12
I get `Failed to load resource: net::ERR_CONTENT_DECODING_FAILED`
,
Nov 13
Looks a lot like the other logs - we're reading from the cache when we get the error. morlovich is the expert there, but I believe we delete partial entries from the cache on error. If we don't, I suppose it's possible that the issue is we're getting bad data from lower layers of the stack, as opposed to getting corrupted data when reading from a cache entry for a previously successfully received file.
,
Nov 13
The resource byte length matches what I get when I transfer it myself, so something is probably going wrong with payload bytes themselves. I plan on trying to see if hacking the performance stress test to verify the read data can reproduce it. To folks suffering from the problem: going to chrome://flags and forcing on "Simple Cache for HTTP" will likely serve as a workaround. That backend is slower on Macs, especially those with spinning media, but it's likely preferable to what you're seeing.
,
Nov 16
FYI we're seeing a spike in crashes in Sheets for users on Mac 10.14, and one person who's gotten back to us so far has reported seeing ERR_CONTENT_DECODING_FAILED error in their console.
,
Nov 16
Just a comment that the simple caching model has resolved this for me.
,
Nov 21
,
Nov 22
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/67e42af2a372cfbc6e24ed867a62cab286bb22b7 commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7 Author: Maks Orlovich <morlovich@chromium.org> Date: Thu Nov 22 14:05:57 2018 Workaround apparent data corruption in blockfile on OS X 10.14 by switching backends. This is slower, but it's better than not loading pages at all since important resources got corrupted:( Bug: 899874 Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917 Reviewed-on: https://chromium-review.googlesource.com/c/1347109 Commit-Queue: Maks Orlovich <morlovich@chromium.org> Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org> Reviewed-by: Bence Béky <bnc@chromium.org> Cr-Commit-Position: refs/heads/master@{#610404} [modify] https://crrev.com/67e42af2a372cfbc6e24ed867a62cab286bb22b7/components/network_session_configurator/browser/network_session_configurator.cc [modify] https://crrev.com/67e42af2a372cfbc6e24ed867a62cab286bb22b7/components/network_session_configurator/browser/network_session_configurator_unittest.cc
,
Nov 26
As I am asking for a backport incredibly late (thanksgiving is really inconveniently timed for this release), so here is some data to help make the call on whether it should happen or not. Sadly the documentation is google-only. a) Why is this important? ERR_CONTENT_DECODING_FAILED error rates for resource fetching, split by OS: https://uma.googleplex.com/p/chrome/timeline_v2/?sid=7be812d3c219350ff532eb51b2c64a03 This is on stable, and important thing to note that the total is all the subresource fetches, not just failed fetches. https://buganizer.corp.google.com/issues/119681757 --- report on seeing increased problems from Chrome + OS X 10.14 on gSuite side --- gives some idea of how visible the problem is to the users of a particular product; and this report makes it clear that it can also affect Facebook (highly important for lots of users!), and a product of the bug reporter. How risky is this, and what are the downsides? This switches the backend on OS X >= 10.14 to the same cache backend we use on Android, ChromeOS, and Linux. It's also been used in a 50% experiment in OS X beta for a while. Stability should not be a problem, but there is a performance regression (which would be why we haven't switched to it). See https://uma.googleplex.com/p/chrome/variations/?sid=ecbb4370eb3601e759a7e7741f3fdf6b It looks a bit better restricted to 10.14, but that might just be early adopters having nicer machines. Spinning media regression is particularly bad, though (but probably still better than non-loading); luckily Macs use those less than Windows. Doing this also makes it hard to find if Apple changed something that helped, but we could probably do a force-off experiment as well, though population sizes maybe tricky. Any other options? The backend choice can be done via experiment config outside the scope of any release, but then it can not be OS-version gated.
,
Nov 26
This bug requires manual review: We are only 7 days from stable. Please contact the milestone owner if you have questions. Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop) For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Nov 26
Thank you morlovich@ for all details at #28. How safe will the merge will be with known performance regression this late in release cycle? Pls note we only have this week beta release left before stable promotion next week.
,
Nov 26
Sorry, I don't understand the question; could you elaborate?
,
Nov 27
Approving merge to M71 branch 3578 based on comment #28 as this can be disabled via finch if anything goes wrong. Pls merge ASAP so we can pick it up for tomorrow's last beta release, cutting RC very soon. Thank you.
,
Nov 27
Seems like morlovich@ is OOO. +ellyjones@ (CL reviewer and Mac TL), could you pls a merge to M71?
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16fb85e34b59b442d370930353db9817e0446e75 commit 16fb85e34b59b442d370930353db9817e0446e75 Author: Maks Orlovich <morlovich@chromium.org> Date: Tue Nov 27 18:47:21 2018 Merge workaround apparent data corruption in blockfile on OS X 10.14 by switching backends. This is slower, but it's better than not loading pages at all since important resources got corrupted:( Bug: 899874 Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917 Reviewed-on: https://chromium-review.googlesource.com/c/1347109 Commit-Queue: Maks Orlovich <morlovich@chromium.org> Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org> Reviewed-by: Bence Béky <bnc@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#610404}(cherry picked from commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7) Reviewed-on: https://chromium-review.googlesource.com/c/1352470 Reviewed-by: Matt Menke <mmenke@chromium.org> Cr-Commit-Position: refs/branch-heads/3578@{#825} Cr-Branched-From: 4226ddf99103e493d7afb23a4c7902ee496108b6-refs/heads/master@{#599034} [modify] https://crrev.com/16fb85e34b59b442d370930353db9817e0446e75/components/network_session_configurator/browser/network_session_configurator.cc [modify] https://crrev.com/16fb85e34b59b442d370930353db9817e0446e75/components/network_session_configurator/browser/network_session_configurator_unittest.cc
,
Nov 27
Already done. :)
,
Nov 27
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/16fb85e34b59b442d370930353db9817e0446e75 Commit: 16fb85e34b59b442d370930353db9817e0446e75 Author: morlovich@chromium.org Commiter: mmenke@chromium.org Date: 2018-11-27 18:47:21 +0000 UTC Merge workaround apparent data corruption in blockfile on OS X 10.14 by switching backends. This is slower, but it's better than not loading pages at all since important resources got corrupted:( Bug: 899874 Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917 Reviewed-on: https://chromium-review.googlesource.com/c/1347109 Commit-Queue: Maks Orlovich <morlovich@chromium.org> Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org> Reviewed-by: Bence Béky <bnc@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#610404}(cherry picked from commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7) Reviewed-on: https://chromium-review.googlesource.com/c/1352470 Reviewed-by: Matt Menke <mmenke@chromium.org> Cr-Commit-Position: refs/branch-heads/3578@{#825} Cr-Branched-From: 4226ddf99103e493d7afb23a4c7902ee496108b6-refs/heads/master@{#599034}
,
Nov 27
Thank you!
,
Nov 27
Thanks Matt!
,
Nov 28
Fixed and merged?
,
Nov 28
OSX 10.14 is still corrupting the blockfile cache, so I don't think that's really fixed?
,
Dec 3
This is happening consistently for me on my personal mac running stable Chrome. Let me know if there's anything I can provide to help with a fix.
,
Dec 3
If you have a source checkout there for some reason querying the messed up URL with cachetool will provide some information, though likely not enough to be actionable (so not worth getting a checkout if one isn't already handy).
,
Dec 3
The machine doesn't have a checkout, and I've never tried building on macOS, so I don't know that I can help much. Is the issue well understood at this point? If clearing my cache will unbreak my machine, I'll go ahead and do that. I'll hold off for the moment in case there's some useful forensics that could be performed.
,
Dec 4
No, it's not well understood. We know that the cache backend that has been used on OS X basically forever and has hardly changed in the last couple of years has suddenly started showing data corruption with OS X 10.14, but I have not been able to reproduce it, and so there is no root cause. It should, however be, worked around in M71 starting from 71.0.3578.75 (by using a different backend).
,
Dec 6
Hi everyone. Just weighing in that myself and a couple of colleagues are seeing these symptoms too. I'm running Mac OS 10.14 (18A391), Chrome Version 70.0.3538.110 (Official Build) (64-bit). I have intermittently received the ERR_CONTENT_DECODING_FAILED in the console for the following URLs, then mention-me.com URLs are served from an AWS S3 bucket. - https://jmsyst.com/bootstrap/css/bootstrap.css - https://jmsyst.com/bootstrap/css/bootstrap-responsive.css - https://static-demo.mention-me.com/dist/ShareEmailViaProvider-c6d548a7e9e18f9f786bf5059fc75ff7ea41d18b.e63b1e597e18fc53b63b.js - https://static-demo.mention-me.com/css/2dfe92a-d8cc6aa.css I appreciate it may not help narrow down the cause of the data corruption, but it just shows that it's not just Facebook/Google hosted files that are being affected. (I don't know much about brotli, if you have any good resources for research I'd be up for a bedtime read..) Many thanks for producing a temporary fix. Antony
,
Dec 7
,
Dec 13
I hit this today as well.
,
Dec 13
Is a radar filed? If so, I can help escalate this with Apple. If not, marking it externalDependency will only cause this to wither with no further action.
,
Dec 13
No radar report; we don't have any testcase smaller than Chrome (and it's not reproducible on demand, just happens enough to clearly affect users a lot).
,
Dec 14
Issue 915104 has been merged into this issue.
,
Dec 14
Are there unit tests that stress-test the disk i/o in the cache that might help reproduce this? Should we try to write some? Apple's not going to do anything without a radar filed, thus having it be externalDep isn't useful. Moving back to Assigned so it's tracked. Also, is there be a bug filed to revert the switch to the slower backend when this is finally resolved by Apple? We'll likely need to have some kind of special casing by OS-version.
,
Dec 14
There is a stress test in net_perftests, but it (well, modified to actually verify the payload) didn't seem to trigger the bug for me.
,
Jan 2
morlovich@: is there a bug as requested in #51? |
||||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||||
Comment 1 by mmenke@chromium.org
, Oct 29