New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 899874 link

Starred by 12 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug


Participants' hotlists:
GSuite-Priorities


Sign in to add a comment

Requests return ERR_CONTENT_DECODING_FAILED clearing cache resolves the issue

Reported by jmatth...@duosecurity.com, Oct 29

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.20 Safari/537.36

Example URL:
Most recent was: https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217

Steps to reproduce the problem:
1. Browse to GSuite or Facebook 
2. Notice some functionality isn't working, generally XHR driven features
3. Clear cached images and files via Clear Browsing Data...
4. Reload

What is the expected behavior?
I would hope that I would not have to clear cache frequently. I have only experienced this issue on MacOS 10.14. Canary does not seem to exhibit this behavior, or I haven't used it long enough to reproduce it.

What went wrong?
I'm able to use the inspector to figure out what XHR requests are failing. I can clearly see ERR_CONTENT_DECODING_FAILED and then I can reproduce that by trying to visit the URL directly. 

Once I clear cache everything works again for an indeterminate period of time. 

Did this work before? N/A 

Chrome version: 71.0.3578.20  Channel: beta
OS Version: OS X 10.14.1
Flash Version: 

FWIW I realize I'm handing you a bit of a heisenbug. I'm interested in anything that might help in mitigating this before we roll 10.14 internally.
 
chrome-net-export-log.json
196 KB View Download
When you try to visit the URL directly, do you see a ERR_CONTENT_DECODING_FAILED error page, or just a truncated response?  Seeing the error page indicates the request failed before we received sufficient data to start decoding anything useful, while getting a partial response means that the data is corrupted part way through.

Google and Facebook both use Brotli encoding, at least for some responses.  I'm not sure how widespread that is outside Google.  Of course, that's not proof of anything.

This failure sounds to me like either there's most likely something going wrong at either the cache layer, or the filter layer, though not at all confident of that.

I assume you aren't using an SSL-decrypting proxy?
Cc: morlovich@chromium.org
I do see the 304 response with "content-encoding: br" and DISK_CACHE_ENTRY read:

=134792 [st= 1]     +HTTP_TRANSACTION_READ_HEADERS  [dt=12]
t=134804 [st=13]        HTTP2_STREAM_UPDATE_SEND_WINDOW
                        --> delta = 10420224
                        --> stream_id = 3
                        --> window_size = 10485760
t=134804 [st=13]        HTTP_TRANSACTION_READ_RESPONSE_HEADERS
                        --> HTTP/1.1 304
                            status: 304
                            access-control-allow-credentials: true
                            cache-control: public,max-age=31536000,immutable
                            content-type: application/x-javascript; charset=utf-8
                            x-content-type-options: nosniff
                            x-xss-protection: 0
                            content-security-policy: default-src * data: blob:;script-src *.facebook.com *.fbcdn.net *.facebook.net *.google-analytics.com *.virtualearth.net *.google.com 127.0.0.1:* *.spotilocal.com:* 'unsafe-inline' 'unsafe-eval' *.atlassolutions.com blob: data: 'self';style-src data: blob: 'unsafe-inline' *;connect-src *.facebook.com facebook.com *.fbcdn.net *.facebook.net *.spotilocal.com:* wss://*.facebook.com:* https://fb.scanandcleanlocal.com:* *.atlassolutions.com attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' chrome-extension://boadgeojelhgndaghljhdicfkmllpafd chrome-extension://dliochdbjfkdbacpmhlcpmleaejidimm;
                            timing-allow-origin: *
                            expires: Tue, 29 Oct 2019 16:55:34 GMT
                            last-modified: Mon, 01 Jan 2001 08:00:00 GMT
                            access-control-allow-origin: *
                            vary: Accept-Encoding
                            content-encoding: br
                            content-md5: UVs3Qe9b08CYFxJskOEQtg==
                            date: Mon, 29 Oct 2018 19:32:13 GMT
t=134804 [st=13]     -HTTP_TRANSACTION_READ_HEADERS
t=134804 [st=13]      HTTP_CACHE_WRITE_INFO  [dt=1]
t=134805 [st=14]      HTTP_CACHE_READ_INFO  [dt=0]
t=134805 [st=14]      NETWORK_DELEGATE_HEADERS_RECEIVED  [dt=0]
t=134805 [st=14]      URL_REQUEST_FILTERS_SET
                      --> filters = "BROTLI"
t=134805 [st=14]   -URL_REQUEST_START_JOB
t=134805 [st=14]    URL_REQUEST_DELEGATE_RESPONSE_STARTED  [dt=0]
t=134805 [st=14]    HTTP_CACHE_READ_DATA  [dt=0]
t=134805 [st=14]    URL_REQUEST_JOB_BYTES_READ
                    --> byte_count = 9592
t=134806 [st=15]    FAILED
                    --> net_error = -330 (ERR_CONTENT_DECODING_FAILED)
t=134806 [st=15] -REQUEST_ALIVE
                  --> net_error = -330 (ERR_CONTENT_DECODING_FAILED)


83534: DISK_CACHE_ENTRY
https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217
Start Time: 2018-10-29 15:32:11.177

t=132874 [st= 0] +DISK_CACHE_ENTRY_IMPL  [dt=60]
                  --> created = false
                  --> key = "https://static.xx.fbcdn.net/rsrc.php/v3irX84/yI/l/en_US/TdX884thumC.js?_nc_srcc=3217"
t=132874 [st= 0]   +ENTRY_READ_DATA  [dt=0]
                    --> buf_len = 5232
                    --> index = 0
                    --> offset = 0
t=132874 [st= 0]   -ENTRY_READ_DATA
                    --> bytes_copied = 5232
t=132932 [st=58]   +ENTRY_WRITE_DATA  [dt=1]
                    --> buf_len = 5232
                    --> index = 0
                    --> offset = 0
                    --> truncate = true
t=132933 [st=59]   -ENTRY_WRITE_DATA
                    --> bytes_copied = 5232
t=132933 [st=59]   +ENTRY_READ_DATA  [dt=0]
                    --> buf_len = 16
                    --> index = 2
                    --> offset = 0
t=132933 [st=59]   -ENTRY_READ_DATA
                    --> bytes_copied = 16
t=132933 [st=59]   +ENTRY_READ_DATA  [dt=0]
                    --> buf_len = 32768
                    --> index = 1
                    --> offset = 0
t=132933 [st=59]   -ENTRY_READ_DATA
                    --> bytes_copied = 9592
t=132934 [st=60]    ENTRY_CLOSE
t=132934 [st=60] -DISK_CACHE_ENTRY_IMPL

> When you try to visit the URL directly, do you see a ERR_CONTENT_DECODING_FAILED error page, or just a truncated response?

The error page unfortunately :/

> I assume you aren't using an SSL-decrypting proxy?

Nope, to wit this happens when connected to both my work and home wi-fi network. 
Since Facebook and Google use HTTPS, we can conclude it's not a network issue, unless there's an SSL-decrypting MitM (Which seems unlikely, though suppose malware doing that without your knowledge, having modified your root cert store is possible, I think we can pretty much ignore that possibility).

You, however, seem to be the only one who's running into this.

I don't think an extension could cause this.

Is this only happening on one laptop?  It's possible it's failing RAM, HDD, or some other system component.  If it's more than one system, that possibility seems pretty unlikely.

It could be something mucking with the on-disk files, I suppose.

If you're wondering why I'm not looking at Chrome possibilities - I just don't see a way for a Chrome bug to be doing this to you, and no one else.  There just aren't any configurable Chrome-bits that I can think of that affect this.  That's certainly not proof it isn't Chrome, just can't think of any place to go poking if it is a Chrome issue.
Is Brotli support always on?
Yes, it is, unless someone set up a new field trial that I'm unaware of.  I also checked the log, and we sent br in the accept-encoding line, and the server also send Vary: Accept-Encoding.  I also believe we implicitly add that to all responses, anyways, so pretty sure it's not that.

I should have mentioned that earlier.
Initial request seems to be content-encoding: br as well, so it's not a bitflip in non-updateable headers table.

It may be worth trying to switch the cache backend to simple; if the bug still reproduces with that we could at least pull the cache file for the specific URL....

I'm going to work from another 10.14 partition tomorrow to see if I can repro this. There may be others at Duo with this same issue. I'm waiting on reports.
Labels: Needs-Triage-M71
I disabled spotlight indexing on files and folders and this behaviors seems to have ceased completely for me.
Status: WontFix (was: Unconfirmed)
Thanks for the followup!  I'm going to go ahead and close the issue, but if it reappears while indexing is still disabled, please comment here and I'll re-open (If it happens again more than a week or two from now, it's best to file a new bug).
So metrics do suggest an increase in error rate for this error on OS X 10.14 and 10.14.1 with blockfile cache, with it being much higher than older OS versions or simplecache (apparently) regardless of version... 
I've gotten 2 new reports of this issue and have had it re-occur myself (though less frequently)

What diagnostic information would be helpful here, it seems the net exports might not contain the best data?
Status: Untriaged (was: WontFix)
Opening the bug against, per comment 13.
Just had this happen to me again, this time in google calendar with the Zoom meetings plugin (clicking the button did nothing because the JS assets failed to load with ERR_CONTENT_DECODING_FAILED). Clearing cache, reloading page fixed the issue for now.

What kind of diagnostics should I provide if this happens again?

macOS 10.14.1
Chrome Version 70.0.3538.77

I ran into 'ERR_CONTENT_DECODING_FAILED' this AM.  FWIW, I had to reboot my mac this AM because it was non-responsive when I tried to wake it.  I hit 'ERR_CONTENT_DECODING_FAILED' after Chrome restored my tabs.  Included is a trace file

chrome-net-export-log.json
709 KB View Download
Re: comment #15, #16: what file system are you folks using?

Sadly our logging may be useless/way too high-level here, since there is a pretty high-chance of an OS bug. The log in comment #16 is interesting in that it's using gzip, not brotli, though.

I'm on APFS (Encrypted)
APFS (Encrypted) also. I think 10.14 forces everyone to APFS now (even if you pass the disable option in the updater).
Oh, looks like my Mac got converted as well. Was thinking it may have been too old for that. Well, I have a chance of reproducing it then....
I get `Failed to load resource: net::ERR_CONTENT_DECODING_FAILED`
chrome-net-export-log.json
1.3 MB View Download
chrome-net-export-log-kibana.json
743 KB View Download
Looks a lot like the other logs - we're reading from the cache when we get the error.  morlovich is the expert there, but I believe we delete partial entries from the cache on error.  If we don't, I suppose it's possible that the issue is we're getting bad data from lower layers of the stack, as opposed to getting corrupted data when reading from a cache entry for a previously successfully received file.
The resource byte length matches what I get when I transfer it myself, so something is probably going wrong with payload bytes themselves. I plan on trying to see if hacking the performance stress test to verify the read data can reproduce it. 

To folks suffering from the problem: going to chrome://flags and forcing on "Simple Cache for HTTP" will likely serve as a workaround.  That backend is slower on Macs, especially those with spinning media, but it's likely preferable to what you're seeing.

FYI we're seeing a spike in crashes in Sheets for users on Mac 10.14, and one person who's gotten back to us so far has reported seeing ERR_CONTENT_DECODING_FAILED error in their console.
Just a comment that the simple caching model has resolved this for me.
Labels: Hotlist-Partner-GSuite
Project Member

Comment 27 by bugdroid1@chromium.org, Nov 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/67e42af2a372cfbc6e24ed867a62cab286bb22b7

commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7
Author: Maks Orlovich <morlovich@chromium.org>
Date: Thu Nov 22 14:05:57 2018

Workaround apparent data corruption in blockfile on OS X 10.14 by switching backends.

This is slower, but it's better than not loading pages at all since important resources
got corrupted:(

Bug: 899874
Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917
Reviewed-on: https://chromium-review.googlesource.com/c/1347109
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org>
Reviewed-by: Bence Béky <bnc@chromium.org>
Cr-Commit-Position: refs/heads/master@{#610404}
[modify] https://crrev.com/67e42af2a372cfbc6e24ed867a62cab286bb22b7/components/network_session_configurator/browser/network_session_configurator.cc
[modify] https://crrev.com/67e42af2a372cfbc6e24ed867a62cab286bb22b7/components/network_session_configurator/browser/network_session_configurator_unittest.cc

Labels: Merge-Request-71
As I am asking for a backport incredibly late (thanksgiving is really inconveniently timed for this release), so here is some data to help make the call on whether it should happen or not. Sadly the documentation is google-only.

a) Why is this important?

ERR_CONTENT_DECODING_FAILED error rates for resource fetching, split by OS:
https://uma.googleplex.com/p/chrome/timeline_v2/?sid=7be812d3c219350ff532eb51b2c64a03
This is on stable, and important thing to note that the total is all the subresource fetches, not just failed fetches.

https://buganizer.corp.google.com/issues/119681757 --- report on seeing increased problems from 
Chrome + OS X 10.14 on gSuite side --- gives some idea of how visible the problem is to the users 
of a particular product; and this report makes it clear that it can also affect Facebook 
(highly important for lots of users!), and a product of the bug reporter. 

How risky is this, and what are the downsides?

This switches the backend on OS X >= 10.14 to the same cache backend we use on Android, ChromeOS, and Linux. It's also been used in a 50% experiment in OS X beta for a while.  Stability should not be a problem, but there is a performance regression (which would be why we haven't switched to it). See https://uma.googleplex.com/p/chrome/variations/?sid=ecbb4370eb3601e759a7e7741f3fdf6b 

It looks a bit better restricted to 10.14, but that might just be early adopters having nicer machines. Spinning media regression is particularly bad, though (but probably still better than non-loading); luckily Macs use those less than Windows.

Doing this also makes it hard to find if Apple changed something that helped, but we could probably do a force-off experiment as well, though population sizes maybe tricky.

Any other options?
The backend choice can be done via experiment config outside the scope of any release, but then it can not be OS-version gated. 

Project Member

Comment 29 by sheriffbot@chromium.org, Nov 26

Labels: -Merge-Request-71 Hotlist-Merge-Review Merge-Review-71
This bug requires manual review: We are only 7 days from stable.
Please contact the milestone owner if you have questions.
Owners: benmason@(Android), kariahda@(iOS), kbleicher@(ChromeOS), govind@(Desktop)

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Thank you  morlovich@ for all details at #28.  How safe will the merge will be with known performance regression this late in release cycle? Pls note we only have this week beta release left before stable promotion next week.
Sorry, I don't understand the question; could you elaborate?

Labels: -Merge-Review-71 Merge-Approved-71
Approving merge to M71 branch 3578 based on comment #28 as this can be disabled via finch if anything goes wrong. Pls merge ASAP so we can pick it up for tomorrow's last beta release, cutting RC very soon. Thank you.
Cc: ellyjo...@chromium.org
Seems like morlovich@ is OOO. 

+ellyjones@ (CL reviewer and Mac TL), could you pls  a merge to M71?
Project Member

Comment 34 by bugdroid1@chromium.org, Nov 27

Labels: -merge-approved-71 merge-merged-3578
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/16fb85e34b59b442d370930353db9817e0446e75

commit 16fb85e34b59b442d370930353db9817e0446e75
Author: Maks Orlovich <morlovich@chromium.org>
Date: Tue Nov 27 18:47:21 2018

Merge workaround apparent data corruption in blockfile on OS X 10.14 by switching backends.

This is slower, but it's better than not loading pages at all since important resources
got corrupted:(

Bug: 899874
Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917
Reviewed-on: https://chromium-review.googlesource.com/c/1347109
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org>
Reviewed-by: Bence Béky <bnc@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#610404}(cherry picked from commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7)
Reviewed-on: https://chromium-review.googlesource.com/c/1352470
Reviewed-by: Matt Menke <mmenke@chromium.org>
Cr-Commit-Position: refs/branch-heads/3578@{#825}
Cr-Branched-From: 4226ddf99103e493d7afb23a4c7902ee496108b6-refs/heads/master@{#599034}
[modify] https://crrev.com/16fb85e34b59b442d370930353db9817e0446e75/components/network_session_configurator/browser/network_session_configurator.cc
[modify] https://crrev.com/16fb85e34b59b442d370930353db9817e0446e75/components/network_session_configurator/browser/network_session_configurator_unittest.cc

Already done.  :)
Labels: Merge-Merged-71-3578
The following revision refers to this bug: 
https://chromium.googlesource.com/chromium/src.git/+/16fb85e34b59b442d370930353db9817e0446e75

Commit: 16fb85e34b59b442d370930353db9817e0446e75
Author: morlovich@chromium.org
Commiter: mmenke@chromium.org
Date: 2018-11-27 18:47:21 +0000 UTC

Merge workaround apparent data corruption in blockfile on OS X 10.14 by switching backends.

This is slower, but it's better than not loading pages at all since important resources
got corrupted:(

Bug: 899874
Change-Id: I19f7eccff0c8aa119e522aee9cf728934906b917
Reviewed-on: https://chromium-review.googlesource.com/c/1347109
Commit-Queue: Maks Orlovich <morlovich@chromium.org>
Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org>
Reviewed-by: Bence Béky <bnc@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#610404}(cherry picked from commit 67e42af2a372cfbc6e24ed867a62cab286bb22b7)
Reviewed-on: https://chromium-review.googlesource.com/c/1352470
Reviewed-by: Matt Menke <mmenke@chromium.org>
Cr-Commit-Position: refs/branch-heads/3578@{#825}
Cr-Branched-From: 4226ddf99103e493d7afb23a4c7902ee496108b6-refs/heads/master@{#599034}
Labels: -Merge-Merged-71-3578
Thank you!
Thanks Matt!

Owner: morlovich@chromium.org
Status: Fixed (was: Untriaged)
Fixed and merged?
Status: ExternalDependency (was: Fixed)
OSX 10.14 is still corrupting the blockfile cache, so I don't think that's really fixed?
This is happening consistently for me on my personal mac running stable Chrome. Let me know if there's anything I can provide to help with a fix.
If you have a source checkout there for some reason querying the messed up URL with cachetool will provide some information, though likely not enough to be actionable (so not worth getting a checkout if one isn't already handy).
The machine doesn't have a checkout, and I've never tried building on macOS, so I don't know that I can help much.

Is the issue well understood at this point? If clearing my cache will unbreak my machine, I'll go ahead and do that. I'll hold off for the moment in case there's some useful forensics that could be performed.
No, it's not well understood. We know that the cache backend that has been used on OS X basically forever and has hardly changed in the last couple of years has suddenly started showing data corruption with OS X 10.14, but I have not been able to reproduce it, and so there is no root cause. 

It should, however be, worked around in M71 starting from 71.0.3578.75 (by using a different backend).


Hi everyone.
Just weighing in that myself and a couple of colleagues are seeing these symptoms too.
I'm running Mac OS 10.14 (18A391), Chrome Version 70.0.3538.110 (Official Build) (64-bit).
I have intermittently received the ERR_CONTENT_DECODING_FAILED in the console for the following URLs, then mention-me.com URLs are served from an AWS S3 bucket. 
- https://jmsyst.com/bootstrap/css/bootstrap.css
- https://jmsyst.com/bootstrap/css/bootstrap-responsive.css
- https://static-demo.mention-me.com/dist/ShareEmailViaProvider-c6d548a7e9e18f9f786bf5059fc75ff7ea41d18b.e63b1e597e18fc53b63b.js
- https://static-demo.mention-me.com/css/2dfe92a-d8cc6aa.css

I appreciate it may not help narrow down the cause of the data corruption, but it just shows that it's not just Facebook/Google hosted files that are being affected. (I don't know much about brotli, if you have any good resources for research I'd be up for a bedtime read..)

Many thanks for producing a temporary fix.
Antony
Cc: d...@fb.com
I hit this today as well.
Cc: pinkerton@chromium.org
Is a radar filed? If so, I can help escalate this with Apple. If not, marking it externalDependency will only cause this to wither with no further action. 
No radar report; we don't have any testcase smaller than Chrome (and it's not reproducible on demand, just happens enough to clearly affect users a lot).

 Issue 915104  has been merged into this issue.
Status: Assigned (was: ExternalDependency)
Are there unit tests that stress-test the disk i/o in the cache that might help reproduce this? Should we try to write some? 

Apple's not going to do anything without a radar filed, thus having it be externalDep isn't useful. Moving back to Assigned so it's tracked. 

Also, is there be a bug filed to revert the switch to the slower backend when this is finally resolved by Apple? We'll likely need to have some kind of special casing by OS-version. 
There is a stress test in net_perftests, but it (well, modified to actually verify the payload) didn't seem to trigger the bug for me. 

morlovich@: is there a bug as requested in #51?

Sign in to add a comment