Files are CHANGED (corrupted) on DOWNLOAD unexpectedly
Reported by
alup...@gmail.com,
Mar 2 2016
|
|
Issue descriptionUserAgent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2665.0 Safari/537.36 Example URL: https://www.samba.org/ftp/rsync/src/ Steps to reproduce the problem: 1.Left-click to download 'rsync-3.1.2.tar.gz' 2. 3. What is the expected behavior? File to be downloaded correctly (identical to the source, same SHA sums, etc.) What went wrong? Instead of the original 872K file you get a 3MB file 3287040 Mar 2 12:02 rsync-3.1.2.tar.gz NOTE: with Firefox-44.0.2, you get the correct file 892724 Mar 2 12:18 rsync-3.1.2.tar.gz Did this work before? Yes Months ago, as I suspect Chrome version: 51.0.2665.0 Channel: dev OS Version: 4.4.0 Flash Version: Shockwave Flash 11.2 r999 Tested more extensively on the current, compiled Chromium Version 51.0.2665.0, r378433, Linux 32-bit (but see below for other systems, browsers) Comments: This happens inexplicably, some of these files (standard tarballs) download correctly, some NOT (this is obviously a HUGE problem - the user's false sense of security). It does NOT depend on Linux system size (32- or 64-bit) and what is WORSE, the problem has been carried over all the way to the current release of CHROME (!): Ubuntu 15.10 (64-bit) (latest) with Chrome 48.0.2564.116 (64-bit) (latest) To TOP it all OFF, I found this bug in existence on Windows 10 Home (64-bit, x64-based processor) with Chrome 48.0.2564.116 m This BUG is all the more SEVERE since - One of the BASIC function of a browser is to download (and not for Voice/Sound Recognition) - The users are not made aware of ANY problems while their files are corrupted (yes, a file, on destination, not equal to the one on arrival IS called CORRUPTED) - Has NOT been detected by any "developer", QA, etc. all the way from CHROMIUM Development to the official CHROME release for who knows how long (months, I would guess)!!! ---------------------------------------------------------- Additional test (good download) Go to https://openssl.org/source/ Left-click to download 'openssl-1.0.2g.tar.gz' Now, you get the correct file 5266102 Mar 2 12:08 openssl-1.0.2g.tar.gz (check with: 65 Mar 2 11:48 openssl-1.0.2g.tar.gz.sha256) NOTE: with Firefox-44.0.2, you get the correct file (any surprises? ): 5266102 openssl-1.0.2g.tar.gz ----------------------------------------------------------- NOTE: the "inspiration" for files to test-download: http://www.linuxfromscratch.org/blfs/view/svn/ Examples: rsync-3.1.2: Chapter 15. Networking Programs OpenSSL-1.0.2g: Chapter 4. Security and a host of others to test/compare (and enjoy). -- Alex
,
Mar 2 2016
> This is server configuration issue and the browser is working as expected. "As expected" By whom??? Your response boggles my mind (but not surprises me): 1. How come other browsers (I gave the example, Firefox) don't "react" the same way? (To 'Content-Encoding: gzip', i.e., not as expected by _you_) EVERYBODY ELSE download a file "AS IS", like any file _must_ be downloaded. Sometimes, pdf, etc., the browser _displays_ the file but never ever has a regular browser downloaded while _manipulating_ the file! Who asked the google browser to _decompress_ a file for the user (at the very least, this defeats any checksums on the file that the owner offers - i.e., a potential security breach)? 2. Speaking of the subject 'rsync-3.1.2.tar.gz', it arrives as 'rsync-3.1.2.tar.gz' NOT 'rsync-3.1.2.tar', i.e., without the ".gz", which MIGHT be a clue that the file was decompressed for the user, courtesy of the great Chrome(*ium) browser. Not all users can un-gzip the file, not even any old 'tar' :) 3. HOW about the other example (identically gzipped) file, 'openssl-1.0.2g.tar.gz'? 4. I don't have time to list now, but many other tarballs (whether 'gz', 'xz') some arrived uncompressed (as they should), some not. 5. Where is this "uncompress-download" documented at all (or that it is a feature that is under control of the user (i.e., in the broswer settings))? 6. Please give an example of other browsers which adopted this "new and improved" feature. In summary, by your saying "the browser is working as expected" you actually said, "this bug is a bug because it was coded that way"
,
Mar 2 2016
Just a quick note that "as expected" means "According to the HTTP spec", specifically https://tools.ietf.org/html/rfc2616#page-118 which describes the behavior of the Content-Encoding field. If that field says that the content being delivered is compressed for transit by using gzip, the browser is, by the HTTP spec, supposed to uncompress it. Sometimes servers are misconfigured and don't follow the spec. If such servers are common enough, sometimes browsers are coded to work around the server bugs. That doesn't change the fact that according to the spec it's a bug in the server.
,
Mar 2 2016
Under "3.5 Content Codings", I read <snip> Frequently, the entity is stored in coded form, transmitted directly, _and only decoded by the recipient_. [emphasis mine] <snip> > If that field says that the content being delivered is compressed > for transit by using gzip, the browser is, by the HTTP spec, supposed to uncompress it. 1. Where, in any HTTP spec, do you see stated/spec'd that "the browser is, by the HTTP spec, supposed to uncompress it"? Or for that matter, "it would be nice to", "it would help the poor user if", "it would make your browser better than anything currently around", etc.? 2. You have NOT answered the main questions: 2.1. Why no other browser has ever attempted to decompress "in transit"? 2.2. Why this Chrome(*ium) decompression (whether acceptable or not) hasn't been announced/documented? 3. Do you realize this defeats the SECURITY of a file (transformed maliciously or unintentionally)? Any reputable/important file (which is the case with all tarball files here) the author always offers the recipient various checksums to guarantee the INTEGRITY of the file. 4. > Sometimes servers are misconfigured and don't follow the spec. If such servers are common enough, sometimes browsers are coded to work around the server bugs. That doesn't change the fact that according to the spec it's a bug in the server. % file openssl-1.0.2g.tar.gz openssl-1.0.2g.tar.gz: gzip compressed data, from Unix, last modified: Tue Mar 1 08:36:56 2016, max compression % file rsync-3.1.2.tar.gz rsync-3.1.2.tar.gz: gzip compressed data, from Unix, last modified: Mon Dec 21 15:23:09 2015 Are you implying that the server of 'openssl-1.0.2g.tar.gz' was "misconfigured" (by, say, not properly disclosing that the file was gzip compressed) so that "according to the spec", Chrome(*ium) "worked around" this server "bug" and delivered the file _untouched_, AS YOU SHOULD HAVE according to me, the rest of the user universe and the history of transmitting a tarball (normally compressed) file. 5. Educational note: The billions of users downloading a tarball (compressed or not - the great majority compressed) after checking the checksums for integrity, SECURITY (non-malicious changes, etc.), type tar -xf <tarball file> 'tar' is intelligent enough to decompress the file (if such file came compressed - as it should) before actually untarring.
,
Mar 3 2016
Re #4: Thank you for your feedback. 1. "When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field." 2.1. This has been answered in #3. 2.2. This is the spec compliant behavior, it is documented in RFC2616. 3. I agree. It is very unfortunate that the server is misconfigured. I recommend you bringing up this issue with the server administrator. 4. The other way around. There is a file you want to download, call it A. It has some Content-Type. The server can serve this without (further) compression, advertising the actual Content-Type, without any Content-Encoding headers. Then the browser will save the file A unmodified. What happens instead is that the server is misconfigured: it adds a bogus Content-Encoding: gzip header, without actually performing (another) gzip compression on A. As a result of this, the specs says that the browser must decompress A, so what the user gets is not A. Note that whether A is a gzip-compressed file itself is irrelevant. 5. Thanks, I actually was not aware that one could drop the -z flag. Good to know.
,
Mar 3 2016
Also see "Automatic Decompression" section at https://redmine.lighttpd.net/projects/1/wiki/Docs_ModSetEnv: this confirms that the expected behavior on the browser's part is to decompress the response if Content-Encoding: gzip header is present.
,
Mar 5 2016
In Comment #4, alupu01 asks: 1. Where, in any HTTP spec, do you see stated/spec'd that "the browser is, by the HTTP spec, supposed to uncompress it"? In Comment #5, b...@chromium.org replies: 1. "When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field." ------ WHERE does its specify that the BROWSER (of all possible entities) MUST apply the decoding mechanism in order to obtain the media-type. It just says "must be applied". Everybody, other than Chrom(*ium), understands that if a "decoding must be applied", that is by and at the user. For decompressing a 'gz' file, the last thing a user needs a _browser_ when the 'gzip' is one of the most widely available utilities; and ironically, as I mentioned in my Comment #4, point 5., 'tar' itself performs this function for free. Since the "coding" was first applied ('Compress' in early Unix systems, 'gzip' and later 'bz2' and 'xz') the "decoding" (i.e., decompression) has been applied at the User end _by the User_. All the other browsers (Firefox, IE, Opera, Safari, ETC.) have always understood this simple and obvious fact since the early days (please note the date, June 1999, of RFC 2616 - when only 'gzip' was in existence; the more widely used now, 'bzip2' and 'xz', were only a glimmer in developers' eyes at the time of RFC 2616 release) At the very least, a 'gz' file decompressed by the Chrome browser at the origin (WRONG any way, as I've been saying all along) should arrive at the user end WITHOUT the ".gz" extension so as to at least give the user a (late) WARNING that the SECURITY of the file is in jeopardy since the file now fails on any checksums that the user was provided with by the (compressed) file author.
,
Apr 9 2016
Update The GOOD Somebody at google finally read the RFC-2616 correctly and now the file 'rsync-3.2.1.tar.gz' (this bug submission) and a few other "gz" files I spot- checked are downloaded "as is" (i.e., not decompressed), like they should (I told you so :). Now, on arrival, 'rsync-3.2.1.tar.gz' has size 892724 bytes (A-OK) with both the latest 51.0.2700.0 Dev and 49.0.2623.110 Chrome Stable. the BAD Some files like 'openssh-7.2p2.tar.gz' are still decompressed at times on download (at the whim of the browser it seems - on both latest versions above): Go to 'http://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/' Click (left mouse button: to download file) on 'openssh-7.2p2.tar.gz' Size on arrival: 7249920 bytes (decompressed - BAD) Go to 'ftp://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/' Click (left mouse button: to download file) on 'openssh-7.2p2.tar.gz' Size on arrival: 1499808 bytes (in "pristine condition - GOOD) and the (possibly) PRETTY Once the developers move past the lame explanations (like server misconfigured, the wrong header, etc.) and spend a tenth as much time as I did to provide an air-tight submission here, they'll fix this UGLY bug too. As a reminder, Wget, Firefox and IE have always downloaded these files (whether FTP or HTTP) correctly (you can check :) As another reminder: like the main, "rsync" fix above, this all can still be done discretely. You provide the spiel with the "bad servers, etc.) for public consumption here while behind the scenes in a couple of months both the 51.0.2700.0+ Dev and 49.0.2623.112+ Chrome Stable will come up with the correct file download as if through magic! Come on guys, you can do it! I'm rooting for you! You're now over the uge hurdle - reading and understanding the RFC. Understanding and fixing this little pesky bug is nothing by comparison (explaining how the same file ends up with two different sizes depending on the type of download to someone with High School+ _is_ a bit of a challenge, I admit though). -- Alex |
|
►
Sign in to add a comment |
|
Comment 1 by mef@chromium.org
, Mar 2 2016Status: WontFix (was: Unconfirmed)
I've reproduced this issue on Version 49.0.2623.63 beta (64-bit). It appears that downloaded rsync-3.1.2.tar.gz file is decompressed tar, renaming it into rsync-3.1.2.tar shows content. It is happening because server sends 'Content-Encoding: gzip' response header, so browser applies gzip filter to decompress the content: t=24057 [st= 2] -HTTP_STREAM_REQUEST t=24057 [st= 2] +HTTP_TRANSACTION_SEND_REQUEST [dt=0] t=24057 [st= 2] HTTP_TRANSACTION_SEND_REQUEST_HEADERS --> GET /ftp/rsync/src/rsync-3.1.2.tar.gz HTTP/1.1 Host: www.samba.org Connection: keep-alive Upgrade-Insecure-Requests: 1 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 t=24057 [st= 2] -HTTP_TRANSACTION_SEND_REQUEST t=24057 [st= 2] +HTTP_TRANSACTION_READ_HEADERS [dt=102] t=24057 [st= 2] HTTP_STREAM_PARSER_READ_HEADERS [dt=102] t=24159 [st=104] HTTP_TRANSACTION_READ_RESPONSE_HEADERS --> HTTP/1.1 200 OK Date: Wed, 02 Mar 2016 19:17:03 GMT Server: Apache Content-Type: application/x-gzip Content-Encoding: gzip t=24159 [st=104] -HTTP_TRANSACTION_READ_HEADERS t=24159 [st=104] HTTP_CACHE_WRITE_INFO [dt=0] t=24159 [st=104] HTTP_CACHE_WRITE_DATA [dt=0] t=24159 [st=104] HTTP_CACHE_WRITE_INFO [dt=0] t=24159 [st=104] URL_REQUEST_DELEGATE [dt=0] t=24159 [st=104] URL_REQUEST_FILTERS_SET --> filters = "FILTER_TYPE_GZIP" t=24159 [st=104] -URL_REQUEST_START_JOB This is server configuration issue and the browser is working as expected.