FTP site parse error
Reported by
tar...@tarick.nl,
Dec 23 2016
|
|||||||
Issue descriptionChrome Version : 55.0.2883.87 (Official Build) m (32-bit) URLs (if applicable) : ftp://ftp.ni.com/pub/devzone/ Other browsers tested: Add OK or FAIL, along with the version, after other browsers where you have tested this issue: Safari: - Firefox: OK IE: OK What steps will reproduce the problem? (1) go to: ftp://ftp.ni.com/pub/devzone/ What is the expected result? a ftp website file listing What happens instead? empty page Please provide any additional information below. Attach a screenshot if possible.
,
Dec 26 2016
,
Dec 27 2016
,
Dec 28 2016
I've verified that the problem is definitely in the character set detection/conversion code. The new logic detects the encoding as Shift_JIS, and then ICU returns some sort of error. Not going to dig deeper (Not familiar with the guts of ICU, just wanted to confirm it was an issue with the new content-encoding detection).
,
Jan 3 2017
Somehow the new detector thinks the text is in Shift_JIS while it apparently is in latin. Looking now.
,
Jan 3 2017
The new encoding detector(CED) got tripped by a couple of occurrences of byte 0x82 in the ftp directory listing, and thinks that the listing is in Shift_JIS (0x82 appears very frequently in Shift_JIS-encoded char sequences where the value is used as a sort of char-opening byte). This is unfortunate limitation of the new detector. I'm afraid we should bite the bullet and just take what it does, considering the fact that no detector is perfect and its merits outweighs the shortcomings in terms of speed and accuracy (it gives the right encoding for lots of other documents which couldn't be handled by the old detector that was replaced by CED). One remedy I can think of is make the conversion less strict to return the listing even when conversion on some filenames fails. Other web browsers like Firefox also suffer the same problem and shows the filename with a broken character in it but still displays the whole listing for users to at least use other files. This is possible by switching the option param of |CodepageToUTF16| from base::OnStringConversionError::FAIL to base::OnStringConversionError::SKIP. +phajdan.jr owner of net/ftp for his thoughts.
,
Jan 4 2017
One thing to correct is that the listing is not in latin (ISO-8859-1) but simply broken. 'é' (small e acute), the letter that caused the conversion to fail, is encoded to 0x82. It should have been 0xe9, a kind of glitch that often happens due to mismatch in encoding when uploading files. Firefox also shows gibberish characters for them, and cannot download them via click.
Either the site owner or the uploader was aware of this issue - probably that's why there are a version of same files with normalized name ('e' instead of 'é') also uploaded and available for download. So displaying all the other files in the listing can get around this problem of ftp listing having files of broken encoding.
Just to vindicate CED - old encoding detector simply ignored those odd letters and detected it as latin, while CED made the best effort to deduce the right encoding with those letters as a clue and came up with Shift_JIS, which I don't think is really a bad job after all.
,
Jan 4 2017
,
Jan 4 2017
I see, so unless CED distinguish 0x82 of FTP and Shift_JIS, it won't work at all. I have no idea how IE11 and many file managers handle this, isn't it normal to revert change if new update breaks things?
,
Jan 4 2017
Depends on the significance of the breakage, and what it broke. This breakage seems minor (It's FTP, and only FTP sites sending us bad data). It broke sites sending us bad data by guessing the wrong text encoding, FTP has no specified way of figuring out encoding, so beyond being uncommon, behavior here can be no better than best effort.
,
Jan 4 2017
Not to say this shouldn't be fixed, but it doesn't seem like this is a big enough issue to be worth rushing a fix to stable channel.
,
Jan 4 2017
I got the point, thanks for reply. So if there is really Shift_JIS in FTP file list, like in my case, fix for CED might be impossible. 2017-01-04 23:43 GMT+09:00 mme… via monorail < monorail+v2.3079273537@chromium.org>:
,
Jan 4 2017
With real Shift_JIS files in the list, the new detector may actually be working better, it's just that our old error handling code is causing problems. So rather than a revert, jinsukkim's suggestion of improving how we handle decoding errors would be a much better fix than just reverting the change.
,
Jan 5 2017
Hi, I was the one who encountered the error and I am myself a programmer so thought I might chip as well, considering the comprehensive discussion that followed. I do agree that reverting the changes would be way to mayor for such a small bug and indeed just changing from ::fail to ::skip would fix the issue. It is true that when a webmaster doesn't notice the special characters and doesn't upload a copy without the special characters, it would still fail to produce the downloadable links. But that is preferred over no page showing up at all in my book.
,
Jan 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ed3ef90ecc3c01246a3d838fbeefdab72c94c228 commit ed3ef90ecc3c01246a3d838fbeefdab72c94c228 Author: Jinsuk Kim <jinsukkim@chromium.org> Date: Fri Jan 06 02:56:52 2017 Make FTP directory parser less strict FTP directory listing returns empty result if encoding detection or conversion fails. This CL puts less strict constraint on conversion so that the whole list can survive some filenames with broken chars and be shown to users. Other browsers seem to work this way. The broken chars can happen due to wrong encoding detection, which cannot be avoided 100%, as reported in the bug. BUG= 676762 R=mmenke@chromium.org, phajdan.jr@chromium.org Review-Url: https://codereview.chromium.org/2608213002 . Cr-Commit-Position: refs/heads/master@{#441856} [add] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/data/ftp/dir-listing-ls-34 [add] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/data/ftp/dir-listing-ls-34.expected [modify] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/ftp/ftp_directory_listing_parser.cc [modify] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/ftp/ftp_directory_listing_parser_unittest.cc
,
Jan 6 2017
,
Jan 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9c484365968491ca416461d7c982aca155ff26bf commit 9c484365968491ca416461d7c982aca155ff26bf Author: loyso <loyso@chromium.org> Date: Fri Jan 06 03:58:18 2017 Revert of Make FTP directory parser less strict (patchset #3 id:40001 of https://codereview.chromium.org/2608213002/ ) Reason for revert: Fails on all platforms. https://uberchromegw.corp.google.com/i/chromium.win/builders/Win7%20Tests%20%281%29/builds/62042 Original issue's description: > Make FTP directory parser less strict > > FTP directory listing returns empty result if encoding detection or > conversion fails. This CL puts less strict constraint on conversion > so that the whole list can survive some filenames with broken chars > and be shown to users. Other browsers seem to work this way. > > The broken chars can happen due to wrong encoding detection, which > cannot be avoided 100%, as reported in the bug. > > BUG= 676762 > R=mmenke@chromium.org, phajdan.jr@chromium.org > > Review-Url: https://codereview.chromium.org/2608213002 . > Cr-Commit-Position: refs/heads/master@{#441856} > Committed: https://chromium.googlesource.com/chromium/src/+/ed3ef90ecc3c01246a3d838fbeefdab72c94c228 TBR=phajdan.jr@chromium.org,mmenke@chromium.org,jinsukkim@chromium.org # Skipping CQ checks because original CL landed less than 1 days ago. NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG= 676762 Review-Url: https://codereview.chromium.org/2614103002 Cr-Commit-Position: refs/heads/master@{#441866} [delete] https://crrev.com/b4271bc30bd0c92be3fcf3c4cc92737edc55e7fb/net/data/ftp/dir-listing-ls-34 [delete] https://crrev.com/b4271bc30bd0c92be3fcf3c4cc92737edc55e7fb/net/data/ftp/dir-listing-ls-34.expected [modify] https://crrev.com/9c484365968491ca416461d7c982aca155ff26bf/net/ftp/ftp_directory_listing_parser.cc [modify] https://crrev.com/9c484365968491ca416461d7c982aca155ff26bf/net/ftp/ftp_directory_listing_parser_unittest.cc
,
Jan 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/9276998c3811db276e04aff731b8aebfdae8ce71 commit 9276998c3811db276e04aff731b8aebfdae8ce71 Author: Jinsuk Kim <jinsukkim@chromium.org> Date: Fri Jan 06 21:48:20 2017 Landing again: "Make FTP directory parser less strict" The new testfiles added in the CL had a problem. Fixed it now. This reverts commit 9c484365968491ca416461d7c982aca155ff26bf. BUG= 676762 R=mmenke@chromium.org Review-Url: https://codereview.chromium.org/2613093003 . Cr-Commit-Position: refs/heads/master@{#442064} [add] https://crrev.com/9276998c3811db276e04aff731b8aebfdae8ce71/net/data/ftp/dir-listing-ls-34 [add] https://crrev.com/9276998c3811db276e04aff731b8aebfdae8ce71/net/data/ftp/dir-listing-ls-34.expected [modify] https://crrev.com/9276998c3811db276e04aff731b8aebfdae8ce71/net/ftp/ftp_directory_listing_parser.cc [modify] https://crrev.com/9276998c3811db276e04aff731b8aebfdae8ce71/net/ftp/ftp_directory_listing_parser_unittest.cc |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by sureshkumari@chromium.org
, Dec 26 2016Labels: -Type-Bug -Pri-3 hasbisect-per-revision M-57 OS-Linux OS-Mac OS-Windows Pri-1 Type-Bug-Regression
Owner: jinsuk...@chromium.org