New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 676762 link

Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

FTP site parse error

Reported by tar...@tarick.nl, Dec 23 2016

Issue description

Chrome Version       : 55.0.2883.87 (Official Build) m (32-bit)
URLs (if applicable) : ftp://ftp.ni.com/pub/devzone/
Other browsers tested:
  Add OK or FAIL, along with the version, after other browsers where you
have tested this issue:
     Safari: -
    Firefox: OK
         IE: OK

What steps will reproduce the problem?
(1) go to: ftp://ftp.ni.com/pub/devzone/

What is the expected result?
a ftp website file listing

What happens instead?
empty page

Please provide any additional information below. Attach a screenshot if
possible.

 
Components: Blink>HTML>Parser
Labels: -Type-Bug -Pri-3 hasbisect-per-revision M-57 OS-Linux OS-Mac OS-Windows Pri-1 Type-Bug-Regression
Owner: jinsuk...@chromium.org
Able to reproduce the issue on Windows 7, Mac 10.12.2 and Linux-Ubuntu-14.04  using chrome stable version #55.0.2883.87 and latest canary #57.0.2962.0.

This is regression issue broken in M54.Please find the bisect information as below

Bisect Information:
=====================
Good build: 54.0.2825.0  Revision(410913)
Bad Build : 54.0.2826.0  Revision(411209)

Change Log URL:
--------------
https://chromium.googlesource.com/chromium/src/+log/713aedbdd27d1bd0d431a02220748c740b471af9..a160544b475601684a4f88677f2077cd5312f8ff

Possible suspect:
----------------
https://chromium.googlesource.com/chromium/src/+/70de1704feea35840fbed4bc1e864a24b620569e

Review-Url:https://codereview.chromium.org/2168003003

Jinsuk Kim@  could you please look into this issue if it is related to your change,else please help us in finding the appropriate owner for this issue.

Thanks...!!

Status: Assigned (was: Unconfirmed)

Comment 3 by tkent@chromium.org, Dec 27 2016

Components: -Blink>HTML>Parser Internals>Network>FTP

Comment 4 by mmenke@chromium.org, Dec 28 2016

Cc: mmenke@chromium.org
I've verified that the problem is definitely in the character set detection/conversion code.  The new logic detects the encoding as Shift_JIS, and then ICU returns some sort of error.

Not going to dig deeper (Not familiar with the guts of ICU, just wanted to confirm it was an issue with the new content-encoding detection).
Status: Started (was: Assigned)
Somehow the new detector thinks the text is in Shift_JIS while it apparently is in latin. Looking now.
Cc: phajdan.jr@chromium.org
The new encoding detector(CED) got tripped by a couple of occurrences of byte 0x82 in the ftp directory listing, and thinks that the listing is in Shift_JIS (0x82 appears very frequently in Shift_JIS-encoded char sequences where the value is used as a sort of char-opening byte).

This is unfortunate limitation of the new detector. I'm afraid we should bite the bullet and just take what it does, considering the fact that no detector is perfect and its merits outweighs the shortcomings in terms of speed and accuracy (it gives the right encoding for lots of other documents which couldn't be handled by the old detector that was replaced by CED).  

One remedy I can think of is make the conversion less strict to return the listing even when conversion on some filenames fails. Other web browsers like Firefox also suffer the same problem and shows the filename with a broken character in it but still displays the whole listing for users to at least use other files.

This is possible by switching the option param of |CodepageToUTF16| from base::OnStringConversionError::FAIL to base::OnStringConversionError::SKIP.

+phajdan.jr owner of net/ftp for his thoughts.

One thing to correct is that the listing is not in latin (ISO-8859-1) but simply broken. 'é' (small e acute), the letter that caused the conversion to fail, is encoded to 0x82. It should have been 0xe9, a kind of glitch that often happens due to mismatch in encoding when uploading files. Firefox also shows gibberish characters for them, and cannot download them via click.

Either the site owner or the uploader was aware of this issue - probably that's why there are a version of same files with normalized name ('e' instead of 'é') also uploaded and available for download. So displaying all the other files in the listing can get around this problem of ftp listing having files of broken encoding.

Just to vindicate CED - old encoding detector simply ignored those odd letters and detected it as latin, while CED made the best effort to deduce the right encoding with those letters as a clue and came up with Shift_JIS, which I don't think is really a bad job after all.
Cc: jinsuk...@chromium.org
 Issue 673781  has been merged into this issue.
I see, so unless CED distinguish 0x82 of FTP and Shift_JIS, it won't work at all.
I have no idea how IE11 and many file managers handle this, isn't it normal to revert change if new update breaks things?
Depends on the significance of the breakage, and what it broke.  This breakage seems minor (It's FTP, and only FTP sites sending us bad data).

It broke sites sending us bad data by guessing the wrong text encoding, FTP has no specified way of figuring out encoding, so beyond being uncommon, behavior here can be no better than best effort.
Not to say this shouldn't be fixed, but it doesn't seem like this is a big enough issue to be worth rushing a fix to stable channel.
I got the point, thanks for reply.
So if there is really Shift_JIS in FTP file list, like in my case, fix for
CED might be impossible.

2017-01-04 23:43 GMT+09:00 mme… via monorail <
monorail+v2.3079273537@chromium.org>:
With real Shift_JIS files in the list, the new detector may actually be working better, it's just that our old error handling code is causing problems.  So rather than a revert, jinsukkim's suggestion of improving how we handle decoding errors would be a much better fix than just reverting the change.

Comment 14 by tar...@tarick.nl, Jan 5 2017

Hi, 
I was the one who encountered the error and I am myself a programmer so thought I might chip as well, considering the comprehensive discussion that followed.

I do agree that reverting the changes would be way to mayor for such a small bug and indeed just changing from ::fail to ::skip would fix the issue.

It is true that when a webmaster doesn't notice the special characters and doesn't upload a copy without the special characters, it would still fail to produce the downloadable links. But that is preferred over no page showing up at all in my book.
Project Member

Comment 15 by bugdroid1@chromium.org, Jan 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ed3ef90ecc3c01246a3d838fbeefdab72c94c228

commit ed3ef90ecc3c01246a3d838fbeefdab72c94c228
Author: Jinsuk Kim <jinsukkim@chromium.org>
Date: Fri Jan 06 02:56:52 2017

Make FTP directory parser less strict

FTP directory listing returns empty result if encoding detection or
conversion fails. This CL puts less strict constraint on conversion
so that the whole list can survive some filenames with broken chars
and be shown to users. Other browsers seem to work this way.

The broken chars can happen due to wrong encoding detection, which
cannot be avoided 100%, as reported in the bug.

BUG= 676762 
R=mmenke@chromium.org, phajdan.jr@chromium.org

Review-Url: https://codereview.chromium.org/2608213002 .
Cr-Commit-Position: refs/heads/master@{#441856}

[add] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/data/ftp/dir-listing-ls-34
[add] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/data/ftp/dir-listing-ls-34.expected
[modify] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/ftp/ftp_directory_listing_parser.cc
[modify] https://crrev.com/ed3ef90ecc3c01246a3d838fbeefdab72c94c228/net/ftp/ftp_directory_listing_parser_unittest.cc

Status: Fixed (was: Started)
Project Member

Comment 17 by bugdroid1@chromium.org, Jan 6 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9c484365968491ca416461d7c982aca155ff26bf

commit 9c484365968491ca416461d7c982aca155ff26bf
Author: loyso <loyso@chromium.org>
Date: Fri Jan 06 03:58:18 2017

Revert of Make FTP directory parser less strict (patchset #3 id:40001 of https://codereview.chromium.org/2608213002/ )

Reason for revert:
Fails on all platforms.

https://uberchromegw.corp.google.com/i/chromium.win/builders/Win7%20Tests%20%281%29/builds/62042

Original issue's description:
> Make FTP directory parser less strict
>
> FTP directory listing returns empty result if encoding detection or
> conversion fails. This CL puts less strict constraint on conversion
> so that the whole list can survive some filenames with broken chars
> and be shown to users. Other browsers seem to work this way.
>
> The broken chars can happen due to wrong encoding detection, which
> cannot be avoided 100%, as reported in the bug.
>
> BUG= 676762 
> R=mmenke@chromium.org, phajdan.jr@chromium.org
>
> Review-Url: https://codereview.chromium.org/2608213002 .
> Cr-Commit-Position: refs/heads/master@{#441856}
> Committed: https://chromium.googlesource.com/chromium/src/+/ed3ef90ecc3c01246a3d838fbeefdab72c94c228

TBR=phajdan.jr@chromium.org,mmenke@chromium.org,jinsukkim@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= 676762 

Review-Url: https://codereview.chromium.org/2614103002
Cr-Commit-Position: refs/heads/master@{#441866}

[delete] https://crrev.com/b4271bc30bd0c92be3fcf3c4cc92737edc55e7fb/net/data/ftp/dir-listing-ls-34
[delete] https://crrev.com/b4271bc30bd0c92be3fcf3c4cc92737edc55e7fb/net/data/ftp/dir-listing-ls-34.expected
[modify] https://crrev.com/9c484365968491ca416461d7c982aca155ff26bf/net/ftp/ftp_directory_listing_parser.cc
[modify] https://crrev.com/9c484365968491ca416461d7c982aca155ff26bf/net/ftp/ftp_directory_listing_parser_unittest.cc

Sign in to add a comment