Filenames in NFD are not normalized to NFC on upload
Reported by
christia...@gmail.com,
Dec 14 2016
|
|||||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36 Example URL: https://cfm.ciscllc.de/index.cfm Steps to reproduce the problem: 1. Upload a file with a special character (ä,ü,ö, ...) 2. The script should rename it and remove blanks, replace ä with a, ö with o, ü with u, and some other 3. This characters are not replaced in chrome (at least latest) and FF on MacOS and OS X (tested from 10.11 and up) Chrome on Windows, Linux are working as expected, so this is a Mac-Chrome problem. Safari on same MacOS and OS X is working! What is the expected behavior? Characters should be renamed. Only umlaute are NOT replaced - ä becomes a?, ö becomes o? Server FS should be no problem, as any other OS with Chrome and FF and even OS X with safari IS working! What went wrong? This characters are not replaced in chrome (at least latest) and FF on MacOS and OS X (tested from 10.11 and up) Does it occur on multiple sites: Yes Is it a problem with a plugin? N/A Did this work before? N/A Does this work in other browsers? Yes Chrome version: 54.0.2840.99 Channel: stable OS Version: 10.11 Flash Version: Shockwave Flash 24.0 r0
,
Dec 15 2016
This is a server-side issue. All browsers except Safari send filenames without any modification. Only Safari normalizes filenames into NFC form, and the server doesn't take care of NFD filenames.
,
Feb 14 2017
I'm afraid we can't leave it up to web servers. Even gmail doesn't take care of this issue. Yesterday, I attached two files with Korean names from Mac and the recipient on Windows was puzzled at file names in NFD. Moreover, no file system other than Mac OS filesystem converts NFD to NFC so that the filename sent from Mac in NFD would take 2 ~ 3 times more characters than NFC when saved to their disk on Windows/Linux/Android. '한글' in NFC takes two code points while in NFD, it takes 6 code points and can be shown as ᄒ ᅡ ᆫ ᄀ ᅳ ᆯ). I don't know today, but Firefox/Mozilla certainly did normalize filenames to NFC in mid 2000's because that's what I coded back then. [1] I'm curious why Firefox has changed its behavior since. And, I thought Chrome at one point (perhaps because Webkit did?) had normalized filenames to NFC on upload. Perhaps, this issue has to be discussed at W3C/WHATWG. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=227547
,
Feb 21 2017
We stopped NFC some years ago because of inconsistency with File.name, which was not NFC. A) No normalization for file names in FormData and File.name B) NFC for file names in FormData and File.name C) NFC for file names in FormData, no normalization for Filename. Firefox an Chrome are A, WebKit is C. IMO, both of A and B are acceptable. If we don't expose NFD file names at all, it's user-invisible implementation detail. Kinuko-san, did you have NFC/NFD issues on filesystem API?
,
Feb 24 2017
,
Apr 5 2017
,
Jun 30 2017
kinuko@, tzik@, do you have any insight into file name normalization?
,
Jun 30 2017
,
Oct 25 2017
,
Dec 27 2017
Kinuko-san, who shall we consult this for making any progress? This issue has been stalled for a long time.
,
Jan 9 2018
Would it be possible to normalize all parts that are not in the CJK compatibility ideographs block of Unicode to NFC? This seems to be (very approximately) what is needed to "undo" macOS decomposition without making breaking changes to unified characters used in names.
,
Jan 9 2018
|
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by nyerramilli@chromium.org
, Dec 14 2016