toUpperCase for Georgian strings returns modified strings
Reported by
w.fi...@gmail.com,
Aug 1
|
||||||
Issue descriptionUserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3508.0 Safari/537.36 Steps to reproduce the problem: 1. Type into the console: 'ქართული'.charCodeAt(0) 2. Then type: 'ქართული'.toUpperCase().charCodeAt(0) Or use this jsfiddle: https://jsfiddle.net/dg3o79jc/ What is the expected behavior? Character codes should be equal. What went wrong? The character codes for standard and uppercased strings are different. Did this work before? Yes 68.0.3440.75 Chrome version: 70.0.3508.0 Channel: canary OS Version: 10.0 Flash Version:
,
Aug 1
Bisected to e9126f1d03725c2ae97d524985971d66089eede3 "Update ICU to 62.1" Landed in 69.0.3488.0 I think the codes should be different as per the specification which is now honored by Chrome. However current output in Chrome doesn't make sense and it can't be even shown: all I see is 7 empty boxes. I would expect to see ႵႠႰႧႳႪႨ as a result of 'ქართული'.toUpperCase() I would expect to see 4277 as a result of 'ქართული'.toUpperCase().charCodeAt(0) https://unicode.org/charts/PDF/U10A0.pdf ქ code is 10e5 (4325) ქ uppercase is Ⴕ Ⴕ code is 10b5 (4277)
,
Aug 2
Thanks for filing the issue! C#2 @woxxom: Your inputs were very helpful. Able to reproduce the issue on reported chrome version 68.0.3440.75 and on the latest canary 70.0.3508.0 using Mac 10.13.1, Ubuntu 17.10 and Windows 10. Bisect Information: ----------------------- Good Build: 69.0.3487.0 Bad Build: 69.0.3488.0 Change log from Omahaproxy: https://chromium.googlesource.com/chromium/src/+log/69.0.3487.0..69.0.3488.0?pretty=fuller&n=10000 Suspecting: https://chromium.googlesource.com/chromium/src/+/e9126f1d03725c2ae97d524985971d66089eede3 Review URL: https://chromium-review.googlesource.com/1111818 @Jungshik Shin: Please help in assigning it to the right owner if this is not related to your change. Note: Adding RB-Stable for M-69 as this seems to be a recent regression, please remove if not required.
,
Aug 2
You got empty boxes because you don't have a font to cover newly encoded Georgian uppercase letters. re comment 2: Who said that the uppercase of U+10E5 is U+10B5? Its uppercase is U+1CA5 as Chrome correctly gives you. See https://www.unicode.org/charts/PDF/U10A0.pdf
,
Aug 2
The above PDF has the following: 1. U+10A0 - U+10C5 Capital letters (Khutsuri) This is the uppercase of the old ecclesiastical alphabet. The style shown in the code charts is known as Asomtavruli. See the Georgian Supplement block for lowercase Nuskhuri. 2. U+10D0 - U+10F0 Mkhedruli This is the lowercase of the modern secular alphabet. Modern Georgian orthography uses these letters for most text, including at the beginnings of sentences and names. See the Georgian Extended block for uppercase Mtavruli. See also https://www.unicode.org/charts/PDF/U1C90.pdf that has the following: U+1C90 - U+1CBF Capital letters (Mtavruli) This is the special uppercase of the modern secular alphabet. Modern Georgian orthography uses these letters to emphasize words and phrases analogously to Latin "all caps" style. See the Georgian block for lowercase Mkhedruli.
,
Aug 2
See also http://unicode.org/versions/Unicode11.0.0/ for Georgian changes.
,
Aug 2
,
Aug 2
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by swarnasree.mukkala@chromium.org
, Aug 1