New issue
Advanced search Search tips

Issue 869770 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Aug 2
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux , Windows , Mac
Pri: 1
Type: Bug



Sign in to add a comment

toUpperCase for Georgian strings returns modified strings

Reported by w.fi...@gmail.com, Aug 1

Issue description

UserAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3508.0 Safari/537.36

Steps to reproduce the problem:
1. Type into the console: 'ქართული'.charCodeAt(0)
2. Then type: 'ქართული'.toUpperCase().charCodeAt(0)

Or use this jsfiddle: https://jsfiddle.net/dg3o79jc/

What is the expected behavior?
Character codes should be equal.

What went wrong?
The character codes for standard and uppercased strings are different.

Did this work before? Yes 68.0.3440.75

Chrome version: 70.0.3508.0  Channel: canary
OS Version: 10.0
Flash Version:
 
Labels: Needs-Bisect Needs-Triage-M70
Bisected to e9126f1d03725c2ae97d524985971d66089eede3
"Update ICU to 62.1"
Landed in 69.0.3488.0

I think the codes should be different as per the specification which is now honored by Chrome.
However current output in Chrome doesn't make sense and it can't be even shown: all I see is 7 empty boxes.

I would expect to see ႵႠႰႧႳႪႨ as a result of 'ქართული'.toUpperCase()
I would expect to see 4277 as a result of 'ქართული'.toUpperCase().charCodeAt(0)

https://unicode.org/charts/PDF/U10A0.pdf

ქ code is 10e5 (4325)
ქ uppercase is Ⴕ
Ⴕ code is 10b5 (4277)

Cc: vamshi.kommuri@chromium.org
Components: -Blink Blink>Fonts
Labels: -Pri-2 -Needs-Bisect ReleaseBlock-Stable Triaged-ET M-69 Target-70 RegressedIn-69 hasbisect FoundIn-70 Target-69 FoundIn-69 OS-Linux OS-Mac Pri-1
Owner: js...@chromium.org
Status: Assigned (was: Unconfirmed)
Thanks for filing the issue!
C#2 @woxxom: Your inputs were very helpful.

Able to reproduce the issue on reported chrome version 68.0.3440.75 and on the latest canary 70.0.3508.0 using Mac 10.13.1, Ubuntu 17.10 and Windows 10.


Bisect Information:
-----------------------
Good Build: 69.0.3487.0
Bad Build:  69.0.3488.0

Change log from Omahaproxy:
https://chromium.googlesource.com/chromium/src/+log/69.0.3487.0..69.0.3488.0?pretty=fuller&n=10000
Suspecting: https://chromium.googlesource.com/chromium/src/+/e9126f1d03725c2ae97d524985971d66089eede3
Review URL: https://chromium-review.googlesource.com/1111818

@Jungshik Shin: Please help in assigning it to the right owner if this is not related to your change. 
Note: Adding RB-Stable for M-69 as this seems to be a recent regression, please remove if not required.
Components: -Blink>Fonts Blink>CSS
Status: WontFix (was: Assigned)
You got empty boxes because you don't have a font to cover newly encoded Georgian uppercase letters. 

re comment 2:

Who said that the uppercase of U+10E5 is U+10B5?  Its uppercase is U+1CA5 as Chrome correctly gives you. 

See https://www.unicode.org/charts/PDF/U10A0.pdf  
The above PDF has the following:

1. U+10A0 - U+10C5
Capital letters (Khutsuri) 

This is the uppercase of the old ecclesiastical alphabet. The
style shown in the code charts is known as Asomtavruli. See
the Georgian Supplement block for lowercase Nuskhuri.

2. U+10D0 - U+10F0
Mkhedruli
This is the lowercase of the modern secular alphabet. Modern
Georgian orthography uses these letters for most text,
including at the beginnings of sentences and names. See the
Georgian Extended block for uppercase Mtavruli.

See also https://www.unicode.org/charts/PDF/U1C90.pdf that has the following:
U+1C90 - U+1CBF
Capital letters (Mtavruli)
This is the special uppercase of the modern secular alphabet.
Modern Georgian orthography uses these letters to
emphasize words and phrases analogously to Latin "all caps"
style. See the Georgian block for lowercase Mkhedruli.
Labels: -Type-Bug-Regression Type-Bug
See also http://unicode.org/versions/Unicode11.0.0/  for Georgian changes. 
Labels: -ReleaseBlock-Stable -M-69 -RegressedIn-69 -Target-69 -Target-70 -Needs-Triage-M70
Components: -Blink>CSS Blink>JavaScript>Internationalization

Sign in to add a comment