New issue
Advanced search Search tips

Issue 747649 link

Starred by 2 users

Issue metadata

Status: WontFix
Owner:
Closed: Jul 2017
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug



Sign in to add a comment

maxlength attribute on text inputs don't handle surrogate pair characters properly

Reported by edwardke...@gmail.com, Jul 22 2017

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3155.0 Safari/537.36

Steps to reproduce the problem:
1. Create a textarea element with maxlength=1
2. Paste 👨  into the textarea

What is the expected behavior?
The character 👨 to be pasted in the textarea

What went wrong?
The character 👨 is not pasted in the textarea

Did this work before? N/A 

Chrome version: 61.0.3155.0  Channel: stable
OS Version: OS X 10.12.1
Flash Version: 

The character 👨 is a surrogate pair in UTF-16, however, JavaScript strings are counted by the number of the 16-bits of UTF-16, not the number of the code point, which makes the browser count the input as 2 character long string, and fails to paste it to the textarea.

This behavior becomes a bit tricky when you want to control string inputs that may contain special Chinese characters in Japanese such as ð ®· as well as most of the emojis. This is acceptable when you have access to APIs that can handle the string by its code points like using spread operator to the string or the unicode flag of the regex match function. Yet, for DOM attributes like maxlength or minlength, users do not have any control over this behavior and it does not reflect how characters should be counted by its code point.
 
Below is the WebIDL specification about UTF-16 strings and code points and it says

https://www.w3.org/TR/WebIDL-1/#dfn-obtain-unicode

> The DOMString type corresponds to the set of all possible sequences of code units. Such sequences are commonly interpreted as UTF-16 encoded strings [RFC2781] although this is not required

However, I am not 100% sure what this means, whether JavaScript string SHOULD be counted based by its code point or not. Also looking at the LayoutTest of Chromium, the current behaviors seems to be expected but not a bug, but I still think there's a space for consideration since it is problematic emojis being counted more than 1 character. 

https://cs.chromium.org/chromium/src/third_party/WebKit/LayoutTests/fast/forms/textarea/textarea-maxlength.html?q=maxlength&dr=C&l=10

Comment 2 by kojii@chromium.org, Jul 24 2017

Components: -Blink Blink>Forms>Text
Status: Untriaged (was: Unconfirmed)

Comment 3 by tkent@chromium.org, Jul 24 2017

Owner: tkent@chromium.org
Status: WontFix (was: Untriaged)
This works as expected according to the current specification.

We know the current behavior doesn't work well in many cases.
There is a specification issue: https://github.com/whatwg/html/issues/1467

Thanks a lot for the link, will follow the discussion there. 

Sign in to add a comment