New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2014
Cc:
HW: ----
NextAction: ----
OS: ----
Priority: 2
Type: Bug



Sign in to add a comment

Supplementary characters are allowed as identifier names in ES6

Project Member Reported by yangguo@chromium.org, Oct 8 2014

Issue description

ES6 allows supplementary characters (i.e. characters with code point > 0xFFFF)
in identifiers, while ES5 doesn’t. See
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-names-and-keywords,
and note how `IdentifierStart` refers to `UnicodeIDStart` which is defined as
‘any Unicode code point with the Unicode property “ID_Start” or
“Other_ID_Start”’ — this doesn’t exclude supplementary code points. But ES5 did:
https://bugs.ecmascript.org/show_bug.cgi?id=469#c0 The same thing goes for
`IdentifierPart` and `UnicodeIDContinue`.

To illustrate the difference in code: compare
https://gist.github.com/mathiasbynens/6334847#file-javascript-identifier-rege...
(ES5) with
https://gist.github.com/mathiasbynens/6334847#file-javascript-identifier-rege...
(ES6).
 
The main blocker here is that unibrow, our unicode tables generator script, only supports BMP, so its tables don't support supplementary planes. Not sure how much work it is to solve that issue, and how much it would bloat up the unicode tables.

On a lighter note, U+1F4A9 is not allowed as identifier name anyways, so at least we are not missing out on much :)
Project Member

Comment 4 by bugdroid1@chromium.org, Oct 10 2014

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc

commit 0dd69ec4392844c08ff62ab9294d9e9d8425cebc
Author: yangguo@chromium.org <yangguo@chromium.org>
Date: Fri Oct 10 07:13:46 2014

Allow identifier code points from supplementary multilingual planes.

ES5.1 section 6 ("Source Text"):
"Throughout the rest of this document, the phrase “code unit” and the
word “character” will be used to refer to a 16-bit unsigned value
used to represent a single 16-bit unit of text."

This changed in ES6 draft section 10.1 ("Source Text"):
"The ECMAScript code is expressed using Unicode, version 5.1 or later.
ECMAScript source text is a sequence of code points. All Unicode code
point values from U+0000 to U+10FFFF, including surrogate code points,
may occur in source text where permitted by the ECMAScript grammars."

This patch is to reflect this spec change.

BUG= v8:3617 
LOG=Y
R=jochen@chromium.org

Review URL: https://codereview.chromium.org/640193002

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00

[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/BUILD.gn
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/char-predicates.cc
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/char-predicates.h
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/scanner.h
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/intl/general/smp-identifier.js
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/mjsunit/parse-surrogates.js
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/unittests/unicode/unicode-predicates-unittest.cc
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/tools/gyp/v8.gyp

Project Member

Comment 5 by bugdroid1@chromium.org, Oct 10 2014

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc

commit 0dd69ec4392844c08ff62ab9294d9e9d8425cebc
Author: yangguo@chromium.org <yangguo@chromium.org>
Date: Fri Oct 10 07:13:46 2014

Allow identifier code points from supplementary multilingual planes.

ES5.1 section 6 ("Source Text"):
"Throughout the rest of this document, the phrase “code unit” and the
word “character” will be used to refer to a 16-bit unsigned value
used to represent a single 16-bit unit of text."

This changed in ES6 draft section 10.1 ("Source Text"):
"The ECMAScript code is expressed using Unicode, version 5.1 or later.
ECMAScript source text is a sequence of code points. All Unicode code
point values from U+0000 to U+10FFFF, including surrogate code points,
may occur in source text where permitted by the ECMAScript grammars."

This patch is to reflect this spec change.

BUG= v8:3617 
LOG=Y
R=jochen@chromium.org

Review URL: https://codereview.chromium.org/640193002

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00

[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/BUILD.gn
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/char-predicates.cc
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/char-predicates.h
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/src/scanner.h
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/intl/general/smp-identifier.js
[add] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/mjsunit/parse-surrogates.js
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/test/unittests/unicode/unicode-predicates-unittest.cc
[modify] https://chromium.googlesource.com/v8/v8.git/+/0dd69ec4392844c08ff62ab9294d9e9d8425cebc/tools/gyp/v8.gyp

Status: Fixed
Labels: Priority-2

Sign in to add a comment