New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 7 users
Status: Fixed
Owner:
Closed: Nov 11
Cc:
HW: ----
OS: ----
Priority: 2
Type: FeatureRequest


Sign in to add a comment
Implement unicode regexp property class
Project Member Reported by yangguo@chromium.org, Feb 9 2016 Back to list
This is currently stage0, but that doesn't prohibit us from experimenting, right? Property classes have the syntax:
/\p{<property alias>}/u
or negated
/\P{<property alias>}/u
and only work in combination with the unicode flag.

It's not entirely clear what <property alias> can be. It should work for character categories:
http://www.fileformat.info/info/unicode/category/index.htm


Spec draft here: https://github.com/tc39/proposal-regexp-unicode-property-escapes

 
Labels: -Type-Bug Type-FeatureRequest
Project Member Comment 4 by bugdroid1@chromium.org, Mar 7 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89

commit 7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89
Author: yangguo <yangguo@chromium.org>
Date: Mon Mar 07 18:11:55 2016

Revert of [regexp] extend property classes by script category. (patchset #1 id:1 of https://codereview.chromium.org/1774513002/ )

Reason for revert:
wrong noi18n expectations

Original issue's description:
> [regexp] extend property classes by script category.
>
> R=littledan@chromium.org
> BUG= v8:4743 
> LOG=N
>
> Committed: https://crrev.com/22f6735ccbe2e341d341e61b9c38ce308b8da655
> Cr-Commit-Position: refs/heads/master@{#34553}

TBR=littledan@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= v8:4743 

Review URL: https://codereview.chromium.org/1772563003

Cr-Commit-Position: refs/heads/master@{#34554}

[modify] https://crrev.com/7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89/src/char-predicates-inl.h
[modify] https://crrev.com/7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89/src/char-predicates.h
[modify] https://crrev.com/7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89/src/regexp/regexp-parser.cc
[delete] https://crrev.com/22f6735ccbe2e341d341e61b9c38ce308b8da655/test/mjsunit/harmony/regexp-property-script-category.js
[rename] https://crrev.com/7dc13c2a8cc44c0ec58e31caaa17653e9d2e5a89/test/mjsunit/harmony/unicode-regexp-property-class.js

Comment 10 by math...@qiwi.be, May 20 2016
Cc: math...@qiwi.be
Comment 11 Deleted
Comment 12 by math...@qiwi.be, Jun 11 2016
There is now a formal proposal to add this functionality to the language: https://github.com/mathiasbynens/es-regexp-unicode-property-escapes
Comment 13 by phistuck@gmail.com, Jun 11 2016
Cool! Add it to the stage 0 list?
https://github.com/tc39/proposals/blob/master/stage-0-proposals.md
Project Member Comment 15 by bugdroid1@chromium.org, Jun 14 2016
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/3fe12ef8fab76dfeeab6b568067bd9be8d8a6b37

commit 3fe12ef8fab76dfeeab6b568067bd9be8d8a6b37
Author: yangguo <yangguo@chromium.org>
Date: Tue Jun 14 13:58:53 2016

Revert of [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}. (patchset #3 id:40001 of https://codereview.chromium.org/2059113002/ )

Reason for revert:
compile failure

Original issue's description:
> [regexp] implement \p{Any}, \p{Ascii}, and \p{Assigned}.
>
> R=littledan@chromium.org, mathias@qiwi.be
> BUG= v8:4743 
>
> Committed: https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4
> Cr-Commit-Position: refs/heads/master@{#36969}

TBR=littledan@chromium.org,mathias@qiwi.be
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG= v8:4743 

Review-Url: https://codereview.chromium.org/2065083002
Cr-Commit-Position: refs/heads/master@{#36970}

[modify] https://crrev.com/3fe12ef8fab76dfeeab6b568067bd9be8d8a6b37/src/regexp/regexp-parser.cc
[modify] https://crrev.com/3fe12ef8fab76dfeeab6b568067bd9be8d8a6b37/src/regexp/regexp-parser.h
[modify] https://crrev.com/3fe12ef8fab76dfeeab6b568067bd9be8d8a6b37/test/mjsunit/harmony/regexp-property-general-category.js
[delete] https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4/test/mjsunit/harmony/regexp-property-lu-ui.js
[delete] https://crrev.com/92bfd13457c80f02be01551f4ea9a5badfe0e4c4/test/mjsunit/harmony/regexp-property-special.js

Project Member Comment 17 by bugdroid1@chromium.org, Nov 18 2016
Labels: Priority-2
Cc: js...@chromium.org
The current spec [0] requires recognizing Other_ID_Start and Other_ID_Continue as binary property classes.

ICU 57 doesn't have a UProperty for Other_ID_Start and Other_ID_Continue. Instead, it seems to consider these part of ID_Start and ID_Continue respectively. For example 

I'm hesitant to just include the definitions of these two binary property classes directly in V8. Is there anything we can do in ICU?

[0] https://github.com/tc39/proposal-regexp-unicode-property-escapes
Comment 20 by math...@qiwi.be, Apr 7 2017
Some background:

> The `Other_ID_Start` property provides a small list of characters that
> qualified as `ID_Start` characters in some previous version of Unicode
> solely on the basis of their `General_Category` properties, but that no
> longer qualify in the current version. These are called “grandfathered
> characters”.

— http://unicode.org/reports/tr31/#Backward_Compatibility

As such, I’d rather not remove these potentially useful properties from the proposal, unless they prove to be painful to implement or require hardcoding.
Comment 21 by math...@qiwi.be, Apr 7 2017
Here are the code points matching these properties:

$ node
> require('unicode-9.0.0/Binary_Property/Other_ID_Start/code-points.js')
[0x1885, 0x1886, 0x2118, 0x212E, 0x309B, 0x309C]
> require('unicode-9.0.0/Binary_Property/Other_ID_Continue/code-points.js')
[0xB7, 0x387, 0x1369, 0x136A, 0x136B, 0x136C, 0x136D, 0x136E, 0x136F, 0x1370, 0x1371, 0x19DA]

These lists are smaller than I’d expected. Still, deferring to ICU somehow would be preferable to hardcoding them.
Project Member Comment 22 by bugdroid1@chromium.org, Apr 7 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/749c00ef454d08807d659ebcbfb92169e2d1dd06

commit 749c00ef454d08807d659ebcbfb92169e2d1dd06
Author: yangguo <yangguo@chromium.org>
Date: Fri Apr 07 18:35:30 2017

[regexp] stage --harmony-regexp-property

R=adamk@chromium.org
BUG= v8:4743 

Review-Url: https://codereview.chromium.org/2801263002
Cr-Commit-Position: refs/heads/master@{#44493}

[modify] https://crrev.com/749c00ef454d08807d659ebcbfb92169e2d1dd06/src/flag-definitions.h

Project Member Comment 23 by bugdroid1@chromium.org, Apr 10 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/582921454d593607cf1c49322f2b412fe7002a2b

commit 582921454d593607cf1c49322f2b412fe7002a2b
Author: yangguo <yangguo@chromium.org>
Date: Mon Apr 10 10:19:01 2017

[regexp] add more tests for binary property classes.

R=jgruber@chromium.org
BUG= v8:4743 

Review-Url: https://codereview.chromium.org/2803693006
Cr-Commit-Position: refs/heads/master@{#44512}

[modify] https://crrev.com/582921454d593607cf1c49322f2b412fe7002a2b/test/mjsunit/harmony/regexp-property-binary.js

Project Member Comment 24 by bugdroid1@chromium.org, Apr 11 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/26e5d0129ca501e4dbadaa098e65222687ce4ab0

commit 26e5d0129ca501e4dbadaa098e65222687ce4ab0
Author: yangguo <yangguo@chromium.org>
Date: Tue Apr 11 07:10:33 2017

[regexp] implement \p{Other_ID_Start} and \p{Other_ID_Continue}.

Other_ID_Start and Other_ID_Continue are not supported by ICU, so for
now we implement these manually as special binary property classes.

R=jgruber@chromium.org
BUG= v8:4743 

Review-Url: https://codereview.chromium.org/2808803002
Cr-Commit-Position: refs/heads/master@{#44549}

[modify] https://crrev.com/26e5d0129ca501e4dbadaa098e65222687ce4ab0/src/regexp/regexp-parser.cc
[modify] https://crrev.com/26e5d0129ca501e4dbadaa098e65222687ce4ab0/test/mjsunit/harmony/regexp-property-special.js

Comment 25 by math...@qiwi.be, Apr 11 2017
Per https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/27 deprecated binary properties and contributory binary properties (such as `Other_ID_Start` and `Other_ID_Continue`) have been removed from the proposal.

`Composition_Exclusion` and `Prepended_Concatenation_Mark` might get removed, too, once I figure out why ICU doesn’t support them.

Patch: https://codereview.chromium.org/2809143003/
Project Member Comment 26 by bugdroid1@chromium.org, Apr 11 2017
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/f956279ed618f026cec357b4a36f0efcc77e2fd6

commit f956279ed618f026cec357b4a36f0efcc77e2fd6
Author: mathias <mathias@qiwi.be>
Date: Tue Apr 11 12:40:41 2017

[regexp] remove \p{Other_ID_Start} and \p{Other_ID_Continue}

The spec proposal has been updated to drop contributory binary
properties such as `Other_ID_Start` and `Other_ID_Continue`.

This patch reverts commit 26e5d0129ca501e4dbadaa098e65222687ce4ab0 and
adds tests to ensure these properties are not supported.

R=
BUG= v8:4743 

Review-Url: https://codereview.chromium.org/2809143003
Cr-Commit-Position: refs/heads/master@{#44569}

[modify] https://crrev.com/f956279ed618f026cec357b4a36f0efcc77e2fd6/src/regexp/regexp-parser.cc
[modify] https://crrev.com/f956279ed618f026cec357b4a36f0efcc77e2fd6/test/mjsunit/harmony/regexp-property-special.js

Comment 28 by math...@qiwi.be, Apr 15 2017
Exhaustive tests for all supported properties (and their aliases), values (and their aliases) against all Unicode code points: https://github.com/tc39/test262/tree/master/test/built-ins/RegExp/property-escapes

V8’s implementation currently fails the following:

```
FAIL output/Emoji_Component.js
 Expected no error, got SyntaxError: Invalid regular expression: /^\p{Emoji_Component}+$/: Invalid property name
```

This failure was expected since ICU currently doesn’t support `Emoji_Component`. However, it will likely be included in ICU 60: https://ssl.icu-project.org/trac/ticket/13062

```
FAIL output/Emoji.js
  `\p{Emoji}` should match U+01F9E6 (`🧦`)

FAIL output/Emoji_Presentation.js
  `\p{Emoji_Presentation}` should match U+01F9E6 (`🧦`)

FAIL output/Emoji_Modifier_Base.js
  `\P{Emoji_Modifier_Base}` should match U+01F93C (`🤼`)
```

These can be fixed by updating to ICU 59.1, which uses Emoji 5.0 data.

Thanks Mathias!

V8 follows Chrome wrt ICU version upgrades, so I guess we will just wait.

What about Composition_Exclusion and Prepended_Concatenation_Mark? Does Test262 not test them?
Comment 30 by math...@qiwi.be, Apr 15 2017
https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/27 has some background on this, but TL;DR:

1. `Prepended_Concatenation_Mark` has been removed from the proposal because it’s an informative property rather than a normative one.

2. `Full_Composition_Exclusion` and `Composition_Exclusion` have been removed from the proposal because even UTS18 RL2.7 dropped them in revision 19. http://unicode.org/reports/tr18/#Modifications

I have an open PR to remove those three test files from Test262: https://github.com/tc39/test262/pull/980. The latest version of the tests and the script to generate them can always be found here: https://github.com/mathiasbynens/unicode-property-escapes-tests
Blockedon: chromium:711901
Regarding Full_Composition_Exclusion: would it be a spec violation to parse it?
Comment 33 by math...@qiwi.be, Apr 15 2017
Yes, it would — https://tc39.github.io/proposal-regexp-unicode-property-escapes/#sec-static-semantics-unicodematchproperty-p says:

> To ensure interoperability, implementations must not extend Unicode property support to the remaining enumeration properties.

(I plan to contribute a few manually-written tests to Test262 that assert those properties throw, among other things.)
Comment 34 by js...@chromium.org, Apr 17 2017
Blockedon: chromium:699469
Project Member Comment 35 by bugdroid1@chromium.org, Apr 18 2017
Comment 36 by math...@qiwi.be, May 1 2017
Blockedon: 896 2728
Note that any SyntaxErrors in Unicode RegExp property escapes should be early errors instead of runtime errors.
Comment 37 by js...@chromium.org, May 15 2017
re: Other_ID_Start and Other_ID_Continue

I see little value in supporting them. 

If they're necessary for some hard-to-imagine reasons, Other_ID_Start can be derived by subtracting [:L:][:Nl:] from [:ID_Start:]]. 

Other_ID_Continue can be derived subtracting [[:ID_Start:][:Mn:][:Mc:][:Nd:][:Pc:] from [:ID_Continue:].  

( and adding [:Pattern_Syntax:] and [:Pattern_WhiteSpace:] ). 

http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers


Comment 38 by js...@chromium.org, May 15 2017
Sorry I wrote comment without reading all the comments. It's good that Other_ID_* were dropped from the spec. 

Anyway, ICU in Chromium's tree was updated to 59.1. Note that not all Emoji-related properties are supported in ICU 59.1 (the release cycles of Emoji, Unicode and ICU are kinda 'tangled' up. They are expected to be harmonized in the future). 
Comment 39 by math...@qiwi.be, May 16 2017
@jshin, is there a tracking bug for ICU 60 already? Perhaps we can reuse https://bugs.chromium.org/p/chromium/issues/detail?id=711901 for that.
ICU 60 will not be released until October. 
Anyway, Emoji_Component(?)-related properties support was added to ICU trunk. 
If we really want to support them now, we can merge those changes to our copy of ICU. 

Mathias, what if we just remove Emoji_Component for the initial version of RegExp Unicode property escapes, and add it in in a future revision? This could let us get the overall feature to users faster and reduce the burden on other engines who are trying to implement it.
Did any other implementers mark `Emoji_Component` as a blocker? Not everyone is using ICU, right?

So far the open bugs tracking the implementation have received little attention: https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/4#issue-159392455
> Not everyone is using ICU, right?

Chrome, Firefox and Safari do.  (I suspect Edge is moving in that direction as well). 
From <https://github.com/tc39/proposal-regexp-unicode-property-escapes/issues/29>:

    > 'a-'.match(/[\p{Script=Latin}-\p{Script=Cyrillic}]/gu)
    [ 'a', '-' ]

This should throw.
1. Is there a way to programmatically get the list of supported Scripts/Script_Extensions?

2. Is there a way to programmatically get the Scripts/Script_Extensions a code point belongs to?
There is no built-in API for either of those.
#45 - you can propose them as additions to the ECMAScript Internationalization API perhaps.
In Node.js, Intl (ICU) has only English locale by default, while all the RegExp Scripts/Script_Extensions are supported in this restricted variant. It seems it would be better to expose already available data somehow.
Cc: jgruber@chromium.org
Description: Show this description
Blockedon: chromium:766816
Chrome ToT now has ICU 60.1 with Emoji_Component support. v8 will pick it up very soon (within a day). 
With the new ICU,

    d8> /\p{Script_Extensions=Bopomofo}/u.test('\u3012')
    true

This seems like a bug per https://unicode.org/Public/10.0.0/ucd/ScriptExtensions.txt.
The above issue with U+3012 repeats for Script_Extensions={Bopomofo,Hangul,Hiragana,Katakana}.

Additionally, `\p{Script_Extensions=Yi}` now matches U+3000 (and consequently, `\p{Script_Extensions=Common}` no longer matches it), which also seems incorrect.

Spotted in this try run: https://build.chromium.org/p/tryserver.v8/builders/v8_linux64_asan_rel_ng_triggered/builds/29053/steps/Test262

Project Member Comment 56 by bugdroid1@chromium.org, Nov 8
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/63536799acf2962b6064d608e180971ca5bc88f0

commit 63536799acf2962b6064d608e180971ca5bc88f0
Author: Mathias Bynens <mathias@chromium.org>
Date: Wed Nov 08 16:35:50 2017

[regexp] Support Emoji_Component property class

This patch adds support for Emoji_Component within Unicode property
escapes in regular expressions.

The Emoji_Component binary property was added in Emoji data v5
and is supported in ICU 60.1.

An `#if` directive is used to prevent breaking Node.js until they
update their ICU.

BUG= v8:4743 

Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: If1b49a4c175e88f1840ca5ef8d57829d6d8c3291
Reviewed-on: https://chromium-review.googlesource.com/758261
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49229}
[modify] https://crrev.com/63536799acf2962b6064d608e180971ca5bc88f0/src/regexp/regexp-parser.cc
[modify] https://crrev.com/63536799acf2962b6064d608e180971ca5bc88f0/test/mjsunit/harmony/regexp-property-binary.js
[modify] https://crrev.com/63536799acf2962b6064d608e180971ca5bc88f0/test/test262/test262.status

Project Member Comment 57 by bugdroid1@chromium.org, Nov 8
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/4cc8e1d9bc7527507d4c6d67f06636aebdc2c4d4

commit 4cc8e1d9bc7527507d4c6d67f06636aebdc2c4d4
Author: Mathias Bynens <mathias@chromium.org>
Date: Wed Nov 08 17:10:30 2017

[test] Enable passing Unicode property class tests

Instead of disabling the entire RegExp Unicode property escape test
suite, this patch explicitly lists the failing tests and only disables
them.

BUG= v8:4743 

Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: If398eb4fcc8d9d96279dc5afb29489e699744d9f
Reviewed-on: https://chromium-review.googlesource.com/758757
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49230}
[modify] https://crrev.com/4cc8e1d9bc7527507d4c6d67f06636aebdc2c4d4/test/test262/test262.status

> Spotted in this try run: https://build.chromium.org/p/tryserver.v8/builders/v8_linux64_asan_rel_ng_triggered/builds/29053/steps/Test262

As you found out in http://bugs.icu-project.org/trac/ticket/13466, it's fixed in ICU ToT.  I'll cherry pick that fix to Chrome's copy of ICU. 

BTW, I wonder why the test in question didn't fail when I tried ICU 60 at https://chromium-review.googlesource.com/c/v8/v8/+/731764 .  Perhaps, Test262 was updated after that? 
Oh..  'built-ins/RegExp/property-escapes/generated/*' had been skipped until you enabled it. 
Project Member Comment 60 by bugdroid1@chromium.org, Nov 9
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/9237c08dd22ba057789a3dfab101e723a965735b

commit 9237c08dd22ba057789a3dfab101e723a965735b
Author: Mathias Bynens <mathias@chromium.org>
Date: Thu Nov 09 07:52:35 2017

[regexp] Support Regional_Indicator property class

This patch adds support for Regional_Indicator within Unicode property
escapes in regular expressions.

The Regional_Indicator binary property was added in Unicode v10 and is
supported in ICU 60.1.

An `#if` directive is used to prevent breaking Node.js until they
update their ICU.

BUG= v8:4743 

Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: I7acec13c8ae7552558a0f852937984bba828e738
Reviewed-on: https://chromium-review.googlesource.com/758273
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49250}
[modify] https://crrev.com/9237c08dd22ba057789a3dfab101e723a965735b/src/regexp/regexp-parser.cc
[modify] https://crrev.com/9237c08dd22ba057789a3dfab101e723a965735b/test/mjsunit/harmony/regexp-property-binary.js
[modify] https://crrev.com/9237c08dd22ba057789a3dfab101e723a965735b/test/test262/test262.status

Project Member Comment 61 by bugdroid1@chromium.org, Nov 9
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0db90bc5270a21b60c36262de75015c5bddff072

commit 0db90bc5270a21b60c36262de75015c5bddff072
Author: Mathias Bynens <mathias@chromium.org>
Date: Thu Nov 09 09:09:25 2017

[regexp] Include unicode/uvernum.h in parser

This patch explicitly includes unicode/uvernum.h in the regular
expression parser.

It should be removed once we no longer need to check
`U_ICU_VERSION_MAJOR_NUM` during preprocessing, i.e. once Node.js
updates their ICU. This is an ongoing effort:
https://github.com/nodejs/node/pull/16876

BUG= v8:4743 

Change-Id: I3cd9447b481249a9035d9fb00745057da8809c58
Reviewed-on: https://chromium-review.googlesource.com/758407
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49253}
[modify] https://crrev.com/0db90bc5270a21b60c36262de75015c5bddff072/src/regexp/regexp-parser.cc

Project Member Comment 62 by bugdroid1@chromium.org, Nov 9
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/6aa3349afb306364ccf6709cbdf0698e76e89eca

commit 6aa3349afb306364ccf6709cbdf0698e76e89eca
Author: Yang Guo <yangguo@chromium.org>
Date: Thu Nov 09 09:16:47 2017

Revert "[regexp] Include unicode/uvernum.h in parser"

This reverts commit 0db90bc5270a21b60c36262de75015c5bddff072.

Reason for revert: https://build.chromium.org/p/client.v8/builders/V8%20Linux%20-%20noi18n%20-%20debug/builds/17335

You need to also check whether i18n is on, e.g. #ifdef V8_INTL_SUPPORT.

Original change's description:
> [regexp] Include unicode/uvernum.h in parser
> 
> This patch explicitly includes unicode/uvernum.h in the regular
> expression parser.
> 
> It should be removed once we no longer need to check
> `U_ICU_VERSION_MAJOR_NUM` during preprocessing, i.e. once Node.js
> updates their ICU. This is an ongoing effort:
> https://github.com/nodejs/node/pull/16876
> 
> BUG= v8:4743 
> 
> Change-Id: I3cd9447b481249a9035d9fb00745057da8809c58
> Reviewed-on: https://chromium-review.googlesource.com/758407
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Commit-Queue: Mathias Bynens <mathias@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#49253}

TBR=yangguo@chromium.org,jshin@chromium.org,jgruber@chromium.org,mathias@chromium.org

Change-Id: I58d6b7a49b707c97153b8b0aec141248f5c669e1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  v8:4743 
Reviewed-on: https://chromium-review.googlesource.com/759777
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49255}
[modify] https://crrev.com/6aa3349afb306364ccf6709cbdf0698e76e89eca/src/regexp/regexp-parser.cc

Project Member Comment 64 by bugdroid1@chromium.org, Nov 9
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/bc4bd08e63d7a39c814d725c64fde070f2cdc399

commit bc4bd08e63d7a39c814d725c64fde070f2cdc399
Author: Mathias Bynens <mathias@chromium.org>
Date: Thu Nov 09 09:49:25 2017

Reland "[regexp] Include unicode/uvernum.h in parser"

This is a reland of 0db90bc5270a21b60c36262de75015c5bddff072
Original change's description:
> [regexp] Include unicode/uvernum.h in parser
>
> This patch explicitly includes unicode/uvernum.h in the regular
> expression parser.
>
> It should be removed once we no longer need to check
> `U_ICU_VERSION_MAJOR_NUM` during preprocessing, i.e. once Node.js
> updates their ICU. This is an ongoing effort:
> https://github.com/nodejs/node/pull/16876
>
> BUG= v8:4743 
>
> Change-Id: I3cd9447b481249a9035d9fb00745057da8809c58
> Reviewed-on: https://chromium-review.googlesource.com/758407
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Commit-Queue: Mathias Bynens <mathias@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#49253}

Bug:  v8:4743 
Change-Id: Id3f375f27fb5eaa4129884f99095d16763bd6e86
Reviewed-on: https://chromium-review.googlesource.com/758861
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49260}
[modify] https://crrev.com/bc4bd08e63d7a39c814d725c64fde070f2cdc399/src/regexp/regexp-parser.cc

Project Member Comment 65 by bugdroid1@chromium.org, Nov 9
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fff82867091b079c72e3d00d40b52b2dc133871b

commit fff82867091b079c72e3d00d40b52b2dc133871b
Author: Jungshik Shin <jshin@chromium.org>
Date: Thu Nov 09 12:29:01 2017

Roll ICU to 5ed26985

There's only one change in the roll:

  https://chromium.googlesource.com/chromium/deps/icu/+/5ed26985

TBR=mathias@chromium.org

Bug:  v8:4743 
Test: v8: test262/built-ins/RegExp/property-escapes/generated/*
Change-Id: I43fdb5f639605b0965d2b39b3bbb957103194268
Reviewed-on: https://chromium-review.googlesource.com/760243
Commit-Queue: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Jungshik Shin <jshin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515148}
[modify] https://crrev.com/fff82867091b079c72e3d00d40b52b2dc133871b/DEPS

You should not need to include unicode/uvernum.h directly as long as you include *any* public ICU header. They all end up including several core-definition headers including this one.
Project Member Comment 67 by bugdroid1@chromium.org, Nov 10
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/54637463a32e3a5ab62750eda1d17b16fd0b8d09

commit 54637463a32e3a5ab62750eda1d17b16fd0b8d09
Author: Mathias Bynens <mathias@chromium.org>
Date: Fri Nov 10 09:09:20 2017

[test] Re-enable all Unicode property tests

The DEPS roll in f3a2e34d updated ICU to a version that includes
a fix for https://ssl.icu-project.org/trac/ticket/13462. As a
result, our Script_Extension data is now correct again.

This patch re-enables the Test262 tests that were failing due to this
ICU data bug.

BUG= v8:4743 

Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: If0f08693ed0355c59b9c02aa6d941dab1588431c
Reviewed-on: https://chromium-review.googlesource.com/761616
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49284}
[modify] https://crrev.com/54637463a32e3a5ab62750eda1d17b16fd0b8d09/test/test262/test262.status

Project Member Comment 68 by bugdroid1@chromium.org, Nov 11
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/6f890fd5b212eb441a4b87364922d5286fdc505e

commit 6f890fd5b212eb441a4b87364922d5286fdc505e
Author: Mathias Bynens <mathias@chromium.org>
Date: Sat Nov 11 09:39:25 2017

Ship RegExp Unicode property escapes

Intent to ship:
https://groups.google.com/d/msg/v8-users/isa2JrstEbE/i6UvFO7RBwAJ

BUG= v8:4743 

Change-Id: I62cdc6fa1f114fc8be26c5cf354c45a15ce49ab7
Reviewed-on: https://chromium-review.googlesource.com/761776
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49312}
[modify] https://crrev.com/6f890fd5b212eb441a4b87364922d5286fdc505e/src/flag-definitions.h

Status: Fixed
Project Member Comment 70 by bugdroid1@chromium.org, Nov 11
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/30d50b3cc7d5126adc7600b41bbd15dcf7841174

commit 30d50b3cc7d5126adc7600b41bbd15dcf7841174
Author: Michael Achenbach <machenbach@chromium.org>
Date: Sat Nov 11 12:59:58 2017

Revert "Ship RegExp Unicode property escapes"

This reverts commit 6f890fd5b212eb441a4b87364922d5286fdc505e.

Reason for revert: Breaks layout test:
https://build.chromium.org/p/client.v8.fyi/builders/V8-Blink%20Linux%2064/builds/19671

https://github.com/v8/v8/wiki/Blink-layout-tests

Original change's description:
> Ship RegExp Unicode property escapes
> 
> Intent to ship:
> https://groups.google.com/d/msg/v8-users/isa2JrstEbE/i6UvFO7RBwAJ
> 
> BUG= v8:4743 
> 
> Change-Id: I62cdc6fa1f114fc8be26c5cf354c45a15ce49ab7
> Reviewed-on: https://chromium-review.googlesource.com/761776
> Reviewed-by: Adam Klein <adamk@chromium.org>
> Commit-Queue: Mathias Bynens <mathias@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#49312}

TBR=adamk@chromium.org,yangguo@chromium.org,mathias@chromium.org

Change-Id: I5e09070582cf247f61429a1d45ae5faa3d21518e
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  v8:4743 
Reviewed-on: https://chromium-review.googlesource.com/765348
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49313}
[modify] https://crrev.com/30d50b3cc7d5126adc7600b41bbd15dcf7841174/src/flag-definitions.h

Project Member Comment 71 by bugdroid1@chromium.org, Nov 11
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9fcebbe1b25bd4ac2a51aef552e43f15f7850817

commit 9fcebbe1b25bd4ac2a51aef552e43f15f7850817
Author: Mathias Bynens <mathias@chromium.org>
Date: Sat Nov 11 15:55:45 2017

Temporarily disable <input pattern> tests

This patch temporarily disables some tests in preparation for shipping
RegExp Unicode property escapes in V8.

The tests can be re-enabled once that feature’s shipped and the error
messages in the test have been updated.

BUG= v8:4743 

Change-Id: I60e18aae01018b22842e6b24812a9f9bc5afb97b
Reviewed-on: https://chromium-review.googlesource.com/765328
Reviewed-by: Peter Beverloo <peter@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#515844}
[modify] https://crrev.com/9fcebbe1b25bd4ac2a51aef552e43f15f7850817/third_party/WebKit/LayoutTests/TestExpectations

Project Member Comment 72 by bugdroid1@chromium.org, Nov 11
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/2feb99dc8ac75f20d2e5c9c1b343e923476851ea

commit 2feb99dc8ac75f20d2e5c9c1b343e923476851ea
Author: Mathias Bynens <mathias@chromium.org>
Date: Sat Nov 11 17:18:45 2017

Reland "Ship RegExp Unicode property escapes" 🔥

This is a reland of 6f890fd5b212eb441a4b87364922d5286fdc505e
Original change's description:
> Ship RegExp Unicode property escapes
>
> Intent to ship:
> https://groups.google.com/d/msg/v8-users/isa2JrstEbE/i6UvFO7RBwAJ
>
> BUG= v8:4743 
>
> Change-Id: I62cdc6fa1f114fc8be26c5cf354c45a15ce49ab7
> Reviewed-on: https://chromium-review.googlesource.com/761776
> Reviewed-by: Adam Klein <adamk@chromium.org>
> Commit-Queue: Mathias Bynens <mathias@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#49312}

Bug:  v8:4743 
Change-Id: I6618242b147472a9a083573299ed3b3dda5ac6ed
Reviewed-on: https://chromium-review.googlesource.com/764908
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49314}
[modify] https://crrev.com/2feb99dc8ac75f20d2e5c9c1b343e923476851ea/src/flag-definitions.h

Project Member Comment 73 by bugdroid1@chromium.org, Nov 14
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c7f24fba3104b579b1fadde313ece93f6dc334b9

commit c7f24fba3104b579b1fadde313ece93f6dc334b9
Author: Mathias Bynens <mathias@chromium.org>
Date: Tue Nov 14 11:24:38 2017

Re-enable <input pattern> tests

9fcebbe1b25bd4ac2a51aef552e43f15f7850817 temporarily disabled these
tests in preparation for shipping RegExp Unicode property escapes in
V8.

Now that Unicode property escapes have shipped, the test expectations
can be updated accordingly, and the tests can be re-enabled.

BUG= v8:4743 

Change-Id: I6fc3add3692dd1da7da59b83704b98db415fcd8e
Reviewed-on: https://chromium-review.googlesource.com/765368
Reviewed-by: Peter Beverloo <peter@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#516266}
[modify] https://crrev.com/c7f24fba3104b579b1fadde313ece93f6dc334b9/third_party/WebKit/LayoutTests/TestExpectations
[modify] https://crrev.com/c7f24fba3104b579b1fadde313ece93f6dc334b9/third_party/WebKit/LayoutTests/fast/forms/ValidityState-patternMismatch-expected.txt

Project Member Comment 74 by bugdroid1@chromium.org, Today (7 hours ago)
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0da56e74cf8ab9317b4e6e0a0abe0593de5bc57c

commit 0da56e74cf8ab9317b4e6e0a0abe0593de5bc57c
Author: jgruber <jgruber@chromium.org>
Date: Fri Dec 15 14:17:34 2017

[regexp] Restrict unicode property value expressions

The unicode property escape syntax restricts unicode property names and
unicode property values to consist only of characters taken from the
[a-zA-Z0-9_] character class. See the spec at:

https://tc39.github.io/proposal-regexp-unicode-property-escapes/

In most cases, we do not actually need to validate that this is the
case, since subsequent property lookup in ICU will fail (and throw a
SyntaxError) if the given property does not exist.

However, there one special case. The ICU lookup takes the property name
as a null-terminated string, so it will accept carefully malformed
property names (e.g. '\p{Number\0[}'). This can end up confusing the
regexp parser.

With this CL, we explicitly restrict potential property names / values
to the character set as specified.

Bug:  v8:4743 , chromium:793793
Change-Id: Ic97deea8602571ec6793b79c4bb858e1c7597405
Reviewed-on: https://chromium-review.googlesource.com/824272
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#50130}
[modify] https://crrev.com/0da56e74cf8ab9317b4e6e0a0abe0593de5bc57c/src/regexp/regexp-parser.cc
[add] https://crrev.com/0da56e74cf8ab9317b4e6e0a0abe0593de5bc57c/test/mjsunit/regress/regress-793793.js

Sign in to add a comment